Machine Learning Engineer
Role summary
We are seeking an AI/Machine Learning Engineer to build and maintain production-grade ML systems at scale. This role involves owning the end-to-end ML lifecycle, from translating business needs into ML problems to deploying, monitoring, and continuously improving models. Responsibilities include designing scalable ML pipelines, developing and training models, engineering robust feature pipelines, deploying models as low-latency APIs or streaming services, and implementing robust monitoring and alerting systems. The ideal candidate will have strong Python programming skills, a deep understanding of ML fundamentals, experience with ML frameworks, and proficiency in data processing, MLOps, and cloud platforms.
Role Overview
We are hiring an AI / Machine Learning Engineer to build
production-grade ML systems
that operate at scale. This role focuses on turning data and models into reliable, high-performance services that directly impact business outcomes.
You will own the
end-to-end ML lifecycle
- from problem framing to deployment, monitoring, and continuous improvement.
What You’ll Own
- Translate business problems into
ML problem statements and measurable objectives
- Design and build
scalable ML pipelines
(batch + real-time)
- Develop and train models for prediction, ranking, and optimization
- Engineer robust
feature pipelines and data transformations
- Deploy models as
low-latency APIs or streaming services
- Implement
continuous training and model retraining pipelines
- Monitor
data drift, model drift, and system performance
- Conduct
experimentation (A/B testing, offline/online validation)
- Optimize systems for
accuracy, latency, and cost
- Ensure reliability with
logging, monitoring, and alerting systems
Core Technical Requirements
- Strong programming in Python with production-level standards
- Deep understanding of
machine learning fundamentals, supervised
& unsupervised learning
- Ensemble methods (Random Forest, XGBoost)
- Solid foundation in
statistics and probability
- Experience with
feature engineering and data pipelines
- Hands-on with ML frameworks (Scikit-learn, TensorFlow, PyTorch)
- Experience deploying models using
REST/gRPC APIs
- Understanding of
system design (scalability, fault tolerance, latency)
Data & Systems Engineering Expectations
- Strong SQL and data modeling skills
- Experience with
large-scale data processing (Spark or equivalent)
- Understanding of
data pipelines (ETL/ELT workflows)
- Familiarity with
streaming systems (Kafka or similar)
- Ability to debug and optimize
data + model pipelines end-to-end
MLOps & Production Readiness
- Experience with
CI/CD for ML systems
- Model versioning, experiment tracking, and reproducibility
- Monitoring pipelines for
data quality and model performance
- Experience with containerization (Docker) and orchestration (Kubernetes)
- Handling
rollback, failure recovery, and deployment strategies
Cloud & Infrastructure
- Experience with at least one cloud platform (AWS / Azure / GCP)
- Understanding of
distributed systems and scalable architecture
- Ability to optimize
compute cost vs performance trade-offs
Good to Have (Strong Differentiators)
- Deep learning (CNNs, transformers)
- Exposure to
Generative AI / LLM systems
- Experience with
recommendation systems or ranking models
- Knowledge of
feature stores and online/offline consistency
- Familiarity with
model explainability and fairness techniques
Qualifications
- Bachelor’s or Master’s in Computer Science, AI, Data Science, or related field
- 3–8+ years of experience building ML systems in production
- Strong fundamentals in math, algorithms, and data structures
How You Will Be Measured
- Model performance (accuracy, precision/recall, business metrics)
- System latency and throughput
- Reliability (uptime, failure rate, recovery time)
- Impact on business KPIs (revenue, cost, efficiency)
What Top Candidates Do Differently
- Think in
systems, not just models
- Balance
accuracy vs latency vs cost
- Build
reusable, scalable ML infrastructure
- Communicate clearly with both engineers and business stakeholders
Typical High-Impact Projects
- Real-time recommendation and ranking systems
- Fraud detection and risk scoring pipelines
- Demand forecasting and supply optimization
- Customer churn prediction and targeting models
- Personalization engines for large-scale platforms
Reality of the Role (No Fluff)
This is not a “train model and done” role.
You are expected to:
- Work with messy, incomplete data
- Debug pipelines in production
- Handle scale and failures
- Deliver measurable business value
Why This Role Matters
You will directly influence:
- Product decisions
- Customer experience
- Revenue and cost optimization
Similar roles
Machine Learning EngineerMastech Digital · Dallas, Texas, United States · Onsite- Machine Learning EngineerEdurech Technoogy · Santa Clara, California, United States · Hybrid
- Machine Learning EngineerMORSE Corp · Boston, Massachusetts, United States · Hybrid
- Machine Learning EngineerReddit · San Francisco, California, United States · Remote
- Machine Learning EngineerReddit · New York, New York, United States · Remote