Senior AI/ML Engineer
Role summary
We are seeking a Senior AI/ML Engineer with 4+ years of experience in building, fine-tuning, and deploying large language models (LLMs) in production. This role focuses on designing scalable ML systems, optimizing inference efficiency, and delivering production-grade AI solutions across the full ML lifecycle, from distributed GPU training to cloud deployment. Responsibilities include architecting end-to-end ML systems, optimizing LLM performance (latency, throughput, cost), implementing prompt engineering and RAG systems, and deploying APIs. The ideal candidate will have a Master's or PhD in a related field, strong Python skills, and experience with ML frameworks like PyTorch or TensorFlow, cloud platforms (AWS/GCP), and MLOps. Success involves delivering scalable, cost-efficient, and reliable LLM systems with robust monitoring and safety features.
Overview
We are seeking a highly motivated Senior AI/ML Engineer with 4+ years of experience building, fine-tuning, and deploying large language models (LLMs) in production. This role is focused on designing scalable, high-performance ML systems, improving inference efficiency, and delivering reliable, production-grade AI solutions.
You will work across the full ML lifecycle—from distributed GPU training to cloud deployment—, optimizing cost and performance, and collaborating cross-functionally to deliver impactful AI products.
Key Responsibilities
- Design, build, and deploy scalable AI/ML systems for production environments
- Optimize LLM performance across latency, throughput, memory usage, and cost
- Architect end-to-end ML systems, making tradeoffs across performance, scalability, and reliability
- Develop, fine-tune, and evaluate LLMs and deep learning models
- Implement advanced prompt engineering strategies to improve output quality, consistency, and reliability
- Build and optimize retrieval-augmented generation (RAG) systems, including integration with vector databases
- Apply model optimization techniques such as quantization, pruning, batching, and efficient inference strategies
- Deploy and maintain production-grade APIs and model endpoints (e.g., FastAPI)
- Design and maintain distributed data pipelines and cloud-based ML infrastructure
- Build and maintain MLOps pipelines, including experiment tracking, model versioning, and CI/CD workflows
- Implement monitoring, logging, and alerting systems for model performance, drift detection, and system reliability
- Develop robust evaluation frameworks, including offline evaluation, online testing, and A/B experimentation
- Implement safety, alignment, and guardrail mechanisms to mitigate hallucinations, bias, and unsafe outputs
- Optimize infrastructure and deployment strategies for cost efficiency
- Partner with product, engineering, and leadership teams to translate business requirements into scalable AI solutions
- Stay current with emerging research, tools, and best practices in AI/ML
Required Qualifications
- Master’s or PhD in Computer Science, Machine Learning, Artificial Intelligence, or a related field
- 4+ years of hands-on experience building and deploying ML/LLM systems in production
- Strong proficiency in Python (required) and experience with C++ (preferred)
- Deep experience with ML frameworks such as PyTorch and/or TensorFlow
- Strong understanding of NLP, LLMs, and deep learning architectures
- Proven experience optimizing models for production, including GPU acceleration and efficient inference
- Hands-on experience with distributed training
- Experience deploying models at scale with AWS or Google Cloud
- Experience building APIs using FastAPI
- Strong experience with Linux and scripting
- Proficiency with Git
- Solid understanding of databases (PostgreSQL, MySQL)
Nice to Have
- Experience with TensorRT-LLM, vLLM, or DeepSpeed
- Experience with LangChain or LlamaIndex
- Experience with OpenAI, Anthropic, or open-weight models
- Familiarity with MLflow, Weights & Biases, or Kubeflow
- Experience with LLM evaluation frameworks
- Experience with RLHF or DPO
- Experience with multimodal models
- Contributions to open-source or research publications
What Success Looks Like
- Deliver scalable LLM systems in production
- Reduce latency and infrastructure costs while maintaining quality
- Build reliable systems with strong monitoring and safety
- Contribute to scalable architecture decisions
- Drive measurable improvements in model performance
Pay: $175,000.00 - $230,000.00 per year
Work Location: In person
Similar roles
- Senior AI/ML EngineerModern Government Solutions · Point Mugu, California, United States · Onsite
AI/ML EngineerTechTrend, Inc. · Reston, Virginia, United States · Hybrid
AI/ML EngineerSignature IT World Inc · Austin, Texas, United States · Hybrid
Intermediate AI/ML EngineerSolink · Ottawa, Ontario, Canada · Hybrid- AI/ML EngineerJobgether · United States · Remote