Senior Machine Learning Engineer (Inference Platform)
Role summary
Wizard AI is seeking a Senior Machine Learning Engineer to lead their Inference Platform team. This role focuses on the end-to-end lifecycle of production ML serving systems, including model packaging, deployment, monitoring, optimization, and scaling. The engineer will own critical decisions regarding serving architecture, performance, reliability, and scalability for a live conversational shopping agent. Responsibilities include evolving the multi-engine inference platform, building production ML pipelines, defining lifecycle management strategies, implementing SLAs, and developing observability tooling. The ideal candidate has 5-8+ years of experience in Software, ML, Platform, or Infrastructure Engineering, with hands-on experience serving LLMs in production, strong Python skills, and cloud platform expertise.
About Wizard AI
At Wizard AI, we’re building the top-performing AI Shopping Agent that delivers the best products from across the web with unmatched accuracy, quality, and trust. Our ML models power the core of our platform, and we’re looking for a Senior Machine Learning Engineer to own how they run in production reliably, efficiently, and at scale.
The Role
As a Senior ML Engineer on our Inference Platform, you’ll own the end-to-end lifecycle of production ML serving systems from model packaging and deployment to monitoring, optimization, and scaling. This is not a traditional MLOps role focused solely on pipelines and tooling. You’ll be responsible for the inference infrastructure powering a live conversational shopping agent, operating multiple specialized serving engines under real-world production load.
You’ll own critical decisions around serving architecture, performance, reliability, and scalability, working closely with ML Engineers, Data teams, Product, and DevOps to ensure models move seamlessly from experimentation into high-performance production systems.
What You'll Do
What We're Looking For
What Success Looks Like
Reliable, Scalable Inference Systems
Production serving infrastructure operates with clear SLAs, strong observability, and minimal downtime. Latency, availability, throughput, and GPU utilization are actively measured and optimized as platform demands grow.
End-to-End Ownership
You own the complete serving lifecycle — from deployment and release management through monitoring, optimization, and scaling — enabling ML engineers to ship quickly while maintaining reliability and reproducibility.
Technical Leadership and Impact
You shape the future of Wizard's inference platform, driving key architectural decisions that improve performance, reduce infrastructure costs, and support the next generation of AI-powered shopping experiences.