Machine Learning Engineer
Role summary
We are seeking a Machine Learning Engineer to join our Inference & Reinforcement Learning Platform team. This hybrid role involves designing, deploying, and validating ML proof-of-concepts on GPU infrastructure, working directly with customers to translate research and business needs into performant systems. You will own customer POCs end-to-end, deploying and optimizing LLM inference and RL training, debugging complex issues, and providing feedback to the product team. The ideal candidate has a strong software engineering background with hands-on experience in ML inference or training systems, familiarity with distributed systems and GPUs, and comfort working directly with customers.
Location:
Bay area (frequent customer interaction)
Team:
Inference & Reinforcement Learning Platform
About the Role
We’re looking for a
Machine Learning Engineer (MLE)
to work directly with customers and partners to
design, deploy, and validate inference and reinforcement learning (RL) proof-of-concepts
on GMI’s GPU infrastructure.
This is a
high-impact, hybrid engineering role
that sits at the intersection of platform engineering, applied ML, and customer success. You’ll be embedded with customers during early-stage deployments—turning research ideas, datasets, and business requirements into
working, performant systems
on real GPU clusters.
If you enjoy being close to users, debugging real systems, and shipping results fast (not just writing docs), this role is for you.
What You’ll Do
Own customer POCs end-to-end
- Deploy and optimize
LLM inference
,
RL training
, and
post-training workflows
on GMI clusters
- Translate customer requirements into concrete system designs and experiments
Forward-deploy with customers
- Work hands-on with research teams, startups, and enterprise customers
- Debug performance, stability, and correctness issues in real environments
Inference deployment
- Stand up and tune inference stacks (e.g. vLLM / SGLang / Ray Serve–style architectures)
- Optimize latency, throughput, GPU utilization, and cost efficiency
RL & post-training POCs
- Support RLHF / RFT / SFT workflows using customer-provided datasets
- Integrate SDKs, training APIs, and cluster resources to shorten “idea → experiment” cycles
Performance & reliability
- Diagnose GPU, networking, and distributed system bottlenecks
- Run benchmarks, profiling, and stress tests on multi-GPU / multi-node setups
Feedback loop to product
- Feed real-world customer learnings back into GMI’s platform, SDKs, and APIs
- Help shape reference architectures, cookbooks, and best practices
What We’re Looking For
Core Requirements
- Strong software engineering background (Python required; Go / Rust a plus)
- Hands-on experience with
ML inference or training systems
- Familiarity with distributed systems and GPUs (multi-GPU, multi-node)
- Comfort working directly with customers and ambiguous requirements
- Ability to debug end-to-end systems (code, infra, networking, performance)
Nice to Have
- Experience with:
- LLM inference frameworks (vLLM, SGLang, Ray Serve, Triton, etc.)
- RL or post-training workflows (RLHF, RFT, SFT)
- PyTorch, DeepSpeed, Megatron-LM, or similar
- Kubernetes-based ML platforms
- GPU performance profiling and optimization
- Prior experience as:
- Forward Deployed Engineer
- Solutions Engineer
- ML Platform Engineer
- Applied Research Engineer
What Makes This Role Special
- You’re
close to real users and real GPUs
—not abstract roadmaps
- You’ll work on
cutting-edge inference and RL workloads
, not toy demos
- You’ll influence product direction through direct customer feedback
- Fast iteration, high ownership, and visible impact
Who Thrives Here
- Engineers who like
shipping over theorizing
- People who enjoy being the
“last mile” problem solver
- Builders who want exposure to
both deep systems and applied ML
- Those excited by early-stage POCs that turn into real production systems
Similar roles
Machine Learning EngineerMastech Digital · Dallas, Texas, United States · Onsite- Machine Learning EngineerEdurech Technoogy · Santa Clara, California, United States · Hybrid
- Machine Learning EngineerMORSE Corp · Boston, Massachusetts, United States · Hybrid
- Machine Learning EngineerReddit · San Francisco, California, United States · Remote
- Machine Learning EngineerReddit · New York, New York, United States · Remote