GMI Cloud logo
GMI Cloud Verified
IT Services, Managed Services Provider (MSP), Cloud Computing, Cybersecurity

Machine Learning Engineer

Mountain View, California, United StatesHybridFull TimePosted 2 months agoVisa sponsorship available

Is this role right for you?

Upload your resume and get a skill-by-skill breakdown — see exactly where you match, where you're close, and what to highlight. Not a mystery percentage.

Get a tailored resume highlighting what this role needs.

Role summary

We are seeking a Machine Learning Engineer to join our Inference & Reinforcement Learning Platform team. This hybrid role involves designing, deploying, and validating ML proof-of-concepts on GPU infrastructure, working directly with customers to translate research and business needs into performant systems. You will own customer POCs end-to-end, deploying and optimizing LLM inference and RL training, debugging complex issues, and providing feedback to the product team. The ideal candidate has a strong software engineering background with hands-on experience in ML inference or training systems, familiarity with distributed systems and GPUs, and comfort working directly with customers.

Location:
Bay area (frequent customer interaction)

Team:
Inference & Reinforcement Learning Platform

About the Role

We’re looking for a
Machine Learning Engineer (MLE)
to work directly with customers and partners to
design, deploy, and validate inference and reinforcement learning (RL) proof-of-concepts
on GMI’s GPU infrastructure.

This is a
high-impact, hybrid engineering role
that sits at the intersection of platform engineering, applied ML, and customer success. You’ll be embedded with customers during early-stage deployments—turning research ideas, datasets, and business requirements into
working, performant systems
on real GPU clusters.

If you enjoy being close to users, debugging real systems, and shipping results fast (not just writing docs), this role is for you.

What You’ll Do

Own customer POCs end-to-end

- Deploy and optimize
LLM inference
,
RL training
, and
post-training workflows
on GMI clusters
- Translate customer requirements into concrete system designs and experiments

Forward-deploy with customers

  • Work hands-on with research teams, startups, and enterprise customers
  • Debug performance, stability, and correctness issues in real environments

Inference deployment

  • Stand up and tune inference stacks (e.g. vLLM / SGLang / Ray Serve–style architectures)
  • Optimize latency, throughput, GPU utilization, and cost efficiency

RL & post-training POCs

  • Support RLHF / RFT / SFT workflows using customer-provided datasets
  • Integrate SDKs, training APIs, and cluster resources to shorten “idea → experiment” cycles

Performance & reliability

  • Diagnose GPU, networking, and distributed system bottlenecks
  • Run benchmarks, profiling, and stress tests on multi-GPU / multi-node setups

Feedback loop to product

  • Feed real-world customer learnings back into GMI’s platform, SDKs, and APIs
  • Help shape reference architectures, cookbooks, and best practices

What We’re Looking For

Core Requirements

- Strong software engineering background (Python required; Go / Rust a plus)
- Hands-on experience with
ML inference or training systems
- Familiarity with distributed systems and GPUs (multi-GPU, multi-node)
- Comfort working directly with customers and ambiguous requirements
- Ability to debug end-to-end systems (code, infra, networking, performance)

Nice to Have

  • Experience with:
  • LLM inference frameworks (vLLM, SGLang, Ray Serve, Triton, etc.)
  • RL or post-training workflows (RLHF, RFT, SFT)
  • PyTorch, DeepSpeed, Megatron-LM, or similar
  • Kubernetes-based ML platforms
  • GPU performance profiling and optimization
  • Prior experience as:
  • Forward Deployed Engineer
  • Solutions Engineer
  • ML Platform Engineer
  • Applied Research Engineer

What Makes This Role Special

- You’re
close to real users and real GPUs
—not abstract roadmaps
- You’ll work on
cutting-edge inference and RL workloads
, not toy demos
- You’ll influence product direction through direct customer feedback
- Fast iteration, high ownership, and visible impact

Who Thrives Here

- Engineers who like
shipping over theorizing
- People who enjoy being the
“last mile” problem solver
- Builders who want exposure to
both deep systems and applied ML
- Those excited by early-stage POCs that turn into real production systems

Ready to apply?
You'll be redirected to GMI Cloud's application page.

Similar roles