Machine Learning Engineer

San Francisco, California, United StatesOnsiteFull Time$180,000–$250,000 /yrPosted 2 months agoVisa sponsorship availableHidden Gem · YC Startup

Is this role right for you?

Upload your resume and get a skill-by-skill breakdown — see exactly where you match, where you're close, and what to highlight. Not a mystery percentage.

Get a tailored resume highlighting what this role needs.

Role summary

Osmosis is seeking a Machine Learning Engineer to develop high-performance distributed training infrastructure for reinforcement learning (RL) at scale. The role involves implementing RL algorithms, building post-training pipelines, optimizing GPU utilization, and working with customers on production deployments. Expertise in RL algorithms, distributed training, and low-level optimization is required, with a strong preference for Python and cloud infrastructure skills.

### **About Osmosis**

At Osmosis, we help companies use cutting-edge reinforcement learning techniques to fine-tune open-source language models that beat foundation models on performance, latency, and cost.

We’ve raised $7M in funding from Y Combinator, top institutional investors like CRV and Audacious Ventures, as well as angel investors including Paul Graham (Y Combinator), Erik Bernhardsson (Modal Labs), Misha Laskin (Reflection AI), and Guillermo Rauch (Vercel).

### **About the Role**

We're looking for a Machine Learning Engineer to contribute to high-performance distributed training infrastructure for RL at scale. You'll work directly with our founding team and design partners to push the boundaries of what's possible with post-training and continual learning systems.

This role requires expertise in RL algorithms, distributed training, and low-level optimization. You'll have exceptional agency to make impactful decisions while working in a fast-paced, customer-driven environment.

### **Responsibilities**

You’ll contribute to work in areas like:

* **Distributed Training Infrastructure**: implement new RL algorithms and build scalable post-training pipelines
* **Resource Management & Optimization:** design infrastructure systems for efficient GPU utilization and dynamic resource allocation
* **Customer-Facing Work**: work directly with customers on production deployments and custom model development

### **Technology**

* **Backend**: Python FastAPI, Golang
* **Frontend**: React, TypeScript, Next.js
* **Cloud Infrastructure**: AWS Fargate, Docker, Kubernetes, AWS SageMaker
* **ML Frameworks**: Verl / slime / Megatron-LM / SkyRL, PyTorch (FSDP experience is a plus), vLLM / SGLang
* **Databases**: DynamoDB, S3

Ready to apply?

You'll be redirected to Osmosis's application page.

Is this role right for you?

Role summary

Similar roles