Osmosis Verified
EdTech, Healthcare, Medical Education, E-learning
Machine Learning Engineer
San Francisco, California, United StatesOnsiteFull Time$180,000–$250,000 /yrPosted 2 months agoVisa sponsorship availableHidden Gem · YC Startup
Role summary
Osmosis is seeking a Machine Learning Engineer to develop high-performance distributed training infrastructure for reinforcement learning (RL) at scale. The role involves implementing RL algorithms, building post-training pipelines, optimizing GPU utilization, and working with customers on production deployments. Expertise in RL algorithms, distributed training, and low-level optimization is required, with a strong preference for Python and cloud infrastructure skills.
### **About Osmosis**
At Osmosis, we help companies use cutting-edge reinforcement learning techniques to fine-tune open-source language models that beat foundation models on performance, latency, and cost.
We’ve raised $7M in funding from Y Combinator, top institutional investors like CRV and Audacious Ventures, as well as angel investors including Paul Graham (Y Combinator), Erik Bernhardsson (Modal Labs), Misha Laskin (Reflection AI), and Guillermo Rauch (Vercel).
### **About the Role**
We're looking for a Machine Learning Engineer to contribute to high-performance distributed training infrastructure for RL at scale. You'll work directly with our founding team and design partners to push the boundaries of what's possible with post-training and continual learning systems.
This role requires expertise in RL algorithms, distributed training, and low-level optimization. You'll have exceptional agency to make impactful decisions while working in a fast-paced, customer-driven environment.
### **Responsibilities**
You’ll contribute to work in areas like:
* **Distributed Training Infrastructure**: implement new RL algorithms and build scalable post-training pipelines
* **Resource Management & Optimization:** design infrastructure systems for efficient GPU utilization and dynamic resource allocation
* **Customer-Facing Work**: work directly with customers on production deployments and custom model development
### **Technology**
* **Backend**: Python FastAPI, Golang
* **Frontend**: React, TypeScript, Next.js
* **Cloud Infrastructure**: AWS Fargate, Docker, Kubernetes, AWS SageMaker
* **ML Frameworks**: Verl / slime / Megatron-LM / SkyRL, PyTorch (FSDP experience is a plus), vLLM / SGLang
* **Databases**: DynamoDB, S3
At Osmosis, we help companies use cutting-edge reinforcement learning techniques to fine-tune open-source language models that beat foundation models on performance, latency, and cost.
We’ve raised $7M in funding from Y Combinator, top institutional investors like CRV and Audacious Ventures, as well as angel investors including Paul Graham (Y Combinator), Erik Bernhardsson (Modal Labs), Misha Laskin (Reflection AI), and Guillermo Rauch (Vercel).
### **About the Role**
We're looking for a Machine Learning Engineer to contribute to high-performance distributed training infrastructure for RL at scale. You'll work directly with our founding team and design partners to push the boundaries of what's possible with post-training and continual learning systems.
This role requires expertise in RL algorithms, distributed training, and low-level optimization. You'll have exceptional agency to make impactful decisions while working in a fast-paced, customer-driven environment.
### **Responsibilities**
You’ll contribute to work in areas like:
* **Distributed Training Infrastructure**: implement new RL algorithms and build scalable post-training pipelines
* **Resource Management & Optimization:** design infrastructure systems for efficient GPU utilization and dynamic resource allocation
* **Customer-Facing Work**: work directly with customers on production deployments and custom model development
### **Technology**
* **Backend**: Python FastAPI, Golang
* **Frontend**: React, TypeScript, Next.js
* **Cloud Infrastructure**: AWS Fargate, Docker, Kubernetes, AWS SageMaker
* **ML Frameworks**: Verl / slime / Megatron-LM / SkyRL, PyTorch (FSDP experience is a plus), vLLM / SGLang
* **Databases**: DynamoDB, S3
Similar roles
Machine Learning EngineerMastech Digital · Dallas, Texas, United States · Onsite- Machine Learning EngineerEdurech Technoogy · Santa Clara, California, United States · Hybrid
- Machine Learning EngineerMORSE Corp · Boston, Massachusetts, United States · Hybrid
- Machine Learning EngineerReddit · San Francisco, California, United States · Remote
- Machine Learning EngineerReddit · New York, New York, United States · Remote