Founding Engineer, ML Systems

San Francisco, California, United StatesOnsiteFull Time$250,000–$250,000 /yrPosted 2 months agoHidden Gem · YC Startup

Is this role right for you?

Upload your resume and get a skill-by-skill breakdown — see exactly where you match, where you're close, and what to highlight. Not a mystery percentage.

Get a tailored resume highlighting what this role needs.

Role summary

Hillclimb is seeking a Founding Engineer to build AI systems, focusing on Reinforcement Learning environments and end-to-end training infrastructure. This role involves working directly with researchers and scientists to teach AI to think and improve itself. The ideal candidate has a talent for systems and research engineering, with preferred experience in data structures, algorithms, concurrency, system design, and proficiency in languages like Python or C++ on Linux. Experience with distributed training frameworks, model training optimization, and distributed schedulers is also valued. This is an onsite, full-time position in San Francisco.

_To surpass the ceiling of human intelligence, the machine must first learn from it._

Hillclimb works with the world's best researchers, scientists, and engineers to teach AI to think like them so that it has the means to improve itself.

Founding engineers build RL environments, own training infra end-to-end, and work directly with frontier labs. You may be a good fit if you have a talent for systems / research engineering.

We’re a founding team of former DeepMind & quant researchers backed by Garry Tan, Tier 1 VCs & angel investors from OpenAI, Anthropic, DeepMind, xAI & Meta Superintelligence Labs. We achieved [SOTA for math](https://x.com/NousResearch/status/1998536543565127968) shortly after graduating YC.

**Preferred Experience:**

• Strong understanding of data structures, algorithms, concurrency, and system design.

• Proficiency in modern programming languages (Python, C++, or similar) on Linux systems, with experience building large-scale services, infrastructure tooling, or distributed systems.

• Ability to reason about trade-offs between performance, reliability, and maintainability.

• Experience with distributed training frameworks such as Megatron-LM, FSDP, DeepSpeed, TorchTitan

• Experience with optimizing training throughput for large scale models

• Experience with distributed schedulers (k8s, slurm, docker swarm)

**Compensation starts at $250K + 2% Equity. Must be ready to work in-person in San Francisco, Full-time**

Ready to apply?

You'll be redirected to hillclimb's application page.