Software Engineer, ML Infrastructure
Compensation estimateAI
See base, equity, bonus, and total comp estimates for this role — free, no credit card.
Sign up to see compensation estimateEngineering · Full-time · San Francisco, New York
Our mission is to automate coding. The first step in our journey is to build the best tool for professional programmers, using a combination of inventive research, design, and engineering. Our organization is very flat, and our team is small and talent dense. We particularly like people who are truth-seeking, passionate, and creative. We enjoy spirited debate, crazy ideas, and shipping code.
### About the role
The ML Infrastructure team builds large-scale compute, storage, and software infrastructure to support Cursor’s work building the world’s best agentic coding model. We’re looking for strong engineers who are interested in building high-performance infrastructure and the software to support it. This role works closely with ML researchers and engineers to enable their work through improvements to our training framework, systems reliability/performance, and developer experience.
### What you’ll do
- Collaborate with ML researchers to improve the throughput and reliability of training
- Work with OEMs, cloud service providers, and others to plan and build cutting-edge GPU infrastructure
- Improve the density and scalability of compute environments to enable increasingly large RL workloads
- Create software and systems to automate building, monitoring, and running GPU clusters
- Build workload scheduling and data movement systems to support Cursor’s growing training footprint
### You may be a fit if
- A strong background in systems and infrastructure-focused software engineering, particularly in Python, Typescript, Rust, and Golang
- Experience with distributed storage and networking infrastructure, particularly on Linux systems across cloud and bare metal environments
- Exposure to large-scale systems and their unique challenges, ideally across thousands of nodes with significant resource footprints.
- Production use of infrastructure-as-code and configuration management, across hosts and Kubernetes
### Nice to have
- Operational exposure to Nvidia GPUs with Infiniband or RoCE, particularly with Blackwell and Hopper-class hardware
- Exposure to Ray, Slurm, or other common compute and runtime schedulers
#LI-DNI
Similar roles
- Senior Software Engineer, ML InfrastructureVoxel · San Francisco, California, United States · Hybrid
- Senior Software Engineer, ML InfrastructureNuro · Mountain View, California, California (hq)
- Software Engineer, ML InfrastructureNuro · Mountain View, California, California (hq)
- Senior Software Engineer, ML InfrastructureDecagon · California, United States · Onsite
- Software Engineer, ML InfrastructureTwelveLabs · San Francisco, California, United States · Hybrid