Member of Technical Staff

San Francisco, California, United StatesOnsiteFull Time$150,000–$250,000 /yrPosted 2 months agoHidden Gem · YC Startup

Is this role right for you?

Upload your resume and get a skill-by-skill breakdown — see exactly where you match, where you're close, and what to highlight. Not a mystery percentage.

Get a tailored resume highlighting what this role needs.

Role summary

Wafer is building AI that optimizes AI itself, starting with GPU kernels and expanding into ML systems and AI infrastructure. They are seeking engineers to work at the intersection of AI agents and systems programming. The role involves building and improving GPU kernel optimization frameworks, developing integrations with GPU profilers and compiler toolchains, designing architectures for remote GPU execution, and working on trace analysis systems. The ideal candidate has strong technical intuition, can ship production code quickly while maintaining quality, and is comfortable working across the stack. Experience with GPU programming, profiling tools, compiler internals, AI/ML research, or agent systems is a plus.

## **About the Role**

Wafer's mission is to maximize intelligence per watt, by building AI that optimizes AI itself. Our journey starts with GPU kernels, but will expand into every corner of ML systems and AI infrastructure. We're a small team (4 people) backed by Fifty Years, Y Combinator, Jeff Dean, and Woj Zaremba (co-founder of OpenAI), and we're looking for engineers who want to work at the intersection of AI agents and systems programming.

You'll work directly with the founding team to build the systems that power our GPU optimization platform, from the agent framework that iterates on kernels, to the profiling infrastructure that connects to NCU and ROCprofiler, to the compiler tooling that analyzes PTX and SASS.

## **What You'll Do**

* Build and improve our framework for GPU kernel optimization (multi-turn tool use, state management, reward signals)

* Develop integrations with GPU profilers and compiler toolchains

* Design the architecture for remote GPU execution across cloud GPUs

* Work on trace analysis systems that help the agent diagnose performance bottlenecks

* Ship features that engineers use daily, and that optimizes infrastructure that runs the world's AI (PyTorch, vLLM, NVIDIA, AMD, etc.)

## **What We Look For**

**You're a strong fit if you:**

* Have deep technical intuition and can learn new domains quickly

* Are comfortable working across the stack

* Can ship production code fast while maintaining quality

* Want to work on some of the most interesting AI infra problems at a small company with no bullshit + ship fast culture.

**Very nice to have:**

* GPU programming experience (CUDA, HIP, Triton)

* Experience with profiling tools or compiler internals

* Background in AI/ML research or agent systems

* Publications or open-source work in relevant areas

Ready to apply?

You'll be redirected to Wafer's application page.

Is this role right for you?

Role summary

Similar roles