CloudCruise Verified
Member of Technical Staff – AI/ML
San Francisco, California, United StatesOnsiteFull TimeStaff$150,000–$250,000 /yrPosted 2 months agoHidden Gem · YC Startup
Role summary
CloudCruise is seeking experienced engineers to build and enhance the intelligence layer for their AI/ML-powered coding agent platform. This role involves designing and improving a multi-agent system for enterprise computer automation, focusing on healthcare applications. Key responsibilities include fine-tuning vision-language models (VLMs), developing robust evaluation frameworks to ensure agent reliability and performance, and implementing techniques for handling edge cases and improving debuggability. The ideal candidate has experience training production models, is opinionated about agent design, and is comfortable researching and implementing novel approaches, especially in data-scarce environments.
**About CloudCruise**
CloudCruise is building the coding agent for enterprise computer automation. Our developer platform writes, tests, and maintains automation code on fully-managed infrastructure – cutting dev time by 90%. We're starting with healthcare, where legacy systems make reliable automation a genuinely hard problem. We just raised $5M and brought angels like Zack Lipton (CTO Abridge) and David Singleton (fmr. CTO Stripe) on board.
We're looking for 10x engineers who are comfortable learning across domains and diving deep into unfamiliar territory. High agency, ground-up builders who thrive with significant ownership from day one.
**The Role**
You'll own the intelligence layer that makes our agents actually work. We run a multi-agent system in production – a workflow builder agent for conversational automation creation and a maintenance agent for error recovery. Both rely on vision-language models to understand what's on screen. Your job is to make all of them more reliable, faster, and cheaper.
**What You'll Work On**
* Design and improve our two-agent system (workflow builder, maintenance) that creates and repairs automations through tool-calling with 15+ browser automation primitives
* Fine-tune VLMs to improve accuracy and reduce reliance on API calls
* Build evaluation frameworks that measure agent reliability and catch regressions before customers do
* Develop techniques for handling edge cases, recovering from failures, and generalizing across UI variations
* Research and implement approaches to make agents more predictable and debuggable
**You Might Be a Fit If**
* You've trained or fine-tuned models that run in production
* You're opinionated about agent design – when to use LLMs, when not to, how to structure tool use
* You believe evals are the path to reliability and have built systems to prove it
* You're comfortable reading papers and implementing ideas from scratch
* You've worked on problems where labeled data is scarce and you had to get creative
**Compensation**
Competitive salary and meaningful equity. We want you to have real ownership in what we're building.
CloudCruise is building the coding agent for enterprise computer automation. Our developer platform writes, tests, and maintains automation code on fully-managed infrastructure – cutting dev time by 90%. We're starting with healthcare, where legacy systems make reliable automation a genuinely hard problem. We just raised $5M and brought angels like Zack Lipton (CTO Abridge) and David Singleton (fmr. CTO Stripe) on board.
We're looking for 10x engineers who are comfortable learning across domains and diving deep into unfamiliar territory. High agency, ground-up builders who thrive with significant ownership from day one.
**The Role**
You'll own the intelligence layer that makes our agents actually work. We run a multi-agent system in production – a workflow builder agent for conversational automation creation and a maintenance agent for error recovery. Both rely on vision-language models to understand what's on screen. Your job is to make all of them more reliable, faster, and cheaper.
**What You'll Work On**
* Design and improve our two-agent system (workflow builder, maintenance) that creates and repairs automations through tool-calling with 15+ browser automation primitives
* Fine-tune VLMs to improve accuracy and reduce reliance on API calls
* Build evaluation frameworks that measure agent reliability and catch regressions before customers do
* Develop techniques for handling edge cases, recovering from failures, and generalizing across UI variations
* Research and implement approaches to make agents more predictable and debuggable
**You Might Be a Fit If**
* You've trained or fine-tuned models that run in production
* You're opinionated about agent design – when to use LLMs, when not to, how to structure tool use
* You believe evals are the path to reliability and have built systems to prove it
* You're comfortable reading papers and implementing ideas from scratch
* You've worked on problems where labeled data is scarce and you had to get creative
**Compensation**
Competitive salary and meaningful equity. We want you to have real ownership in what we're building.