Staff Ai Engineer

City and County of San Francisco, California, United StatesOnsiteFull TimeStaff$250,000–$300,000 /yrPosted 2 months agoVisa sponsorship available

Is this role right for you?

Upload your resume and get a skill-by-skill breakdown — see exactly where you match, where you're close, and what to highlight. Not a mystery percentage.

Get a tailored resume highlighting what this role needs.

Role summary

We are seeking a Staff AI Engineer to lead the design, development, and hardening of core AI intelligence systems for an AI-native decision-support platform. This hands-on role requires deep expertise in applied AI/ML, particularly with LLM-powered agentic systems, hybrid reasoning pipelines, and robust RAG implementations. You will own systems end-to-end, from architecture to production, focusing on accuracy, trust, and reliability in high-stakes environments. The ideal candidate has 6+ years of software engineering experience, strong Python and backend system design skills, and a track record of delivering complex AI systems. You will also play a key role in defining AI engineering practices and mentoring others.

We're building an AI-native platform focused on helping professionals make complex, high-stakes decisions with greater clarity and confidence.

This is not an AI "feature." AI is the product.

As a
Staff AI Engineer
, you will serve as a technical leader responsible for designing, building, and hardening the core intelligence systems behind the platform-systems that directly support real-world decision-making in environments where accuracy and trust are critical.

This is a hands-on role for someone who wants to operate at the edge of what's reliable in applied AI and push those boundaries into production. You will own systems end-to-end: from architecture and modeling decisions through deployment, evaluation, and iteration. You'll also help define technical standards and influence how AI systems are built across the organization.

What You'll Work On

- Architecting
LLM-powered, agentic systems
for research, analysis, and decision support
- Designing
hybrid reasoning pipelines
that combine language models with retrieval systems, structured data, deterministic logic, and external tools
- Building
robust RAG pipelines
over unstructured, noisy, and proprietary datasets
- Developing
evaluation frameworks
to measure reasoning quality, faithfulness, latency, and cost
- Implementing
observability, debugging, and failure handling
for multi-step AI workflows
- Translating ambiguous user needs into reliable, production-grade intelligent behavior in collaboration with product and design
- Raising the bar for
AI engineering practices
through technical leadership and mentorship

Example Problem
Design and build an AI system capable of synthesizing diverse data sources-documents, structured datasets, and external signals-into actionable, well-supported insights, while explicitly surfacing uncertainty and tradeoffs.

Why This Is Challenging

- Product complexity:
The goal is to deliver a system users rely on daily-not a demo or internal prototype
- High-stakes environment:
Outputs must be accurate, explainable, and calibrated-"mostly correct" is insufficient
- Data ambiguity:
Inputs are often incomplete, inconsistent, or contradictory, with no single source of truth
- Reasoning over generation:
The focus is on systems that evaluate, compare, and justify-not just generate fluent responses
- Agent reliability:
Multi-step, tool-using workflows must behave consistently in production environments
- Evaluation is evolving:
You will help define how to measure quality when traditional ML metrics fall short
- Trust as a requirement:
Explainability, traceability, and failure handling are core system properties-not afterthoughts

What We're Looking For

- 6+ years of software engineering experience with significant hands-on work in applied AI/ML systems
- Strong foundation in
Python
and backend system design
- Experience working with
LLMs
, including areas like prompting, fine-tuning, RAG, agentic workflows, or evaluation tooling
- Track record of owning
ambiguous, high-impact systems
from concept through production
- Ability to make thoughtful
architectural tradeoffs
in real-world environments
- Systems-level thinking combined with a bias toward shipping high-quality implementations
- Strong product intuition and a sense of responsibility for end-user outcomes

Bonus Experience

Background in data-intensive products or regulated environments
Exposure to domains where correctness, traceability, and trust are critical

Oscar Associates Limited (US) is acting as an Employment Agency in relation to this vacancy.

Ready to apply?

You'll be redirected to Oscar Health's application page.

Is this role right for you?

Role summary

Similar roles