Senior Product Manager

San Francisco, California, United StatesOnsiteFull TimeSenior$250,000–$250 /yrPosted 2 months ago

Is this role right for you?

Upload your resume and get a skill-by-skill breakdown — see exactly where you match, where you're close, and what to highlight. Not a mystery percentage.

Get a tailored resume highlighting what this role needs.

Role summary

We are seeking a Senior Product Manager to build a next-generation AI platform focused on enabling developers and enterprises to train, deploy, and operate large-scale machine learning systems. This high-impact role sits at the intersection of AI infrastructure, distributed systems, and developer platforms. You will define and lead product strategy for key areas including inference and model serving, developer experience, multi-cluster compute orchestration, observability, and managed AI services. The ideal candidate has experience building AI/ML platforms or developer-facing infrastructure, with a strong understanding of distributed systems, cloud infrastructure, and performance trade-offs. This role is crucial for closing the gap between complex infrastructure and usable AI systems, making cutting-edge AI accessible and efficient.

Senior Product Manager: AI Platform & Managed Services ($600mill in funding)

$250K + Equity

San Francisco, Bay Area - Onsite 5 days

We are building a next-generation AI platform that enables developers and enterprises to train, deploy, and operate large-scale machine learning systems across heterogeneous compute environments.

Our focus is on making cutting-edge AI infrastructure usable, scalable, and efficient - abstracting away the complexity of distributed systems, multi-GPU orchestration, and model lifecycle management into a cohesive, developer-first platform.

This is a high-impact role at the intersection of
AI infrastructure, distributed systems, and developer platforms
, where you will define and build the product layer that sits between raw compute and real-world AI applications.

You will lead product across key areas of our AI platform and managed services stack, including:

Inference & Model Serving Platforms
Design systems for high-throughput, low-latency inference across LLMs, diffusion models, and multimodal workloads
Define abstractions for batching, scheduling, caching, and model optimization (quantization, compilation, etc.)
Balance performance, cost, and reliability across diverse workloads

AI Platform & Developer Experience
Build APIs, SDKs, and workflows that enable developers to go from model → production seamlessly
Define primitives for fine-tuning, evaluation, deployment, and observability
Simplify complex infrastructure into intuitive, composable building blocks

Multi-Cluster / Multi-Vendor Compute Orchestration
Work on scheduling and workload placement across heterogeneous environments (GPU/CPU, multi-region, multi-cloud)
Partner with engineering on resource allocation, queuing systems, and capacity-aware scheduling

Observability, Evaluation & Cost Governance
Define telemetry systems for model performance, latency, token usage, and failure modes
Build evaluation workflows for LLM quality, safety, and regression detection
Introduce cost controls and optimization strategies for large-scale inference and training

Managed AI Services
Package infrastructure into opinionated, production-ready services for enterprise customers
Define SLAs, reliability models, and deployment patterns for mission-critical workloads
Work closely with customers to understand real-world constraints and translate them into product capabilities

We’re looking for product leaders who can operate at depth across both
systems and product
, and who are excited about building the foundation for the next generation of AI applications.

You likely have:

- Experience building
AI/ML platforms, inference systems, or developer-facing infrastructure
- Strong understanding of
distributed systems, cloud infrastructure, and performance trade-offs
- Familiarity with modern AI stacks:
- LLMs, transformers, diffusion models
- Frameworks like PyTorch, TensorRT, ONNX, vLLM, Triton, etc.

AI is undergoing a platform shift. The gap between raw infrastructure and usable systems is still enormous. This role is about closing that gap: turning fragmented, complex infrastructure into a coherent platform that developers can rely on to build real products.

You’ll be working on problems like:

How to make LLM inference predictable and cost-efficient at scale
How to expose the right abstractions for agentic workflows
How to manage heterogeneous compute without leaking complexity to users
How to make AI systems observable, debuggable, and reliable

If you’re excited about building the systems that power the next generation of AI applications, apply now!

Ready to apply?

You'll be redirected to Realm Alliance's application page.

Is this role right for you?

Role summary

Similar roles