Senior Product Manager
Role summary
We are seeking a Senior Product Manager to build a next-generation AI platform focused on enabling developers and enterprises to train, deploy, and operate large-scale machine learning systems. This high-impact role sits at the intersection of AI infrastructure, distributed systems, and developer platforms. You will define and lead product strategy for key areas including inference and model serving, developer experience, multi-cluster compute orchestration, observability, and managed AI services. The ideal candidate has experience building AI/ML platforms or developer-facing infrastructure, with a strong understanding of distributed systems, cloud infrastructure, and performance trade-offs. This role is crucial for closing the gap between complex infrastructure and usable AI systems, making cutting-edge AI accessible and efficient.
Senior Product Manager: AI Platform & Managed Services ($600mill in funding)
$250K + Equity
San Francisco, Bay Area - Onsite 5 days
We are building a next-generation AI platform that enables developers and enterprises to train, deploy, and operate large-scale machine learning systems across heterogeneous compute environments.
Our focus is on making cutting-edge AI infrastructure usable, scalable, and efficient - abstracting away the complexity of distributed systems, multi-GPU orchestration, and model lifecycle management into a cohesive, developer-first platform.
This is a high-impact role at the intersection of
AI infrastructure, distributed systems, and developer platforms
, where you will define and build the product layer that sits between raw compute and real-world AI applications.
You will lead product across key areas of our AI platform and managed services stack, including:
- Inference & Model Serving Platforms
- Design systems for high-throughput, low-latency inference across LLMs, diffusion models, and multimodal workloads
- Define abstractions for batching, scheduling, caching, and model optimization (quantization, compilation, etc.)
- Balance performance, cost, and reliability across diverse workloads
- AI Platform & Developer Experience
- Build APIs, SDKs, and workflows that enable developers to go from model → production seamlessly
- Define primitives for fine-tuning, evaluation, deployment, and observability
- Simplify complex infrastructure into intuitive, composable building blocks
- Multi-Cluster / Multi-Vendor Compute Orchestration
- Work on scheduling and workload placement across heterogeneous environments (GPU/CPU, multi-region, multi-cloud)
- Partner with engineering on resource allocation, queuing systems, and capacity-aware scheduling
- Observability, Evaluation & Cost Governance
- Define telemetry systems for model performance, latency, token usage, and failure modes
- Build evaluation workflows for LLM quality, safety, and regression detection
- Introduce cost controls and optimization strategies for large-scale inference and training
- Managed AI Services
- Package infrastructure into opinionated, production-ready services for enterprise customers
- Define SLAs, reliability models, and deployment patterns for mission-critical workloads
- Work closely with customers to understand real-world constraints and translate them into product capabilities
We’re looking for product leaders who can operate at depth across both
systems and product
, and who are excited about building the foundation for the next generation of AI applications.
You likely have:
- Experience building
AI/ML platforms, inference systems, or developer-facing infrastructure
- Strong understanding of
distributed systems, cloud infrastructure, and performance trade-offs
- Familiarity with modern AI stacks:
- LLMs, transformers, diffusion models
- Frameworks like PyTorch, TensorRT, ONNX, vLLM, Triton, etc.
AI is undergoing a platform shift. The gap between raw infrastructure and usable systems is still enormous. This role is about closing that gap: turning fragmented, complex infrastructure into a coherent platform that developers can rely on to build real products.
You’ll be working on problems like:
- How to make LLM inference predictable and cost-efficient at scale
- How to expose the right abstractions for agentic workflows
- How to manage heterogeneous compute without leaking complexity to users
- How to make AI systems observable, debuggable, and reliable
If you’re excited about building the systems that power the next generation of AI applications, apply now!
Similar roles
Product ManagerProvation · Seattle, Washington, United States · Onsite
Product ManagerProvation · Nashville, Tennessee, United States · Onsite- Product ManagerBear Robotics · Redwood City, California, United States · Onsite
- Product ManagerVeriiPro · San Francisco, California, United States · Onsite
Product ManagerPeak Scientific Instruments Ltd · Westford, United States · Onsite