Senior Platform Engineer

San Francisco, California, United StatesOnsiteFull TimeSenior$200,000–$260,000 /yrPosted 2 months ago

Is this role right for you?

Upload your resume and get a skill-by-skill breakdown — see exactly where you match, where you're close, and what to highlight. Not a mystery percentage.

Get a tailored resume highlighting what this role needs.

Role summary

Together AI is seeking a Senior Platform Engineer to build and own the API and infrastructure layer for their Voice AI platform. This role involves designing and implementing real-time streaming APIs (WebSocket and HTTP) and autoscaling solutions for latency-sensitive voice workloads, supporting tens of thousands of GPUs. You will focus on developer experience, reliability, and multi-model normalization across various providers. The ideal candidate has 5+ years of experience with distributed systems, real-time streaming infrastructure, and expert-level proficiency in TypeScript and Python, with Kubernetes experience being crucial. Familiarity with audio/media protocols and ML model serving is a plus.

### Who you are
- 5+ years of experience building large-scale, real-time distributed systems and API services
- Deep expertise in real-time streaming infrastructure — WebSocket server architecture, Server-Sent Events, bidirectional streaming, connection multiplexing, and stateful protocol design
- Expert-level programming in TypeScript and Python; experience with Rust is a plus
- Strong distributed systems fundamentals: load balancing, autoscaling, rate limiting, and traffic shaping for latency-sensitive workloads
- Experience with Kubernetes — including custom autoscalers, resource management, and health checking for stateful services
- Strong product sense — you care about API ergonomics and think about what developers building voice apps actually need
- Comfort working on a small, early-stage team where you'll wear multiple hats and move fast
- Experience with audio or media protocols (WebRTC, g711, PCM encoding) is a strong plus
- Familiarity with ML model serving infrastructure and how inference engines work is a plus — you'll interface with the serving layer regularly
- Full-stack experience (React, Next.js) is a nice-to-have for contributing to developer-facing tooling
- Bachelor's or Master's degree in Computer Science, Computer Engineering, or related field, or equivalent practical experience

### What the job involves
- Together AI is building the best inference infrastructure for voice applications. Our Voice AI platform powers production-grade, real-time voice agents and applications — serving speech-to-text and text-to-speech models with best-in-class latency and reliability
- We're looking for a Senior Platform Engineer to own the API and infrastructure layer for voice workloads. You'll build the real-time WebSocket and HTTP APIs that developers use to ship voice experiences, design autoscaling for latency-sensitive streaming workloads, and ensure our multi-provider voice platform is reliable enough for production voice agents handling millions of calls
- This is a foundational hire on a small, high-impact team. Voice APIs have fundamentally different infrastructure requirements than text-based inference — bidirectional audio streaming, stateful connections, tight latency SLOs, and complex multi-model routing. You'll define how developers interact with Together's voice platform as we grow from early customers to the default infrastructure for voice AI
- Own the real-time API layer (WebSocket + HTTP streaming) that powers Together's voice platform
- Design autoscaling and orchestration for voice workloads running on tens of thousands of GPUs
- Build the developer experience — APIs, observability, and tooling — for a fast-growing product area
- Work with production voice customers (contact centers, AI agents, communication platforms) to ship what they actually need
- Join a small, early-stage team with outsized impact on a new product line
- Build and harden real-time WebSocket and HTTP streaming APIs for STT and TTS — including connection lifecycle management, backpressure, error handling, and reconnection, at the reliability bar needed for production voice agents
- Design and ship autoscaling for voice model endpoints that handles bursty, real-time traffic patterns — accounting for concurrent connection limits, streaming state, and hard latency ceilings
- Implement voice-specific API features: word-level alignment, speaker diarization in realtime, audio format flexibility (g711/mulaw for telephony, PCM, WebRTC formats), pronunciation controls, and multi-context WebSocket support
- Build voice-specific observability — latency breakdowns, audio quality signals, and dashboards that help both the team and customers debug issues
- Own multi-model normalization across our model partners (Cartesia, Deepgram, Rime, and others), ensuring consistent API behavior regardless of the underlying provider
- Collaborate with the ML engineering side of the team on the interface between the API layer and the model serving stack, ensuring latency and reliability requirements are met end-to-end
- Contribute to developer experience — API design, documentation, integration cookbooks, playground and showcasing how best-in-class voice agents are built
- Lay the groundwork for multiple new products down the line

### Benefits
- Competitive health insurance plans
- Dental and vision insurance
- Pre-tax flexible spending accounts
- Mental health support and services
- Income protection & retirement
- 401(k) plan
- AD&D insurance
- Life insurance
- STD & LTD insurance
- Monthly team lunches
- Flexible time off policy
- Team-driven celebrations and events
- Monthly commuting stipend + pre-tax bene

Ready to apply?

You'll be redirected to Together AI's application page.

Is this role right for you?

Role summary

Similar roles