Senior Machine Learning Engineer

San Francisco, California, United StatesOnsiteFull TimeSenior$200,000–$260,000 /yrPosted 2 months ago

Is this role right for you?

Upload your resume and get a skill-by-skill breakdown — see exactly where you match, where you're close, and what to highlight. Not a mystery percentage.

Get a tailored resume highlighting what this role needs.

Role summary

Together AI is seeking a Senior Machine Learning Engineer to lead the model serving layer for their Voice AI platform. This role focuses on optimizing inference infrastructure for voice workloads, including speech-to-text and text-to-speech models, with an emphasis on low latency and high reliability. The engineer will work hands-on with LLM serving engines, profile GPU utilization, design batching strategies, and productionize new model architectures. This is a foundational role on a small, early-stage team, requiring strong Python, PyTorch, and GPU optimization skills, with experience in ML model serving and production systems being essential. Familiarity with speech/audio ML is a plus.

### Who you are
- 5+ years of experience in ML engineering, with a focus on model serving, inference optimization, or ML infrastructure
- Hands-on experience with LLM serving engines (vLLM, SGLang, TensorRT-LLM, or similar) — comfortable reading and modifying engine internals, not just using APIs
- Strong proficiency in Python and PyTorch; experience with GPU profiling and optimization (CUDA, memory management, kernel-level debugging)
- Track record of shipping ML systems to production with measurable performance improvements
- Strong product sense — you think about what developers building voice apps actually need, not just what's technically interesting
- Comfort working on a small, early-stage team where you'll wear multiple hats and move fast
- Experience with speech and audio ML (ASR, TTS architectures, audio signal processing) is a strong plus but not required — you can learn this quickly if you have strong ML engineering fundamentals
- Familiarity with audio codecs and tokenization schemes (SNAC, Encodec, DAC) is a plus
- Experience training or fine-tuning speech models is a plus
- Bachelor's or Master's degree in Computer Science, Electrical Engineering, or related field, or equivalent practical experience

### What the job involves
- Together AI is building the best inference infrastructure for voice applications
- Our Voice AI platform powers production-grade, real-time voice agents and applications — serving speech-to-text and text-to-speech models with best-in-class latency and reliability
- We're looking for a Senior ML Engineer to drive the model serving layer for voice workloads
- You'll work hands-on with inference engines like TRT-LLM and SGLang to optimize how we serve models like Whisper, Parakeet, Orpheus, and Kokoro — pushing latency and throughput to the frontier
- You'll profile GPU utilization, design batching strategies for streaming audio, and ensure new model architectures can go from research to production quickly
- This is a foundational hire on a small, high-impact team
- Voice inference has unique challenges — streaming audio, tokenization, real-time latency budgets — that require dedicated ML engineering focus
- You'll shape how Together serves voice models as the industry moves from pipeline architectures (ASR → LLM → TTS) toward end-to-end speech-to-speech
- Own the model serving stack that powers Together's voice platform across STT, TTS, and speech-to-speech
- Work directly with state-of-the-art accelerators (H100s, H200s, B200s) to optimize voice model inference
- Collaborate with model partners (Cartesia, Deepgram, Rime, and others) to bring their models to production on Together's infrastructure
- Build quality evaluation frameworks that guide model selection for customers and inform the roadmap
- Join a small, early-stage team with outsized impact on a fast-growing product area
- Optimize inference performance for voice models (STT, TTS, speech-to-speech) — targeting best-in-class TTFB, throughput, and GPU utilization across our curated model set
- Productionize voice models on serverless and dedicated endpoints, including batching strategies, streaming inference, and memory management tailored to audio workloads
- Build and maintain a voice model evaluation framework — measuring WER across accents, languages, and noise conditions for STT; naturalness, latency, and pronunciation accuracy for TTS
- Enable new model architectures in our serving stack as the field evolves, including audio-native LLMs, codec-based models (SNAC), and speech-to-speech systems
- Collaborate with model partners to integrate and optimize their models (Cartesia, Deepgram, Rime, and others) running on Together's infrastructure
- Profile and debug performance across the full inference stack — from GPU kernels to framework-level bottlenecks — and ship measurable improvements
- Work with the platform engineering side of the team to ensure the serving layer meets the latency and reliability requirements of real-time voice APIs
- Contribute to voice model fine-tuning capabilities (STT and TTS) as we enable customers to build differentiated voice experiences on Together
- Lay the groundwork for multiple new products down the line

### Benefits
- Competitive health insurance plans
- Dental and vision insurance
- Pre-tax flexible spending accounts
- Mental health support and services
- Income protection & retirement
- 401(k) plan
- AD&D insurance
- Life insurance
- STD & LTD insurance
- Monthly team lunches
- Flexible time off policy
- Team-driven celebrations and events
- Monthly commuting stipend + pre-tax bene

Ready to apply?

You'll be redirected to Together AI's application page.

Is this role right for you?

Role summary

Similar roles