Staff/Principal Platform Engineer
Compensation estimateAI
See base, equity, bonus, and total comp estimates for this role — free, no credit card.
Sign up to see compensation estimate### Who you are
- 8-10 years of experience in software engineering
- 3+ years of experience with infrastructure-as-code
- Proficiency in managing Kubernetes clusters and applications, including creating Kustomize manifests/Helm charts for new applications
- Experience in creating and maintaining CI/CD pipelines for both applications and infrastructure deployments (using tools like Terraform/Terragrunt, ArgoCD, GitHub Actions, Ansible, etc.)
- Deep knowledge of at least one major cloud provider (Google Cloud Platform, Microsoft Azure, Oracle Cloud)
- Proficient in at least one backend programming/scripting languages such as Golang, Python, and Bash
- Candidates must be based in the SF Bay Area or willing to relocate (you will be working on-site in our South Bay office a few days a week)
### What the job involves
- Join our team as a Staff / Principal Platform Engineer and take end-to-end ownership of building, securing, and scaling our AI products
- You'll be the driving force behind our cloud infrastructure, partnering with engineers across the organization to deploy and evolve services across major cloud providers using Terraform, ArgoCD, and other tooling
- In this high-impact role, you'll identify what needs to be done and move it forward, directly shaping how we operate and innovate
- Work closely with engineers to design, deploy, and maintain reliable, high-performance, and secure cloud infrastructure for our TTS and LLM Router
- Drive engineering velocity by identifying and building AI-powered tooling and workflows that improve how our teams develop and deploy software
- Facilitate a "you build it, you run it" culture by providing the necessary tools and processes for monitoring the reliability, availability, and performance of services
- Manage pipelines to ensure smooth and efficient code integration and deployment
- Conduct root cause analysis to identify critical issues and develop automated solutions to prevent recurrence