Senior Site Reliability Engineer
Role summary
Satsuma is seeking a Senior Site Reliability Engineer to manage its multi-cloud infrastructure (AWS, GCP, Azure) with a focus on reliability, scalability, and operations. The role involves building and maintaining CI/CD pipelines, observability stacks, and incident response workflows, defining SLOs/SLIs, and authoring IaC with Terraform. A key aspect is leveraging AI-assisted development for tooling and automation. The ideal candidate has 5-8 years of experience in SRE/DevOps, strong Kubernetes and observability tooling skills, and experience in high-growth SaaS environments. Familiarity with API gateways or commerce tech stacks is preferred.
About Satsuma
Satsuma is a commerce iPaaS that builds merchant-specific APIs, MCP Servers, and MCP Apps, enabling retailers to connect their full commerce stack once and deploy branded shopping experiences across every AI channel. We work with enterprise retailers and move fast. Our infra has to match.
The role
We're looking for a Senior SRE to own the reliability, scalability, and operational posture of Satsuma's multi-cloud infrastructure. You'll be the person who keeps things running, builds the systems that prevent fires, and makes on-call not terrible.
This is an infra-first role. But we're an AI-native company, and we expect you to use AI-assisted development (Claude Code) as a core part of your workflow — writing tooling, automating runbooks, building internal utilities.
What you'll do
- Own infrastructure across AWS, GCP, and Azure environments
- Build and maintain CI/CD pipelines, observability stacks, and incident response workflows
- Define and enforce SLOs/SLIs; lead postmortems
- Author and maintain IaC (Terraform preferred)
- Write internal tooling and automation using AI-assisted development workflows
- Partner closely with engineering on reliability reviews and architecture decisions
### Requirements
- 5-8 years in SRE, DevOps, or infrastructure engineering
- Hands-on experience across at least two major cloud providers
- Strong Kubernetes, Terraform, and observability tooling (Datadog, Grafana, or equivalent)
- Comfortable reading and editing code; able to ship scripts and internal tools
- Experience with AI-assisted development (Copilot, Cursor, Claude Code)
- On-call maturity -- you've owned incidents end-to-end and made systems better afterward
- Prior experience at a startup or high-growth SaaS company
- Familiarity with API gateway infrastructure or commerce tech stacks
- Hands-on experience with MCP or agentic AI infrastructure
### Benefits
- Unlimited PTO
- 401(K)
- Healthcare Stipend
- Gym stipend
Similar roles
- Senior Site Reliability EngineerParallel Domain · Madrid, Comunidad de Madrid, Spain · Remote
- Site Reliability EngineerPacer Group · Montreal, Quebec, Canada · Hybrid
- Senior Site Reliability EngineerBlock Inc · New York, New York, United States · Remote
- Senior Site Reliability EngineerBlock Inc · Bay, California, United States · Remote
- Senior Site Reliability EngineerUplink · United States · Hybrid