Senior Site Reliability Engineer

United StatesOnsiteFull TimeSeniorPosted 2 months ago

Is this role right for you?

Upload your resume and get a skill-by-skill breakdown — see exactly where you match, where you're close, and what to highlight. Not a mystery percentage.

Get a tailored resume highlighting what this role needs.

Role summary

Satsuma is seeking a Senior Site Reliability Engineer to manage its multi-cloud infrastructure (AWS, GCP, Azure) with a focus on reliability, scalability, and operations. The role involves building and maintaining CI/CD pipelines, observability stacks, and incident response workflows, defining SLOs/SLIs, and authoring IaC with Terraform. A key aspect is leveraging AI-assisted development for tooling and automation. The ideal candidate has 5-8 years of experience in SRE/DevOps, strong Kubernetes and observability tooling skills, and experience in high-growth SaaS environments. Familiarity with API gateways or commerce tech stacks is preferred.

About Satsuma

Satsuma is a commerce iPaaS that builds merchant-specific APIs, MCP Servers, and MCP Apps, enabling retailers to connect their full commerce stack once and deploy branded shopping experiences across every AI channel. We work with enterprise retailers and move fast. Our infra has to match.

The role

We're looking for a Senior SRE to own the reliability, scalability, and operational posture of Satsuma's multi-cloud infrastructure. You'll be the person who keeps things running, builds the systems that prevent fires, and makes on-call not terrible.

This is an infra-first role. But we're an AI-native company, and we expect you to use AI-assisted development (Claude Code) as a core part of your workflow — writing tooling, automating runbooks, building internal utilities.

What you'll do

Own infrastructure across AWS, GCP, and Azure environments
Build and maintain CI/CD pipelines, observability stacks, and incident response workflows
Define and enforce SLOs/SLIs; lead postmortems
Author and maintain IaC (Terraform preferred)
Write internal tooling and automation using AI-assisted development workflows
Partner closely with engineering on reliability reviews and architecture decisions

### Requirements

5-8 years in SRE, DevOps, or infrastructure engineering
Hands-on experience across at least two major cloud providers
Strong Kubernetes, Terraform, and observability tooling (Datadog, Grafana, or equivalent)
Comfortable reading and editing code; able to ship scripts and internal tools
Experience with AI-assisted development (Copilot, Cursor, Claude Code)
On-call maturity -- you've owned incidents end-to-end and made systems better afterward
Prior experience at a startup or high-growth SaaS company
Familiarity with API gateway infrastructure or commerce tech stacks
Hands-on experience with MCP or agentic AI infrastructure

### Benefits

Unlimited PTO
401(K)
Healthcare Stipend
Gym stipend

Ready to apply?

You'll be redirected to Satsuma's application page.

Is this role right for you?

Role summary

Similar roles