Site Reliability Engineer – AWS / Kubernetes
Compensation estimateAI
See base, equity, bonus, and total comp estimates for this role — free, no credit card.
Sign up to see compensation estimateSite Reliability Engineer – AWS / Kubernetes
We’re working with a SaaS company that’s investing heavily in reliability engineering as the platform continues to scale.
The product runs fully in AWS and supports a growing set of services running in Kubernetes. As usage increases, the focus is on improving system reliability, observability and operational maturity across production environments.
This role sits close to the live platform. The work centres around keeping production systems stable, improving monitoring and automation, and helping engineering teams run services safely at scale.
Responsibilities
• Improve reliability and performance across production services
• Operate and scale Kubernetes environments running live workloads
• Build monitoring, alerting and observability across the platform
• Improve incident response processes and operational tooling
• Automate operational workflows using Infrastructure as Code
• Work with engineering teams to improve resilience and system design
• Investigate and resolve production performance issues
Tech stack
- AWS
- Kubernetes
- Terraform
- Docker
- Monitoring and observability tooling such as Datadog, Prometheus or Grafana
- CI/CD tooling
Background
Engineers moving into this role typically come from Site Reliability Engineering, DevOps or Cloud Infrastructure backgrounds and have experience supporting production systems in AWS environments.
Experience running Kubernetes infrastructure and working with monitoring and observability tooling is important.