Site Reliability Engineer (SRE)
Compensation estimateAI
See base, equity, bonus, and total comp estimates for this role — free, no credit card.
Sign up to see compensation estimateAbout Your Role
As a Site Reliability Engineer, you will ensure the reliability, availability, and performance of
mission-critical platforms by building scalable systems, robust automation, and data-driven
operations.
You will partner closely with development, cloud, infrastructure, and security teams to deliver
resilient, high-performing services that support the way people live and work today.
What You’ll Do
● Design and implement solutions that enhance application reliability, performance,
scalability, and resilience.
● Build and maintain monitoring, alerting, observability, and telemetry to drive
proactive detection and rapid incident response.
● Lead incident management efforts, perform root cause analysis, and implement action-
oriented post-mortem improvements.
● Automate operational workflows using scripting, IaC, and configuration management
tools.
● Analyze capacity, performance, and usage trends to forecast demand and optimize
cloud costs.
● Collaborate with engineering teams to embed operability, resilience, and security into
application and architecture designs.
● Support safe, reliable deployments through CI/CD pipelines, release governance, and
change control.
● Maintain clear runbooks, architecture diagrams, and operational documentation
that enable efficient production support.
Experience Required:
● Managing Kubernetes and containerized workloads (EKS, AKS, GKE), including
scaling, networking, upgrades, and orchestration.
● Experience in public cloud platforms (AWS, Azure, or GCP) across compute, storage,
networking, IAM, and cost governance.
● Using observability and APM tools such as Dynatrace, Splunk, Prometheus, Grafana,
Datadog, ExtraHop, etc.
● Implementing security and compliance controls in regulated environments (e.g., PCI
DSS, SOC 2), including secrets management and vulnerability remediation.
● Infrastructure as Code experience using Terraform, CloudFormation, Ansible, or
similar tools.
● Designing and maintaining CI/CD pipelines using Jenkins, GitLab CI, GitHub Actions, or
Azure DevOps.
● Scripting and automation using Bash, PowerShell, or Python.
● Equivalent combination of education, experience, and/or military background
● Key point is the experience on projects with high volume transactions and taking care of
Zero data loss is a must which primarily in banking and payment projects . please avoid
experience with Insurance project background
Good to Have
● Certifications such as AWS SysOps Administrator, AWS DevOps Engineer, Google
Cloud DevOps Engineer, or CKA.
● Experience with Premier applications, IBM iSeries, and/or Unisys systems.
● Hands-on database operations and performance tuning (Oracle, SQL Server,
PostgreSQL).
● Proven experience in major incident command, stakeholder communication, and
cross-team coordination.
● Experience with ITIL and ServiceNow (change, problem, and configuration
management).
Similar roles
Site Reliability Engineer (SRE)Mithril · Palo Alto, California, United States · Hybrid
Senior Site Reliability Engineer (SRE)hackajob · Atlanta, Georgia, United States · Remote- Senior Site Reliability Engineer (SRE)PrizePicks · Georgia, United States · Remote
- Site Reliability Engineer (SRE)Finsmart Solution Pte Ltd · New York, United States · Onsite
- Site Reliability Engineer (SRE)Samsung Electronics · British Columbia, Canada · Onsite