SRE-7
Role summary
We are seeking a Senior Site Reliability Engineer (SRE) with extensive experience in AWS infrastructure, automation, observability, and production support. The role involves ensuring cloud-native systems are resilient, scalable, and efficient by driving reliability through code. Key responsibilities include designing and maintaining AWS infrastructure, developing CI/CD pipelines and IaC using tools like Terraform and Harness, implementing monitoring and alerting with tools such as Dynatrace/Datadog, troubleshooting incidents, optimizing systems, and promoting SRE/DevOps culture. The position requires deep knowledge of AWS core services, Kubernetes, Linux, and scripting languages like Python or Go.
Chicago, Illinois 60018 Posted March 29th, 2026
Looking for more job opportunities? Click here!
Job Type: Full Time
Job Category: IT
Job Description
Role- SRE
Location- Chicago, IL Onsite
FTE
Visa- USC, GC
Exp-12+
Job Description
We are looking for a Senior Site Reliability Engineer (SRE) with deep experience in AWS infrastructure, automation, observability, and production support. As an SRE, you will ensure our cloud-native systems are resilient, scalable, and efficient, driving reliability through code, not just processes.
5+ years of experience in SRE, DevOps, or Cloud Engineering
Expertise in AWS core services (EC2, ECS/EKS, Lambda, S3, VPC, RDS, IAM, CloudFront, etc.)
Hands-on experience with Terraform, Ansible, or other IaC tools
Strong scripting/coding skills (Python, Go, Shell, etc.)
Experience with Kubernetes, containerization, and orchestration
Deep knowledge of Linux systems and networking
Experience with Service Meshes (e.g., Istio, App Mesh)
Familiarity with AWS Well-Architected Framework
Experience building self-healing systems and automated remediation
Background in security, compliance, or multi-account/multi-region AWS architectures
Roles & Responsibilities
Design, implement, and maintain scalable, secure, and highly available infrastructure on AWS
Develop and improve CI/CD pipelines, Infrastructure as Code (IaC) using Terraform, Harness
Own and implement monitoring, alerting, logging, and distributed tracing with tools like Dynatrace/ Datadog
Troubleshoot production incidents, conduct blameless postmortems, and improve incident response processesOptimize systems for cost, performance, and reliability
Drive chaos engineering and resilience testing
Collaborate with development teams to embed SRE practices like SLAs, SLOs, and error budgets
Mentor junior SREs and promote DevOps/SRE culture across the org
Required Skills
SRE ENGINEER (SITE RELIABILITY/RESILIENCY)