Site Reliability Engineer (SRE) - Remote
Compensation estimateAI
See base, equity, bonus, and total comp estimates for this role — free, no credit card.
Sign up to see compensation estimateHi,
We are looking for
Site Reliability Engineer (SRE) - Remote
Site Reliability Engineer (SRE) to ensure the reliability, scalability, security, and performance of our production systems and services.
Reliability & Performance:
Ensure high availability, scalability, and performance of production systems.
Implement and maintain SLIs, SLOs, and SLAs for critical services.
Conduct capacity planning and performance tuning.
Automation & Tooling
Automate infrastructure provisioning using IaC tools such as Terraform and Terragrunt , ansible
Develop automation to minimize manual operations and improve deployment workflows.
Build CI/CD pipelines to support rapid and reliable deployments.
Monitoring & Incident Response
Design and maintain monitoring, logging, and alerting systems (Datadog).
Participate in on-call rotations and lead incident response efforts.
Perform root-cause analysis and develop postmortems to prevent recurring issues.
Systems Engineering
Manage cloud infrastructure (AWS, Azure) and container orchestration platforms (Kubernetes, ECS).
Optimize system architecture for reliability and fault tolerance.
Implement best practices for security, networking, and service resilience.
Collaboration & Leadership
Work closely with development teams to design reliable microservices and distributed systems.
Advocate for SRE principles and drive operational excellence across engineering teams.
Mentor engineers on reliability practices, tooling, and automation strategies.
Bachelor’s degree in Computer Science, Engineering, or equivalent experience.
3–7 years of experience in SRE, DevOps, or Systems Engineering roles.
Strong proficiency with Linux systems and shell scripting.
Experience with cloud platforms (AWS, Azure).
Hands-on experience with Kubernetes/ECS and container technologies (Docker).
Proficiency in at least one programming language: Python or Java
Experience with CI/CD pipelines and DevOps tooling.
Strong understanding of distributed systems, networking, and security fundamentals.
Preferred Qualifications
Experience with observability stacks (OpenTelemetry).
Knowledge of database management (PostgreSQL).
Experience with configuration management tools (Ansible, Chef, Puppet).
Familiarity with zero-downtime deployments and chaos engineering practices.
If you are interested please share me your resume to my email nazeera@radixlink.com