Site Reliability Engineer (SRE) - Remote

United StatesRemoteContractPosted 2 months agoVisa sponsorship available

See base, equity, bonus, and total comp estimates for this role — free, no credit card.

Hi,

We are looking for
Site Reliability Engineer (SRE) - Remote

Site Reliability Engineer (SRE) to ensure the reliability, scalability, security, and performance of our production systems and services.

Reliability & Performance:

Ensure high availability, scalability, and performance of production systems.

Implement and maintain SLIs, SLOs, and SLAs for critical services.

Conduct capacity planning and performance tuning.

Automation & Tooling

Automate infrastructure provisioning using IaC tools such as Terraform and Terragrunt , ansible

Develop automation to minimize manual operations and improve deployment workflows.

Build CI/CD pipelines to support rapid and reliable deployments.

Monitoring & Incident Response

Design and maintain monitoring, logging, and alerting systems (Datadog).

Participate in on-call rotations and lead incident response efforts.

Perform root-cause analysis and develop postmortems to prevent recurring issues.

Systems Engineering

Manage cloud infrastructure (AWS, Azure) and container orchestration platforms (Kubernetes, ECS).

Optimize system architecture for reliability and fault tolerance.

Implement best practices for security, networking, and service resilience.

Collaboration & Leadership

Work closely with development teams to design reliable microservices and distributed systems.

Advocate for SRE principles and drive operational excellence across engineering teams.

Mentor engineers on reliability practices, tooling, and automation strategies.

Bachelor’s degree in Computer Science, Engineering, or equivalent experience.

3–7 years of experience in SRE, DevOps, or Systems Engineering roles.

Strong proficiency with Linux systems and shell scripting.

Experience with cloud platforms (AWS, Azure).

Hands-on experience with Kubernetes/ECS and container technologies (Docker).

Proficiency in at least one programming language: Python or Java

Experience with CI/CD pipelines and DevOps tooling.

Strong understanding of distributed systems, networking, and security fundamentals.

Preferred Qualifications

Experience with observability stacks (OpenTelemetry).

Knowledge of database management (PostgreSQL).

Experience with configuration management tools (Ansible, Chef, Puppet).

Familiarity with zero-downtime deployments and chaos engineering practices.

If you are interested please share me your resume to my email nazeera@radixlink.com

Ready to apply?

You'll be redirected to Radixlink's application page.