Site Reliability Engineer
Role summary
We are seeking a Site Reliability Engineer (SRE) to ensure the stability, performance, and security of our production platforms. The role involves driving automation, implementing observability solutions, and collaborating with development teams to enhance system reliability. You will also identify technical improvements, modernize legacy systems, and contribute to technical planning and best practices. The ideal candidate has a Bachelor's degree in Computer Science or a related field with 5 years of experience, strong SRE practices, hands-on experience with Docker, Kubernetes, Terraform, Ansible, and observability tools like Splunk or Datadog, and a background in Java development or distributed systems administration. An AWS Certification is required.
🚀
We're Hiring: Site Reliability Engineer (SRE)
Are you passionate about building resilient systems and driving automation at scale? We're looking for a
Site Reliability Engineer
to help ensure the stability, performance, and security of our critical production platforms.
💡 What You'll Do
- Ensure the
stability and resilience
of mission-critical production systems
- Drive
end-to-end automation
(infrastructure as code, CI/CD pipelines, automated testing)
- Design and implement
observability solutions
(logs, metrics, alerts) to meet service level objectives
- Collaborate with development teams to improve
reliability, performance, and security
from design to operations
- Identify and prioritize
technical improvements
, modernizing legacy systems with scalable, future-ready solutions
- Contribute to
technical planning
and help define standards, tools, and best practices across teams
✅ What You Bring
- Bachelor's in Computer Science, Software Engineering (or related) + 5 years' experience
*(or Master's + 4 years, or equivalent experience)*
- Strong expertise in
SRE practices
, automation, CI/CD, and infrastructure as code
- Hands-on experience with
Docker, Kubernetes, Git, Terraform, and Ansible
- Experience with observability tools like
Splunk, Datadog, or SonarQube
- Background in
Java development
or
distributed systems administration
- AWS Certification (required)
🌟 Why Join Us?
- Work on
high-impact, large-scale systems
- Be part of a
collaborative and forward-thinking team
- Shape the future of our
platform reliability and performance
👉 Ready to make an impact? Apply now or reach out to learn more!
Similar roles
- Senior Site Reliability EngineerParallel Domain · Madrid, Comunidad de Madrid, Spain · Remote
- Site Reliability EngineerPacer Group · Montreal, Quebec, Canada · Hybrid
- Senior Site Reliability EngineerBlock Inc · New York, New York, United States · Remote
- Senior Site Reliability EngineerBlock Inc · Bay, California, United States · Remote
- Senior Site Reliability EngineerUplink · United States · Hybrid