Site Reliability Engineer
Compensation estimateAI
See base, equity, bonus, and total comp estimates for this role — free, no credit card.
Sign up to see compensation estimateJob Title: Site Reliability Engineer (SRE)
Role Overview:
We are seeking a skilled
Site Reliability Engineer (SRE)
responsible for ensuring the reliability, scalability, and performance of production systems. The ideal candidate will work closely with development and operations teams to automate processes, monitor system health, and quickly resolve production issues while continuously improving system reliability.
Key Responsibilities:
Operations & Incident Management
- Monitor production systems and applications to ensure reliability and performance.
- Respond to emergency incidents and perform root cause analysis.
- Manage system changes through established change management processes.
- Support IT infrastructure operations and ensure system stability.
- Implement automation tools to streamline operational tasks and improve efficiency.
System Support & Collaboration
- Work closely with development teams to support the deployment of new features.
- Assist in stabilizing production environments and resolving escalated issues.
- Develop and maintain SRE processes for the engineering team.
- Provide documentation and procedures for customer support teams to help resolve technical issues.
Process Improvement
- Conduct post-incident reviews and implement improvements to prevent recurring issues.
- Maintain a knowledge base documenting system problems, resolutions, and best practices.
- Continuously improve the software development lifecycle and operational processes.
Required Skills & Technologies
- Cloud Platforms:
GCP and AWS
- Infrastructure as Code:
Terraform
- Version Control:
GitHub
- Scripting/Programming:
Python
- Project & Documentation Tools:
JIRA and Confluence
- Experience with automation and monitoring tools
- Strong troubleshooting and problem-solving skills
Preferred Background
- Experience as a
System Administrator, DevOps Engineer, or Operations Engineer
- Strong understanding of
production systems, infrastructure management, and automation
Similar roles
- Site Reliability EngineerPacer Group · Montreal, Quebec, Canada · Hybrid
Senior Site Reliability EngineerBasis Theory · United States · Remote- Senior Site Reliability EngineerBlock Inc · New York, New York, United States · Remote
- Senior Site Reliability EngineerBlock Inc · Bay, California, United States · Remote
- Senior Site Reliability EngineerUplink · United States · Hybrid