Senior Site Reliability Engineer
Compensation estimateAI
See base, equity, bonus, and total comp estimates for this role — free, no credit card.
Sign up to see compensation estimate### Who you are
- A BA/BS degree in a technical discipline (or equivalent practical experience)
- 4+ years of experience deploying and managing infrastructure-as-code and configuration management tools such as Puppet, Ansible, or Terraform
- Proficiency with Git and modern version control practices, especially as they relate to continuous integration and delivery workflows
- Strong understanding of core Internet protocols and networking concepts, including TCP, DNS, and HTTP
- In-depth experience designing, deploying, and maintaining Kubernetes infrastructure in distributed, high-availability production environments
- Hands-on experience with both cloud-based and on-premises infrastructure, with a solid understanding of hybrid environments
- Familiarity with hypervisors and virtualization platforms such as KVM and VMware
- Experience architecting, implementing, and scaling CI/CD pipelines using tools such as Argo CD, GitHub Actions, and Jenkins, with a focus on reliability, security, and developer experience
- Working knowledge of managing artifact repositories like AWS ECR, Nexus, or Yum
- Demonstrated ability to lead cross-functional technical initiatives and drive alignment on infrastructure and reliability best practices
- Strong written and verbal communication skills, with a proven ability to collaborate effectively across globally distributed teams
- A continuous improvement mindset with a bias for action and accountability
### What the job involves
- The Site Reliability Engineering (SRE) team is a globally distributed group of approximately 25 engineers, dedicated to ensuring the reliability, scalability, and performance of our production systems. Spanning multiple time zones and regions, we operate as a cohesive unit that supports mission-critical infrastructure 24/7
- We sit at the intersection of software and systems engineering, partnering closely with application developers, infrastructure, security, and product teams
- In addition to owning core reliability initiatives, we support the broader engineering organization by building shared tooling, defining best practices, and providing guidance on operational excellence
- Our work focuses on areas such as observability, automation, platform reliability, capacity planning, and incident response, with a strong emphasis on reducing toil and improving system resilience at scale
- We embrace and promote modern DevOps principles to deliver robust, scalable platforms. As a Principal SRE, you’ll join a senior group of peers who value ownership, technical leadership, collaboration, and a commitment to engineering excellence
- Be a key member of the Technical Infrastructure, Engineering, and Operations (TIEO) organization
- Provide senior-level technical leadership, guidance, and mentorship across the broader Engineering organization
- Champion automation and standardization efforts to accelerate system deployment and reduce operational overhead
- Lead architectural reviews and contribute to the design of scalable, resilient, and observable systems
- Ensure the operational integrity, reliability, and performance of our global production infrastructure
- Design and implement scalable solutions to manage and monitor thousands of hosts and services efficiently
- Build and operate multi-tenant Kubernetes environments, with a focus on reliability, security, and performance
- Collaborate with platform teams that manage data infrastructure such as Hadoop and Kafka to ensure seamless integration and operational reliability
- Participate in a shared 24/7 on-call rotation to support system availability and incident response
### Benefits
- Equity and Employee Stock Purchase Plan
- Pension and Retirement Savings Plan in Several Countries
- Comprehensive Healthcare Benefits for You and Your Family
- Generous Time Off, Holiday Breaks and Summer Fridays
- Family-Focused Leave Benefits
- Cell Phone Subsidy
- Performance Management, and Investment in Diversity Initiatives
- Bonusly Peer-to-Peer Recognition Program, Turning Recognition into Tangible Perks and Magnite Swag
- Community Service Events
- Wellness Coach—Meditate and Recharge with an Unlimited User Account for You and a Plus One
Similar roles
- Site Reliability EngineerPacer Group · Montreal, Quebec, Canada · Hybrid
Senior Site Reliability EngineerBasis Theory · United States · Remote- Senior Site Reliability EngineerBlock Inc · New York, New York, United States · Remote
- Senior Site Reliability EngineerBlock Inc · Bay, California, United States · Remote
- Senior Site Reliability EngineerUplink · United States · Hybrid