Senior Manager / Director, Site Reliability Engineer
Compensation estimateAI
See base, equity, bonus, and total comp estimates for this role — free, no credit card.
Sign up to see compensation estimateLocation: Remote
Compensation: $150,000-$220,000 (Base)
About the Role
We are seeking a Senior Manger / Director of Site Reliability Engineering to lead and scale our SRE function, ensuring the reliability, availability, performance, and efficiency of our critical systems. This role blends deep technical expertise with strategic leadership, partnering closely with Engineering, Product, Security, and Infrastructure teams to build resilient, scalable platforms that support business growth.
As a Senior Manager of SRE, you will define reliability standards, establish operational excellence, and foster a culture of automation, observability, and continuous improvement.
Key Responsibilities
Leadership & Strategy
Define and execute the SRE vision, strategy, and roadmap aligned with business objectives
Build, mentor, and lead a high-performing team of SRE managers and engineers
Establish best practices for reliability, incident management, change management, and capacity planning
- Serve as a senior technical leader and trusted advisor across the organization
Reliability & Operations
Own system reliability metrics, including SLIs, SLOs, and error budgets
Lead major incident response, post-incident reviews, and long-term remediation efforts
- Drive improvements in uptime, latency, scalability, and fault tolerance across
Architecture & Engineering Excellence
Influence system architecture to improve resilience, scalability, and operability
Champion automation, Infrastructure as Code, and self-service platforms
Oversee observability strategy (monitoring, logging, tracing, alerting)
- Ensure systems are designed for high availability, disaster recovery, and business continuity
Collaboration & Governance
Partner with Product, Platform, Security, and Compliance teams to meet operational and regulatory requirements
Define operational standards, runbooks, and on-call practices
- Communicate reliability risks, tradeoffs, and performance to executive leadership
Required Qualifications
8+ years of experience in Site Reliability Engineering, DevOps, or Production Engineering
3+ years in engineering leadership roles
Strong background in distributed systems, cloud platforms (AWS, GCP, Azure), and container orchestration (Kubernetes)
Hands-on experience with CI/CD, Infrastructure as Code (e.g., Terraform, CloudFormation), and automation
Proven experience defining and operating SLOs, SLIs, and error budgets
Excellent incident management and root cause analysis skills
- Strong communication skills with the ability to influence technical and non-technical stakeholders
Preferred Qualifications
Experience supporting large-scale, high-traffic, or mission-critical systems
Background in software engineering or systems engineering
Experience scaling SRE practices in a fast-growing organization
Familiarity with security, compliance, and regulatory requirements
- Bachelor's or Master's degree in Computer Science or a related field (or equivalent experience)
About Eltropy (www.eltropy.com)
Eltropy is on a mission to disrupt the way people access financial services. Eltropy enables financial institutions to digitally engage in a secure and compliant way. Using our world-class digital communications platform, community financial institutions can improve operations, engagement and productivity. CFIs (Community Banks and Credit Unions) use Eltropy to communicate with consumers via Text, Video, Secure Chat, co-browsing, screen sharing and chatbot technology — all integrated in a single platform bolstered by AI, skill-based routing and other contact center capabilities.
Eltropy Values:
- Customers are our North Star
- No Fear - Tell the truth
- Team of Owners
Eltropy is an equal opportunity employer. All applicants will be considered for employment without attention to race, color, religion, sex, sexual orientation, gender identity, national origin, veteran or disability status.