Flywire logo
Flywire Verified
FinTech, Payments, Software

Site Reliability Engineering Manager II

United StatesHybridFull TimeManager / Head$160,000–$200,000 /yrPosted 2 months agoVisa sponsorship available

Is this role right for you?

Upload your resume and get a skill-by-skill breakdown — see exactly where you match, where you're close, and what to highlight. Not a mystery percentage.

Get a tailored resume highlighting what this role needs.

Role summary

Flywire is seeking an experienced Manager II, Site Reliability Engineering to enhance reliability, automation, and performance within their cloud-based infrastructure. This role involves managing and developing SRE teams, driving production excellence, and embedding within engineering teams to advocate for best practices. The manager will coordinate daily activities, track team success, mentor team members, and drive initiatives. Responsibilities include debugging production issues, practicing incident response, identifying process and tool improvements, and contributing to talent acquisition. The role requires a strong understanding of SRE principles, software development, cloud infrastructure, CI/CD, and incident management, with a focus on enabling engineering teams to ship reliable and operable systems.

### Who you are
- 5 years of experience within the SRE space
- 2-5 years of leading or managing and developing SRE teams
- Comfortable with the idea of being or becoming a generalizing specialist as we are aiming to build a multidisciplinary and balanced team based on "t-shaped" individuals
- Experience with at least one programming language is required as software engineering is an important part of our work and we actively use and support many different platforms and languages
- Proficient with testing techniques such TDD or BDD will be highly valued
- Familiarity with the container ecosystem, cloud infrastructure, build systems and CI/CD tools is key for being successful at this role
- Comfortable taking ownership of complex systems challenges and help uncover opportunities for improvement
- Strong communication and collaboration skills, and most importantly, empathy as we enable, empower and encourage our fellow colleagues

### What the job involves
- We, at Flywire, are looking for an experienced Manager II, Site Reliability Engineering to join our team. In this role, you’ll help drive reliability, automation and performance within our cloud-based infrastructure
- At Flywire, the SRE team is responsible for the lifecycle of production systems. Our team is embedded within Software Engineering teams enabling and empowering them to achieve full speed on shipping reliable and operable systems
- They also work at a global scale driving initiatives to achieve production excellence
- Coordinate and support daily activities for SREs on the team and partner with their managers to determine approach for managing daily tasks
- Track success on the team based on established goals and objectives
- Work on issues of limited scope with the ability to find and execute solutions to routine problems
- Become embedded within an Engineering team helping them navigate production excellence and advocate for best practices
- Mentor team members and drive initiatives
- Drive a design for a feature while understanding system-wide and architectural concerns
- Understand the basic day-to-day tasks traits of a production environment and participate in on-call support
- Engage and collaborate with other disciplines within the design, deployment, operation and optimization of services
- Debug production issues across services and levels of the stack as well as practice incident response and blameless postmortems
- Identifies opportunities both in processes and tools to improve the overall productivity of the team
- Identify great talent and excite them to join our team
- Provide estimations, track progress and manage risk as well as team members' time
- Participate in an on-call shift along with other disciplines to respond to incidents
- Become involved in tech communities and add contributions to enhance them
- Lean into our business domain and needs as well as our company vision, mission and strategy to deliver on our short and long term goals
- Some Technologies We Use:
- Ruby, Java, Kotlin, Go, Node, Python
- AWS: EC2, ECS, Lambda, Cloudwatch, SQS, RDS, Kinesis, S3, ElasticSearch, DocumentDB
- Linux, Docker, Terraform, Make, Chef
- Gitlab, Jenkins
- Sentry, Sumologic, Honeycomb

Ready to apply?
You'll be redirected to Flywire's application page.