Senior Site Reliability Engineer (GCP)

Sugar House, United StatesOnsiteFull TimeSenior$130,000–$180,000 /yrPosted 26 days agoVisa sponsorship available

Is this role right for you?

Upload your resume and get a skill-by-skill breakdown — see exactly where you match, where you're close, and what to highlight. Not a mystery percentage.

Get a tailored resume highlighting what this role needs.

Role summary

Filevine is seeking a Senior Site Reliability Engineer (SRE) to join their Reliability team, reporting to the Director of Reliability. This role focuses on engineering autonomous systems to ensure Filevine products are reliable, scalable, performant, cost-effective, and secure. The SRE will be embedded in a cross-functional team, driving reliability improvements, reducing toil through automation, and contributing to high-availability, scalable production systems. Key responsibilities include designing and maintaining autonomous systems for the full SDLC, monitoring, CI/CD enhancement, incident response, and mentoring junior engineers. The role requires strong proficiency in GCP, Python, Bash, and PowerShell, with preferred experience in AWS.

This role reports directly to the Director of Reliability
To achieve the dream of allowing professionals to focus on what they love Filevine products need key features. They need to be reliable, scalable, performant, cost effective, secure and they need to have a way to recover in the event of a disaster
The Reliability team is responsible for thinking through these problems and engineering solutions to them. We hire excellent engineers who apply software engineering to these problems to create autonomous systems that take care of these details for us
We use the principle of continuous improvement to make each iteration of these autonomous systems better than the last
The state of our autonomous systems are nascent with the foundational pieces either recently having been completed or currently under development
As a Site Reliability Engineer you will be embedded with a cross functional team who has key responsibilities for certain portions of our systems. Over the course of the first year you will gain the valuable context needed to be truly effective and move at speed in the Filevine environment
During your successive years you will be given specific mission critical objectives that help build out and improve our autonomous systems and simultaneously build out your personal brand as an exceptional engineer who has built and maintained amazing systems that can grow to internet scale
Provide strong leadership, mentoring, and sound judgment as the Reliability Engineering lead on your team
Design and maintain autonomous systems for building, deploying, testing, and operating all Filevine products
Act as the authoritative voice of reliability across the full software development lifecycle (SDLC)
Monitor, aggregate, dashboard, and alert on software/infrastructure events to ensure visibility and fast response
Continuously enhance CI/CD pipelines, automation scripts, playbooks, and tools to streamline processes and reduce resolution time
Proactively identify and resolve gaps in system availability, performance, and security while defending overall security posture
Document processes, architecture, procedures, and best practices; research, adopt, or build reliable tools to boost engineer productivity
Collaborate within your team (or independently), mentor junior engineers, participate in 24/7 on-call rotation for production support and emergency response, and communicate clearly with technical and management stakeholders

### Benefits

Medical, Dental, & Vision Insurance
Competitive & Fair Pay
Short & long-term disability
Maternity & Paternity leave
Centrally located open office in Sugar House
Company swag that will make your friends envious
Opportunity to make an impact on Day 1- 8+ years of hands-on technical experience in software engineering, infrastructure, or operations roles, including a minimum of 4 years dedicated to Site Reliability Engineering (SRE)
Proficiency in all core skills expected of an SRE II, including monitoring/alerting, incident response, capacity planning, performance optimization, CI/CD pipeline enhancement, and reliability engineering best practices
Bachelor’s degree in Computer Science, Information Systems, or a related field; equivalent certifications (e.g., Google Cloud Professional certifications, AWS certifications); or substantial comparable direct work experience
Strong proficiency in Python, Bash, PowerShell, and other common SRE tooling and scripting technologies
Proven track record of independently driving reliability improvements, reducing toil through automation, and contributing to high-availability, scalable production systems in a fast-paced environment
Proficient hands-on experience with Google Cloud Platform (GCP) (e.g., Compute Engine, Kubernetes Engine/GKE, Cloud Monitoring, Cloud Logging, Pub/Sub, Cloud Functions, IAM) and preferred experience with AWS (e.g., EC2, EKS, CloudWatch, Lambda, S3, IAM)
Expert-level experience designing, building, and maintaining autonomous systems that handle software build, deployment, testing, monitoring, and operations with minimal human intervention
Demonstrated curiosity, self-motivation, continuous learning mindset, passion for improvement, and proactive enthusiasm to enhance systems and processes daily without needing direction
M.S. in computer science, information systems, a related field; comparable certifications; or equivalent direct work experience
Experience developing, deploying, and maintaining internet scale applications
Experience incorporating Artificial Intelligence or Machine Learning into internet scale applications

Ready to apply?

You'll be redirected to Filevine's application page.

Similar roles

Site Reliability Engineer (GCP)
Stacktics · Ontario, Canada · Remote