Filevine logo
Filevine Verified
Software Development

Senior Site Reliability Engineer (AWS)

Sugar House, United StatesOnsiteFull TimeSenior$160,000–$190,000 /yrPosted 26 days agoVisa sponsorship available

Is this role right for you?

Upload your resume and get a skill-by-skill breakdown — see exactly where you match, where you're close, and what to highlight. Not a mystery percentage.

Get a tailored resume highlighting what this role needs.

Role summary

The Senior Site Reliability Engineer (SRE) will be embedded within a cross-functional team, taking ownership of system reliability. This role involves designing, building, and maintaining autonomous systems for build, deployment, testing, and operations across all products. The SRE will be the authority on reliability throughout the SDLC, focusing on monitoring, alerting, incident response, and enhancing CI/CD pipelines. Key responsibilities include proactively identifying and resolving system gaps, documenting best practices, and mentoring junior engineers, while participating in a 24/7 on-call rotation. The position requires strong proficiency in AWS, Python, Bash, PowerShell, and Kubernetes, with a minimum of 4 years dedicated SRE experience.

  • As a Site Reliability Engineer you will be embedded with a cross functional team who has key responsibilities for certain portions of our systems
  • Over the course of the first year you will gain the valuable context needed to be truly effective and move at speed in the Filevine environment
  • During your successive years you will be given specific mission critical objectives that help build out and improve our autonomous systems and simultaneously build out your personal brand as an exceptional engineer who has built and maintained amazing systems that can grow to internet scale
  • Provide strong leadership, mentoring, and sound judgment as the Reliability Engineering lead on your team
  • Design and maintain autonomous systems for building, deploying, testing, and operating all Filevine products
  • Act as the authoritative voice of reliability across the full software development lifecycle (SDLC)
  • Monitor, aggregate, dashboard, and alert on software/infrastructure events to ensure visibility and fast response
  • Continuously enhance CI/CD pipelines, automation scripts, playbooks, and tools to streamline processes and reduce resolution time
  • Proactively identify and resolve gaps in system availability, performance, and security while defending overall security posture
  • Document processes, architecture, procedures, and best practices; research, adopt, or build reliable tools to boost engineer productivity
  • Collaborate within your team (or independently), mentor junior engineers, participate in 24/7 on-call rotation for production support and emergency response, and communicate clearly with technical and management stakeholders

### Benefits

  • Medical, Dental, & Vision Insurance
  • Competitive & Fair Pay
  • Short & long-term disability
  • Maternity & Paternity leave
  • Centrally located open office in Sugar House
  • Company swag that will make your friends envious
  • Opportunity to make an impact on Day 1- 8+ years of hands-on technical experience in software engineering, infrastructure, or operations roles, including a minimum of 4 years dedicated to Site Reliability Engineering (SRE)
  • Proficient hands-on experience with AWS (e.g., EC2, Kubernetes/EKS, CloudWatch, Lambda, S3, IAM)
  • Bachelor’s degree in Computer Science, Information Systems, or a related field; equivalent certifications (e.g., Google Cloud Professional certifications, AWS certifications); or substantial comparable direct work experience
  • Strong proficiency in Python, Bash, PowerShell, and other common SRE tooling and scripting technologies
  • Proven track record of independently driving reliability improvements, reducing toil through automation, and contributing to high-availability, scalable production systems in a fast-paced environment
  • Expert-level experience designing, building, and maintaining autonomous systems that handle software build, deployment, testing, monitoring, and operations with minimal human intervention
  • Demonstrated curiosity, self-motivation, continuous learning mindset, passion for improvement, and proactive enthusiasm to enhance systems and processes daily without needing direction
  • Proficiency in all core skills expected of an SRE II, including monitoring/alerting, incident response, capacity planning, performance optimization, CI/CD pipeline enhancement, and reliability engineering best practices
Ready to apply?
You'll be redirected to Filevine's application page.

Similar roles