We're in beta · Starting with US & Canada · Shipping weekly — your feedback shapes RiseMe
RemoteHunter logo
RemoteHunter Verified
Human Resources, Job Board, SaaS, Technology

Site Reliability Engineer II

United StatesHybridFull Time$95,000–$171,000 /yrPosted 1 month ago

Compensation estimateAI

See base, equity, bonus, and total comp estimates for this role — free, no credit card.

Sign up to see compensation estimate
  • About Our Client:

This organization operates in the cloud computing and AI infrastructure space. It addresses the challenge of enabling customers to run AI inference models and developers to create AI applications by designing, implementing, deploying, and operating AI platforms. The organization supports scalable, serverless inference workloads and integrates GPU infrastructure and Kubernetes to ensure reliable AI services at scale.

  • About the Opportunity:

The Site Reliability Engineer II will focus on automating, monitoring, and maintaining the reliability of AI inference workloads within the organization’s cloud platform. This role is essential for reducing operational toil, improving system stability, and supporting continuous deployment processes. The position contributes directly to ensuring the availability and performance of AI applications by collaborating with engineering teams and responding to production incidents.

  • Responsibilities:

• Build and maintain dashboards, alerts, and monitoring for inference workloads using the existing observability platform

• Develop automation and tooling in Python or Go to enhance system reliability and reduce manual work

• Create and improve runbooks for inference-specific operational procedures

• Support SLO tracking and reporting to identify trends and improvement areas

• Maintain CI/CD pipelines, deployment safety checks, and rollback processes

• Collaborate with product engineering teams to troubleshoot complex issues across the stack

• Participate in on-call rotations, respond to production incidents, and conduct blameless post-mortems

  • Requirements:

• 2+ years of Site Reliability Engineering experience and a Bachelor’s Degree or equivalent

• Proficiency in Python or Go with experience in automation scripting

• Experience in Linux systems administration and troubleshooting infrastructure issues

• Familiarity with Kubernetes and containerization concepts

• Experience with monitoring and observability tools such as Prometheus or Grafana

• Exposure to CI/CD pipelines and infrastructure-as-code tools like Terraform or SaltStack

• Willingness to learn and curiosity about AI infrastructure and distributed systems

  • Pay Range and Compensation Package:

• For US-based candidates, the base salary ranges from $95,000 to $171,000 per year, determined by factors including experience, skills, certifications, and location

• Compensation for candidates outside the US will vary

• Additional incentives may include annual bonuses, equity awards, and an Employee Stock Purchase Plan (ESPP)

  • Benefits & Perks:

• Healthcare coverage

• 401K savings plan

• Company holidays and paid time off (PTO)

• Sick leave

• Family-friendly benefits including parental leave

• Employee assistance program focused on mental and financial wellness

• Flexible work arrangements allowing choice of remote or office work within the advertised country

Equal Opportunity Statement: Our client is an equal opportunity employer. They celebrate diversity and are committed to creating an inclusive environment for all employees. All qualified applicants will receive consideration for employment without regard to race, color, religion, gender, gender identity or expression, sexual orientation, or national origin.

Note:

RemoteHunter is not the Employer of Record (EOR) for this role. Our purpose in this opportunity is to connect exceptional candidates with leading employers. We help job seekers worldwide discover roles that match their goals and guide them to complete their full application directly through the hiring company’s career page or ATS.

Ready to apply?
You'll be redirected to RemoteHunter's application page.

Similar roles