Site Reliability Engineer

Terryville, Connecticut, United StatesOnsiteFull Time₹10,000,000–₹30,000,000 /yrPosted 2 months agoVisa sponsorship available

Is this role right for you?

Upload your resume and get a skill-by-skill breakdown — see exactly where you match, where you're close, and what to highlight. Not a mystery percentage.

Get a tailored resume highlighting what this role needs.

Role summary

A Weekday client is seeking an experienced Site Reliability Engineer (SRE) in Poland for a full-time, onsite role. The position involves designing, building, and maintaining scalable, resilient, and high-performance infrastructure systems, focusing on reliability, availability, and efficiency through software engineering principles. Responsibilities include IaC, observability, automation, incident management, and CI/CD pipelines. The ideal candidate has 9+ years of experience in SRE/DevOps, expertise in cloud platforms (AWS, GCP, Azure), IaC tools (Terraform, CloudFormation), monitoring tools (Prometheus, Grafana, ELK), scripting (Python, Go, Bash), and a strong understanding of distributed systems, networking, and security. The role offers the chance to influence reliability practices and infrastructure strategy for complex, mission-critical applications.

This role is for one of the Weekday's clients

Salary range: Rs 10000000 - Rs 30000000 (ie INR 100-300 LPA)

Min Experience: 9 years

Location: poland

JobType: full-time

We are seeking a highly experienced Site Reliability Engineer (SRE) to design, build, and maintain scalable, resilient, and high-performance infrastructure systems. This role focuses on ensuring system reliability, availability, and efficiency by applying software engineering principles to infrastructure and operations. You will work at the intersection of development and operations, driving automation, improving system observability, and minimizing downtime across critical services. The ideal candidate brings deep expertise in infrastructure management, a strong SRE mindset, and a passion for building fault-tolerant systems that support large-scale, mission-critical applications. This position offers the opportunity to work on complex distributed systems while shaping reliability practices and infrastructure strategy at scale.

### Requirements

### Key Responsibilities

Design, implement, and maintain highly available, scalable, and secure infrastructure systems
Apply SRE principles to improve system reliability, performance, and operational efficiency
Develop and manage infrastructure using Infrastructure as Code (IaC) tools
Build and maintain monitoring, alerting, and observability systems to ensure proactive issue detection
Define and manage SLAs, SLOs, and SLIs to measure and improve service reliability
Automate operational tasks, deployments, and incident response processes
Lead incident management, root cause analysis, and postmortem processes to prevent recurrence
Optimize system performance, capacity planning, and cost efficiency across infrastructure environments
Collaborate with engineering teams to improve system design, scalability, and resilience
Implement robust CI/CD pipelines to ensure smooth and reliable software delivery
Ensure security, compliance, and best practices across infrastructure and operations
Continuously evaluate and adopt new tools, technologies, and practices to enhance reliability

### What Makes You a Great Fit

9+ years of experience in Site Reliability Engineering, DevOps, or infrastructure engineering roles
Strong expertise in designing and managing large-scale distributed systems and cloud infrastructure
Hands-on experience with Infrastructure as Code tools such as Terraform, CloudFormation, or similar
Deep understanding of monitoring, logging, and observability tools (e.g., Prometheus, Grafana, ELK stack)
Experience with CI/CD pipelines, automation tools, and modern deployment practices
Strong knowledge of cloud platforms such as AWS, GCP, or Azure
Proficiency in scripting or programming languages such as Python, Go, or Bash
Solid understanding of networking, security, and system architecture principles
Experience in defining and managing SLAs, SLOs, and incident response processes
Strong analytical and problem-solving skills with a focus on reliability and performance
Excellent collaboration and communication skills to work across cross-functional teams
Ability to operate in high-pressure environments and manage critical production systems
Proactive mindset with a focus on continuous improvement and automation

Ready to apply?

You'll be redirected to Weekday's application page.

Is this role right for you?

Role summary

Similar roles