Site Reliability Engineer
Role summary
A Weekday client is seeking an experienced Site Reliability Engineer (SRE) in Poland for a full-time, onsite role. The position involves designing, building, and maintaining scalable, resilient, and high-performance infrastructure systems, focusing on reliability, availability, and efficiency through software engineering principles. Responsibilities include IaC, observability, automation, incident management, and CI/CD pipelines. The ideal candidate has 9+ years of experience in SRE/DevOps, expertise in cloud platforms (AWS, GCP, Azure), IaC tools (Terraform, CloudFormation), monitoring tools (Prometheus, Grafana, ELK), scripting (Python, Go, Bash), and a strong understanding of distributed systems, networking, and security. The role offers the chance to influence reliability practices and infrastructure strategy for complex, mission-critical applications.
This role is for one of the Weekday's clients
Salary range: Rs 10000000 - Rs 30000000 (ie INR 100-300 LPA)
Min Experience: 9 years
Location: poland
JobType: full-time
We are seeking a highly experienced Site Reliability Engineer (SRE) to design, build, and maintain scalable, resilient, and high-performance infrastructure systems. This role focuses on ensuring system reliability, availability, and efficiency by applying software engineering principles to infrastructure and operations. You will work at the intersection of development and operations, driving automation, improving system observability, and minimizing downtime across critical services. The ideal candidate brings deep expertise in infrastructure management, a strong SRE mindset, and a passion for building fault-tolerant systems that support large-scale, mission-critical applications. This position offers the opportunity to work on complex distributed systems while shaping reliability practices and infrastructure strategy at scale.
### Requirements
### Key Responsibilities
- Design, implement, and maintain highly available, scalable, and secure infrastructure systems
- Apply SRE principles to improve system reliability, performance, and operational efficiency
- Develop and manage infrastructure using Infrastructure as Code (IaC) tools
- Build and maintain monitoring, alerting, and observability systems to ensure proactive issue detection
- Define and manage SLAs, SLOs, and SLIs to measure and improve service reliability
- Automate operational tasks, deployments, and incident response processes
- Lead incident management, root cause analysis, and postmortem processes to prevent recurrence
- Optimize system performance, capacity planning, and cost efficiency across infrastructure environments
- Collaborate with engineering teams to improve system design, scalability, and resilience
- Implement robust CI/CD pipelines to ensure smooth and reliable software delivery
- Ensure security, compliance, and best practices across infrastructure and operations
- Continuously evaluate and adopt new tools, technologies, and practices to enhance reliability
### What Makes You a Great Fit
- 9+ years of experience in Site Reliability Engineering, DevOps, or infrastructure engineering roles
- Strong expertise in designing and managing large-scale distributed systems and cloud infrastructure
- Hands-on experience with Infrastructure as Code tools such as Terraform, CloudFormation, or similar
- Deep understanding of monitoring, logging, and observability tools (e.g., Prometheus, Grafana, ELK stack)
- Experience with CI/CD pipelines, automation tools, and modern deployment practices
- Strong knowledge of cloud platforms such as AWS, GCP, or Azure
- Proficiency in scripting or programming languages such as Python, Go, or Bash
- Solid understanding of networking, security, and system architecture principles
- Experience in defining and managing SLAs, SLOs, and incident response processes
- Strong analytical and problem-solving skills with a focus on reliability and performance
- Excellent collaboration and communication skills to work across cross-functional teams
- Ability to operate in high-pressure environments and manage critical production systems
- Proactive mindset with a focus on continuous improvement and automation
Similar roles
- Senior Site Reliability EngineerParallel Domain · Madrid, Comunidad de Madrid, Spain · Remote
- Site Reliability EngineerPacer Group · Montreal, Quebec, Canada · Hybrid
- Senior Site Reliability EngineerBlock Inc · New York, New York, United States · Remote
- Senior Site Reliability EngineerBlock Inc · Bay, California, United States · Remote
- Senior Site Reliability EngineerUplink · United States · Hybrid