Site Reliability Engineer

Dallas, Texas, United StatesOnsiteFull Time$150,000–$170,000 /yrPosted 2 months ago

Is this role right for you?

Upload your resume and get a skill-by-skill breakdown — see exactly where you match, where you're close, and what to highlight. Not a mystery percentage.

Get a tailored resume highlighting what this role needs.

Role summary

We are seeking a Site Reliability Engineer to ensure the high availability, performance, and reliability of a cutting-edge cloud-native platform. In this role, you will collaborate closely with engineering and infrastructure teams to build automation, enhance incident response, and bolster system resilience using modern SRE practices. Key responsibilities include managing Kubernetes clusters from scratch, supporting large-scale distributed systems, hands-on cloud infrastructure management (AWS), and utilizing monitoring tools like Splunk and Datadog. Proficiency in programming languages such as Java, Python, Bash, or Go, along with experience in container orchestration and CI/CD tools, is essential.

We’re looking for a Site Reliability Engineer to support the availability, performance, and reliability of a next‑generation cloud‑native platform. You’ll collaborate across engineering and infrastructure teams, build automation to reduce toil, improve incident response, and strengthen system resilience through monitoring, metrics, and modern SRE practices.

What You’ll Do

• Partner with development, operations, and infrastructure teams to ensure service availability

• Build automation to improve incident response and prevent recurring issues

• Create and enhance runbooks for outages and service degradations

• Assess production readiness and reliability of new and existing services

• Define and track operational metrics for performance, scalability, and availability

• Architect and maintain shared tools that improve reliability across teams

• Contribute to continuous improvement through research, retrospectives, and code reviews

• Influence timelines, expectations, and technical direction within the team

• Mentor junior engineers and help shape sprint planning

Required Qualifications:

• Expert in Building Kubernetes Clusters from scratch

• Experience supporting and troubleshooting large‑scale distributed systems

• Strong documentation, communication, and analytical problem‑solving skills

• Comfortable working in fast‑paced, rapidly changing environments

Technical Skills:

• Hands‑on experience managing cloud infrastructure (AWS)

• Analysis using tools like Splunk, AppDynamics, Datadog, Prometheus, Grafana

• Programming/scripting in Java, Python, Bash, or Go

• Experience with distributed messaging (Kafka, RabbitMQ, ActiveMQ)

• Container orchestration (Kubernetes, Docker, Rancher)

• CI/CD tools such as Jenkins, Travis, and Harness

Benefits:

• 15% Bonus

• 20+ days PTO

• Health, Vison, Dental

• 6% match 401k

• Technology Stipend

• Tuition/Training reimbursement program

Ready to apply?

You'll be redirected to Qorali's application page.

Is this role right for you?

Role summary

Similar roles