We're in beta · Starting with US & Canada · Shipping weekly — your feedback shapes RiseMe
Devopie Inc. logo
Devopie Inc. Verified
IT Services, DevOps Consulting, Cloud Services

Senior Site Reliability Engineer

Hamilton, Ontario, CanadaOnsiteFull TimeSeniorPosted 1 month ago

Compensation estimateAI

See base, equity, bonus, and total comp estimates for this role — free, no credit card.

Sign up to see compensation estimate

💡 What You’ll Do

You’ll operate at the intersection of software engineering and systems engineering , building resilient systems that scale, self-heal, and empower developers to ship safely.

🔎 Reliability Engineering

- Define and manage
SLIs, SLOs, and error budgets
- Reduce MTTD, MTTA, and MTTR through structured incident response
- Conduct blameless postmortems and drive preventative improvements
- Champion reliability in architectural reviews and production readiness

📊 Observability & Monitoring

- Design actionable, symptom-based alerts (not noise)
- Build dashboards and tracing systems using tools like
CloudWatch, Prometheus, Grafana, New Relic, X-Ray, ADOT
- Implement synthetic monitoring to simulate real user journeys (URLs, clickpaths, APIs)
- Ensure full observability coverage across critical paths

☁️ Cloud & Infrastructure

- Operate and optimize
AWS environments (EC2, EKS/ECS, Lambda, VPC, RDS, IAM, S3, ALB/NLB, CloudTrail)
- Build resilient, multi-AZ and regionally replicated systems
- Implement autoscaling and fault-tolerant architecture
- Leverage Infrastructure as Code (Terraform, CDK, CloudFormation)

🤖 Automation & Toil Reduction

  • Eliminate manual processes through automation
  • Build self-healing infrastructure
  • Improve CI/CD pipelines with safe deployment strategies (canary releases, feature flags)
  • Write production-quality code (not just scripts) in Python, Go, Ruby, Bash, or Java

📈 Performance & Capacity Planning

  • Analyze system metrics and traffic patterns
  • Conduct load testing, chaos testing, and capacity modeling
  • Identify bottlenecks and proactively optimize systems

🤝 Cross-Functional Collaboration

You’ll work closely with:

  • Engineering & Platform teams on scalable system design
  • Security teams on IAM, KMS, GuardDuty, secrets management
  • Product leaders to align reliability with roadmap priorities
  • Cloud vendors and SaaS providers during critical incidents

🧠 What You Bring

Must-Have Experience

  • Bachelor’s degree in Computer Science, Software Engineering, or related field
  • Strong Linux/Unix systems knowledge
  • Deep AWS experience
  • Hands-on Kubernetes (EKS/ECS), Docker, and container orchestration
  • Infrastructure as Code (Terraform, CDK, CloudFormation)
  • Production on-call and incident management experience
  • Strong understanding of MTTx metrics (MTTD, MTTR, MTBF, etc.)
  • Experience with MongoDB, PostgreSQL, Redis, RabbitMQ
  • Experience with observability and monitoring platforms
  • CI/CD pipeline experience (GitHub, Kubernetes, etc.)

Nice-to-Have

  • Performance engineering and chaos testing
  • Experience in fintech or regulated environments
  • Knowledge of distributed storage systems (NFS, HDFS, Ceph, S3)
  • Familiarity with dynamic resource frameworks (Kubernetes, Mesos, Yarn)
Ready to apply?
You'll be redirected to Devopie Inc.'s application page.

Similar roles