SRE/ Observablity Engineer-7
Role summary
We are seeking a Mid-Level Observability Engineer to join our team in Toronto. This onsite, full-time role focuses on implementing, operating, and improving observability capabilities across applications and platforms. You will be responsible for hands-on onboarding, instrumentation, dashboarding, and alerting, collaborating with SRE, application, and operations teams to ensure systems are observable and production-ready. Key responsibilities include implementing and maintaining metrics, logs, and traces, assisting with application onboarding into observability platforms like Dynatrace, ELK, or Datadog, and configuring dashboards and alerts. You will also work with development teams on logging and tracing, validate observability requirements, and support incident response and root cause analysis using observability tools. Automation of onboarding tasks and adherence to best practices are also expected.
Toronto, Ontario M5V 3L9 Posted March 29th, 2026
Looking for more job opportunities? Click here!
Job Type: Full Time
Job Category: IT
Job Description
SRE/ Observablity Engineer
Toronto, ON - Onsite
Total Experience: 8-10 years
Required Skill Set:
- We are looking for a Mid-Level Observability Engineer to help implement, operate, and improve observability capabilities across our applications and platforms.
- This role focuses on hands-on onboarding, instrumentation, dashboarding, and alerting, working under established standards and guidance from senior engineers.
- You will collaborate with application, SRE, and operations teams to ensure systems are observable, supportable, and production-ready.
- Observability Implementation Implement and maintain metrics, logs, and traces for applications and infrastructure
- Assist with onboarding applications into observability platforms (e.g., Dynatrace, ELK, Datadog)
- Configure dashboards, alerts, and basic anomaly detection Application Support Instrumentation
- Work with development teams to enable structured logging, basic distributed tracing, and core metrics
- Validate observability requirements during Production Readiness Reviews (PRR)Troubleshoot missing or low-quality telemetry
- Monitoring Alerting Configure alerts based on golden signals (latency, errors, traffic, saturation)
- Help reduce alert noise by tuning thresholds and alert logic
- Support incident response by gathering logs, metrics, and traces Operations Reliability Support root cause analysis using observability tools
- Maintain dashboards and documentation used by on-call and support teams
- Participate in on-call rotations (as applicable) Automation Continuous Improvement Assist in automating observability onboarding and validation tasks
- Create and maintain reusable dashboards and alert templates
- Follow established observability standards and best practices Required Qualifications 24 years of experience in Observability, or SRE
- Working knowledge of metrics, logs, and basic tracing concepts
- Hands-on experience with at least one observability platform (Dynatrace, Elastic ELK, Datadog, New Relic, etc.)
- Basic understanding of SLIs SLOs and service health indicators
- Experience with cloud platforms or hybrid environments
- Ability to write scripts (Python, Bash, PowerShell) for automation and troubleshooting
- Preferred Qualifications Experience with Open Telemetry or APM agents
- Familiarity with Kubernetes or containerized workloads
- Experience working with incident management tools (PagerDuty, ServiceNow)Exposure to Dynatrace Kibana ELK or similar cloud-native monitoring
- Experience in regulated or enterprise environments
Required Skills
TECHNICAL PROJECT MANAGER