Director ITSM, Service Delivery and SRE

Toronto, Ontario, CanadaOnsiteFull TimeDirectorPosted 2 months ago

Is this role right for you?

Upload your resume and get a skill-by-skill breakdown — see exactly where you match, where you're close, and what to highlight. Not a mystery percentage.

Get a tailored resume highlighting what this role needs.

Role summary

This build-and-transform leadership role is for a Director of ITSM, Service Delivery, and SRE. The individual will design and mature the IT Service Management framework from the ground up, shaping how technology services are delivered, stabilized, and improved across the enterprise. Responsibilities include overseeing the full lifecycle of IT service operations with a focus on stability, responsiveness, and continuous improvement, ensuring end-to-end ownership of incident resolution, application stabilization, and infrastructure reliability through strategic ITSM practices. The role also involves leading and incorporating SRE principles into the function, driving accountability, and shaping a high-performing service organization.

This is a build-and-transform leadership role - an opportunity to design and mature Our Client’s

IT Service Management framework from the ground up. The Director will shape how technology services are delivered, stabilized, and continuously improved across the enterprise, bringing structure, governance, and accountability to a fast-evolving environment. They will lead and incorporate SRE into this function. This is a chance to be at the forefront of modernizing IT operations, driving accountability, and shaping a high-performing service organization that makes a tangible, direct impact on employees and customers every day.

Responsible for overseeing the full lifecycle of IT service operations with a focus on

stability, responsiveness, and continuous improvement. This leader ensures end-to-end

ownership of incident resolution, application stabilization, and infrastructure reliability

through strategic ITSM practices.

Key Responsibilities:

Oversee the execution of production fixes, ensuring swift and accurate resolution of

operational issues.

Responsible for building and overseeing the IT Service Management (ITSM) lifecycle, this leader

owns escalated (Tier 2) service operations and ensures timely, effective resolution of complex

incidents and service issues for end-users. They partner closely with application and infrastructure teams to strengthen service stability, responsiveness, and continuous improvement through robust ITSM governance and disciplined process execution.

Provide leadership and governance for the production support lifecycle, from incident detection through verification and resolution.
Establish clear SLAs, response times, and escalation criteria to ensure that all production fixes are delivered with minimal business disruption.
Implement standardized post-fix validation and communication processes to confirm issue resolution and prevent recurrence.
Partner with application owners, infrastructure leads, and business stakeholders to prioritize production fixes based on customer impact and business criticality
Partner with Help Desk Leadership and team to ensure seamless handoff of tickets, consistent escalation workflows, and unified service reporting across tiers” to stress clear escalation and collaboration without formal Tier 1 ownership.
Own all resolution of production issues end to end.

Drive accountability for escalations to Subject Matter Experts (SMEs), ensuring efficient triage and root cause identification.

Define escalation protocols and ensure SMEs across applications, infrastructure, and network domains are engaged promptly and effectively.
Champion a “one-team” approach to problem-solving, breaking down silos between support tiers and functional areas to accelerate resolution.
Foster a culture of ownership and continuous learning by ensuring SMEs conduct comprehensive root cause analyses (RCAs) and implement lasting preventive measures.

Own and manage the backlog of application stabilization and infrastructure improvement

initiatives, driving long-term reliability and resilience.

Establish and maintain a strategic backlog focused on stabilizing production environments, reducing technical debt, and addressing chronic issues.
Partner with Product, Engineering, and Infrastructure teams to assess, prioritize, and execute stabilization work in alignment with business objectives and risk tolerance.
Define clear success metrics (e.g., reduction in incident recurrence, performance improvements, service availability) to measure progress against stabilization goals.
Ensure transparency and visibility into backlog prioritization decisions through governance forums and regular executive reporting.

IT Service Management (ITSM) Leadership
- Lead all aspects of ITSM functions, ensuring mature, efficient, and metrics-driven service management across the organization:

Major Incident Management (P1/P2):

Direct the end-to-end management of critical incidents, ensuring rapid restoration of service and clear executive communication.
Establish robust on-call, escalation, and incident response frameworks.

Root Cause Analysis (RCA) & Problem Management:

Ensure post-incident reviews and RCAs are completed, published, and result in actionable improvements.
Drive accountability for recurring issue elimination through a structured problem- management process.

Change & Release Management:

Oversee the governance and coordination of change/release activities to minimize production risk.
Establish release readiness criteria, communication protocols, and rollback planning processes.

ITSM Reporting & Metrics:

Develop and manage ITSM dashboards, KPIs, and SLA/OLA tracking for transparency and continuous improvement.
ITSM Maturity and Continuous Improvement.
Lead the design and execution of ITSM maturity improvement initiatives, progressing toward standardized, measured, and automated service management practices.
Benchmark ITSM processes against ITIL v4 standards and internal performance metrics to identify and prioritize improvement opportunities.

CMDB & Asset Management:

Oversee configuration and asset management to maintain accurate visibility of the IT landscape.
Ensure CMDB data integrity and integration with incident, change, and problem workflows.

SRE:

Make ITSM more modern, measurable, automated and business - relevant.

Requirements:

12+ years of experience in IT operations, service delivery, or ITSM leadership, with at least 5 years in a senior leadership capacity.
Proven experience leading major incident response, problem management, and change/release governance in complex enterprise environments.
Strong knowledge of ITIL or equivalent ITSM frameworks; ITIL v4 certification preferred.
Experience with SRE.
Demonstrated success in driving application stability, infrastructure reliability, and process automation.
Excellent leadership, stakeholder management, and communication skills, with the ability to influence across technical and business domains.
Strategic thinker with a hands-on operational mindset.
Data-driven decision-maker with strong analytical and reporting skills.
Bachelor’s degree in Information Technology, Computer Science, or related field (Master’s preferred).

Ready to apply?

You'll be redirected to People Machine's application page.