We're in alpha · Starting with US & Canada
MarkiTech.AI logo
MarkiTech.AI Verified
Artificial Intelligence, IT Services, Software Development, Management Consulting

Dev Ops AWS (or Senior Cloud / Platform Engineer)

Toronto, Ontario, CanadaHybridFull TimeSeniorPosted 1 day ago

Company Description

MarkiTech.AI, a Canadian-based company, specializes in developing innovative digital healthcare solutions, AI agents, and automation systems for healthcare and telecommunications. Over the past decade, the company has successfully delivered 50+ global projects and introduced a range of advanced platforms, such as CliniScripts, YourDoctors.Online, SenSights.AI, and others aimed at improving care delivery and enhancing user experiences. With a focus on intelligent, workflow-integrated systems, MarkiTech.AI is poised to shape the future of AI in healthcare and telecommunications through cutting-edge automation and digital transformation. By prioritizing smarter and more efficient decision-making, MarkiTech.AI strives to create positive change across industries.

Job title (MUST BE IN CANADA)

*Senior DevOps Engineer (alternate: Senior Cloud / Platform Engineer)*

*About the role*

*We are hiring a senior DevOps engineer to own and evolve our cloud platform on*

*AWS, grounded in infrastructure as code, secure multi-account patterns, and*

*reliable delivery. You will shape the DevOps roadmap (standards, tooling,*

*automation, and operational excellence), support application releases, and*

*provide production support for critical workloads.*

*Amazon EKS is central to how we run workloads—we need someone with deep,*

*production-grade EKS expertise who has built and owned Kubernetes on AWS*

*end-to-end, not only deployed apps to a cluster someone else runs.*

*You will also lead how we adopt AI for infrastructure and platform work—not as*

*a buzzword, but as a practical force multiplier: safe use of AI-assisted authoring*

*and review for IaC and automation, clearer runbooks and incident workflows, and*

*evaluation of tools and patterns that improve speed without weakening security,*

*compliance, or change control. This role suits someone who combines deep AWS*

*practice with leadership: you can define “how we build and run” while still being*

*hands-on in pipelines, clusters, and incidents.*

*What you will do*

*Roadmap & standards: Define and socialize DevOps priorities (security,*

*reliability, cost, velocity). Align teams on AWS Well-Architected*

*practices, tagging, guardrails, and repeatable patterns for networking,*

*identity, secrets, and data.*

*AI adoption for infra & platform: Drive a pragmatic AI strategy for the*

*team—e.g. standards for AI-assisted IaC and pipeline changes (review*

*gates, testing, drift detection), documentation and runbook quality,*

*incident summarization and triage workflows where appropriate, and*

*guardrails so AI tooling fits regulated or high-stakes environments. Stay*

*current on vendor and open-source options; pilot, measure, and roll out*

*what actually reduces toil.*

*Infrastructure as code: Design, review, and implement changes using*

*Terraform and Terragrunt, with clear module boundaries, environmentspecific*

*config, and safe promotion across dev → non-prod →*

*production.*

*EKS (critical): Build, operate, and own the Kubernetes platform on AWS*

*—cluster lifecycle (creation, upgrades, patching), node groups / capacity,*

*networking (CNI, service mesh or ingress as used), security (RBAC,*

*admission controls, pod security, secrets and IRSA), add-ons, and cost/ reliability tuning. Partner with app teams on standards for workloads, namespaces, and safe rollouts; be the escalation point for cluster-level incidents.*

*Broader AWS platform: Operate and improve adjacent services—e.g.*

*RDS/Aurora, DynamoDB, object storage and CDN, KMS, Secrets*

*Manager, SNS (alerting), Lambda, EventBridge, and CI/CD*

*(CodePipeline / CodeBuild, connections to source control)—plus IAM,*

*VPC, and multi-tenant or multi-namespace patterns where applicable.*

*Release engineering: Partner with development teams on release*

*processes, deployment strategies, change management, rollbacks, and*

*post-release verification in regulated or high-stakes environments (e.g.*

*healthcare-adjacent workloads).*

*Production support: Participate in on-call or escalation rotation as*

*defined by the team; troubleshoot incidents, drive root-cause analysis,*

*and implement preventive fixes (runbooks, dashboards, alarms,*

*automation).*

*Observability & operations: Improve monitoring, logging, tracing, and*

*alerting; tune thresholds; reduce noise; document operational*

*procedures.*

*Collaboration: Work with security, architecture, and engineering leads to*

*implement least-privilege access, encryption, backup/DR posture, and*

*audit-friendly operations—including how AI-assisted workflows meet*

*security and audit expectations.*

*What we are looking for*

*Required*

*AWS*

*6+ years in software/systems / DevOps / SRE roles, including 4+ years*

*focused on AWS in production.*

*Strong command of infrastructure as code (Terraform) and modular,*

*environment-driven layouts (experience with Terragrunt or similar*

*composition patterns is a plus).*

*Deep, mandatory expertise in Amazon EKS: You have prior experience*

*building and owning Kubernetes on AWS—not only deploying*

*applications to a shared cluster. We expect fluency across the stack:*

*cluster design and lifecycle, upgrades and patching, networking*

*(VPC/CNI, DNS, ingress), identity and security (RBAC, IRSA, secrets,*

*guardrails), observability, capacity and performance, and production*

*troubleshooting. Surface-level or “I’ve used kubectl” experience is not*

*sufficient.*

*Solid grasp of CI/CD, artifact promotion, secrets injection, and safe*

*change practices in multi-environment pipelines.*

*Experience with production incidents: triage, communication, RCAs, and*

*durable remediation.*

*Demonstrated interest or experience in applying AI to DevOps/platform*

*work (e.g. AI-assisted coding and review workflows for IaC, internal*

*tooling, or operational documentation)—with judgment about limits,*

*verification, and risk in production systems.*

*Ability to influence without authority: written standards, design*

*reviews, and roadmap proposals that engineering teams actually adopt.*

*Excellent communication skills; comfortable working with distributed*

*teams and stakeholders outside pure engineering.*

*Preferred*

*AWS certifications (e.g. Solutions Architect Professional, DevOps*

*Engineer) or equivalent demonstrated depth.*

*Kubernetes certifications (e.g. CKA, CKS) or equivalent evidence of*

*advanced Kubernetes/EKS depth.*

*Experience with Helm / Helmfile, policy-as-code, or cluster baseline*

*tooling.*

*Familiarity with PostgreSQL/RDS, multi-tenant data patterns, or*

*regulated-industry constraints.*

*Experience shaping SLOs, error budgets, or platform KPIs.*

*Exposure to cost optimization (rightsizing, scheduling non-prod,*

*storage lifecycle) and FinOps collaboration.*

*Hands-on experimentation with AI coding assistants, internal LLM or*

*RAG patterns for ops knowledge, or evaluating vendor tools for the*

*platform team.*

Ready to apply?
You'll be redirected to MarkiTech.AI's application page.