
Dev Ops AWS (or Senior Cloud / Platform Engineer)
Company Description
MarkiTech.AI, a Canadian-based company, specializes in developing innovative digital healthcare solutions, AI agents, and automation systems for healthcare and telecommunications. Over the past decade, the company has successfully delivered 50+ global projects and introduced a range of advanced platforms, such as CliniScripts, YourDoctors.Online, SenSights.AI, and others aimed at improving care delivery and enhancing user experiences. With a focus on intelligent, workflow-integrated systems, MarkiTech.AI is poised to shape the future of AI in healthcare and telecommunications through cutting-edge automation and digital transformation. By prioritizing smarter and more efficient decision-making, MarkiTech.AI strives to create positive change across industries.
Job title (MUST BE IN CANADA)
*Senior DevOps Engineer (alternate: Senior Cloud / Platform Engineer)*
*About the role*
*We are hiring a senior DevOps engineer to own and evolve our cloud platform on*
*AWS, grounded in infrastructure as code, secure multi-account patterns, and*
*reliable delivery. You will shape the DevOps roadmap (standards, tooling,*
*automation, and operational excellence), support application releases, and*
*provide production support for critical workloads.*
*Amazon EKS is central to how we run workloads—we need someone with deep,*
*production-grade EKS expertise who has built and owned Kubernetes on AWS*
*end-to-end, not only deployed apps to a cluster someone else runs.*
*You will also lead how we adopt AI for infrastructure and platform work—not as*
*a buzzword, but as a practical force multiplier: safe use of AI-assisted authoring*
*and review for IaC and automation, clearer runbooks and incident workflows, and*
*evaluation of tools and patterns that improve speed without weakening security,*
*compliance, or change control. This role suits someone who combines deep AWS*
*practice with leadership: you can define “how we build and run” while still being*
*hands-on in pipelines, clusters, and incidents.*
*What you will do*
*Roadmap & standards: Define and socialize DevOps priorities (security,*
*reliability, cost, velocity). Align teams on AWS Well-Architected*
*practices, tagging, guardrails, and repeatable patterns for networking,*
*identity, secrets, and data.*
*AI adoption for infra & platform: Drive a pragmatic AI strategy for the*
*team—e.g. standards for AI-assisted IaC and pipeline changes (review*
*gates, testing, drift detection), documentation and runbook quality,*
*incident summarization and triage workflows where appropriate, and*
*guardrails so AI tooling fits regulated or high-stakes environments. Stay*
*current on vendor and open-source options; pilot, measure, and roll out*
*what actually reduces toil.*
*Infrastructure as code: Design, review, and implement changes using*
*Terraform and Terragrunt, with clear module boundaries, environmentspecific*
*config, and safe promotion across dev → non-prod →*
*production.*
*EKS (critical): Build, operate, and own the Kubernetes platform on AWS*
*—cluster lifecycle (creation, upgrades, patching), node groups / capacity,*
*networking (CNI, service mesh or ingress as used), security (RBAC,*
*admission controls, pod security, secrets and IRSA), add-ons, and cost/ reliability tuning. Partner with app teams on standards for workloads, namespaces, and safe rollouts; be the escalation point for cluster-level incidents.*
*Broader AWS platform: Operate and improve adjacent services—e.g.*
*RDS/Aurora, DynamoDB, object storage and CDN, KMS, Secrets*
*Manager, SNS (alerting), Lambda, EventBridge, and CI/CD*
*(CodePipeline / CodeBuild, connections to source control)—plus IAM,*
*VPC, and multi-tenant or multi-namespace patterns where applicable.*
*Release engineering: Partner with development teams on release*
*processes, deployment strategies, change management, rollbacks, and*
*post-release verification in regulated or high-stakes environments (e.g.*
*healthcare-adjacent workloads).*
*Production support: Participate in on-call or escalation rotation as*
*defined by the team; troubleshoot incidents, drive root-cause analysis,*
*and implement preventive fixes (runbooks, dashboards, alarms,*
*automation).*
*Observability & operations: Improve monitoring, logging, tracing, and*
*alerting; tune thresholds; reduce noise; document operational*
*procedures.*
*Collaboration: Work with security, architecture, and engineering leads to*
*implement least-privilege access, encryption, backup/DR posture, and*
*audit-friendly operations—including how AI-assisted workflows meet*
*security and audit expectations.*
*What we are looking for*
*Required*
*AWS*
*6+ years in software/systems / DevOps / SRE roles, including 4+ years*
*focused on AWS in production.*
*Strong command of infrastructure as code (Terraform) and modular,*
*environment-driven layouts (experience with Terragrunt or similar*
*composition patterns is a plus).*
*Deep, mandatory expertise in Amazon EKS: You have prior experience*
*building and owning Kubernetes on AWS—not only deploying*
*applications to a shared cluster. We expect fluency across the stack:*
*cluster design and lifecycle, upgrades and patching, networking*
*(VPC/CNI, DNS, ingress), identity and security (RBAC, IRSA, secrets,*
*guardrails), observability, capacity and performance, and production*
*troubleshooting. Surface-level or “I’ve used kubectl” experience is not*
*sufficient.*
*Solid grasp of CI/CD, artifact promotion, secrets injection, and safe*
*change practices in multi-environment pipelines.*
*Experience with production incidents: triage, communication, RCAs, and*
*durable remediation.*
*Demonstrated interest or experience in applying AI to DevOps/platform*
*work (e.g. AI-assisted coding and review workflows for IaC, internal*
*tooling, or operational documentation)—with judgment about limits,*
*verification, and risk in production systems.*
*Ability to influence without authority: written standards, design*
*reviews, and roadmap proposals that engineering teams actually adopt.*
*Excellent communication skills; comfortable working with distributed*
*teams and stakeholders outside pure engineering.*
*Preferred*
*AWS certifications (e.g. Solutions Architect Professional, DevOps*
*Engineer) or equivalent demonstrated depth.*
*Kubernetes certifications (e.g. CKA, CKS) or equivalent evidence of*
*advanced Kubernetes/EKS depth.*
*Experience with Helm / Helmfile, policy-as-code, or cluster baseline*
*tooling.*
*Familiarity with PostgreSQL/RDS, multi-tenant data patterns, or*
*regulated-industry constraints.*
*Experience shaping SLOs, error budgets, or platform KPIs.*
*Exposure to cost optimization (rightsizing, scheduling non-prod,*
*storage lifecycle) and FinOps collaboration.*
*Hands-on experimentation with AI coding assistants, internal LLM or*
*RAG patterns for ops knowledge, or evaluating vendor tools for the*
*platform team.*