Senior Cloud Platform Engineer
Role summary
The Senior Cloud Platform Engineer will be responsible for designing, operating, and governing resilient, scalable Azure cloud-native platforms. This role involves implementing DevSecOps and GitOps pipelines, managing Azure Landing Zones with robust governance, and building automation frameworks. Key responsibilities include ensuring platform reliability through AIOps and observability, embedding FinOps practices for cost governance, and provisioning infrastructure using Infrastructure as Code (IaC) with Terraform. The engineer will also support container orchestration with Kubernetes and secure API integrations, acting as a senior escalation point for platform incidents and driving continuous service improvement through automation and AI-driven insights.
Required Skills & Experience
Cloud Native Architecture :
Design and operate resilient, scalable Azure cloud native platforms aligned to enterprise standards and RUN SLAs
DevSecOps & GitOps :
Implement secure CI/CD and GitOps pipelines with built in security, policy enforcement, and automated controls
Cloud Landing Zone & Policy Management :
Operate and govern Azure Landing Zones using Azure Policy, RBAC, guardrails, and compliance automation
Platform & COE Tooling:
Build and support reusable COE accelerators, golden paths, templates, and automation frameworks
AIOps & Observability :
Enable proactive monitoring, logging, alerting, and AIOps driven insights for platform reliability and incident reduction
FinOps :
Embed cost governance, tagging, budgets, and optimization practices into platform operations
Cloud Architecture (RUN focused):
Translate client approved architectures into operable, supportable, and compliant Azure platforms
Containers & Kubernetes:
Design, deploy, and operate container platforms using Kubernetes, AKS, Docker, and Helm
Infrastructure as Code:
Provision and manage Azure infrastructure using Terraform and automated pipelines
API & Integration Platforms:
Design and support secure APIs and integrations using Azure API Management (APIM)
Event & Streaming Platforms:
Support cloud native messaging and streaming solutions using Kafka and managed services
Scripting & Automation:
Develop operational automation using Python and platform SDKs
Agile & ITSM Alignment:
Operate within Agile delivery models while supporting ITSM, incident, change, and problem management processes
Certifications:
Microsoft Certified: Azure Solutions Architect Expert (AZ 305) - Required
Microsoft Certified: Azure Administrator Associate (AZ 104) - Required
AZ 400 (DevOps Engineer Expert)
AZ 500 (Azure Security Engineer Associate)
ITIL 4 Foundation
Terraform Associate
Responsibilities:
Azure Platform RUN Ownership Act as L3/L4 escalation point for Azure platform incidents across IaaS, PaaS, landing zones, and Terraform based deployments. Lead root cause analysis (RCA) for P1/P2 incidents and drive permanent fixes through automation and design improvements. Ensure platform services meet availability, reliability, and performance SLAs.
Landing Zone & Governance Operations Operate and govern Azure Landing Zones, including RBAC models, Azure Policy, network/security baselines, and compliance monitoring. Detect and remediate configuration drift using policy as code and IaC controls. Maintain operational RACI alignment across Platform, Security, FinOps, and Network teams.
Infrastructure as Code & Automation Design, maintain, and review Terraform modules, CI/CD pipelines, and reusable golden paths used in RUN operations. Ensure provisioning, changes, and decommissioning follow approved automated pipelines. Perform senior level IaC and pipeline conformance reviews.
Service Requests & Change Governance Provide architectural oversight for service requests, enhancements, and onboarding of new Azure services. Support cloud change governance processes and validate Low Level Designs (LLDs) for operational readiness. Ensure changes are safe, auditable, and compliant within the managed services model.
Security & Compliance Support Implement and operate Azure security controls (Azure Policy, RBAC, Conditional Access, Key Vault). Support security incidents, audit evidence requests, and remediation of compliance findings in coordination with Security teams.
FinOps & Continuous Improvement Partner with FinOps teams to enforce cost guardrails, tagging standards, and optimization actions. Drive continuous service improvement through automation, reliability engineering, and cost efficiency initiatives.
Contineous Service Improvement Automation Led Optimization: Continuously reduce manual operational effort by automating Azure platform tasks using Python, Azure SDKs, and REST APIs
Self Healing Operations: Implement Agentic AI-driven remediation workflows to auto detect, diagnose, and resolve recurring platform issues
Proactive Incident Reduction: Leverage AIOps and AI assisted analytics to identify patterns, predict failures, and prevent incidents before impact
IaC Drift & Compliance Improvement: Use automation to detect and remediate Terraform drift, configuration non compliance, and policy violations
Operational Observability Enhancement: Improve platform reliability through continuous tuning of logging, metrics, alerts, and telemetry across Azure services
Agentic Runbook Automation: Convert manual runbooks into agent driven workflows for repeatable, zero touch execution of common operational tasks
Cost & Performance Optimization: Drive CSI through FinOps automation, including rightsizing, scheduling, and cost anomaly detection
API First Improvements: Enhance service responsiveness by integrating Azure services using SDK based and event driven automation
Intelligent Change Execution: Apply AI assisted impact analysis and guardrails to reduce change related incidents and improve change success rates
Continuous Feedback Loop: Use operational data, AI insights, and platform KPIs to prioritize CSI backlog and deliver measurable improvements sprint over sprint
For applications and inquiries, contact: hirings@openkyber.com
Similar roles
- Lead Cloud Platform EngineerIntelex Technologies ULC · United States · Remote
- Lead Cloud Platform EngineerIntelex · United States · Remote
Cloud Platform EngineerAccenture Federal Services · Hanover, Maryland, United States · Hybrid- Senior Cloud Platform EngineerLantern · Dallas, Texas, United States · Hybrid
- Senior Cloud Platform EngineerLantern · Vancouver, British Columbia, Canada · Hybrid