Openkyber logo
Openkyber Verified
Cybersecurity, Software Development, Blockchain.

Senior Cloud Platform Engineer

Georgia, Georgia, United StatesOnsiteFull TimeSeniorPosted 2 months agoVisa sponsorship available

Is this role right for you?

Upload your resume and get a skill-by-skill breakdown — see exactly where you match, where you're close, and what to highlight. Not a mystery percentage.

Get a tailored resume highlighting what this role needs.

Role summary

The Senior Cloud Platform Engineer will be responsible for designing, operating, and governing resilient, scalable Azure cloud-native platforms. This role involves implementing DevSecOps and GitOps pipelines, managing Azure Landing Zones with robust governance, and building automation frameworks. Key responsibilities include ensuring platform reliability through AIOps and observability, embedding FinOps practices for cost governance, and provisioning infrastructure using Infrastructure as Code (IaC) with Terraform. The engineer will also support container orchestration with Kubernetes and secure API integrations, acting as a senior escalation point for platform incidents and driving continuous service improvement through automation and AI-driven insights.

Required Skills & Experience

Cloud Native Architecture :

Design and operate resilient, scalable Azure cloud native platforms aligned to enterprise standards and RUN SLAs

DevSecOps & GitOps :

Implement secure CI/CD and GitOps pipelines with built in security, policy enforcement, and automated controls

Cloud Landing Zone & Policy Management :

Operate and govern Azure Landing Zones using Azure Policy, RBAC, guardrails, and compliance automation

Platform & COE Tooling:

Build and support reusable COE accelerators, golden paths, templates, and automation frameworks

AIOps & Observability :

Enable proactive monitoring, logging, alerting, and AIOps driven insights for platform reliability and incident reduction

FinOps :

Embed cost governance, tagging, budgets, and optimization practices into platform operations

Cloud Architecture (RUN focused):

Translate client approved architectures into operable, supportable, and compliant Azure platforms

Containers & Kubernetes:

Design, deploy, and operate container platforms using Kubernetes, AKS, Docker, and Helm

Infrastructure as Code:

Provision and manage Azure infrastructure using Terraform and automated pipelines

API & Integration Platforms:

Design and support secure APIs and integrations using Azure API Management (APIM)

Event & Streaming Platforms:

Support cloud native messaging and streaming solutions using Kafka and managed services

Scripting & Automation:

Develop operational automation using Python and platform SDKs

Agile & ITSM Alignment:

Operate within Agile delivery models while supporting ITSM, incident, change, and problem management processes

Certifications:

Microsoft Certified: Azure Solutions Architect Expert (AZ 305) - Required
Microsoft Certified: Azure Administrator Associate (AZ 104) - Required
AZ 400 (DevOps Engineer Expert)
AZ 500 (Azure Security Engineer Associate)
ITIL 4 Foundation
Terraform Associate

Responsibilities:

Azure Platform RUN Ownership Act as L3/L4 escalation point for Azure platform incidents across IaaS, PaaS, landing zones, and Terraform based deployments. Lead root cause analysis (RCA) for P1/P2 incidents and drive permanent fixes through automation and design improvements. Ensure platform services meet availability, reliability, and performance SLAs.

Landing Zone & Governance Operations Operate and govern Azure Landing Zones, including RBAC models, Azure Policy, network/security baselines, and compliance monitoring. Detect and remediate configuration drift using policy as code and IaC controls. Maintain operational RACI alignment across Platform, Security, FinOps, and Network teams.

Infrastructure as Code & Automation Design, maintain, and review Terraform modules, CI/CD pipelines, and reusable golden paths used in RUN operations. Ensure provisioning, changes, and decommissioning follow approved automated pipelines. Perform senior level IaC and pipeline conformance reviews.

Service Requests & Change Governance Provide architectural oversight for service requests, enhancements, and onboarding of new Azure services. Support cloud change governance processes and validate Low Level Designs (LLDs) for operational readiness. Ensure changes are safe, auditable, and compliant within the managed services model.

Security & Compliance Support Implement and operate Azure security controls (Azure Policy, RBAC, Conditional Access, Key Vault). Support security incidents, audit evidence requests, and remediation of compliance findings in coordination with Security teams.

FinOps & Continuous Improvement Partner with FinOps teams to enforce cost guardrails, tagging standards, and optimization actions. Drive continuous service improvement through automation, reliability engineering, and cost efficiency initiatives.

Contineous Service Improvement Automation Led Optimization: Continuously reduce manual operational effort by automating Azure platform tasks using Python, Azure SDKs, and REST APIs

Self Healing Operations: Implement Agentic AI-driven remediation workflows to auto detect, diagnose, and resolve recurring platform issues

Proactive Incident Reduction: Leverage AIOps and AI assisted analytics to identify patterns, predict failures, and prevent incidents before impact

IaC Drift & Compliance Improvement: Use automation to detect and remediate Terraform drift, configuration non compliance, and policy violations

Operational Observability Enhancement: Improve platform reliability through continuous tuning of logging, metrics, alerts, and telemetry across Azure services

Agentic Runbook Automation: Convert manual runbooks into agent driven workflows for repeatable, zero touch execution of common operational tasks

Cost & Performance Optimization: Drive CSI through FinOps automation, including rightsizing, scheduling, and cost anomaly detection

API First Improvements: Enhance service responsiveness by integrating Azure services using SDK based and event driven automation

Intelligent Change Execution: Apply AI assisted impact analysis and guardrails to reduce change related incidents and improve change success rates

Continuous Feedback Loop: Use operational data, AI insights, and platform KPIs to prioritize CSI backlog and deliver measurable improvements sprint over sprint

For applications and inquiries, contact: hirings@openkyber.com

Ready to apply?
You'll be redirected to Openkyber's application page.

Similar roles