Openkyber logo
Openkyber Verified
Cybersecurity, Software Development, Blockchain.

Lead Platform Engineer

Washington, Washington, United StatesOnsiteFull TimeLeadPosted 2 months agoVisa sponsorship available

Is this role right for you?

Upload your resume and get a skill-by-skill breakdown — see exactly where you match, where you're close, and what to highlight. Not a mystery percentage.

Get a tailored resume highlighting what this role needs.

Role summary

The MLOps Platform Engineer will design, build, and support enterprise-grade machine learning operations capabilities, focusing on scalable, reliable, and secure ML model development and deployment. This hands-on role requires strong expertise in AWS, Kubernetes (EKS), CI/CD automation, containerization, and ML platform operations. Key responsibilities include managing MLOps platform components, overseeing ML infrastructure deployments, implementing robust CI/CD pipelines for model packaging and deployment, and designing/managing EKS workloads. The engineer will also focus on monitoring, observability, cost optimization, and collaborating with Data Scientists and ML Engineers to operationalize ML solutions. Experience with Python, Bash, and infrastructure-as-code is essential.

Job title: MLOps Platform Engineer

Location: Reston VA - In person interviews so need Local In EAST coast only

The Data Modeling Analytics & AI Engineering team is seeking an experienced MLOps Platform Engineer to design, build, and support enterprise-grade machine learning operations capabilities. This role will play a key part in enabling scalable, reliable, and secure ML model development and deployment across our cloud and container platforms.

This is a hands-on engineering role requiring strong expertise in AWS, Kubernetes (EKS), CI/CD automation, containerization, and ML platform operations. The ideal candidate will have solid engineering fundamentals combined with practical knowledge of ML workflows, deployment patterns, and platform reliability.

Key Responsibilities

  • Platform Engineering & Operations Engineer, manage, and support MLOps platform components across AWS and EKS-based environments.
  • Oversee deployment, configuration, and operation of infrastructure used for ML training, batch inference, and real-time model serving.
  • Ensure platform availability, resilience, and performance across dev, test, and production environments.
  • Implement role-based access controls (RBAC), network policies, and scalable namespace designs within EKS.
  • Model Deployment & CI/CD Automation Build and support CI/CD pipelines (GitLab) for model packaging, container image builds, vulnerability scanning, and automated deployment flows.
  • Enable standardized model release processes including environment promotion, versioning, and rollback workflows.
  • Integrate CI/CD with ML frameworks, model repositories, artifacts, and runtime environments.
  • Container & Kubernetes Workloads Design and manage EKS workloads supporting containerized ML jobs and microservices.
  • Implement auto-scaling, resource quotas, cluster optimization, and multi-tenant workload isolation.
  • Support GPU and CPU-based training/inference workloads.
  • Monitoring, Observability & Optimization Implement logging, monitoring, and alerting for ML pipelines, model endpoints, batch jobs, and platform components.
  • Analyze compute, storage, and data transfer usage to optimize cost efficiency across ML workloads.
  • Perform incident response, root cause analysis, and long-term remediation planning.
  • Collaboration & Enablement Partner with Data Scientists, ML Engineers, and application teams to operationalize end-to-end machine learning solutions.
  • Provide technical guidance on best practices for ML model lifecycle management, deployment patterns, and scalable architectures.
  • Contribute to documentation, runbooks, onboarding materials, and internal knowledge bases.

Required Qualifications

  • 3+ years of hands-on experience with AWS services, including EKS, EC2, S3, IAM, CloudWatch, and ECR.
  • Strong experience operating and troubleshooting Kubernetes (preferably AWS EKS).
  • Proficiency in containerization (Docker) and orchestration concepts.
  • Strong programming/scripting experience in Python and Bash.
  • Experience building and managing CI/CD pipelines (GitLab or equivalent).
  • Familiarity with machine learning workflows, including training, inference, and model monitoring.
  • Experience with infrastructure-as-code (Terraform or CloudFormation).
  • Experience supporting production platforms, including incident management and root cause analysis.

For applications and inquiries, contact: hirings@openkyber.com

Ready to apply?
You'll be redirected to Openkyber's application page.

Similar roles