
AI/ML Infrastructure Engineer
Role summary
The MLOps / AI Platform Engineer is responsible for the operational lifecycle of AI models, prompts, and agents within the AI-Enabled Platform. This role ensures reliable deployments, safe rollbacks, robust observability, and cost-effective performance at scale. Key responsibilities include implementing and managing AI/ML CI/CD pipelines, operating the AI platform (model registry, feature stores, inference infrastructure), monitoring and optimizing model performance and costs, providing experimentation frameworks like A/B testing, and partnering with AI Engineers and Governance teams to enforce responsible AI practices. The role also involves documenting procedures and platform guidelines.
Role Summary
The MLOps / AI Platform Engineer owns the operational lifecycle of AI models, prompts, and agents supporting the AI-Enabled Platform—ensuring reliable deployments, safe rollbacks, observability, and cost-effective performance at scale.
Key Responsibilities
- Implement and manage AI/ML CI/CD:
- Pipelines for models, prompts, and configuration changes
- Canary deployments, rollbacks, and environment management
- Operate the AI platform:
- Model registry, feature stores, and inference infrastructure
- SLOs and SLAs for AI endpoints used by Jira/Confluence apps and services
- Monitor and optimize:
- Model performance, drift, and data quality signals
- Cost-to-serve, latency, and scalability for inference workloads
- Adoption metrics, override rates, and false positives/negatives
- Provide experimentation and evaluation frameworks:
- A/B testing harnesses for new models and prompts
- Dashboards for time saved, risk detection quality, and user engagement
- Partner with AI Engineers, Backend, and Governance:
- Enforce responsible AI and governance constraints in deployments
- Support auditability and traceability of AI decisions and releases
- Document and standardize:
- Runbooks, playbooks, and incident management procedures
- Platform guidelines for AI feature teams building on the platform
Qualifications
- Strong experience in MLOps, ML platform engineering, or related DevOps roles
- Hands-on experience with model registries, CI/CD tools, and monitoring stacks
- Familiarity with serving ML/GenAI workloads in production
- Solid skills in infrastructure-as-code, containerization, and cloud-native services
- Understanding of responsible AI, observability, and cost optimization for ML systems