ML Engineer / AI Operations
Role summary
This role is for an ML Engineer focused on AI Operations, based in Morristown, NJ. The primary responsibility is to own and operate the CI/CD pipelines for existing ML services, ensuring robust deployment strategies like blue/green and canary releases with automated rollbacks. Key duties include implementing and managing model/data drift monitoring, setting up alerts and retraining triggers, and building production dashboards and incident workflows. The engineer will provide L2/L3 support for production ML systems, conduct root-cause analysis, and maintain operational documentation. Additionally, they will productionize research notebooks into maintainable services, manage model-serving APIs and batch jobs, and ensure models are integrated with CI/CD, observability, and monitoring stacks, enforcing traceability.
Morristown, New Jersey 07960 Posted April 2nd, 2026
Looking for more job opportunities? Click here!
Job Type: Full Time
Job Category: IT
Job Description
Job Title : ML Engineer - AI Operations
Location : Morristown, NJ (Onsite)
Fulltime
Skill: ML Engineer - AI Operations
Key responsibilities:
Own and operate CI/CD for existing ML services across dev/test/prod; standardize blue/green and canary releases with automated rollbacks.
Run model/data drift and performance monitoring with SLAs; define alerts, thresholds, and retraining triggers.
Build and maintain production dashboards, alerts, and incident workflows; codify on-call runbooks and escalation paths.
Partner with onshore model owners to diagnose metric degradation and land mitigations aligned to governance and controls.
Provide day-to-day L2/L3 support for production ML: triage, root-cause analysis, permanent fixes, and post-incident reviews.
Own operational documentation: runbooks, standard operating procedures, and recurring health checks.
Coordinate hotfixes and safe rollbacks with onshore teams; verify recovery via automated smoke tests.
Harden and productionize research notebooks into maintainable, testable services with CI, unit/integration tests, and linting.
Operate and evolve model-serving APIs and batch scoring jobs; integrate with enterprise schedulers and data platforms.
Ensure models are fully integrated into CI/CD, observability, and monitoring stacks; enforce traceability with experiment and model registries.
Validate successful delivery of model outputs to apps, chatbots, reports, and downstream systems with contract tests and data quality checks.
Required Skills:
Git/GitLab, Python, SQL, MLflow, Power BI, Snowflake.
OLAP/OLTP data modeling and architecture.
API frameworks (FastAPI/Flask), and
Nice to have:
Modern ELT tools (Fivetran/Airbyte).
Streaming/real-time data pipelines (e.g., Kafka, Kinesis, Redpanda).
Production ML service operations experience (experience in broader full-stack environments is a plus.
Required Skills
DEVOPS ENGINEER
SENIOR EMAIL SECURITY ENGINEER