Machine Learning Engineer

Toronto, Ontario, CanadaOnsiteContractPosted 2 months ago

Is this role right for you?

Upload your resume and get a skill-by-skill breakdown — see exactly where you match, where you're close, and what to highlight. Not a mystery percentage.

Get a tailored resume highlighting what this role needs.

Role summary

We are seeking a Machine Learning Engineer to design, build, and deploy end-to-end ML solutions that drive business impact. This hands-on role involves developing ML pipelines from data preparation to production deployment using Python and an open-source AI/ML stack. You will focus on creating production-grade models, building repeatable ML engineering patterns, and improving model quality and reliability through robust testing and monitoring. Collaboration with data engineering and platform teams is key to leveraging scalable platforms like Databricks and Spark, while adhering to security and responsible AI practices. The role requires strong software engineering skills in Python, proficiency with ML libraries, MLOps experience, and a solid understanding of ML concepts.

Role Overview

We are seeking a Machine Learning Developer to design, build, and deploy ML solutions that turn data into measurable business impact. This is a hands-on engineering role focused on developing end-to-end ML pipelines—data preparation, feature engineering, model training, evaluation, and production deployment—using Python and an open-source AI/ML stack. You will collaborate with data engineering and platform teams and work in environments that may include Databricks and Spark for scalable data processing and model operations.

Key Objectives

Deliver production-grade ML models and data products from discovery through deployment.
Build repeatable, maintainable ML engineering patterns for training, evaluation, and inference.
Improve model quality, reliability, and performance through robust testing, monitoring, and iteration.
Partner with data and platform teams to leverage scalable compute and data platforms (including Databricks/Spark) while meeting security and governance requirements.

Primary Responsibilities

Design, develop, and iterate on machine learning models for classification, regression, clustering, recommendation, forecasting, and/or NLP use cases as needed.
Build end-to-end ML pipelines in Python: data ingestion and preparation, feature engineering, training, evaluation, and batch/real-time inference.
Apply sound experimentation practices: baselines, ablation studies, cross-validation (as applicable), and clear success metrics aligned to business outcomes.
Develop and maintain reusable ML code (packages, utilities, pipelines) with strong software engineering practices (tests, code review, documentation, CI/CD).
Implement model evaluation and testing: offline benchmarks, data/label quality checks, reproducible training runs, and regression tests to prevent performance degradation.
Operationalize MLOps: model versioning, experiment tracking, model registry, automated deployments, and monitoring for drift, bias, latency, and cost.
Integrate ML services with product systems via APIs and event-driven patterns; collaborate on feature stores, data contracts, and production SLAs.
Leverage open-source AI/ML components (e.g., scikit-learn, PyTorch/TensorFlow, XGBoost/LightGBM, Hugging Face ecosystem) and choose the right tool for accuracy, latency, and maintainability.
Collaborate with data engineering and platform teams to use Databricks/Spark for large-scale ETL, feature computation, distributed training (where relevant), and scheduled jobs.
Ensure solutions follow security, privacy, and responsible AI practices, including safe handling of sensitive data and auditability of model decisions.

Required Skills & Experience

Strong software engineering experience in Python (clean architecture, API design, testing, packaging, performance tuning).
Hands-on experience building and deploying machine learning models in production environments.
Proficiency with common ML libraries and frameworks (e.g., scikit-learn, PyTorch or TensorFlow; XGBoost/LightGBM as applicable).
Experience with data processing in Python (e.g., pandas, NumPy) and strong SQL fundamentals.
Understanding of ML concepts (bias/variance, regularization, feature leakage, evaluation metrics, calibration) and ability to select appropriate metrics for the use case.
Experience with MLOps practices and tooling (e.g., MLflow or equivalent), including experiment tracking, model versioning, and reproducible training.
Experience deploying services (Docker, CI/CD) and operating them with monitoring/observability practices.
Ability to communicate tradeoffs clearly—balancing accuracy, latency, cost, reliability, and risk.

Preferred / Nice to Have

Awareness of Databricks concepts (workspaces, notebooks, jobs, clusters) and practical experience with Spark for large-scale data processing.
Experience with Databricks MLflow Model Registry and/or Unity Catalog (or similar governance) for managing models, features, and controlled data access.
Experience with feature stores, data versioning, and data quality frameworks.
Experience with model serving and optimization (e.g., FastAPI, TorchServe, ONNX, quantization, batching, caching).
Familiarity with modern open-source LLM and embeddings ecosystem (e.g., Hugging Face Transformers, sentence-transformers) and applying them to NLP tasks when relevant.
Experience with cloud ML services and distributed training patterns (Ray, Spark ML, Horovod, or similar).
Experience implementing responsible AI practices (privacy, explainability, robustness, and security controls).

Ready to apply?

You'll be redirected to 360 IDE's application page.

Is this role right for you?

Role summary

Similar roles