AI / ML Platform Engineer Toronto, ON (Onsite)

Toronto, Ontario, CanadaOnsiteContractPosted 2 months ago

Is this role right for you?

Upload your resume and get a skill-by-skill breakdown — see exactly where you match, where you're close, and what to highlight. Not a mystery percentage.

Get a tailored resume highlighting what this role needs.

Role summary

We are seeking an AI/ML Platform Engineer to construct and manage scalable infrastructure specifically for Large Language Model (LLM) workloads. The role involves overseeing model serving, GPU orchestration, inference optimization, and the complete ML lifecycle within a production setting. Key responsibilities include deploying and managing LLM serving frameworks like vLLM and Triton, building GPU infrastructure, implementing MLflow for model management, optimizing inference performance, and ensuring the reliability and observability of AI systems. Experience with Kubernetes and inference tuning is essential.

ob Title: AI / ML Platform Engineer

Job Location: Toronto, ON (Onsite)

Job Type:12 months plus

Please sahre me the resume at
pranay@burgeonits.com

Job Description

We are looking for an AI/ML Platform Engineer to build and operate scalable infrastructure for Large Language Model (LLM) workloads. You will be responsible for model serving, GPU orchestration, inference optimization, and managing the end-to-end ML lifecycle in a production environment.

Key Responsibilities

Deploy and manage LLM model serving frameworks (vLLM, Triton)
Build and maintain GPU-based infrastructure for AI workloads
Implement and manage MLflow for model registry and versioning
Optimize model inference performance and latency
Monitor and scale AI/ML workloads in production
Manage model lifecycle (deployment, updates, rollback)
Integrate vector search and retrieval systems
Ensure reliability, observability, and performance of AI systems

Required Skills

Strong experience with LLM/AI model serving (vLLM, Triton)
Hands-on experience with GPU infrastructure and CUDA
Experience with MLflow or similar model management tools
Knowledge of Kubernetes and containerized environments
Understanding of AI/ML workload scaling and optimization
Experience with inference performance tuning

Good to Have

Experience with LiteLLM or API gateway for LLMs
Knowledge of vector databases (Qdrant, Pinecone, Weaviate)
Experience with distributed systems and microservices
Familiarity with cloud platforms (Azure/AWS/GCP)

Ready to apply?

You'll be redirected to BURGEON IT SERVICES's application page.