BURGEON IT SERVICES logo
BURGEON IT SERVICES Verified
IT Services

AI / ML Platform Engineer Toronto, ON (Onsite)

Toronto, Ontario, CanadaOnsiteContractPosted 2 months ago

Is this role right for you?

Upload your resume and get a skill-by-skill breakdown — see exactly where you match, where you're close, and what to highlight. Not a mystery percentage.

Get a tailored resume highlighting what this role needs.

Role summary

We are seeking an AI/ML Platform Engineer to construct and manage scalable infrastructure specifically for Large Language Model (LLM) workloads. The role involves overseeing model serving, GPU orchestration, inference optimization, and the complete ML lifecycle within a production setting. Key responsibilities include deploying and managing LLM serving frameworks like vLLM and Triton, building GPU infrastructure, implementing MLflow for model management, optimizing inference performance, and ensuring the reliability and observability of AI systems. Experience with Kubernetes and inference tuning is essential.

ob Title: AI / ML Platform Engineer

Job Location: Toronto, ON (Onsite)

Job Type:12 months plus

Please sahre me the resume at
pranay@burgeonits.com

Job Description

We are looking for an AI/ML Platform Engineer to build and operate scalable infrastructure for Large Language Model (LLM) workloads. You will be responsible for model serving, GPU orchestration, inference optimization, and managing the end-to-end ML lifecycle in a production environment.

Key Responsibilities

  • Deploy and manage LLM model serving frameworks (vLLM, Triton)
  • Build and maintain GPU-based infrastructure for AI workloads
  • Implement and manage MLflow for model registry and versioning
  • Optimize model inference performance and latency
  • Monitor and scale AI/ML workloads in production
  • Manage model lifecycle (deployment, updates, rollback)
  • Integrate vector search and retrieval systems
  • Ensure reliability, observability, and performance of AI systems

Required Skills

  • Strong experience with LLM/AI model serving (vLLM, Triton)
  • Hands-on experience with GPU infrastructure and CUDA
  • Experience with MLflow or similar model management tools
  • Knowledge of Kubernetes and containerized environments
  • Understanding of AI/ML workload scaling and optimization
  • Experience with inference performance tuning

Good to Have

  • Experience with LiteLLM or API gateway for LLMs
  • Knowledge of vector databases (Qdrant, Pinecone, Weaviate)
  • Experience with distributed systems and microservices
  • Familiarity with cloud platforms (Azure/AWS/GCP)
Ready to apply?
You'll be redirected to BURGEON IT SERVICES's application page.