AI / ML Platform Engineer Toronto, ON (Onsite)
Role summary
We are seeking an AI/ML Platform Engineer to construct and manage scalable infrastructure specifically for Large Language Model (LLM) workloads. The role involves overseeing model serving, GPU orchestration, inference optimization, and the complete ML lifecycle within a production setting. Key responsibilities include deploying and managing LLM serving frameworks like vLLM and Triton, building GPU infrastructure, implementing MLflow for model management, optimizing inference performance, and ensuring the reliability and observability of AI systems. Experience with Kubernetes and inference tuning is essential.
ob Title: AI / ML Platform Engineer
Job Location: Toronto, ON (Onsite)
Job Type:12 months plus
Please sahre me the resume at
pranay@burgeonits.com
Job Description
We are looking for an AI/ML Platform Engineer to build and operate scalable infrastructure for Large Language Model (LLM) workloads. You will be responsible for model serving, GPU orchestration, inference optimization, and managing the end-to-end ML lifecycle in a production environment.
Key Responsibilities
- Deploy and manage LLM model serving frameworks (vLLM, Triton)
- Build and maintain GPU-based infrastructure for AI workloads
- Implement and manage MLflow for model registry and versioning
- Optimize model inference performance and latency
- Monitor and scale AI/ML workloads in production
- Manage model lifecycle (deployment, updates, rollback)
- Integrate vector search and retrieval systems
- Ensure reliability, observability, and performance of AI systems
Required Skills
- Strong experience with LLM/AI model serving (vLLM, Triton)
- Hands-on experience with GPU infrastructure and CUDA
- Experience with MLflow or similar model management tools
- Knowledge of Kubernetes and containerized environments
- Understanding of AI/ML workload scaling and optimization
- Experience with inference performance tuning
Good to Have
- Experience with LiteLLM or API gateway for LLMs
- Knowledge of vector databases (Qdrant, Pinecone, Weaviate)
- Experience with distributed systems and microservices
- Familiarity with cloud platforms (Azure/AWS/GCP)