AI Engineer

Tampa, Florida, United StatesRemoteFull TimePosted 2 months ago

Is this role right for you?

Upload your resume and get a skill-by-skill breakdown — see exactly where you match, where you're close, and what to highlight. Not a mystery percentage.

Get a tailored resume highlighting what this role needs.

Role summary

Seeking a deeply technical AI Engineer to build and scale production-grade AI infrastructure from bare metal to the browser. This role involves optimizing transformers, managing GPU clusters, and deploying LLM systems using vLLM, SGLang, and NVIDIA Triton. Responsibilities include designing and operating LLM inference clusters, building AI-powered products with React frontends and Python/FastAPI backends, managing Kubernetes clusters, establishing CI/CD pipelines, and implementing RAG pipelines. Requires expertise in AI inference, ML frameworks, Python, TypeScript, React, DevOps with Kubernetes and Terraform, model quantization, fine-tuning, Linux, networking, and compliance-sensitive environments (CMMC/ITAR). The position offers remote-friendly flexibility and the opportunity to work on mission-critical defense and manufacturing projects.

AI Software Engineer | Tampa, FL | Remote-Friendly

We're working with a mission-critical advanced manufacturing and defense innovator on this exciting opportunity.

We are seeking a deeply technical engineer to build and scale production-grade AI infrastructure from bare metal to the browser. This isn't just about prompt engineering; you will be optimizing transformers, managing high-availability GPU clusters, and deploying sophisticated LLM systems using vLLM, SGLang, and NVIDIA Triton.

The Role

• Design and operate high-availability LLM inference clusters using vLLM, SGLang, and NVIDIA Triton Inference Server to power real-world applications.

• Build and maintain robust AI-powered products featuring React frontends and Python/FastAPI backends for a seamless user experience.

• Manage the full lifecycle of Kubernetes clusters (k8s, k3s, RKE2), including GPU operator configuration, networking, and end-to-end security.

• Establish advanced CI/CD pipelines using GitHub Actions or GitLab CI for automated model packaging, container builds, and seamless deployments.

• Implement RAG pipelines and agentic workflows using vector databases like Milvus or Qdrant and advanced tool-calling frameworks.

What You'll Need

• Deep expertise in AI Inference and ML frameworks including Hugging Face Transformers, LangChain, LlamaIndex, and TensorRT.

• Strong proficiency in Python, TypeScript, and React, with experience building high-performance REST and WebSocket APIs.

• Heavyweight DevOps skills with Kubernetes, Helm, Terraform, and observability tools like Prometheus, Grafana, and the ELK Stack.

• Experience with model quantization (GGUF, AWQ, GPTQ) and fine-tuning techniques such as LoRA and QLoRA on NVIDIA hardware.

• Knowledge of Linux systems (Ubuntu/RHEL), networking fundamentals (BGP, VLAN), and deploying in compliance-sensitive environments (CMMC/ITAR).

What's On Offer

• Remote-friendly flexibility with the backing of a stable, long-standing industry leader.

• Opportunity to work at the cutting edge of AI infrastructure, deploying real models into high-stakes environments.

• Collaborative environment where you own the full technology stack, from CUDA kernels to the UI.

• Chance to work on mission-critical projects that impact defense and high-tech manufacturing sectors.

Apply via Haystack today!

Ready to apply?

You'll be redirected to Haystack's application page.

Is this role right for you?

Role summary

Similar roles