AI Engineer

San Diego, California, United StatesOnsiteFull Time$200,800–$301,200 /yrPosted 2 months ago

Is this role right for you?

Upload your resume and get a skill-by-skill breakdown — see exactly where you match, where you're close, and what to highlight. Not a mystery percentage.

Get a tailored resume highlighting what this role needs.

Role summary

A global leader in semiconductor innovation is seeking a Principal Software Engineer to revolutionize Cloud AI. This on-site role in San Diego, California, focuses on high-performance LLM inference acceleration and next-generation silicon software. The engineer will lead the architecture and deployment of large-scale commercial software solutions, optimizing LLM serving frameworks like vLLM and PyTorch for carrier-grade machine learning workloads on multi-core SoC architectures. This is a high-impact position driving AI infrastructure from R&D to global commercial deployment, requiring 8+ years of experience in high-performance computing environments and advanced proficiency in C++, Python, and Linux.

Principal SW Engineer - LLM Serving (Cloud AI) | San Diego, California | On-site | $200,800 - $301,200

We're working with a global leader in semiconductor innovation and wireless technology on this exciting opportunity. Join a powerhouse engineering team dedicated to revolutionizing Cloud AI through high-performance LLM inference acceleration and next-generation silicon software.

As a Principal Engineer, you will lead the architecture and deployment of large-scale commercial software solutions. You’ll dive deep into LLM serving frameworks like vLLM and PyTorch to optimize carrier-grade machine learning workloads on multi-core SoC architectures. This is a high-impact role driving the future of AI infrastructure from R&D to global commercial deployment.

The Role

• Lead the design and development of high-performance software for LLM serving, utilizing frameworks like vLLM to maximize inference throughput.

• Architect and optimize neural networks across the full product lifecycle, focusing on Multi-modal and reasoning models for cloud-scale AI environments.

• Perform deep-dive bottleneck analysis and performance modeling on multicore architectures, including NoCs, caches, memory subsystems, and PCIe interfaces.

• Collaborate cross-functionally to bridge the gap between AI compiler technology and hardware acceleration, ensuring seamless integration with machine learning accelerators.

• Write and maintain high-performance, low-latency code in C++ and Python for sophisticated SoC architectures and math libraries.

What You'll Need

• 8+ years of professional software or systems engineering experience (or 6+ years with a PhD) in high-performance computing environments.

• Proven expertise in LLM serving frameworks (vLLM) and strong development skills in PyTorch for optimizing neural networks.

• Advanced proficiency in C++, Python, and Linux systems programming, with a focus on multicore architecture fundamentals (Memory, Bus, SoC).

• Deep understanding of linear algebra, math libraries, and neural network operators essential for machine learning acceleration.

• Master's or PhD in Computer Science or Computer Engineering with a track record of delivering complex commercial software projects at scale.

What's On Offer

• Competitive base salary of $200,800 - $301,200 plus a significant discretionary annual bonus program.

• Generous annual RSU grants, providing true ownership in a global tech pioneer.

• Comprehensive benefits package designed to support health, wealth, and work-life balance.

• Opportunity to work at the forefront of the AI revolution, influencing how the world’s largest models are served and scaled.

Apply via Haystack today!

Ready to apply?

You'll be redirected to Haystack's application page.

Is this role right for you?

Role summary

Similar roles