Hyperbolic Labs logo
Hyperbolic Labs Verified
Video Game Development, Entertainment Software

Senior GPU Infrastructure Engineer

San Francisco, California, United StatesOnsiteFull TimeSeniorPosted 2 months agoVisa sponsorship available

Is this role right for you?

Upload your resume and get a skill-by-skill breakdown — see exactly where you match, where you're close, and what to highlight. Not a mystery percentage.

Get a tailored resume highlighting what this role needs.

Role summary

Hyperbolic is seeking a Senior Infrastructure Engineer to build and scale its GPU Cloud Marketplace. This foundational role involves creating a multi-tenancy provisioning and virtualization solution to transform diverse GPUs into a programmable, orchestrated pool for AI developers and researchers. You will work on the core orchestration layer of cutting-edge cloud infrastructure, aiming to provide significant cost savings compared to traditional cloud providers. The ideal candidate possesses deep expertise in bare-metal provisioning, GPU scheduling, infrastructure automation, and storage solutions for AI/ML workloads.

### Who you are
- Deep understanding of bare-metal provisioning and lifecycle management, including IPMI/Redfish, BMC-based remote management, PXE boot, and automated OS deployment workflows
- Deep understanding of GPU scheduling and orchestration, including GPU type awareness, memory management, topology considerations, placement strategies for multi-GPU jobs, and fragmentation minimization
- Strong infrastructure and DevOps engineering skills with proficiency in Terraform or Pulumi, CI/CD for infrastructure, secrets management, configuration management, and observability stack implementation
- Experience with storage and data infrastructure for AI/ML workloads, including object storage, high-IOPS block storage, and distributed file systems for training data and checkpoints
- Proficiency with API design and cloud-init for automated provisioning and configuration
- Solid understanding of GPU architecture, CUDA, and GPU compute optimization
- Highly collaborative team player with excellent communication skills across technical and non-technical stakeholders
- Proven ability to work effectively with hardware vendors and vendor engineering teams to troubleshoot issues and optimize integrations
- Experience building and scaling cloud infrastructure or distributed systems in production environments
- Familiarity with high-performance networking technologies such as InfiniBand and RoCE (RDMA over Converged Ethernet)
- Experience with distributed storage systems such as Ceph, Weka, or VAST Data

### What the job involves
- We're seeking a Senior Infrastructure Engineer to help build and scale Hyperbolic's GPU Cloud Marketplace, by building a multi-tenancy provisioning and virtualization solution
- This is a foundational role where you'll be responsible for transforming raw GPUs from diverse global suppliers into a programmable, orchestrated pool that serves thousands of AI developers and researchers
- You'll work at the cutting edge of cloud infrastructure, building the core orchestration layer that enables our platform to deliver up to 75% cost savings compared to traditional cloud providers

Ready to apply?
You'll be redirected to Hyperbolic Labs's application page.

Similar roles