Richtech Creative Displays logo
Richtech Creative Displays Verified
Retail Technology, Digital Signage, Manufacturing, Interactive Displays

AI Infrastructure Engineer

Las Vegas, Nevada, United StatesOnsiteFull Time$80,000–$120,000 /yrPosted 28 days agoVisa sponsorship available

Is this role right for you?

Upload your resume and get a skill-by-skill breakdown — see exactly where you match, where you're close, and what to highlight. Not a mystery percentage.

Get a tailored resume highlighting what this role needs.

Role summary

Seeking an AI Infrastructure Engineer to deploy and manage high-performance GPU hardware and AI software stacks. Responsibilities include provisioning NVIDIA GPUs, optimizing AI software (CUDA, cuDNN), managing containerized environments with Docker and GPU partitioning, and configuring high-performance networks (InfiniBand, RoCE). The role also involves Linux administration, network engineering (firewalls, switches, routing), and integrating telemetry and billing systems using scripting languages like Python, Go, or Bash. Requires 3-5+ years of experience in relevant fields and expertise in the NVIDIA AI Enterprise ecosystem.

AI Infrastructure Engineer
Responsibilities
1. NVIDIA GPU & Hardware Infrastructure Deployment

  • Hardware Provisioning: Rack, stack, configure, and maintain high-performance bare-metal GPU servers (e.g., NVIDIA H200, B300 or equivalent Supermicro/Dell/HGX architectures).
  • AI Software Stack: Install, update, and optimize NVIDIA Drivers, CUDA Toolkit, cuDNN, and NVIDIA Container Toolkit on physical host machines.
  • Containerization & Orchestration: Manage GPU-accelerated environments using Docker, including configuring GPU partitioning (MIG/vGPU) for optimal resource allocation.
  • Network & Systems Engineering
  • High-Performance Networks: Configure and optimize InfiniBand (IB) switches and RoCE (RDMA over Converged Ethernet) to ensure ultra-low latency and maximum throughput for multi-GPU training workloads.
  • Core Infrastructure: Manage enterprise firewalls, core switches, VLANs, and local network routing to ensure high security and stability of the data center network.
  • Linux Administration: Oversee Linux server administration (Ubuntu, RHEL, or Rocky Linux), including automated OS provisioning and local storage clusters.
  • Metering & Billing System Integration
  • Resource Metering: Implement and configure telemetry tools to accurately monitor and log GPU time, CPU utilization, storage usage, and network traffic.
  • Billing System Management: Maintain and integrate usage-based billing/metering engines to track infrastructure costs or client usage.
  • Automation: Write robust scripts (Python, Go, or Bash) to link data center resource telemetry with the billing platform for precise invoicing and automated usage reporting.

Qualifications & Skills
Required Qualifications:

  • Experience: 3-5+ years of experience in Network Engineering, Linux Systems Administration, or DevOps, with hands-on experience in GPU infrastructure deployment.
  • Linux & Automation: Expert-level knowledge of Linux environments and infrastructure-as-code/automation tools (Ansible, Terraform, or SaltStack).
  • NVIDIA Ecosystem: Deep technical understanding of the NVIDIA AI Enterprise stack (CUDA, NCCL, NVLink).
  • Billing/Metering Awareness: Practical experience working with usage-based tracking, billing APIs, or internal chargeback tools.

Pay: $80,000.00 - $120,000.00 per year

Benefits:

  • Dental insurance
  • Health insurance
  • Paid time off
  • Vision insurance

Work Location: In person

Ready to apply?
You'll be redirected to Richtech Creative Displays's application page.

Similar roles