FarmGPU logo
FarmGPU Verified
Cloud Computing, AI Infrastructure, Deep Learning

Principal Software Engineer - Storage Systems

California, United StatesRemoteFull TimePrincipal$120,000–$200,000 /yrPosted 2 months ago

Is this role right for you?

Upload your resume and get a skill-by-skill breakdown — see exactly where you match, where you're close, and what to highlight. Not a mystery percentage.

Get a tailored resume highlighting what this role needs.

Role summary

FarmGPU is seeking a Principal Software Engineer to lead the design and implementation of next-generation storage and data services for its GPU-powered cloud computing platform. This role involves architecting and building scalable storage systems (block, object, file) optimized for AI workloads, deploying and managing ISV storage solutions, and driving the evolution of storage integration within the cloud platform. The engineer will also focus on data management capabilities, performance tooling, and ensuring security and compliance. The position requires 5+ years of experience in storage software engineering or distributed systems, deep expertise in storage protocols and systems like NVMe and RDMA, and proficiency in Linux environments and automation. This is a remote, full-time position with a salary range of $120,000 - $200,000.

About FarmGPU

FarmGPU is redefining the future of
GPU-powered cloud computing
, delivering cost-effective, scalable, high-performance GPU infrastructure tailored for AI developers, startups, and enterprises globally. Our vertically integrated platform transforms data centers into AI-optimized facilities, accelerates storage-intensive training and inference workflows with GPU-direct architectures, and delivers on-demand compute via strategic partnerships such as with RunPod Secure Cloud. With sustainability, performance, and innovation at our core, we challenge the status quo of traditional cloud providers.

As we scale FarmGPU’s storage systems to support high-bandwidth, low-latency AI workloads—across on-demand clusters and long-running enterprise deployments—we’re seeking a
Principal Software Engineer
to lead the design and implementation of our next-generation storage and data services.

What You’ll Do

As a senior technical leader on the infrastructure team, you will:

- Architect and build scalable storage systems
(block, object, and file) optimized for GPU-centric AI workloads, including
GPU-direct storage pipelines, NVMe high-bandwidth fabrics, and distributed storage integration
.
- Deploy and manage leading storage ISVs such as VAST data, WEKA, MinIO
- Design and deploy high-performance and durable distributed storage systems such as Ceph
- Drive the design and evolution of storage integration in our cloud platform
, ensuring seamless performance for training, inference, and data-intensive workflows across global data centers.
- Collaborate with cross-functional teams
(SRE, DevOps, Product, and partners like RunPod) to define storage tiering strategies, replication/topology plans, and workload-aware data placement.
- Implement and optimize data management capabilities
, including lifecycle management, tiering, compression, and encryption with customer-managed keys.
- Build and maintain performance tooling and telemetry
, enabling real-time insights into storage throughput, latency, and utilization to support both internal operations and customer SLAs.
- Ensure adherence to security, compliance, and governance practices
, including encryption, access controls, and data isolation for multi-tenant and enterprise workloads.

What You Bring

- BS/MS in Computer Science or related technical field
, or equivalent practical experience.
- 5+ years of experience in storage software engineering, distributed systems, or cloud infrastructure
with hands-on implementation experience.
- Deep expertise in
storage protocols and systems
, such as
NVMe, RDMA, distributed object storage, erasure coding, storage networking
, or comparable technologies.
- Experience working with storage ISV platforms (e.g. VAST data, MinIO)
- Strong background in
Linux environments, systems programming, and performance optimization
.
- Proficiency with automation programming,
python, ansible, etc.
- Experience building
high-throughput, low-latency data paths
for demanding workloads (AI/ML, HPC, analytics).
- Familiarity with
cloud native storage integrations
, container orchestration (Kubernetes), and CSI drivers.
- Excellent communication and collaboration skills, capable of influencing across engineering and product teams.

Preferred Qualifications

- Experience with
GPU-direct storage architectures
and integration with
DPUs (e.g., NVIDIA BlueField)
.
- Background in
data center grade storage deployments
, including performance benchmarking (e.g., MLPerf storage) and telemetry analysis.
- Familiarity with
security and compliance standards
such as SOC 2, HIPAA, GDPR.
- Experience contributing to or leading
open source storage or cloud infrastructure projects
.
- Prior work on
backup, disaster recovery, or replication technologies
in distributed systems.
- Expertise in
storage cost optimization and capacity planning
for large-scale environments.

Why FarmGPU?

- Mission-driven infrastructure innovation
—build the foundation for the next generation of AI compute platforms.
- Technical depth and impact
—lead work at the intersection of storage, networking, and high-performance compute.
- Remote-first, collaborative environment
with opportunities to influence product strategy and engineering culture.
- Join us as we expand our platform capabilities and help customers deploy high throughput AI workloads globally.

Compensation

  • $120,000 - $200,000
Ready to apply?
You'll be redirected to FarmGPU's application page.