Voxel51 logo
Voxel51 Verified
Artificial Intelligence, Machine Learning, Computer Vision, Open Source Software

Principal Infrastructure Engineer

United StatesRemoteFull TimePrincipal$250,000–$280,000 /yrPosted 2 months agoVisa sponsorship available

Is this role right for you?

Upload your resume and get a skill-by-skill breakdown — see exactly where you match, where you're close, and what to highlight. Not a mystery percentage.

Get a tailored resume highlighting what this role needs.

Role summary

Voxel51 is seeking a Principal Infrastructure Engineer to shape the architecture and strategy of their platform's infrastructure. This role involves leading the design of containerized systems, CI/CD pipelines, and deployment solutions across cloud and on-premises environments, with a focus on serving unstructured data at scale. The engineer will partner with enterprise customers for production deployments, troubleshoot complex issues, and mentor peers. Key responsibilities include driving best practices in CI/CD, developing robust internal tooling, and ensuring the reliability, security, and scalability of the Voxel51 platform. This is a remote-first position.

### Who you are
- Deep experience with containerized environments
- Building, packaging, and debugging container images
- Kubernetes (and Docker Compose) for orchestration
- Building, maintaining, and deploying Helm charts
- Infrastructure as Code expertise (Terraform, Ansible, or equivalent)
- Scripting and automation skills (Bash or similar)
- Python expertise, including build and environment management, packaging/distribution, release management, and dependency debugging
- CI/CD systems experience, ideally GitHub Actions (we use this today)
- Cloud infrastructure knowledge, especially GCP (IAM, VPC, load balancing, ingress/egress routing, proxies, firewall rules)
- Database fundamentals, ideally MongoDB or similar NoSQL systems
- Observability skills, including designing meaningful monitors, logging, tracing, and alerting
- Security best practices, including certificates, service accounts, least privilege, and role assumptions
- Troubleshooting ability across complex, distributed systems (including with customers in the loop)
- Testing mindset: comfortable with designing and applying different types of tests to validate functionality
- Strong communication skills, with the ability to work directly with enterprise customers as well as collaborate across teams in a remote-first, collaborative environment
- Adaptability and curiosity, with the ability to ramp quickly on unfamiliar concepts and technologies

### What the job involves
- As a Principal Infrastructure Engineer at Voxel51, you will shape the architecture and strategy of the systems that power our platform — from individual researchers to enterprise-scale deployments
- You’ll lead the design of containerized systems, CI/CD pipelines, and deployment solutions across cloud and on-premises environments, while solving the unique challenges of serving unstructured data (images and video) at scale
- You’ll partner with enterprise customers, guiding and troubleshooting their production deployments. You’ll collaborate across engineering teams to improve developer productivity, and mentor peers while setting infrastructure best practices
- Your work will directly shape the reliability, security, and scalability of Voxel51’s platform — and accelerate our mission to democratize data-centric ML
- Shape the architecture and evolution of Voxel51’s infrastructure to support deployments ranging from individual researchers to Fortune 500 enterprises
- Design, build, and scale deployment systems across cloud (GCP, AWS, Azure) and on-premises environments, ensuring reliability, security, and repeatability
- Partner with enterprise customers (and our Customer Success Machine Learning Engineers) to deliver and support production-grade deployments in their environments, guiding them through installation, troubleshooting, and scaling
- Lead infrastructure initiatives across engineering teams, enabling peers to develop, test, and ship features faster with robust internal tooling and automation
- Drive best practices in CI/CD, evolving our pipelines (currently GitHub Actions + Google Cloud Build) and introducing new approaches where they add value
- Develop and maintain deployment solutions for Voxel51-hosted environments (GKE) as well as customer on-prem installations (K8s or Docker Compose)
- Champion developer productivity, improving workflows for development and automated cloud deployments
- Troubleshoot and resolve complex infrastructure issues, spanning build failures, runtime failures, and customer deployment challenges
- Anticipate and prevent failures by designing monitoring, alerting, and predictive solutions for both internal and customer environments
- Mentor engineers and set technical direction, ensuring Voxel51’s infrastructure remains ahead of customer needs and industry trends

### Benefits
- Medical, Dental & Vision
- Flexible work schedules
- Generous PTO
- Family-friendly policies
- Stock options
- 401k matching
- Flexible set-up
- Rich team building
- Healthy remote playbook

Ready to apply?
You'll be redirected to Voxel51's application page.

Similar roles