Infrastructure Engineer

San Mateo, California, United StatesHybridFull TimePosted 2 months ago

Is this role right for you?

Upload your resume and get a skill-by-skill breakdown — see exactly where you match, where you're close, and what to highlight. Not a mystery percentage.

Get a tailored resume highlighting what this role needs.

Role summary

Archetype AI is seeking an Infrastructure Engineer to own the backend services and cloud infrastructure for their AI platform. This role involves architecting, implementing, and maintaining distributed systems for AI model inference and data services, as well as provisioning and managing cloud infrastructure (AWS, GCP, Azure) using IaC tools like Terraform. The engineer will also build and operate Kubernetes-based platforms for production workloads. The position requires 7+ years of experience in backend or distributed systems, a deep understanding of distributed systems fundamentals, and hands-on experience with cloud environments, Kubernetes, and IaC. This is a hybrid role based in San Mateo, California, focused on driving system reliability and scalability for a rapidly growing AI company.

About Archetype AI
Archetype AI is developing the world's first AI platform to bring AI into the real world. Formed by an exceptionally high-caliber team from Google, Archetype AI is building a foundation model for the physical world, a real-time multimodal LLM for real life, transforming real-world data into valuable insights and knowledge that people will be able to interact with naturally. It will help people in their real lives, not just online, because it understands the real-time physical environment and everything that happens in it.
Supported by deep tech venture funds in Silicon Valley, Archetype AI is currently at the Series A stage and is progressing rapidly to develop technology for their next stage. This presents a unique and once-in-a-lifetime opportunity to be part of an exciting AI team at the beginning of their journey, located in the heart of Silicon Valley.
Our team is headquartered in San Mateo, California, with team members throughout the US and Europe.
We are actively growing, so if you are an exceptional candidate excited to work on the cutting edge of physical AI and don’t see a role that exactly fits you below you can contact us directly with your resume via jobsarchetypeaiio.
About Job
This role will own the backend services and cloud infrastructure that power Archetype AI’s production platform—driving system reliability, scalability, and operational excellence as the company scales to meet growing customer and research demands. The engineer will work across the full stack of distributed systems and cloud platform concerns, from designing high-throughput services to provisioning and automating the infrastructure they run on.
Core Responsibilities

Architect, implement, and maintain distributed systems that support high-throughput, low-latency AI model inference and data services.
Design, provision, and manage cloud infrastructure (AWS, GCP, and/or Azure) including compute, networking, storage, and IAM—using infrastructure-as-code tools such as Terraform, Pulumi, or CloudFormation.
Build and operate Kubernetes-based platforms for deploying and scaling production workloads, including GPU-accelerated inference services

Minimum Qualifications

7+ years of professional software engineering experience, with a focus on backend or distributed systems.
Deep understanding of distributed systems fundamentals—concurrency, consistency, replication, fault tolerance, networking.
Hands-on experience building and operating production infrastructure in cloud environments (AWS, GCP, and/or Azure), including compute, networking, and storage services.
Working knowledge of container orchestration (Kubernetes) and infrastructure-as-code (Terraform, Pulumi, or similar).
Strong debugging, instrumentation, and observability skills across distributed systems and cloud infrastructure.
Demonstrated ownership of complex technical problems and ability to learn and adapt quickly.

Preferred / Nice-to-Have Skills

Proven track record of scaling systems through rapid growth and rebuilding or refactoring for new demands.
Experience designing and operating multi-region or multi-cloud deployments with high availability and disaster recovery.
Proficiency in systems programming languages (e.g., Rust, C++) and scripting environments (e.g., Python).
Experience with Kubernetes ecosystem tooling—Karpenter, Kueue, Helm, ArgoCD, or similar—for workload scheduling, autoscaling, and GitOps.
Familiarity with CI/CD systems, service mesh architectures, and secrets/config management at scale.
Experience with FIPS compliance, container hardening, or government cloud environments (C2S/SC2S, GovCloud).
Familiarity with modern ML stacks and hardware acceleration (e.g., PyTorch, CUDA)

Ready to apply?

You'll be redirected to Archetype AI's application page.

Is this role right for you?

Role summary

Similar roles