Hamming AI logo
Hamming AI Verified
Artificial Intelligence / Machine Learning / Software

Backend / Infra Engineer

San Francisco, California, United StatesOnsiteFull Time$140,000–$200,000 /yrPosted 2 months agoHidden Gem · YC Startup

Is this role right for you?

Upload your resume and get a skill-by-skill breakdown — see exactly where you match, where you're close, and what to highlight. Not a mystery percentage.

Get a tailored resume highlighting what this role needs.

Role summary

Hamming AI is seeking a Backend / Infra Engineer to own reliability and scale for their LLM-enabled platform, which automates QA for voice AI agents. The role involves working with TypeScript/Node.js and Python to manage core services, scale systems to handle 10,000+ parallel calls with high uptime, and enhance observability using OpenTelemetry/SigNoz. Responsibilities include hardening pipelines, prototyping LLM features, and managing CI/CD and incident response. The ideal candidate has experience with distributed backends, real-time/streaming constraints, workflow engines like Temporal, cloud-native AWS infrastructure with Terraform, and LLM application development. This is a full-time, remote (North America) or hybrid role based in Austin, TX.

**Location: Remote (North America) or Austin, TX**

**Employment Type: Full-time (no contractors)**

**Department: Engineering**

### **About Hamming AI**

Hamming automates QA for voice AI agents. Everyone is building voice agents. We secure them. In fact, we invented this category. With one click, **thousands of our agents call our customers’ agents** across accents, background noise, and personalities—then we generate **crisp bug reports** and production-grade analytics. Reliability is the moat in voice AI, and that’s our whole job.

We are one of the fastest engineering teams in the world. We prod deploy 4x / day.

I’m looking for someone who can **own reliability and scale** across our LLM-enabled platform, shipping precise, outcome-driven improvements to high-availability systems.

— Sumanyu (CEO)\
Previously: grew Citizen 4× and scaled an AI sales program to $100Ms/yr at Tesla.

[Devin Case Study](https://devin.ai/customers/hamming)

[Ranked #1 Eng team](https://www.linkedin.com/posts/andrew-d-churchill_sumanyu-ceo-at-hamming-ai-had-20x-the-output-activity-7354564854029901824-jSDP?utm_source=share&utm_medium=member_desktop&rcm=ACoAAAh-eI0Bl3EbQhXGlnjcZ4TbyeByvoh32hg)

[OpenAI Dev Day 100billion token list](https://www.linkedin.com/posts/sumanyusharma_openaidevday-activity-7381718141455859712-OPhA?utm_source=share&utm_medium=member_desktop&rcm=ACoAAAh-eI0Bl3EbQhXGlnjcZ4TbyeByvoh32hg)

### **What you’ll do**

* **Own core services** in **TypeScript/Node.js** and **Python** that orchestrate **LiveKit**, **Temporal**, STT/TTS, and LLM tooling for real-time voice agents.
* **Scale 1 → N → 100×**: take what works today and harden it for 10K parallel calls with **99.99%** uptime. Turn human playbooks into productized systems.
* **Harden pipelines** for ingestion, evaluation, and analytics so telephony events, recordings, and outcomes propagate reliably across services.
* **Level-up observability**: deepen **OpenTelemetry/SigNoz** and trace-first practices to shrink mean-time-to-truth in prod.
* **Prototype → test → prod**: partner with product to ship new LLM-driven behaviors with clear success metrics, guardrails, and regressions blocked in CI.
* **Infrastructure readiness**: CI/CD, environment automation, incident response playbooks—customer conversations stay online.

### **You might be a fit if you**

* Have **senior/staff** experience running distributed backends with **real-time/streaming** constraints.
* Are fluent in **TypeScript/Node.js** and comfortable jumping into **Python** for ML/audio jobs.
* Know **Temporal** (or similar workflow engines), queues, Redis, and **PostgreSQL**.
* Have **shipped production LLM apps** and understand prompt/tool design, evals, and guardrail instrumentation.
* Operate cloud-native on **AWS** with **Terraform**; k8s doesn’t scare you.
* Are a **power user of Cursor/Zed/Devin** and were using code-gen before it was cool.
* Have intuition for what current-gen LLMs can/can’t do—and what tomorrow’s models will unlock.
* Think independently, **grind with customers**, and do whatever it takes—without dropping the quality bar.
* Bonus: built 0→1 **real-time systems** in Telecom/Networking, Autonomous Vehicles, or HFT; founded something; built **AI voice** apps.

### **Interesting problems you’ll touch**

* **Voice simulations that feel real**: accents, overlapping speech, crosstalk, background noise, barge-ins.
* **Massive concurrency**: **10,000+ parallel calls** with deterministic behavior and graceful degradation.
* **Temporal-driven orchestration** for long-running, interruptible call flows.
* **Closed-loop reliability**: turn prod failures into auto-generated tests and blocked deploys.
* **Trace-everything** culture: make “what happened?” a 30-second question, not a war room.

### **How we work**

* **Outcomes over output**: we adjust roadmaps when new data lands.
* **Demo early** and document decisions so context moves fast.
* **Own incidents**: lead the investigation, write crisp notes, land durable fixes.
* **Direct, candid, respectful** communication keeps remote teammates in lockstep with Austin HQ.

### **Our stack**

* **App**: Next.js, TypeScript, Tailwind
* **AI**: OpenAI, Anthropic, STT/TTS providers
* **Realtime/Orchestration**: LiveKit, Pipecat/Daily, Temporal
* **Infra/DB**: AWS, k8s, PostgreSQL, Redis, Terraform
* **Observability**: OpenTelemetry, SigNoz

### **Apply**

If you want to make **AI voice agents reliable at scale**, let’s talk.
Ready to apply?
You'll be redirected to Hamming AI's application page.