High Trail Verified
Financial Services, Investment Management, Hedge Fund
AI Infrastructure Engineer
San Francisco, California, United StatesOnsiteFull Time$180,000–$220,000 /yrPosted 19 days ago
Overview
Build and own the observability and diagnostics layer for a real-time AI assistant platform. You’ll make complex AI systems transparent, debuggable, and reliable by enabling end-to-end tracing, rapid root-cause analysis, and real-time monitoring.
Responsibilities
- Design event tracing across AI decisioning, workflows, and real-time communication systems
- Build automated pipelines to detect, classify, and analyze system failures
- Create dashboards for real-time and post-session visibility (timelines, decision paths, errors)
- Monitor live sessions and surface alerts for anomalies (latency, loops, failed actions)
- Enable human intervention tools for in-session issue handling
- Identify recurring failure patterns and drive system improvements
- Implement automated triage and alerting to route issues to the right teams
Requirements
- Strong backend experience with distributed systems and observability
- Proficiency in Python and event-driven architectures
- Experience debugging complex systems
- Familiarity with AI/LLM systems, workflow/state machines, and telemetry tools
Nice to Have
- Experience with real-time/voice systems
- Observability tools (e.g., Grafana, OpenTelemetry)
- Exposure to human-in-the-loop systems or operational tooling
Similar roles
AI Infrastructure EngineerPercepta · New York, New York, United States · Onsite- AI Infrastructure EngineervCluster · New York, New York, United States · Remote
- AI Infrastructure EngineerJroberts Defence & Security Inc. · Toronto, Ontario, Canada · Onsite
- AI Infrastructure EngineervCluster Labs · Boston, Massachusetts, United States · Remote
- AI Infrastructure EngineerScout Motors Inc. · Charlotte, North Carolina, United States · Onsite