Diligent Robotics logo
Diligent Robotics Verified
Robotics, Healthcare Technology, Artificial Intelligence

Software Engineer - Observability & Debugging

Austin, Texas, United StatesFull TimePosted 2 months agoVisa sponsorship available

Is this role right for you?

Upload your resume and get a skill-by-skill breakdown — see exactly where you match, where you're close, and what to highlight. Not a mystery percentage.

Get a tailored resume highlighting what this role needs.

Role summary

Diligent Robotics is seeking a Software Engineer specializing in Observability & Debugging to enhance their production robotics systems. This role involves building and maintaining tools for debugging, diagnosing, and improving the performance of service robots operating in real-world environments. You will collaborate with robotics engineers and operations teams to develop systems for logging, event collection, replay, and metrics generation, ensuring bugs are reproducible and development is scalable. The ideal candidate has strong C++ and Python skills, some robotics experience, and a background in building observability and debugging systems for distributed or real-time applications.

What we’re doing isn’t easy, but nothing worth doing ever is.

We envision a future powered by robots that work seamlessly with human teams. At Diligent Robotics, we build artificial intelligence that enables service robots to collaborate with people and adapt to dynamic, human-filled environments.

Diligent is one of the only companies in the world operating a production fleet of mobile manipulation robots in real environments. Every day, our robots work alongside hospital staff, generating the real-world data needed to advance the next generation of Physical AI. Debugging autonomy in the real world is fundamentally different than debugging in the lab, and solving that challenge requires exceptional tooling and infrastructure.

As a Software Engineer – Observability & Debugging, you will strengthen our team’s ability to understand, diagnose, and improve the performance of our robotics applications in production. You will work closely with robotics engineers and operations teams to build the tools, systems, and standards that allow us to debug, triage, and root-cause robot performance issues quickly and reliably.

Our goal is that every bug should be reproducible. You will help us get there by building the observability, replay, and debugging systems that make real-world robotics development scalable.

Responsibilities

  • Build and maintain observability tooling that supports debugging and root-cause analysis of robot performance in real-world deployments
  • Define and standardize triage workflows and instrumentation practices across the robotics stack
  • Develop reliable mechanisms for collecting, curating, and replaying robot logs, events, and telemetry (“debug + replay” systems)
  • Own critical incident tooling foundations, such as our structured logging and application replay systems, and evolve them into scalable, easy-to-use systems
  • Improve and expand on-robot metrics generation: what we measure, how we measure it, and how quickly we can interpret it
  • Integrate and extend visualization and introspection tools (e.g., Foxglove) for fast iteration and effective triage
  • Partner with robotics platform and applications teams to add instrumentation to key subsystems (behavior, planning, localization, controls, etc.)
  • Drive improvements in data management pipelines: upload flows, retention policies, indexing/search, and developer ergonomics
  • Mentor others on best practices for instrumentation, debugging, reproducibility, and operational excellence
  • Basic Qualifications

  • Undergraduate or graduate degree in Robotics, Computer Science, Electrical Engineering, or related field (or equivalent experience)
  • Strong proficiency in C++ and Python
  • Some robotics experience (comfortable reading autonomy logs, reasoning about robot state, and debugging cross-system behaviors)
  • Experience building observability/debugging systems (structured logging, metrics, tracing, event pipelines, replay tooling, dashboards)
  • Familiarity with developer workflows for diagnosing distributed or real-time systems (profiling, postmortems, regression analysis)
  • Nice to have:
  • Foxglove (or similar robotics visualization/telemetry tooling)
  • Log replay / bag replay systems (ROS bags or equivalent)
  • Data pipeline experience (capture → upload → storage → indexing → retrieval)
  • Ready to apply?
    You'll be redirected to Diligent Robotics's application page.