
Sr Software Engineer, AV Sensor Observability
Role summary
Uber is seeking a Senior Software Engineer to focus on Sensor Reliability Engineering, ensuring the high availability and data yield of in-vehicle sensor data collection systems. This role involves architecting and implementing observability, alerting, and automation solutions for a large, distributed fleet. The engineer will design systems to detect and respond to various failure modes, including sensor degradation and software regressions, with a strong emphasis on scaling through automation and driving operational efficiency. The position requires proficiency in languages like Go, Python, or C++, experience with production systems, Linux internals, and observability tools such as Prometheus and Grafana. The role emphasizes deep systems thinking and the ability to influence technical direction across teams.
About the Role
We are looking for a Senior Software Engineer to focus on Sensor Reliability Engineering, owning the observability, alerting, and automation that ensures Uber's in-vehicle sensor data collection systems operate reliably at scale.
This role is centered on maximizing sensor uptime, data yield, and supply hours across a large, geographically distributed fleet. You will design systems that determine when to react to issues impacting data recording capability, whether caused by failing sensors, degraded onboard computers, software regressions, or systemic environmental factors.
As the technical owner for sensor reliability and observability, you will build the infrastructure that converts low-level signals into actionable intelligence and automated responses. This is a senior role requiring strong software engineering fundamentals, deep systems thinking, and the ability to drive cross-team technical direction without direct authority.
- What the Candidate Will Do -
- Architect Observability Systems: Design and implement monitoring infrastructure for in-vehicle sensor packages and recording pipelines, covering signal ingestion, storage, and correlation.
- Build for Edge Constraints: Develop systems that remain performant despite hardware diversity, intermittent connectivity, and rapid fleet scaling.
- Define Criticality Models: Establish alerting strategies that distinguish transient anomalies from systemic issues impacting sensor uptime and data yield.
- Detect Complex Failure Modes: Design detection logic for "silent" failures, such as sensor degradation, compute saturation, or recording pipeline stalls.
- Scale Through Automation: Design automated detection, triage, and mitigation mechanisms to eliminate manual intervention as the fleet grows.
- Partner on Mitigation: Collaborate with Operations and Engineering to build safe, automated responses to recurring hardware and software failure scenarios.
- Enable Observability by Design: Partner with hardware and platform teams to define the signals and data contracts required for deep-stack visibility.
- Drive Operational Efficiency: Build technical interfaces to help Operations surface issues and Engineering diagnose and deploy mitigations rapidly (TTD/TTM).
- Own Modern Infrastructure: Lead the deployment and evolution of fleet-wide reporting systems using Infrastructure as Code (IaC) best practices.
- Lead Technical Strategy: Drive reliability-focused design reviews and translate operational pain points into concrete technical requirements and high-priority roadmaps.
- Basic Qualifications -
- Proficiency in one or more of Go, Python, or C++, with experience building and operating production systems.
- Proficiency in Linux internals and shell scripting for triaging and debugging edge devices or hardware-adjacent systems.
- Strong software engineering fundamentals with the ability to debug across services, containers (Docker), and networking stacks.
- Proven experience owning reliability, infrastructure, or platform systems for large-scale production workloads.
- Experience designing and operating observability systems, including metrics, logging, alerting, and dashboarding (e.g., Prometheus, Grafana).
- Experience defining and implementing Service Level Indicators (SLIs) and Objectives (SLOs) for system availability or data yield.
- Deep understanding of networking protocols (TCP/IP, gRPC, or MQTT) and data handling in bandwidth-constrained or intermittent environments.
- Track record of driving complex technical projects and architectural reviews across multiple teams from design through production.
- Preferred Qualifications -
- Experience leading large-scope reliability or infrastructure initiatives consistent with a Senior/Staff role.
- Deep experience with modern observability platforms (e.g., Prometheus, Grafana, ELK), especially in edge, IoT, or hardware-integrated environments.
- Experience designing alerting strategies and criticality models that balance signal quality, noise reduction, and operational impact.
- Strong automation mindset, including building self-healing systems for automated detection, triage, or mitigation of hardware-related failures.
- Experience operating systems where uptime, data yield, or hardware availability are core business KPIs.
- Proven ability to design reliability systems that remain effective as hardware platforms, software stacks, and data collection workflows evolve.
- Knowledge of sensor data protocols (e.g., Camera, LiDAR, Radar) or hardware-to-cloud data ingestion pipelines.
- Experience with "Grey Failure" detection and management in complex, distributed systems.
Background in analyzing "Fleet-level" performance metrics to identify systemic regressions across software versions or hardware revisions.
For Sunnyvale, CA-based roles: The base salary range for this role is USD$202,000 per year - USD$224,000 per year. You will be eligible to participate in Uber's bonus program, and may be offered an equity award & other types of comp. All full-time employees are eligible to participate in a 401(k) plan. You will also be eligible for various benefits. More details can be found at the following link https://jobs.uber.com/en/benefits.
Sample Uber interview questions
- 1
Design a truck tracking system that supports filtering by truck number and includes an interface
system designmedium - 2
Design Uber Eats
system designmedium - 3
Given a sorted array of integers (which may include negatives), return the squares of the numbers
codingmedium - 4
Find the minimum characters to insert to make a string a palindrome
codingmedium - 5
Given an array of integers and a number N, find the length of the longest contiguous subarray such
codingmedium
Sign up for a personalized interview prep pack tailored to this role.