Site Reliability Engineer (SRE)
Role summary
Xona is seeking a Site Reliability Engineer (SRE) to manage the critical ground infrastructure for its satellite constellation. This role focuses on ensuring the high availability, scalability, and seamless integration of software controlling orbital assets with Mission Operations. Responsibilities include owning the production environment lifecycle, automating deployments using Infrastructure as Code (IaC), managing core data systems, integrating with Mission Operations, architecting identity providers and databases, developing CI/CD pipelines, implementing comprehensive monitoring and alerting for 99.99% uptime, and performing capacity planning. The role requires 4+ years of cloud operations experience (AWS, GCP, Azure), expertise in Kubernetes (EKS), strong automation skills (Terraform, Ansible, Helm), deep database knowledge, and proficiency in Python and C++.
Xona is the navigational intelligence company bringing real-time, centimeter-level certainty to any device, anywhere on Earth.
With Pulsar – the world’s most advanced PNT satellite infrastructure in Low Earth Orbit – Xona will offer a future-proof, backwards-compatible global positioning system optimized for absolute precision, superior power, and robust protection.
We are seeking a Site Reliability Engineer (SRE) to architect and manage the critical ground infrastructure for our satellite constellation. This role is responsible for the "last mile" of mission success: ensuring that the software controlling our orbital assets is highly available, scalable, and seamlessly integrated with Mission Operations.
You will own the lifecycle of our production environments, from automating deployments via Infrastructure as Code (IaC) to managing the core data systems that track constellation health and user activity.
## Required Qualifications
- Infrastructure as Code (IaC): Design and maintain scalable, repeatable cloud infrastructure (AWS) using tools like Terraform or CloudFormation.
- Mission Ops Integration: Build and optimize the interfaces between core data management systems and Mission Operations software, ensuring reliable telemetry and command flows.
- User & Data Management: Architect and maintain high-availability identity providers (IdP) and distributed databases to support global user access and real-time data processing.
- Automated Deployment Pipelines: Create and manage robust CI/CD pipelines to deploy containerized applications into production with a focus on zero-downtime and rollback capabilities.
- Observability & Reliability: Implement comprehensive monitoring, alerting, and logging (e.g., Prometheus, Grafana, ELK) to ensure 99.99% uptime for ground segment services.
- Scalability Engineering: Perform capacity planning and performance tuning to handle the high-throughput data requirements of a growing satellite constellation.
## Technical Qualifications
- Cloud Operations: 4+ years of experience managing production-grade environments in AWS, GCP, or Azure.
- Orchestration: Expert-level proficiency with Kubernetes (EKS), including networking, ingress controllers, and service mesh management.
- Automation: Strong experience with configuration management and IaC (e.g., Terraform, Ansible, Helm).
- Data Systems: Deep knowledge of SQL and NoSQL database administration, focusing on replication, backup, and disaster recovery.
- Programming: Proficiency in Python and C++ for developing internal tooling and automating complex operational workflows.
- Systems Internals: Strong understanding of Linux networking, storage, and kernel tuning.
## Preferred Qualifications
- Prior experience in Aerospace, Defense, or high-reliability sectors.
- Familiarity with CCSDS standards or satellite ground station software.
- Experience with secure, air-gapped, or hybrid-cloud deployments.
For U.S. Roles: To comply with U.S. Government space technology export regulations, applicant must be a U.S. citizen, lawful permanent resident of the United States (i.e. Green Card holder), or other protected individual as defined by 8 U.S.C. 1324b(a)(3).
For U.K. Roles: To comply with U.K. regulations, this role requires Baseline Personnel Security Standard (BPSS) checks, and successful candidates must be eligible to obtain UK Security Clearance (SC).
For Canada Roles: Successful candidates must obtain and hold a security clearance at the reliability status level, and pass security assessment for the Canadian Controlled Goods Program (CGP) and ITAR.
We celebrate diversity and are committed to creating an inclusive environment for all employees. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or veteran status.
Compensation Range: $170K - $197K
Sample Xona Space Systems interview questions
- 1
Design a Tic Tac Toe game that allows remote play
system designmedium - 2
Design distributed compute systems.
system designaverage - 3
Design a system for managing a distributed CI/CD pipeline.
system designmedium - 4
Design a trending topic system like Twitter's
system designhard - 5
Design the Twitter timeline and search
system designhard
Sign up for a personalized interview prep pack tailored to this role.
Similar roles
Site Reliability Engineer (SRE)Mithril · Palo Alto, California, United States · Hybrid
Senior Site Reliability Engineer (SRE)hackajob · Atlanta, Georgia, United States · Remote- Senior Site Reliability Engineer (SRE)PrizePicks · Georgia, United States · Remote
Site Reliability Engineer (SRE)Samsung Electronics · British Columbia, Canada · Onsite
Senior Site Reliability Engineer (SRE)Samsung Electronics · Canada · Hybrid