
Site Reliability Engineer (SRE) / Operations Engineer
Role summary
ECS is seeking a Site Reliability Engineer (SRE) / Operations Engineer in Arlington, VA, or remotely. This role is responsible for ensuring the reliability, availability, performance, and operational efficiency of enterprise applications and supporting infrastructure. The SRE/Ops Engineer will bridge software engineering and IT operations by applying engineering practices, automation, and monitoring to maintain stable systems and rapidly resolve operational issues. Key responsibilities include maintaining production systems, monitoring health, responding to incidents, implementing automation and IaC, supporting deployment pipelines, collaborating with development teams, managing capacity and performance, and ensuring compliance with security and operational standards. A Bachelor's degree in a related field and a minimum of seven years of experience are required.
ECS is seeking a *Site Reliability Engineer (SRE) / Operations Engineer* to work in our *Arlington, VA* office / *remote*.
ECS is seeking a Site Reliability Engineer (SRE) / Operations Engineer who is responsible for ensuring the reliability, availability, performance, and operational efficiency of enterprise applications and supporting infrastructure. This role bridges software engineering and IT operations by applying engineering practices, automation, and monitoring to maintain stable systems and rapidly resolve operational issues. The SRE/Ops Engineer works closely with development, security, and platform teams to support system deployments, manage incidents, improve observability, and implement resilient architectures that support continuous delivery and mission-critical operations.
Responsibilities
- Maintain the reliability, availability, and performance of production systems and cloud-based services.
- Monitor system health using observability tools (metrics, logs, and tracing) and respond to alerts and incidents.
- Participate in incident response, troubleshooting, and root cause analysis to restore service and prevent recurrence.
- Implement automation and infrastructure-as-code to improve operational efficiency and reduce manual intervention.
- Support deployment pipelines and release management processes to enable reliable and repeatable software delivery.
- Collaborate with development teams to improve application resiliency, scalability, and operational readiness.
- Develop and maintain operational runbooks, standard operating procedures, and system documentation.
- Manage system capacity planning, performance tuning, and scaling strategies.
- Ensure systems comply with security, compliance, and organizational operational standards.
- Contribute to continuous improvement initiatives by identifying opportunities to reduce operational risk and technical debt.
*Salary Range: $145,000 - $180,000*
*General Description of Benefits*
Requirements:
- *U.S. Citizenship*
- *Ability to obtain at minimum a Public Trust suitability designation.*
- *Bachelor's degree in Computer Science**, Engineering, Information Technology, Information Systems, or a related field*
- Minimum of seven (7) years of related experience
Req Benefits:
https://ecstech.com/careers/benefits/">https://ecstech.com/careers/benefits/