ECS logo
ECS Verified
Government Contracting, IT Services, Cybersecurity, Cloud Services

Site Reliability Engineer (SRE) / Operations Engineer

Arrington, Virginia, United StatesHybridTemporary$145,000–$180,000 /yrPosted 2 months agoVisa sponsorship available

Is this role right for you?

Upload your resume and get a skill-by-skill breakdown — see exactly where you match, where you're close, and what to highlight. Not a mystery percentage.

Get a tailored resume highlighting what this role needs.

Role summary

ECS is seeking a Site Reliability Engineer (SRE) / Operations Engineer in Arlington, VA, or remotely. This role is responsible for ensuring the reliability, availability, performance, and operational efficiency of enterprise applications and supporting infrastructure. The SRE/Ops Engineer will bridge software engineering and IT operations by applying engineering practices, automation, and monitoring to maintain stable systems and rapidly resolve operational issues. Key responsibilities include maintaining production systems, monitoring health, responding to incidents, implementing automation and IaC, supporting deployment pipelines, collaborating with development teams, managing capacity and performance, and ensuring compliance with security and operational standards. A Bachelor's degree in a related field and a minimum of seven years of experience are required.

ECS is seeking a *Site Reliability Engineer (SRE) / Operations Engineer* to work in our *Arlington, VA* office / *remote*.

ECS is seeking a Site Reliability Engineer (SRE) / Operations Engineer who is responsible for ensuring the reliability, availability, performance, and operational efficiency of enterprise applications and supporting infrastructure. This role bridges software engineering and IT operations by applying engineering practices, automation, and monitoring to maintain stable systems and rapidly resolve operational issues. The SRE/Ops Engineer works closely with development, security, and platform teams to support system deployments, manage incidents, improve observability, and implement resilient architectures that support continuous delivery and mission-critical operations.

Responsibilities

  • Maintain the reliability, availability, and performance of production systems and cloud-based services.
  • Monitor system health using observability tools (metrics, logs, and tracing) and respond to alerts and incidents.
  • Participate in incident response, troubleshooting, and root cause analysis to restore service and prevent recurrence.
  • Implement automation and infrastructure-as-code to improve operational efficiency and reduce manual intervention.
  • Support deployment pipelines and release management processes to enable reliable and repeatable software delivery.
  • Collaborate with development teams to improve application resiliency, scalability, and operational readiness.
  • Develop and maintain operational runbooks, standard operating procedures, and system documentation.
  • Manage system capacity planning, performance tuning, and scaling strategies.
  • Ensure systems comply with security, compliance, and organizational operational standards.
  • Contribute to continuous improvement initiatives by identifying opportunities to reduce operational risk and technical debt.

*Salary Range: $145,000 - $180,000*

*General Description of Benefits*

Requirements:

  • *U.S. Citizenship*
  • *Ability to obtain at minimum a Public Trust suitability designation.*
  • *Bachelor's degree in Computer Science**, Engineering, Information Technology, Information Systems, or a related field*
  • Minimum of seven (7) years of related experience

Req Benefits:

https://ecstech.com/careers/benefits/">https://ecstech.com/careers/benefits/

Ready to apply?
You'll be redirected to ECS's application page.