Site Reliability Engineer

Austin, Texas, United StatesHybridContractPosted 2 months agoVisa sponsorship available

Is this role right for you?

Upload your resume and get a skill-by-skill breakdown — see exactly where you match, where you're close, and what to highlight. Not a mystery percentage.

Get a tailored resume highlighting what this role needs.

Role summary

The Texas Health and Human Services Commission is seeking a Site Reliability Engineer for a 4+ month contract position in Austin, TX. This hybrid role focuses on ensuring the reliability, availability, performance, and scalability of production systems by applying software engineering practices to infrastructure and operations. The engineer will partner with development teams to build resilient, observable, and automated platforms that meet defined service level objectives (SLOs). The role requires 8+ years of experience in systems engineering, DevOps, or SRE, with strong proficiency in Linux/Unix, programming/scripting languages (Python, Go, Java, Bash), distributed systems, cloud platforms (AWS, GCP), containerization (Docker, Kubernetes), and monitoring/alerting concepts. Familiarity with incident management, security integration, and observability tools is also expected.

Direct End Client: Texas Health and Human Services Commission

Job Title: Site Reliability Engineer

Location: 4601 W. Guadalupe Street, Austin, TX 78701 (Hybrid)

Duration: 4+ Months

Position Type: Contract

Hours Per Week: 40 Hr

Interview Mode: Webcam or In Person

Ceipal ID: STX\_SRE671\_MA

Requirement ID: 529601671

Texas Health and Human Services Commission
requires the services of
1 Systems Analyst 3
, hereafter referred to as Candidate(s), who meets the general qualifications of
Systems Analyst 3, Applications/Software Development
and the specifications outlined in this document for the
Texas Health and Human Services Commission
.

8 or more years of experience, relies on experience and judgment to plan and accomplish goals, independently performs a variety of complicated tasks, a wide degree of creativity and latitude is expected.

Understands business objectives and problems, identifies alternative solutions, performs studies and cost/benefit analysis of alternatives. Analyzes user requirements, procedures, and problems to automate processing or to improve existing computer system: Confers with personnel of organizational units involved to analyze current operational procedures, identify problems, and learn specific input and output requirements, such as forms of data input, how data is to be; summarized, and formats for reports. Writes detailed description of user needs, program functions, and steps required to develop or modify computer program. Reviews computer system capabilities, specifications, and scheduling limitations to determine if requested program or program change is possible within existing system.

Site Reliability Engineer will be responsible for ensuring the reliability, availability, performance, and scalability of production systems by applying software engineering practices to infrastructure and operations. Partners with development teams to build resilient, observable, and automated platforms that meet defined service level objectives (SLOs).

Skills:

8 Required experience in systems engineering, DevOps, or site reliability engineering roles

8 Required Strong experience with Linux/Unix systems and system internals

8 Required Proficiency in one or more programming/scripting languages (Python, Go, Java, Bash)

8 Required Experience designing and operating highly available, distributed systems

8 Required Strong knowledge of cloud platforms (AWS, or GCP) and cloud-native services

8 Required Experience with containerization and orchestration (Docker, Kubernetes)

8 Required Strong understanding of monitoring, alerting, and logging concepts

8 Required Experience defining and managing SLIs, SLOs, and error budgets

8 Required Familiarity with incident management, root cause analysis (RCA), and postmortems

8 Required Experience integrating security and compliance into operational workflows

4 Preferred Familiarity with observability tools (Prometheus, Grafana, Application Insights, Datadog, Splunk)

4 Preferred Experience operating 24x7 production environments with on-call rotations

4 Preferred Experience with chaos engineering and resiliency testing

4 Preferred Experience with feature flags, canary deployments, and progressive delivery

4 Preferred Strong documentation skills for runbooks, dashboards, and operational standards

V Group Inc
. is an IT Services company which supplies IT staffing, project management, and delivery services in software, network, help desk and all IT areas. Our primary focus is the public sector including state and federal contracts. We have multiple awards/ contracts with the following states: AR, CA, DE, FL, GA, IL, KY, MD, ME, MI, NC, NJ, NY, OH, OR, PA, SC, TX, VA, and WA. If you are considering applying for a position with V Group, or in partnering with us on a position, please feel free to contact me for any questions you may have regarding our services and the advantages we can offer you as a consultant.

Please share my contact information with others working in Information Technology.

Website: www.vgroupinc.com

LinkedIn: www.linkedin.com/company/v-group/

Facebook: www.facebook.com/VGroupIT

Twitter: www.twitter.com/vgroupinc

Ready to apply?

You'll be redirected to V Group Inc.'s application page.

Is this role right for you?

Role summary

Similar roles