Openkyber logo
Openkyber Verified
Cybersecurity, Software Development, Blockchain.

Principal SRE

Texas, Texas, United StatesHybridFull TimePrincipalPosted 2 months agoVisa sponsorship available

Is this role right for you?

Upload your resume and get a skill-by-skill breakdown — see exactly where you match, where you're close, and what to highlight. Not a mystery percentage.

Get a tailored resume highlighting what this role needs.

Role summary

The Site Reliability Engineering (SRE) team is responsible for building and operating large-scale software systems, ensuring high availability and resiliency through automation. As an SRE Engineer, you will leverage your expertise in software development, complexity analysis, and scalable system design to identify and implement automation solutions. This role requires strong collaboration with other engineering teams to maintain service stability and performance, meeting business and end-user expectations. The position emphasizes expertise in Azure, SLO/SLI definition, software development in multiple languages, database optimization, automated pipeline design, and root cause analysis.

Role : SRE Engineer ( Strong Azure ) Location : Arlington, Texas - Hybrid ( In Peron Interview Must )

JOB SUMMARY The Site Reliability Engineering (SRE) team provides leadership, direction, and accountability for building and running large-scale software systems. As a Site Reliability Engineer, you will identify and deliver automation solutions designed to ensure high availability and resiliency using your expertise in software development, complexity analysis, and scalable system design. Strong collaboration skills will be required to work closely with other engineering teams to ensure services/systems are highly stable and performant, meeting the expectations of our business partners and end users.

JOB DUTIES Partner with the architecture and development teams on how to make applications highly available, reliable, and performant at global scale Collaborate with the architecture team to ensure Reliability factors are accounted for in business features and enablers Guide development teams in understanding established service level objectives and consequences, and implementing appropriate SLIs to support the objectives. Collaborate with development team members to swarm, troubleshoot, and resolve problems. Guide ad-hoc teams to brainstorm solutions and build implementation plans based on the Root Cause Analysis of production issues Design and build automated solutions to optimize application/service/platform uptime with minimal human intervention Be available for an on-call rotation to participate in troubleshooting and communication efforts outside of normal business hours Implement and help create standards and best practices, and mentor other team members in order to drive adoption across development teams Perform other duties as assigned Conform with all company policies and procedures

JOB SPECIFICATION Knowledge Expert in defining, implementing, and evaluating Service Level Objectives (SLO) and Service Level Indicators (SLI), and associated consequences Software development expertise in two or more high-level programming and scripting languages Experience in evolutionary database design, query performance analysis, and indexing as a cornerstone for delivering scalable, performant products and services Experience in designing, building, and optimizing automated pipelines with automated testing and automated security controls Experience in performing Root Cause Analysis and Problem Management Experience working in Agile Scrum teams with demonstrated success leading improvements (getting better/faster/happier)

SKILLS Help establish and maintain a culture of learning through the development and sharing of skills, knowledge, process and tools; combat traditional silos that create "us and them" environments A driving passion for finding solutions to hard problems at scale and operationalizing them Exceptional critical thinking and communication skills, with a passion for leveraging documentation as a tool for constant improvement Additional Knowledge Skills and Abilities Pipeline Automation: Azure DevOps (YAML, ARM), Terraform, Jenkins, Chef, Octopus Deploy Code Scanning: SonarQube, Checkmarx Source Code repos: Git Containerization: Azure Kubernetes Service, Kubernetes (open source), Docker High level programming languages: Java, C# (.NET MVC and .NET Core), Go Scripting: PowerShell, Bash Database: Oracle, Microsoft SQL Server, NoSQL (e.g. CosmosDB) Test Automation: Xamarin.UITest, Specflow, DevTest, Selenium, Test Data Manager, Postman, Maven, TestNG, JMeter Operating systems: Windows, Linux Cloud Platforms: Azure Metrics and Monitoring: Splunk

For applications and inquiries, contact: hirings@openkyber.com

Ready to apply?
You'll be redirected to Openkyber's application page.

Similar roles