
Azure Site Reliability & Automation Engineer
Role summary
Innovative Systems is seeking an Azure Site Reliability & Automation Engineer to architect and maintain the reliability of their Azure SaaS environment. This role focuses on applying software engineering principles to system administration, automating processes, and ensuring a scalable, secure, and self-healing infrastructure. Key responsibilities include designing and managing environments using Infrastructure as Code (IaC) tools like Terraform, optimizing CI/CD pipelines in Azure DevOps, defining and monitoring SRE metrics (SLIs, SLOs, SLAs), and automating manual tasks with scripting languages. The engineer will also participate in incident response, cloud governance, and troubleshooting within the Azure ecosystem. Required skills include deep Azure knowledge (AKS, Functions, SQL DBs), proficiency in C#, Python, Go, or PowerShell, observability tools (Azure Monitor), containerization (Docker, Kubernetes), networking, and IaC tools (Ansible, Pulumi).
The
Azure Site Reliability & Automation Engineer
will be the architect of our infrastructure’s reliability. You will bridge the gap between operations and development by applying a software engineering mindset to system administration. Your mission is to "automate everything," reducing manual toil and ensuring our Azure environment is scalable, secure, and self-healing.
About Innovative Systems
Innovative Systems is one of the world's leading global providers of Saas and on-premise data management and compliance solutions. Processing over
90 billion transactions
annually and climbing means
we are successful, growing and growing fast!
We have chosen Microsoft Azure as the vehicle to support us on this rapid and continuous journey. Having the right people to ride with us on this journey is just as critical. This is where you and your career come in. Do you want to continue to build your career and be a place where you can
influence change
? Would you like an environment where you can
work with bright, dynamic people at all levels of the organization
who strive for mutual success and excellence? If so, we would love to hear from you!
Key Responsibilities
- Infrastructure as Code (IaC):
Design and maintain production environments using
Terraform
, Bicep, or ARM templates.
- CI/CD Pipeline Mastery:
Build and optimize deployment pipelines in
Azure DevOps
or GitHub Actions to ensure seamless, automated code delivery.
- Reliability Engineering:
Define and monitor
SLIs, SLOs, and SLAs
to ensure system health and performance.
- Automation of Toil:
Identify repetitive manual tasks and eliminate them through scripting (
PowerShell, C#, or Bash
).
- Incident Response & Post-Mortems:
Participate in on-call rotations and lead blameless post-mortems to turn outages into learning opportunities.
- Cloud Governance:
Implement Azure Policy and Cost Management tools to keep the environment secure and within budget.
Additional Responsibilities
- Cloud Support & Troubleshooting:
Provide direct technical assistance within the Azure ecosystem, identifying root causes and implementing rapid issue resolution.
- Incident Response:
Respond to and resolve system issues in a timely manner, triaging severity and engaging cross-functional team members as appropriate to maintain SLAs.
- Cross-Functional Collaboration:
Partner with developers, architects, and stakeholders to align infrastructure reliability with business goals and application requirements.
- Adaptive Operations:
Perform other technical responsibilities and special projects as assigned to support the evolving needs of the Azure environment.
Required Technical Skills
- Azure Expert:
Deep knowledge of Azure App Services, Kubernetes (AKS), Azure Functions, and SQL Databases.
- Coding/Scripting:
Proficiency in
C#, Python,
or
Go
, and advanced
PowerShell
for automation tasks.
- Observability:
Hands-on experience with
Azure Monitor, Log Analytics, and Application Insights
.
- Containerization:
Strong experience with Docker and orchestrating workloads in
Kubernetes
.
- Networking:
Understanding of VNETs, ExpressRoute, Azure Front Door, and Private Links.
- Advanced Orchestration:
Possess hands-on experience with configuration management and modern IaC tools like
Ansible
or
Pulumi
.
- Bachelor’s degree in Computer Science, Computer Engineering, Mathematics, Information Technology, or a related field.
Preferred Qualifications
- Professional Certification:
Hold a high-level Microsoft credential, such as
Azure Solutions Architect Expert
or
Azure DevOps Engineer Expert
.
- Resiliency Testing:
Maintain a background in
Chaos Engineering
principles to proactively identify and fix system weaknesses.
- Analytical Mindset:
Demonstrate high intellectual curiosity and elite problem-solving skills to navigate complex cloud obstacles.
- Communication Excellence:
Exhibit strong written and verbal communication skills, with the ability to translate technical jargon into actionable business insights.
- Professional Drive:
Self-motivated with a robust work ethic and a "bias for action" when improving system performance.
- Collaborative Autonomy:
Proven ability to work independently on specialized tasks while remaining a cohesive, contributing member of a high-performing team.
Locations:
This position is hybrid based at our World Headquarters in Pittsburgh, Pennsylvania
OR
there may be remote work opportunities for individuals residing within the following states: Florida, Georgia, Illinois, Maryland, Minnesota, New Jersey, New York, Ohio, Texas and Washington, DC.Innovative’s world headquarters is in Pittsburgh, Pennsylvania. Our regional offices are in London, UK; Mexico City, Mexico; Dubai, UAE and São Paulo, Brazil.
*Innovative is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, veteran status, and other protected characteristics.*
NOTE:
Innovative Systems, Inc. is unable to provide visa sponsorship for this position. Applicants requiring sponsorship now or in the future cannot be considered.