
Principal Site Reliability Engineer
Role summary
ACI Worldwide is seeking a Principal Site Reliability Engineer to join their team in Norcross, GA or Omaha, NE. This role is embedded within product teams, focusing on designing, coding, testing, running, and evolving systems to enhance product reliability and organizational efficiency. The engineer will drive modern reliability practices such as SLOs, error budgets, actionable alerts, on-call rotations, incident retrospectives, chaos testing, and end-to-end ownership. Responsibilities include guiding reliability practices throughout the SDLC, maintaining service health via monitoring and alerting, improving reliability through post-incident reviews, and evolving the overall resilience strategy. The role also involves ensuring redundancy mechanisms, setting standards, contributing to capacity planning, and potentially interfacing with clients and sales teams.
Powering the world’s payments ecosystem
ACI powers the payments ecosystem – globally, and you power ACI. You’ll innovate, collaborate, and grow – in an energetic technology culture with decades of proven success. ACIers – in all roles and levels – are truly your colleagues and many are your friends. Our size and reach allow you to see the global impact of your work. You are visible, your talents are valued, and you are empowered to shape the future of payments.
As a Principal Site Reliability Engineer in Norcross, GA or Omaha, NE, you will join a diverse, passionate team, dedicated to powering the world’s payments ecosystem!
Job Summary:
The Principal Site Reliability Engineer is embedded directly with our product teams, working closely with them to design, code, test, run, and evolve the systems that help people around the world make payments. We work closely with ACI teams to drive adoption of modern reliability practices like SLOs, error budget policies, actionable alerts, follow-the-sun on-call, incident retrospectives, chaos testing, and end-to-end ownership.
Job Responsibilities:
- Design, develop, deploy, and motivate the creation of software and systems to increase product reliability and organizational efficiency.
- Guide reliability practices through the entire software development lifecycle through activities like architecture reviews, code reviews, creating platforms and frameworks, capacity planning, and chaos testing.
- Maintain service health by implementing and evolving monitoring, alerting, self-healing and follow-the-sun incident response.
- Improve service reliability through blameless post-incident reviews and using code to prevent or respond to problem recurrence. Function as a key technical and culture leader throughout your assigned line of business
- Drive and evolve the overall resilience strategy of your given line of business leveraging industry and internal tools
- Ensure that local and cross-site redundancy mechanisms are meeting requirements, work as designed and are ever evolving
- Set, maintain, and enforce standards across deployment practices, operations etc.
- Engage in change review as a key member.
- Function as a key contributor to overall capacity, peak season and business continuity methodologies and testing for your space
- Interface directly with key clients as needed
- Support and help standardize sales responses for your space by helping to craft the go forward offers with business and DevOps teams aligning costs, SLAs and technology.
- Perform other duties as assigned
- Understand and adhere to all corporate policies to include but not limited to the ACI
- Code of Business Conduct and Ethics.
Knowledge, Skills and Experience required for the job:
- BS degree in Computer Science, related technical field, or equivalent practical
experience.
- Experience in data structures, database systems, algorithms, and software design.
- Experience writing code in Java, Go, Shell, Python, or a similar language.
- Ability to debug, optimize code, and automate routine tasks.
- Practical skills with RDBs (such as PostgreSQL, Oracle), NoSQL KV stores (such as
Cassandra) and messaging systems (such as Kafka, RabbitMQ and MQ) or equivalent
- Proven ability to drive organizational adherence to SRE topics like SLOs, resilience,
scaling, performance, and more
- 15+ years of experience
Preferred Knowledge, Skills and Experience needed for the job:
- Experience in an SRE or Production Engineering role
- Experience with a globally distributed team
- Take initiative to solve problems using a scientific approach
- Apply appropriate new technologies and processes
- Skilled in providing substantial feedback on distributed system designs
- Collaboration skills
Work Environment:
- Office work environment
- Collaborative team
- Prolonged periods of sitting at a desk and working on a computer
- Up to 20% Travel, may be domestic or international
- Weekend and off-hours support may be required
Applicants must be currently authorized to work in the United States on a full-time basis. This position does not offer sponsorship for employment visa status or work permit now or in the future.
I n return for your expertise, we offer opportunities for growth, career development, and a competitive compensation and benefits package—all within an innovative and collaborative work environment.
Are you ready to help us transform the payments ecosystem? To learn more about ACI Worldwide, visit our web site at www.aciworldwide.com Job ID (Requisition #18669)
ACI Worldwide is an AA/EEO employer in the United States, which includes providing equal opportunity for protected veterans and individuals with disabilities, and an EEO employer globally.
Important Notice About Recruitment Scams
Job seekers should be aware of ongoing recruitment scams where individuals or organizations impersonate legitimate companies to offer fake job opportunities. These scams often involve requests for personal information, payments, or interviews through unofficial channels. Please be cautious and verify any communications claiming to be from our company ( www.aciworldwide.com / @aciworldwide.com). The ACI Worldwide recruitment team will always follow official channels and will never request payment.
Similar roles
- Senior Site Reliability EngineerParallel Domain · Madrid, Comunidad de Madrid, Spain · Remote
- Site Reliability EngineerPacer Group · Montreal, Quebec, Canada · Hybrid
- Senior Site Reliability EngineerBlock Inc · New York, New York, United States · Remote
- Senior Site Reliability EngineerBlock Inc · Bay, California, United States · Remote
- Senior Site Reliability EngineerUplink · United States · Hybrid