
OMS Support SRE
Role summary
This role is for an OMS Support SRE focused on ensuring the stability, observability, and performance of IBM Sterling OMS platforms. The primary responsibilities include resolving production issues within SLAs, enhancing monitoring and alerting systems, and supporting business operations, especially during peak sale events. The SRE will automate manual processes, build operational utilities using Java, Python, or shell scripting, and debug complex issues across various technologies including Java, Spring Boot, IBM Sterling OMS, and external integrations. Proficiency in SQL and NoSQL databases, along with experience using monitoring tools like Splunk and Dynatrace, is essential. The role also involves performance testing, system optimization, and effective stakeholder engagement.
About the Role
Provide production support and reliability engineering for IBM Sterling OMS platforms, ensuring system stability, observability, and performance.
Focus on incident resolution, automation, and supporting business operations including peak sale events.
Roles & Responsibilities
- Resolve production issues within defined SLAs.
- Create and improve observability and alerts for end-to-end systems.
- Support business for BAU operations and sale events.
- Prepare systems for peak sale events.
- Automate manual processes to reduce toil and improve incident resolution.
- Build utilities to support BAU needs (order creation scripts, order status updates, fulfillment updates, tlog validation, system validation).
- Debug issues across Java, Spring Boot, IBM Sterling OMS, and external integrations (Yantriks, Listrak, payment systems).
- Analyze issues across environments and identify solutions.
- Work with monitoring tools (Splunk, Dynatrace, SolarWinds, Grafana) to analyze issues.
- Write and execute SQL queries (DB2, MySQL) and work with NoSQL databases (Firestore, BigQuery).
- Monitor systems during performance testing and propose fixes.
- Build utilities using Java, Python, or shell scripting for operational tasks.
- Optimize system infrastructure.
- Use JIRA and ServiceNow for incident management.
- Understand operational challenges, resolve issues, and drive continuous improvement.
- Engage with stakeholders through daily, weekly, and leadership meetings.
- Plan and execute peak readiness activities and reporting.
- Coordinate with third-party service providers.
- Drive incident bridges with internal and external stakeholders to resolve issues.
Benefits (W2)
- Health insurance
- Health savings account
- Dental insurance
- Vision insurance
- Flexible spending accounts
- Life insurance
- Retirement plan
EEO Statement
All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or status as a protected veteran.