SRE Application Support Analyst
Compensation estimateAI
See base, equity, bonus, and total comp estimates for this role — free, no credit card.
Sign up to see compensation estimateSRE Application Support Analyst
Location:
San Francisco, CA (Onsite)
Job Overview
We are seeking an experienced
SRE Application Support Analyst
with strong experience supporting
Retail / eCommerce applications
in a production environment. The ideal candidate should have hands-on experience with
SRE practices, incident management, observability tools, and leading L2/L3 support teams
supporting mission-critical applications.
Key Responsibilities
- Provide
Site Reliability Engineering (SRE) support
for retail and eCommerce applications in a production environment.
- Monitor application health using
logs, metrics, and availability indicators
to ensure uptime and reliability.
- Manage
alerts, incidents, change management, CAB approvals, and production deployments
following
ITIL framework
.
- Lead
P1/P2 incident bridge calls
, coordinate with stakeholders, and drive
RCA (Root Cause Analysis) and PIR (Post Incident Review)
.
- Work with
observability tools
for monitoring, logging, alerting, and dashboards (Dynatrace, Splunk, Datadog, ELK, Grafana).
- Collaborate with
Dev, infrastructure, and cross-functional teams
to troubleshoot issues and improve system reliability.
- Support
microservices-based eCommerce applications
and retail platforms such as
Sterling OMS and XStore
.
- Define
SRE roadmaps
, identify monitoring requirements, and establish service health metrics.
- Manage
L2/L3 support teams providing 24x7 production support
.
- Generate
weekly/monthly service reports (WSR/MSR)
using ITSM ticket data and present insights to leadership and customers.
- Develop and maintain
SOPs, runbooks, and operational documentation
.
Required Skills
- Experience working as
SRE / Production Support / Application Support Engineer
in
Retail or eCommerce environments
.
- Strong knowledge of
SRE principles
including:
- Logs and metrics monitoring
- Availability monitoring
- Uptime tracking
- SLA, SLI, and SLO management
- Hands-on experience with
observability and monitoring tools
such as:
- Dynatrace
- Splunk
- Datadog
- ELK Stack
- Grafana
- PagerDuty
- Experience with
incident management, change management, CAB processes, and production deployments
.
- Strong experience working with
ITSM platforms
such as
ServiceNow, JIRA, or BMC Remedy
.
- Knowledge of
microservices architecture
and
retail platforms like Sterling OMS and XStore
.
- Experience
leading L2/L3 application support teams
providing
24x7 support
.
Preferred Skills
- Strong
client-facing communication skills
.
- Experience collaborating with
global/offshore teams
.
- Ability to work in a
fast-paced production support environment
.
- Experience working with
cloud, infrastructure, and Dev teams
.
- Strong documentation and reporting skills.
Key Traits
- Customer-focused mindset
- Strong problem-solving skills
- Ability to manage critical incidents
- Collaborative team player
- Proactive and solution-driven attitude