We're in beta · Starting with US & Canada · Shipping weekly — your feedback shapes RiseMe
Blankfactor logo
Blankfactor Verified
IT Services

Site Reliability Engineer (SRE)

Three Rivers, Michigan, United StatesOnsiteFull TimePosted 2 months agoVisa sponsorship available

Compensation estimateAI

See base, equity, bonus, and total comp estimates for this role — free, no credit card.

Sign up to see compensation estimate

About Your Role

As a Site Reliability Engineer, you will ensure the reliability, availability, and performance of

mission-critical platforms by building scalable systems, robust automation, and data-driven

operations.

You will partner closely with development, cloud, infrastructure, and security teams to deliver

resilient, high-performing services that support the way people live and work today.

What You’ll Do

● Design and implement solutions that enhance application reliability, performance,

scalability, and resilience.

● Build and maintain monitoring, alerting, observability, and telemetry to drive

proactive detection and rapid incident response.

● Lead incident management efforts, perform root cause analysis, and implement action-

oriented post-mortem improvements.

● Automate operational workflows using scripting, IaC, and configuration management

tools.

● Analyze capacity, performance, and usage trends to forecast demand and optimize

cloud costs.

● Collaborate with engineering teams to embed operability, resilience, and security into

application and architecture designs.

● Support safe, reliable deployments through CI/CD pipelines, release governance, and

change control.

● Maintain clear runbooks, architecture diagrams, and operational documentation

that enable efficient production support.

Experience Required:

● Managing Kubernetes and containerized workloads (EKS, AKS, GKE), including

scaling, networking, upgrades, and orchestration.

● Experience in public cloud platforms (AWS, Azure, or GCP) across compute, storage,

networking, IAM, and cost governance.

● Using observability and APM tools such as Dynatrace, Splunk, Prometheus, Grafana,

Datadog, ExtraHop, etc.

● Implementing security and compliance controls in regulated environments (e.g., PCI

DSS, SOC 2), including secrets management and vulnerability remediation.

● Infrastructure as Code experience using Terraform, CloudFormation, Ansible, or

similar tools.

● Designing and maintaining CI/CD pipelines using Jenkins, GitLab CI, GitHub Actions, or

Azure DevOps.

● Scripting and automation using Bash, PowerShell, or Python.

● Equivalent combination of education, experience, and/or military background

● Key point is the experience on projects with high volume transactions and taking care of

Zero data loss is a must which primarily in banking and payment projects . please avoid

experience with Insurance project background

Good to Have

● Certifications such as AWS SysOps Administrator, AWS DevOps Engineer, Google

Cloud DevOps Engineer, or CKA.

● Experience with Premier applications, IBM iSeries, and/or Unisys systems.

● Hands-on database operations and performance tuning (Oracle, SQL Server,

PostgreSQL).

● Proven experience in major incident command, stakeholder communication, and

cross-team coordination.

● Experience with ITIL and ServiceNow (change, problem, and configuration

management).

Ready to apply?
You'll be redirected to Blankfactor's application page.

Similar roles