We're in beta · Starting with US & Canada · Shipping weekly — your feedback shapes RiseMe
Willow Laboratories logo
Willow Laboratories Verified
Biotechnology, Pharmaceuticals, Manufacturing, Cannabis

IT Site Reliability Engineer

Irvine, California, United StatesOnsiteFull Time$120,000–$145,000 /yrPosted 1 month agoVisa sponsorship available

Compensation estimateAI

See base, equity, bonus, and total comp estimates for this role — free, no credit card.

Sign up to see compensation estimate

\*This position is located onsite in Irvine, CA\*

Job Summary:

Willow Laboratories is a fast-growing and forward-thinking medical technology company focused on delivering innovative solutions that improve lives. With a strong foundation in software development and an expanding footprint in regulated medical environments, we are building the infrastructure and systems necessary to support our continued growth.

We are seeking an experienced Site Reliability Engineer to take ownership of operations for our cloud-based applications and infrastructure. As we scale our production systems to support growing user demand, this role will be instrumental in maturing our operational practices, strengthening our reliability posture, and establishing the monitoring and automation foundations critical for long-term success.

The ideal candidate will bring hands-on expertise with AWS infrastructure (EKS, DynamoDB, S3, and related services) and thrive in an environment where they can make immediate impact. You'll lead incident response, implement comprehensive observability solutions, develop infrastructure as code, and work closely with our development teams to build reliability into our mobile backend and microservices architecture from the ground up. This is an opportunity to define SRE practices and operational standards that will scale with the company. You'll work closely with our development team of 15+ engineers and report to the Vice President of Information Technology.

Duties & Responsibilities:

Reliability & Operations Management

  • Own the operational reliability and availability of production applications and infrastructure on AWS, with readiness to support future multi-cloud initiatives (Azure, Google Cloud, Akamai)
  • Respond to and lead incident management efforts, including root cause analysis, post-incident reviews, and implementation of preventive measures
  • Participate in on-call rotation and provide timely incident response
  • Establish and track SLOs, SLIs, and error budgets to drive reliability improvements
  • Document operational procedures, runbooks, and architectural decisions in Confluence

Infrastructure & Cloud Management

  • Manage and optimize AWS services including EKS (Kubernetes), DynamoDB, AppSync, Amplify, S3, and managed streaming services
  • Develop and maintain infrastructure as code using tools like Terraform, CloudFormation, or similar technologies
  • Manage multi-zone and multi-region database deployments and ensure data integrity and availability
  • Manage application load balancing and WAF configurations through Cloudflare
  • Integrate and monitor external services including Firebase, HealthKit, Health Connect, Branchio, Landbot, Twilio, and OpenAI
  • Optimize cloud resource utilization while maintaining performance standards

Monitoring & Observability

  • Design, implement, and maintain comprehensive monitoring and alerting solutions using tools such as Grafana, CloudWatch, and application performance monitoring platforms
  • Produce regular performance reports for stakeholders, highlighting optimization opportunities
  • Monitor cloud spending and identify optimization opportunities

DevOps & Automation

  • Implement and maintain CI/CD pipelines using Jenkins, Bitbucket, and related DevOps tooling

Minimum Qualifications and Experience:

  • Bachelor's degree in Computer Science, Information Technology, or related technical field, or equivalent practical experience
  • 3+ years of experience in Site Reliability Engineering, DevOps, or Production Operations roles
  • Strong hands-on experience with AWS services (EC2, EKS, S3, RDS/DynamoDB, VPC, IAM, CloudWatch)
  • Proven experience with Kubernetes/EKS in production environments
  • Proficiency with at least one scripting/programming language (Python, Go, Bash, or similar)
  • Experience with monitoring and observability tools (Grafana, Prometheus, CloudWatch, or equivalent)
  • Solid understanding of networking concepts, DNS, load balancing, and web application security
  • Experience with CI/CD tools and practices (Jenkins, GitLab CI, GitHub Actions, or similar)
  • Knowledge of incident management processes and post-mortem analysis
  • Familiarity with Atlassian suite (Jira, Confluence, Bitbucket)
  • Understanding of database administration and optimization for both SQL and NoSQL systems
  • Experience managing mobile backend infrastructure
  • Strong problem-solving and troubleshooting skills
  • Excellent communication skills and ability to collaborate with cross-functional teams

Desired Qualifications:

Cloud & Infrastructure Expertise

  • AWS certifications (Solutions Architect, SysOps Administrator, or DevOps Engineer)
  • Experience with multi-cloud environments (Azure, Google Cloud Platform, Akamai)
  • Experience with infrastructure as code tools (Terraform, CloudFormation, Pulumi)
  • Understanding of serverless architectures and AWS Lambda
  • Experience with spot/preemptible instance management and cost optimization strategies

Architecture & Scalability

  • Knowledge of service mesh technologies and microservices architecture patterns
  • Background in capacity planning and performance engineering
  • Background in mobile application infrastructure and APIs
  • Experience with real-time data streaming platforms (Kafka, Kinesis)

Monitoring, Security & Reliability

  • Familiarity with APM tools and distributed tracing (New Relic, Datadog, Dynatrace)
  • Experience with container security and compliance frameworks
  • Experience with WAF configuration and DDoS mitigation strategies
  • Experience with chaos engineering and reliability testing practices

Domain-Specific Knowledge

  • Knowledge of healthcare or nutrition-related application compliance requirements

Willing to work extended hours and weekends when needed to meet critical deadlines

Compensation Range:

This salary range represents the full compensation band for this role. Most new hires are typically placed toward the middle of the range based on experience, skills, education, and job‑related qualifications. Compensation at the upper end of the range is reserved for candidates with exceptional experience or those who significantly exceed the role’s core requirements. Actual compensation within this range will be determined based on experience, skills, education, geographic location, and internal equity.

Physical requirements/Work Environment:

This is an on-site position located at our Irvine, CA office (121 Theory). The role primarily works in an office environment and requires frequent sitting, standing and walking. Daily use of a computer and other computing and digital devices is required. May stand for extended periods when facilitating meetings or walking in the facilities. Some local travel may be necessary; therefore the ability to operate a motor vehicle and maintain a valid Driver's license is required.

The physical demands of the position described herein are essential functions of the job and employees must be able to successfully perform these tasks for extended periods. Reasonable accommodations may be made for those individuals with real or perceived disabilities to perform the essential functions of the job described.

Ready to apply?
You'll be redirected to Willow Laboratories's application page.

Similar roles