Senior Director of Platform Engineering & Reliability
Compensation estimateAI
See base, equity, bonus, and total comp estimates for this role — free, no credit card.
Sign up to see compensation estimate### Who you are
- 10+ years in infrastructure, platform engineering, SRE, or DevOps — with at least 4–5 years in a senior leadership role (Director or VP level)
- Proven experience owning production reliability at a SaaS company with significant scale and uptime requirements
- Demonstrated track record in building and evolving DevOps/platform engineering organizations from the ground up or through significant transformation
- Experience leading cloud-native infrastructure on GCP, AWS, or Azure — with deep familiarity with Kubernetes, container orchestration, and managed cloud services
- Prior ownership of security and compliance programs (SOC 2, ISO 27001, or similar); able to fluently translate security requirements into engineering work
- Experience in companies ranging from growth-stage to mid-scale SaaS ($50M–$500M ARR range), where you've had to build structure while maintaining velocity
- Strong working knowledge of modern observability stacks (OpenTelemetry, Prometheus, Grafana, Datadog, or equivalent)
- Fluency in infrastructure-as-code tooling (Terraform, Pulumi, or equivalent)
- Solid understanding of CI/CD architectures, deployment strategies (canary, blue/green, progressive rollout), and release engineering
- Comfortable reviewing architectural decisions at the system design level and contributing to technical roadmaps alongside principal engineers
- Familiarity with data infrastructure considerations (managed databases at scale, streaming platforms like NATS/Kafka, ClickHouse-style OLAP)
- Exceptional communicator who can contextualize technical risk and infrastructure investment in business terms for a C-suite audience
- Track record of building inclusive, remote-first teams with strong async communication habits and high accountability
- Able to operate in ambiguity — defining the problem, structuring the org, and making progress before the perfect playbook exists
- Sound business judgment: able to sequence investments, make trade-offs between speed and stability, and push back when priorities don't make sense
### What the job involves
- We're looking for a Senior Director of Platform Engineering & Reliability to own the reliability, scalability, security, and operational excellence of Shopmonkey's platform
- This is a senior leadership role sitting at the intersection of engineering and infrastructure — responsible for the teams and practices that keep our product running at the highest levels of performance and trust
- This is not a title for someone who manages from a distance
- You will be deeply embedded in the technical and organizational fabric of the company
- You'll bring strong opinions about how infrastructure should be built, how incidents should be managed, how security posture should evolve, and how DevOps culture creates faster, safer product delivery
- You'll own multi-year roadmaps, manage and grow a team of strong individual contributors, and partner closely with Product, Engineering, and Security to translate business priorities into operational reality
- Define and enforce SLOs, SLAs, and error budgets across Shopmonkey's production systems
- Lead incident management end-to-end: on-call frameworks, blameless postmortems, and systemic remediation
- Drive observability maturity — OpenTelemetry, Prometheus, and the dashboards and alerts that give every team a clear signal
- Ensure 99.9%+ availability for a platform processing critical business operations for thousands of shops
- Own Shopmonkey's GCP-based cloud infrastructure: provisioning, scaling, cost optimization, and architectural evolution
- Lead the internal platform team that builds developer tooling, CI/CD pipelines, and deployment infrastructure that accelerates every engineering squad
- Champion infrastructure-as-code practices, making our environments reproducible, auditable, and fast to provision
- Partner with senior engineers on scalability decisions as we grow our customer base and data volumes
- Own the engineering infrastructure and software budget as well as vendor relationships
- Own the technical security posture of the platform: vulnerability management, secrets management, network security, and hardening
- Lead compliance programs and certifications relevant to Shopmonkey's enterprise customers and growth trajectory (SOC 2 Type II, and beyond)
- Build a culture of security that is embedded into the development lifecycle — not bolted on at the end
- Treat compliance not as a checkbox but as a multi-pillar investment that compounds trust with customers over time
- Evangelize and operationalize DevOps principles across engineering: CI/CD, feature flags, progressive deployment, automated testing infrastructure
- Drive down time-to-deploy, mean time to recovery (MTTR), and toil across the org
- Partner with engineering leadership to ensure platform investments directly accelerate squad velocity
- Build, retain, and grow a high-performing team spanning DevOps, SRE, Platform Engineering, and Infrastructure
- Define org structure, hiring plans, and career ladders for TechOps disciplines
- Develop managers and senior ICs; establish a culture of ownership, psychological safety, and continuous learning
- Lead with a servant-leadership mindset — remove blockers, amplify great work, and hold the bar high
### Benefits
- Flexible Scheduling: We promote work/life balance and flexible scheduling.
- Health, Dental, Vision & Wellness: You'll have access to premium health coverage and what you need to lead a healthy lifestyle.
- Compensation: Competitive pay and localized benefits
- Time Off: Enjoy paid holidays and flexible time to vacation, rest, and be refreshed.
- Career Growth: Work on cutting-edge technology to develop & grow your career.
- Community: Social and charity events, regional team building & meetups, happy hours & more.