Menlo Security Inc. logo
Menlo Security Inc. Verified
Cybersecurity

Principal Platform Infrastructure Engineer

CanadaRemoteFull TimePrincipal$141,000–$249,000 /yrPosted 1 month ago

Is this role right for you?

Upload your resume and get a skill-by-skill breakdown — see exactly where you match, where you're close, and what to highlight. Not a mystery percentage.

Get a tailored resume highlighting what this role needs.

Role summary

The Principal Platform Infrastructure Engineer will be responsible for building and operating the company's core infrastructure platform, focusing on cloud-native technologies like Google Kubernetes Engine (GKE) and VMs on GCP and AWS. This role involves extensive use of Infrastructure as Code (IaC) with Terraform and Spacelift, implementing robust observability solutions with Grafana and Prometheus, and ensuring security-first design principles. The engineer will automate operational tasks, manage networking components like Cilium, and collaborate with cross-functional teams to design scalable and resilient systems. Participation in a 24x7 on-call rotation is expected.

### Who you are
- Bachelor's degree in Computer Science, similar technical field of study, or equivalent practical experience
- Proficiency in common programming & scripting languages. We use a lot of python, bash and go
- Understanding of network topologies, communication protocols (ie. TCP/IP, HTTP/S, UDP, TLS) and enterprise grade connectivity solutions
- Kubernetes expertise including cluster administration, RBAC, networking, workload management, and troubleshooting across production environments
- Proven experience with Terraform for infrastructure provisioning and management
- Knowledge of Google Cloud Platform services including GKE, VPC networking, Cloud DNS, Artifact Registry, Secret Manager, IAM, Gemini Code Assist, and Workload Identity
- Experience with GitOps methodologies and tools
- Clear understanding of how to use LLM code assist tools to effectively build software

### What the job involves
- Platform Infrastructure Engineering is responsible for building and operating Menlo Security's Infrastructure Platform. Together with the rest of our engineering teams, we enable our customers to connect to the Internet without compromise. Our environment provides services globally
- We expect failure, build security in by design, create evolvable systems, and enable multi-tenancy across the infrastructure. Automation is an absolute for us
- We are committed to getting it done properly, the first time
- As a Platform Infrastructure Engineer, you'll join a group of experienced engineers who are part of a globally distributed team responsible for building and managing the company's core infrastructure services and maintaining our constantly growing platform
- The team operates a sophisticated cloud-native infrastructure built on Google Kubernetes Engine and VMs spanning multiple environments globally from development to production. We manage infrastructure as code with Terraform and Spacelift orchestration, and deploy services using Helm charts
- Our platform emphasizes security-first design, comprehensive observability, and multi-region resilience. Success in this role requires working with a vast VM fleet in AWS and GCP as well as Kubernetes, writing Infrastructure as Code, and a passion for automation and reliability engineering
- Design, deploy, and maintain VM and Kubernetes infrastructure on GCP and AWS across dozens of clusters spanning development, staging, and production environments in multiple regions
- Coordinate with your peers in your direct team as well as across teams to ensure that the tasks you’re working on are going to solve the problems that we need them to solve
- Build and maintain Infrastructure as Code (IaC) using Terraform modules, managing resources through Spacelift or equivalent Terraform Automation and Collaboration Software (TACOS). Provision cloud infrastructure including networking, compute, storage, and security components primarily on GCP, with secondary AWS support
- Implement and manage workflows with sophisticated multi-layer configuration management
- Build and maintain comprehensive observability solutions using Grafana Cloud, Prometheus/Mimir, and OTel collectors. Design Grafana dashboards, configure alerting rules, and ensure visibility across all platform components
- Manage certificate lifecycle, DNS automation, ingress controllers, and service mesh networking with Cilium
- Partner with Engineering, Product, Compliance, and Security teams to design resilient, scalable systems. Consult on capacity planning, disaster recovery, and architectural decisions for cloud-native applications
- Identify and eliminate toil through automation. Write scripts, develop tools, and build CI/CD pipelines to improve operational efficiency and reduce manual work
- Participate in a 24x7 on-call rotation as part of a globally distributed team, responding to incidents and driving post-incident reviews

Ready to apply?
You'll be redirected to Menlo Security Inc.'s application page.

Similar roles