Senior Software Engineer (DevOps)

Boston, Massachusetts, United StatesHybridFull TimeSenior$119,000–$221,000 /yrPosted 8 days agoVisa sponsorship available

Is this role right for you?

Upload your resume and get a skill-by-skill breakdown — see exactly where you match, where you're close, and what to highlight. Not a mystery percentage.

Get a tailored resume highlighting what this role needs.

Role summary

We are seeking a Senior Software Engineer II (DevOps) to build and scale the infrastructure powering our core platform and agentic AI services. This role is at the intersection of cloud infrastructure, AI operations, and platform engineering, focusing on reliability, scalability, and cost efficiency on AWS. Responsibilities include designing and operating AI/ML infrastructure, establishing infrastructure-as-code standards with Terraform, implementing advanced observability, and supporting big data pipelines. The ideal candidate will have 6+ years of DevOps/SRE/Platform Engineering experience, 2+ years in AI/ML infrastructure, strong proficiency in Python, Go, or TypeScript, and deep experience with AWS services. Experience in regulated industries and with data infrastructure is a plus. This role requires technical leadership and the ability to mentor engineers.

We are looking for a Sr. Software Engineer II (DevOps) to help us build and scale the infrastructure that powers both our core platform and our rapidly growing agentic AI services
You will be at the intersection of cloud infrastructure, AI operations, and platform engineering — building the foundation that enables Hi Marley to operate reliably at enterprise scale while deploying autonomous AI agents in regulated insurance workflows
You’ll also be expected to raise the bar for the teams around you — setting infrastructure standards, driving technical decisions in ambiguous situations, and helping less experienced engineers grow their operational instincts
Design and operate cloud infrastructure on AWS that supports both our core SaaS platform and our agentic AI services, ensuring reliability, scalability, and cost efficiency
Build and maintain AI/ML infrastructure and monitoring for LLM-powered agentic services
Establish and enforce infrastructure-as-code standards using Terraform, defining the patterns other engineers follow for environment parity, drift detection, and automated compliance validation
Implement observability beyond availability — data integrity monitoring, SLO frameworks with error budgets, and automated regression detection for both platform and AI services
Build deployment automation including pre-deployment verification, migration script validation, and codified rollback procedures to eliminate human-memory dependencies
Support big data infrastructure: data pipelines, warehousing (Redshift), and analytics tooling that enables reporting, BI, and AI training workflows
Implement security and compliance controls for AI workloads operating in regulated carrier environments — including audit logging, access governance, and configuration management
Drive environment parity across all infrastructure with automated drift detection and remediation
Improve disaster recovery capabilities: documented and rehearsed DR procedures, defined RTO/RPO by service tier, and tested recovery runbooks
Lead architecture reviews for new services, integrations, and AI agent deployments — partnering with engineering, product, and security to ensure infrastructure decisions are sound before they ship
Innovate on developer experience: reduce friction in testing environments, CI/CD pipelines, and local development workflows
Act as a technical anchor for infrastructure decisions across teams — providing clarity when requirements are ambiguous and helping the organization converge on consistent, scalable approaches

### Benefits

A fun, lively startup culture
Ample opportunities to learn and take on new responsibilities in a fast-paced, growth-mode startup
Open vacation policy - we all work hard and take time for ourselves when we need it
A culture of employee engagement, diversity and inclusion
Core values-based leadership
Full benefits package including parental leave, a matching 401k program, and medical, dental, vision, disability, and life insurance
Generous stock options - we all get to own a piece of what we’re building- Container orchestration (ECS, EKS)
You have experience with compliance-sensitive environments and understand why audit trails, access governance, and change management matter
Monitoring and observability platforms (Datadog, CloudWatch)
2+ years of experience building or operating AI/ML infrastructure (model serving, inference, LLM orchestration, or agentic systems)
You naturally step up to lead technical conversations, and people across teams seek you out when infrastructure decisions get complicated
Strong proficiency in at least one programming language (Python, Go, TypeScript, or similar)
You have strong infrastructure-as-code skills with Terraform and understand how to manage state, modules, and multi-environment configurations
A genuine curiosity about AI and emerging technologies, paired with the judgment to apply them thoughtfully and responsibly
You understand data infrastructure: pipelines, warehousing, ETL/ELT, and how to support analytics at scale
You have deep experience with AWS cloud services (ECS, Lambda, SageMaker, Bedrock, S3, DynamoDB, Redshift, or equivalent)
You have built and operated infrastructure for traditional and AI or ML workloads at a SaaS company
Experience in regulated industries (insurance, financial services, healthcare) is a strong plus
You communicate well with both engineering and non-technical stakeholders
Data infrastructure (Redshift, or similar data warehousing; Airflow, dbt, Dagster or similar pipeline tools) is a strong plus
You are comfortable operating in a fast-moving environment where AI capabilities are evolving rapidly and infrastructure decisions have regulatory implications
Experience with:
6-+ years of DevOps/SRE/Platform Engineering experience
Bachelor’s degree in Computer Science, Engineering, or equivalent experience
You think about observability as more than dashboards — you care about data integrity, SLOs, error budgets, and catching silent failures
Track record of leading cross-team technical initiatives and mentoring engineers on infrastructure and operational best practices

Ready to apply?

You'll be redirected to Hi Marley's application page.

Is this role right for you?

Role summary

Similar roles