We're in beta · Starting with US & Canada · Shipping weekly — your feedback shapes RiseMe
PTC logo
PTC Verified
Enterprise Software, Industrial Internet of Things (IIoT), Product Lifecycle Management (PLM), Compu

Principal Site Reliability Engineer

Boston, Massachusetts, United StatesHybridFull TimePrincipal$131,000–$185,000 /yrPosted 2 months ago

Compensation estimateAI

See base, equity, bonus, and total comp estimates for this role — free, no credit card.

Sign up to see compensation estimate

Principal Site Reliability Engineer
-Hybrid Boston, MA

Our world is transforming, and PTC is leading the way. Our software brings the physical and digital worlds together, enabling companies to improve operations, create better products, and empower people in all aspects of their business.

Our people make all the difference in our success. Today, we are a global team of nearly 7,000 and our main objective is to create opportunities for our team members to explore, learn, and grow – all while seeing their ideas come to life and celebrating the differences that make us who we are and the work we do possible.

About the Role

We are looking for a
Principal Site Reliability Engineer (SRE
) to play a critical role in ensuring the long‑term reliability, scalability, and operational excellence of the
Onshape by PTC
platform.

As a Principal SRE, you will operate with a high degree of autonomy and influence. You will lead complex, cross‑organization reliability initiatives, shape reliability strategy, and serve as a technical authority and trusted advisor across engineering.

Your work will directly shape the experience of our customers by ensuring the platform is fast, resilient, and dependable. As a Principal SRE, you will help protect customer trust by driving reliability across the entire system lifecycle.

This role is ideal for engineers who enjoy solving ambiguous, high‑impact problems at scale, influencing system design across teams, and raising the reliability bar for an entire organization.

What You’ll Do
:

Own Reliability at Scale

  • Lead design, implementation, and evolution of reliability, availability, and resiliency strategies for large‑scale distributed systems written primarily in Java
  • Apply deep experience operating complex, distributed systems to guide architectural decisions, reliability strategies, and long‑term system evolution
  • Identify systemic risks in application architecture, data flows, and infrastructure, and drive architectural improvements that measurably improve availability, performance, and scalability
  • Set and evolve reliability standards, best practices, and operational principles across R&D

Drive Operational Excellence

  • Lead efforts to prevent, detect, and mitigate incidents through technical improvements and operational maturity
  • Serve as a senior coordination point during major incidents, helping manage response and guide long‑term remediation
  • Champion blameless post-incident reviews and ensure learnings translate into durable system improvements

Reduce Toil Through Engineering

  • Apply advanced software engineering practices to eliminate manual work, reduce operational load, and improve system observability
  • Design and build internal platforms, automation, and tooling that support Java‑based services and their operational needs
  • Raise the bar on monitoring, alerting, and SLO/SLI adoption across systems

Lead Through Influence and Collaboration

  • Partner deeply with product engineers, architects, and engineering leadership to ensure reliability and operability are first‑class concerns in system design
  • Review and influence designs for complex systems involving technologies such as datastores, messaging systems, and coordination services
  • Serve as a technical mentor and coach for SREs and other engineers, raising overall engineering and operational maturity

Shape Strategy and Direction

  • Contribute to longer‑term reliability and infrastructure strategy aligned with business growth
  • Stay current with industry trends in SRE, distributed systems, and the Java ecosystem, turning insights into practical improvements
  • Help define what “great reliability” looks like for the organization and how we measure it

What We’re Looking For:

Required Experience & Expertise

  • US Citizenship or Green Card holder only for this role due to ITAR requirements.
  • Ability to commute to the Seaport Boston office 2-3 days a week.
  • 7+ years of experience in software engineering, site reliability engineering, or systems engineering roles
  • Extremely strong proficiency with the Java programming language and its ecosystem, including building, debugging, and operating production Java services
  • Deep experience operating complex, distributed systems in production environments
  • Strong software engineering background, with a track record of delivering high‑quality, maintainable code

Technical Strength

  • Expert understanding of incident management, service reliability, and performance engineering
  • Strong hands‑on experience with observability (metrics, logs, traces), capacity planning, and SLO‑driven reliability
  • Deep familiarity with modern cloud‑based infrastructure, CI/CD pipelines, and infrastructure‑as‑code practices
  • Ability to reason about failure modes across application, data, and infrastructure layers

Leadership & Influence

  • Demonstrated ability to lead complex initiatives that span teams and organizational boundaries
  • Comfortable making high‑impact technical decisions in ambiguous environments
  • Strong communicator who can influence design and operational decisions across a wide range of stakeholders

Mindset

  • Systems thinker focused on root‑cause analysis and durable fixes
  • Calm and effective under pressure, especially during high‑severity incidents
  • Curious, data‑driven, and committed to continuous improvement

Nice to Have

  • Experience operating or supporting systems using technologies such as MongoDB, ZooKeeper, and RabbitMQ
  • Background in performance tuning and scalability optimization of Java services
  • Experience setting or influencing engineering standards at the organization level
  • Prior involvement in evolving SRE or platform practices in a growing engineering organization
  • Experience designing, operating, or scaling systems in cloud environments such as AWS (preferred), including familiarity with core services, networking models, and reliability features
Ready to apply?
You'll be redirected to PTC's application page.

Similar roles