Site Reliability Engineer

San Francisco, California, United StatesOnsiteFull TimePosted 2 months ago

Is this role right for you?

Upload your resume and get a skill-by-skill breakdown — see exactly where you match, where you're close, and what to highlight. Not a mystery percentage.

Get a tailored resume highlighting what this role needs.

Role summary

Velia Multiservices seeks a top-tier Site Reliability Engineer for a fast-growing, early-stage startup. This role is critical for scaling and strengthening a high-performance platform used by enterprise clients. The engineer will own platform reliability, availability, and performance, working directly with customers to resolve complex production issues. Key responsibilities include identifying and debugging system-level problems, implementing proactive improvements, and conducting performance tuning. Qualifications include experience with Kubernetes, distributed systems, modern backend frameworks like FastAPI, large-scale production environments, and strong programming skills in Python, C, or Rust. Deep understanding of system performance, observability, and debugging techniques is essential. The role offers direct impact, influence on technical direction, and high ownership in a dynamic startup environment.

Velia Multiservices
is proud to partner with a fast-growing, early-stage startup to identify a top-tier
Site Reliability Engineer
who will play a critical role in scaling and strengthening a high-performance platform used by enterprise clients such as Nvidia, Samsara, Zapier, and PwC.
This is a unique opportunity to work closely with founding leadership and take ownership of reliability and performance at scale.
Key Responsibilities
Own the reliability, availability, and performance of the platform, ensuring a seamless experience for enterprise customers
Work directly with customers to investigate, troubleshoot, and resolve complex production issues
Identify and debug system-level problems, including memory leaks, connection pool inefficiencies, and other critical failures
Proactively implement improvements to enhance system stability and prevent recurring issues
Conduct system profiling, benchmarking, and performance tuning to optimize latency and throughput
Collaborate cross-functionally in a fast-paced startup environment to deliver scalable and resilient solutions
Qualifications
Experience working with Kubernetes and building/maintaining distributed systems
Familiarity with modern backend frameworks such as FastAPI
Proven experience working with large-scale, production environments
Strong programming expertise in Python, C, or Rust
Deep understanding of system performance, observability, and debugging techniques
Experience identifying and resolving issues related to memory management, networking, and system reliability
Ability to work directly with customers and communicate technical issues effectively
Comfortable operating in a fast-paced, early-stage startup environment with a high level of ownership
Nice to Have
Experience with
PostgreSQL
and
Redis
Familiarity with monitoring and observability tools such as
Prometheus
and
Grafana
Additional Hands-on Experience Scaling Systems In Cloud-native Environments
Why Join
Direct impact on a rapidly growing product serving leading enterprise organizations
Opportunity to work alongside founders and influence technical direction
High ownership role with visibility and career growth potential

Ready to apply?

You'll be redirected to Velia Multi Services's application page.

Is this role right for you?

Role summary

Similar roles