Site Reliability Engineer

Chicago, Illinois, United StatesOnsiteFull Time$250,000–$450,000 /yrPosted 2 months agoVisa sponsorship available

Is this role right for you?

Upload your resume and get a skill-by-skill breakdown — see exactly where you match, where you're close, and what to highlight. Not a mystery percentage.

Get a tailored resume highlighting what this role needs.

Role summary

We are seeking a Site Reliability Engineer to join our Strategy Development team, focusing on the infrastructure, reliability, and performance of trading systems. This role is crucial for ensuring the resilience, speed, and scalability of our production environment. You will be responsible for monitoring and maintaining system availability, responding to incidents, automating operational tasks, and optimizing infrastructure to handle increasing throughput with minimal latency. The ideal candidate has a strong SRE or production engineering background, experience with Python and Bash, Linux systems, and monitoring tools, and thrives in high-pressure, real-time environments.

We’re looking for a hands-on, entrepreneurial Site Reliability Engineer to join our Strategy Development team. This role sits at the intersection of infrastructure, reliability, and trading performance - ensuring the systems behind our strategies are resilient, fast, and scalable. You’ll own the stability of our production environment, build automation that reduces operational overhead, and play a critical role in keeping our trading systems running at peak performance.

This role is ideal for someone with a strong SRE or production engineering background who thrives in high-stakes environments and wants to work close to real-time systems where reliability and speed are paramount.

What You’ll Do

Own production reliability: Monitor, maintain, and improve the availability and performance of trading systems in real time
Respond to incidents: Troubleshoot issues quickly, lead root cause analysis, and implement preventative fixes
Automate relentlessly: Build tools and workflows that eliminate manual intervention and improve system resilience
Scale infrastructure: Design and optimize systems to handle increasing throughput with minimal latency
Partner across teams: Work closely with traders, quants, and engineers to ensure systems meet performance and reliability needs
Improve observability: Enhance monitoring, alerting, and logging to provide clear visibility into system health

What You Bring

Bachelor’s degree in Computer Science, Engineering, or related field
3+ years of experience in site reliability engineering, systems engineering, or production operations
Strong experience with Python and Bash for scripting and automation
Deep understanding of Linux systems and containerized environments (Kubernetes preferred)
Experience with monitoring and observability tools (e.g., Prometheus, Grafana)
Familiarity with distributed systems, messaging (Kafka), and databases (SQL/NoSQL)
Ability to debug complex issues in high-pressure, real-time environments
Strong communication skills and a collaborative mindset

Why You’ll Love It

You’ll sit at the core of a high-performance trading environment where uptime, speed, and precision are critical. Your work will directly impact system stability and trading outcomes, and you’ll have the autonomy to build, automate, and improve systems alongside a team of exceptional engineers and traders in a fast-paced, high-impact setting.

Ready to apply?

You'll be redirected to Fintal Partners's application page.

Is this role right for you?

Role summary

Similar roles