
Infrastructure Software Engineer (Python, Distributed Systems)
Role summary
This Infrastructure Software Engineer role focuses on building and maintaining large-scale distributed systems, including database systems, distributed queues, and deployment platforms. The engineer will be responsible for the end-to-end lifecycle of changes in systems comprising thousands of servers, utilizing monitoring, logging, and metrics dashboards for proactive maintenance and issue resolution. Proficiency in Python is required. The role also involves familiarity with data systems, ML pipelines, binary packaging, distribution, and CI/CD processes, with a focus on maintaining the health and efficiency of infrastructure services and data pipelines. On-call support is part of the typical workload.
Job Description:
Infra/Service
Build and maintain components based on large-scale infrastructure (e.g., database systems, distributed queues, deployment platforms)
Comfortable making changes in mid/large scale distributed systems (order of 10k servers) by handling end-to-end life-cycle
- Proactive maintenance through extensive use of monitoring, logging and metrics dashboard, work with core infra to diagnose and resolve issues
- Proficiency in Python
70%
Infra/Data
(Matching roles
Software/Data Engineering Profile
)
- Familiarity with data systems, ML pipelines, and distributed databases
- Binary packaging and distribution
- Build and CI/CD
- Miscellaneous operational tasks related to build and third-party modules
- Proficiency in Python
30%
TYPICAL WORKLOAD
Monitor and maintain health of various infra services, data pipelines and build streams
- Provide on-call support during business hours
- Assess level of urgency and escalate to core team, if necessary
- Work on infra tasks/bugs
- Work on items related to infra health and efficiency
- These are most likely follow-ups from monitoring/alerts.