
Infrastructure Software Engineer (Python, Distributed Systems)
Role summary
This role focuses on building and maintaining large-scale infrastructure components, including database systems, distributed queues, and deployment platforms. The engineer will be responsible for the end-to-end lifecycle of changes in distributed systems, utilizing monitoring, logging, and metrics for proactive maintenance and issue resolution. Key responsibilities include supporting infrastructure health, data pipelines, and build streams, with on-call duties and escalation as needed. Proficiency in Python is essential, with familiarity in data systems, ML pipelines, and CI/CD being preferred.
Job Description:
Infra/Service
Build and maintain components based on large-scale infrastructure (e.g., database systems, distributed queues, deployment platforms)
Comfortable making changes in mid/large scale distributed systems (order of 10k servers) by handling end-to-end life-cycle
- Proactive maintenance through extensive use of monitoring, logging and metrics dashboard, work with core infra to diagnose and resolve issues
- Proficiency in Python
70%
Infra/Data
(Matching roles
Software/Data Engineering Profile
)
- Familiarity with data systems, ML pipelines, and distributed databases
- Binary packaging and distribution
- Build and CI/CD
- Miscellaneous operational tasks related to build and third-party modules
- Proficiency in Python
30%
TYPICAL WORKLOAD
Monitor and maintain health of various infra services, data pipelines and build streams
- Provide on-call support during business hours
- Assess level of urgency and escalate to core team, if necessary
- Work on infra tasks/bugs
- Work on items related to infra health and efficiency
- These are most likely follow-ups from monitoring/alerts.