
Founding Data Engineer
Role summary
A rapidly growing, early-stage cybersecurity investment firm is seeking a Founding Data Engineer to build its next-generation analytics infrastructure. This role requires a seasoned professional with 5+ years of relevant industry experience, particularly in designing and managing large-scale data lakehouse environments (multi-petabyte). The engineer will be responsible for end-to-end data pipelines, from intake to processing to access, utilizing open-source technologies like Iceberg, Kafka, Spark, and Python. Bonus points for cybersecurity industry experience and knowledge of cloud data ecosystems, governance, quality, and lineage tools. This is a key role with significant growth potential.
Early-stage, cybersecurity investment (valued over $100M at Seed), founded by a successful serial entrepreneur, is looking to hire a Founding Data Engineer. Bonus points for prior industry exp in cybersecurity.
Our ideal candidate will be a seasoned data engineering specialist (with 5+ years relevant industry exp) who can help this company create its next generation of analytics infrastructure. Working closely with a highly-experienced founding team, you will play a central role in shaping data architecture from the ground up and will have significant room for professional growth as the organization scales.
What the Role Involves
The position centers on designing and operating a modern, open-source data lakehouse environment built to handle extremely large datasets. The engineer will be responsible for constructing robust, end-to-end pipelines—from data intake to processing to end-user access—while ensuring the platform is performant, dependable, and accurate at massive scale.
Qualifications:
- Demonstrated expertise in building and managing very large data platforms (multi-petabyte range)
- Background in both streaming and batch data processing
- Hands-on familiarity with open-source technologies commonly used in lakehouse setups (e.g., Iceberg, PostgreSQL, Parquet, graph databases such as Neo4j)
- Strong experience with streaming and analytics frameworks like Kafka, Spark, or Flink
- Solid understanding of data transformation practices
- Advanced proficiency in Python
- Clear communication skills and the ability to produce strong technical documentation
- Bonus: experience with cloud service provider data ecosystems
- Bonus: knowledge of tools related to governance, data quality, and lineage
Please note:
There are no fees associated with any of the support we provide our investments. Greylock Talent provides free candidate referrals/introductions to all of our active investments (one of the many services we provide).
Due to the volume of applicants we typically receive, a follow-up email will not be sent unless a match is identified.