Founding Data Engineer

San Jose, California, United StatesOnsiteFull TimePosted 2 months ago

Is this role right for you?

Upload your resume and get a skill-by-skill breakdown — see exactly where you match, where you're close, and what to highlight. Not a mystery percentage.

Get a tailored resume highlighting what this role needs.

Role summary

A rapidly growing, early-stage cybersecurity investment firm is seeking a Founding Data Engineer to build its next-generation analytics infrastructure. This role requires a seasoned professional with 5+ years of relevant industry experience, particularly in designing and managing large-scale data lakehouse environments (multi-petabyte). The engineer will be responsible for end-to-end data pipelines, from intake to processing to access, utilizing open-source technologies like Iceberg, Kafka, Spark, and Python. Bonus points for cybersecurity industry experience and knowledge of cloud data ecosystems, governance, quality, and lineage tools. This is a key role with significant growth potential.

Early-stage, cybersecurity investment (valued over $100M at Seed), founded by a successful serial entrepreneur, is looking to hire a Founding Data Engineer. Bonus points for prior industry exp in cybersecurity.

Our ideal candidate will be a seasoned data engineering specialist (with 5+ years relevant industry exp) who can help this company create its next generation of analytics infrastructure. Working closely with a highly-experienced founding team, you will play a central role in shaping data architecture from the ground up and will have significant room for professional growth as the organization scales.

What the Role Involves

The position centers on designing and operating a modern, open-source data lakehouse environment built to handle extremely large datasets. The engineer will be responsible for constructing robust, end-to-end pipelines—from data intake to processing to end-user access—while ensuring the platform is performant, dependable, and accurate at massive scale.

Qualifications:

Demonstrated expertise in building and managing very large data platforms (multi-petabyte range)
Background in both streaming and batch data processing
Hands-on familiarity with open-source technologies commonly used in lakehouse setups (e.g., Iceberg, PostgreSQL, Parquet, graph databases such as Neo4j)
Strong experience with streaming and analytics frameworks like Kafka, Spark, or Flink
Solid understanding of data transformation practices
Advanced proficiency in Python
Clear communication skills and the ability to produce strong technical documentation
Bonus: experience with cloud service provider data ecosystems
Bonus: knowledge of tools related to governance, data quality, and lineage

Please note:

There are no fees associated with any of the support we provide our investments. Greylock Talent provides free candidate referrals/introductions to all of our active investments (one of the many services we provide).

Due to the volume of applicants we typically receive, a follow-up email will not be sent unless a match is identified.

Ready to apply?

You'll be redirected to Greylock Partners's application page.

Is this role right for you?

Role summary

Similar roles