Founding Data Engineer (ML Pipelines)

Redwood City, California, United StatesOnsiteFull TimePosted 2 months ago

Is this role right for you?

Upload your resume and get a skill-by-skill breakdown — see exactly where you match, where you're close, and what to highlight. Not a mystery percentage.

Get a tailored resume highlighting what this role needs.

Role summary

This early-stage, well-funded cybersecurity startup is seeking a Founding Data Engineer to build and maintain the real-time data pipelines essential for their machine learning operations. The role focuses on ingesting, transforming, and organizing large data streams to ensure reliable, high-quality inputs for training and inference systems. Key responsibilities include designing scalable, low-latency pipelines, optimizing data infrastructure, and implementing data quality and governance practices. The ideal candidate will have a CS degree and 4+ years of data engineering experience, with a strong preference for prior experience in a zero-to-one environment and the cybersecurity industry.

Early-stage, cybersecurity investment (valued over $100M at Seed), founded by a successful serial entrepreneur, is looking to hire a Founding Data Engineer with a strong background supporting full ML pipelines. Bonus points for prior industry exp in cybersecurity.

Summary:

Supporting machine learning efforts by focusing on building and maintaining the real-time data pipelines that feed models with reliable, high-quality information. Your work will center on ingesting, transforming, and organizing massive data streams so that training and inference systems have consistent, accurate inputs. In this role, you will handle the backbone of ML workflows — ensuring scalability, low latency, and data governance. Your success will be measured by how efficiently and reliably data flows through the ecosystem, from raw sources to feature stores and production endpoints.

Key Qualifications:

Degree in CS (or a related field) with 4+ years related industry experience in data engineering
Prior experience with a zero-to-one a strong plus
Proven ability to collaborate across different teams and adapt to new fields

Responsibilities:

Design and maintain real-time data pipelines to support large-scale machine learning workflows, ensuring low-latency ingestion and high reliability.
Build and optimize data infrastructure for feature extraction, model training, and online inference using modern streaming and orchestration frameworks.
Collaborate with ML researchers and platform engineers to integrate models into production systems and enable continuous data feedback loops.
Implement robust data quality, observability, and governance practices to ensure scalability, compliance, and reproducibility across enterprise environments.

Please note:

There are no fees associated with any of the support we provide our investments. Greylock Talent provides free candidate referrals/introductions to all of our active investments (one of the many services we provide).

Due to the volume of applicants we typically receive, a follow-up email will not be sent unless a match is identified.

Ready to apply?

You'll be redirected to Greylock Partners's application page.