Crustdata Verified
Software, Artificial Intelligence, Data Management
Senior Data Platform Engineer
San Francisco, California, United StatesOnsiteFull TimeSenior$140,000–$200,000 /yrPosted 2 months agoHidden Gem · YC Startup
Role summary
We are seeking a Senior Data Platform Engineer to design, build, and maintain our core data infrastructure, including data warehouses and data lakes, using cloud technologies (AWS, GCP, Azure). The role involves developing scalable ETL/ELT pipelines, supporting data science and ML initiatives, and managing workflow orchestration with tools like Airflow or Dagster. Experience with real-time streaming (Kafka) and big data technologies (Spark, Flink) is crucial. The ideal candidate has 3+ years of software engineering experience with a focus on data engineering, strong programming skills in Python or similar, and a pragmatic, startup-oriented mindset.
### **The Role**
We are looking for a foundational member of our engineering team: a highly motivated Software Engineer to own the design, creation, and evolution of our data platform. You will be part of the team that owns the data ingestion and management infrastructure that powers Crustdata’s capabilities.
If you are passionate about building robust, scalable data systems and want to see your work directly influence customers, this is the role for you.
### **What You'll Do**
* **Architect & Build:** Design, build, and maintain our core data infrastructure, including our data warehouse and data lake, using modern cloud technologies (AWS, GCP, or Azure).
* **Pipeline Development:** Develop and scale robust, fault-tolerant data pipelines (ETL/ELT) to ingest and process massive volumes of structured and unstructured data from diverse sources.
* **Enable Data Science & ML:** Create the foundational platform to support our data scientists and ML engineers. This includes building systems for feature engineering, model training, and deploying ML models into production.
* **Orchestration at Scale:** Implement and manage workflow orchestration for hundreds of daily data jobs, ensuring reliability, monitorability, and efficiency using tools like Airflow, Dagster, or Prefect.
* **Real-time Infrastructure:** Build and manage real-time data streaming pipelines using technologies like Kafka or Flink to power live dashboards and time-sensitive product features.
* **Data Quality & Governance:** Champion data quality and reliability. Implement frameworks for data validation, testing, and monitoring to ensure our data is accurate and trustworthy.
### **Who You Are**
* **Experience:** You have 3+ years of professional software engineering experience, with a significant focus on data engineering or building backend systems at scale.
* **Strong Coder:** You possess strong programming skills in Python or another modern language (e.g., Java, Go).
* **Big Data Expertise:** You have hands-on experience with modern big data technologies such as Spark, Flink, or Dask.
* **Pipeline Orchestration:** You have practical experience with workflow management tools like Temporal, Airflow, Dagster, or Prefect.
* **Problem Solver:** You are a pragmatic problem-solver who can navigate ambiguity, manage complexity, and take ownership of projects from inception to completion.
* **Startup Mentality:** You are excited to work in a fast-paced, collaborative environment and wear multiple hats.
### **Nice to Haves**
* Experience with real-time streaming technologies (Kafka, Pulsar, Kinesis).
* Familiarity with containerization and orchestration (Docker, Kubernetes).
* Knowledge of modern data warehousing and lakehouse architectures (e.g., Delta Lake, Iceberg).
We are looking for a foundational member of our engineering team: a highly motivated Software Engineer to own the design, creation, and evolution of our data platform. You will be part of the team that owns the data ingestion and management infrastructure that powers Crustdata’s capabilities.
If you are passionate about building robust, scalable data systems and want to see your work directly influence customers, this is the role for you.
### **What You'll Do**
* **Architect & Build:** Design, build, and maintain our core data infrastructure, including our data warehouse and data lake, using modern cloud technologies (AWS, GCP, or Azure).
* **Pipeline Development:** Develop and scale robust, fault-tolerant data pipelines (ETL/ELT) to ingest and process massive volumes of structured and unstructured data from diverse sources.
* **Enable Data Science & ML:** Create the foundational platform to support our data scientists and ML engineers. This includes building systems for feature engineering, model training, and deploying ML models into production.
* **Orchestration at Scale:** Implement and manage workflow orchestration for hundreds of daily data jobs, ensuring reliability, monitorability, and efficiency using tools like Airflow, Dagster, or Prefect.
* **Real-time Infrastructure:** Build and manage real-time data streaming pipelines using technologies like Kafka or Flink to power live dashboards and time-sensitive product features.
* **Data Quality & Governance:** Champion data quality and reliability. Implement frameworks for data validation, testing, and monitoring to ensure our data is accurate and trustworthy.
### **Who You Are**
* **Experience:** You have 3+ years of professional software engineering experience, with a significant focus on data engineering or building backend systems at scale.
* **Strong Coder:** You possess strong programming skills in Python or another modern language (e.g., Java, Go).
* **Big Data Expertise:** You have hands-on experience with modern big data technologies such as Spark, Flink, or Dask.
* **Pipeline Orchestration:** You have practical experience with workflow management tools like Temporal, Airflow, Dagster, or Prefect.
* **Problem Solver:** You are a pragmatic problem-solver who can navigate ambiguity, manage complexity, and take ownership of projects from inception to completion.
* **Startup Mentality:** You are excited to work in a fast-paced, collaborative environment and wear multiple hats.
### **Nice to Haves**
* Experience with real-time streaming technologies (Kafka, Pulsar, Kinesis).
* Familiarity with containerization and orchestration (Docker, Kubernetes).
* Knowledge of modern data warehousing and lakehouse architectures (e.g., Delta Lake, Iceberg).
Similar roles
- Senior Data Platform EngineerEITACIES Inc. · Austin, Texas, United States · Onsite
- Senior Data Platform EngineerUpstart · United States |, United States · Remote
- Data Platform EngineerFigma · San Francisco, Ca • New York, Ny • United States · Remote
Lead Data Platform EngineerMastercard · Ontario, Canada · Hybrid
Data Platform EngineerPwC · Ontario, Canada · Hybrid