Senior Data Engineer
Role summary
VaxCare is seeking a Senior Data Engineer to join their Product Group's Data Engineering team. This role focuses on designing, developing, and managing the company's data processing and analytics infrastructure. The Senior Data Engineer will be responsible for building and optimizing Delta Lake-based data pipelines, architecting lakehouse solutions, and implementing CI/CD and data quality frameworks. Key responsibilities include leveraging Databricks, Apache Spark, and modern data engineering principles to ensure efficient, scalable, and governed data operations. The role also involves mentoring junior engineers, leading technical designs, and driving cost optimization initiatives within the data platform.
THE POSITION
You’ll be a key member of VaxCare’s Product Group, joining our Data Engineering team and reporting to our Data Engineering Lead. We are seeking a highly skilled and experienced Senior Data Engineer to join our team. As a Data Engineer, you will play a critical role in the design, development, and management of our data processing and analytics infrastructure. The ideal candidate will have extensive hands-on experience working with Spark and Databricks, as well as a strong background in data engineering principles and best practices.
RESPONSIBILITIES
- Design and implement Delta Lake-based data pipelines using Databricks Workflows, Delta Live Tables (DLT), and Unity Catalog for enterprise data governance
- Build ELT/ETL pipelines using medallion architecture (bronze/silver/gold layers) supporting both batch and streaming workloads with Auto Loader and Structured Streaming
- Architect lakehouse solutions leveraging Delta Lake ACID transactions, Z-ordering, liquid clustering, and partition
- Implement CI/CD pipelines for data workflows using Git integration and Databricks Asset Bundle
- Design data quality frameworks using Delta Live Tables expectations and custom PySpark validation with automated alerting and SLA monitoring
- Create materialized views and incremental refresh strategies for optimized query performance
- Collaborate with data scientists, ML engineers, and analysts to implement feature engineering pipelines and support MLOps workflows
- Mentor junior engineers, conduct code reviews, and lead technical design
- Implement data observability and monitoring using Databricks SQL, Lakeview dashboards, and custom alerting frameworks
- Drive cost optimization initiatives leveraging Photon engine, serverless compute, and FinOps best practices
- Troubleshoot and resolve complex issues related to distributed computing, data skew, and performance bottlenecks
- Create comprehensive technical documentation including data contracts, runbooks, and data catalog metadata in Unity Catalog
- Champion DataOps best practices including testing strategies, performance tuning, and data platform engineering principles
- Stay current with lakehouse architecture trends and emerging technologies to continuously improve our data infrastructure
EXPERIENCE AND QUALIFICATIONS
Education:
- Bachelor's degree in Computer Science, Data Engineering, Engineering, or related technical field OR equivalent practical experience
- Master's degree or relevant industry certifications (Databricks Certified Data Engineer Professional, AWS/Azure Data certifications) preferred
Experience:
- 7+ years of data engineering experience with 3+ years hands-on production experience building data pipelines on Databricks and Apache Spark
- Proven track record of designing and implementing lakehouse architectures at scale
Technical Skills:
*Programming & Languages:*
- Expert-level proficiency in Python (PySpark, pandas, NumPy) and SQL (complex queries, window functions, CTEs, query optimization)
- Experience with Spark SQL, Delta Lake SQL, and Databricks SQL
*Apache Spark Expertise:*
- Deep expertise in Apache Spark including:
- Performance optimization (partition tuning, broadcast joins, data skew handling, caching strategies)
- Delta Lake features (ACID transactions, time travel, MERGE operations, CDC, liquid clustering)
- Understanding of Spark internals (DAG execution, catalyst optimizer, tungsten execution engine)
*Databricks Platform:*
- Production experience with Databricks including:
- Delta Live Tables (DLT) for declarative pipeline development
- Unity Catalog for data governance, access control, and lineage tracking
- Databricks Workflows and orchestration
- Cluster optimization and cost management (spot instances, autoscaling, serverless compute)
- Databricks Asset Bundles for CI/CD
- Databricks SQL and Lakeview dashboards
*Data Architecture & Modeling:*
- Strong understanding of data modeling techniques:
- Dimensional modeling (star schema, fact/dimension tables)
- Medallion architecture (bronze/silver/gold layers)
- Slowly Changing Dimensions (SCD) implementations
- Expert-level SQL skills including query optimization, execution plan analysis, and performance tuning for billion-row datasets
- Experience with modern lakehouse patterns and understanding of lakehouse vs. traditional data warehouse trade-offs
- Familiarity with legacy systems (Oracle, SQL Server, DB2) and migration strategies to cloud platforms
*DevOps & DataOps:*
- Strong DevOps/DataOps experience:
- Git workflows (branching strategies, pull requests, code reviews)
- CI/CD pipelines for data workflows (GitHub Actions, Azure DevOps, Jenkins)
- Testing strategies (unit tests, integration tests, data quality tests)
- Monitoring and observability (logging, alerting, SLA tracking)
Leadership & Soft Skills:
- Proven ability to mentor junior engineers and conduct technical code reviews
- Experience leading technical design discussions
- Strong stakeholder management skills with ability to translate technical concepts to non-technical audiences
- Systematic approach to debugging complex distributed systems and performance troubleshooting
- Excellent problem-solving abilities with focus on pragmatic trade-offs between speed, cost, and quality
- Strong communication and collaboration skills in cross-functional team environments
Similar roles
- Senior Data EngineerExperion Technologies · Plano, Texas, United States · Hybrid
- Lead Data EngineerSmart IT Frame LLC · Los Angeles, California, United States · Hybrid
Principal Data EngineerRS21: A Data Science and Visualization Company · United States · Remote
Senior Data EngineerRaag Solutions · Bellevue, Washington, United States · Onsite- Lead Data EngineerRetail Insight Ltd · Illinois, United States · Hybrid