Data Engineer
Role summary
A Data Engineer is sought to join a Data & AI team supporting clients in the automotive marketing sector. The role involves building and maintaining scalable data pipelines using Azure Databricks, transforming automotive and marketing data into analytics-ready Delta tables. Key responsibilities include developing PySpark ETL workflows, implementing data quality frameworks, modeling data for various use cases, and automating pipeline executions. The ideal candidate will have 3-6 years of experience, advanced PySpark and SQL skills, and a strong understanding of the Azure data ecosystem. Experience with marketing or automotive data is highly desirable. This is an onsite position in Farmington Hills, MI.
About the Company
We’re seeking a Data Engineer to join our growing Data & AI team supporting multiple clients in the automotive marketing ecosystem.
About the Role
This role will focus on building and maintaining scalable data pipelines in Azure Databricks, transforming large volumes of automotive and marketing data into governed, analytics-ready Delta tables. The ideal candidate is highly skilled in PySpark, SQL, and Azure data services, with strong attention to detail and a passion for clean, reliable data. This position plays a key role in powering our MDM platform, building and maintaining key pipelines to the CRM Application, AI initiatives, and business intelligence solutions across Latcha’s enterprise data environment.
Responsibilities
- Design, build, and maintain scalable data pipelines in Azure Databricks to process structured and unstructured marketing and automotive data across Bronze, Silver, and Gold layers.
- Develop and optimize PySpark ETL workflows for ingesting data from external vendors (Experian, OEM, Dealer Tire, Meta, Basis, etc.) using Azure Blob, Volumes, and Delta tables.
- Implement robust data quality frameworks using Great Expectations and custom validation scripts to ensure data completeness, consistency, and accuracy.
- Collaborate with data architects and analysts to model dealer-centric and customer-centric data for reporting, analytics, and machine learning use cases.
- Automate and monitor pipeline executions via Databricks Jobs and Azure Data Factory; manage schema evolution, partitions, and performance tuning.
- Contribute to development of internal Python utilities and libraries for schema alignment, transformations, and reusable ETL logic.
- Work closely with the integrations and AI/ML engineering teams to operationalize gold-layer datasets for APIs, dashboards, and machine learning models.
Qualifications
- 3–6 years of experience in data engineering or analytics engineering, ideally within a Databricks + Azure environment.
- Bachelor’s degree in Computer Science, Information Systems, Data Engineering, or related field.
- Prior experience in marketing data, CRM, or automotive datasets is highly desirable.
- Strong communication skills and ability to collaborate in cross-functional teams.
Required Skills
- Advanced proficiency in PySpark and SQL (Databricks SQL, Delta Lake).
- Strong understanding of Azure Data Ecosystem – Databricks, Data Factory, Blob Storage, Volumes, Key Vault, and Unity Catalog.
- Hands-on experience building ETL pipelines using Delta architecture
- Proficiency with Git, CI/CD pipelines, and version control best practices.
- Ability to design efficient data models with partitioning, clustering, and schema enforcement.
- Experience working with JSON, Parquet, CSV, and other structured file types.
- Strong understanding of data governance, schema alignment, and error handling in distributed systems.
Preferred Skills
- Experience with Great Expectations, Soda, or similar data quality frameworks.
- Familiarity with FastAPI and exposing Delta tables via REST APIs.
- Knowledge of MLflow, feature stores, and model lifecycle management in Databricks.
- Experience with Power BI and Fabric Mirroring for analytics layer integration.
- Exposure to AI/LLM-based automation and RAG pipelines (preferred but not required).
- Understanding of Delta MERGE logic, schema evolution, and optimization
- Experience with Azure DevOps or GitHub Actions for CI/CD automation.
- Working knowledge of Docker and containerized deployments.
Work Location: In person at our office in Farmington Hills, MI
- Must be authorized to work in the U.S. without the need for employment-based visa sponsorship now or in the future. Latcha+Associates will not sponsor applicants for U.S. work visa status for this opportunity (no sponsorship is available for H-1B, L-1, TN, O-1, E-3, H-1B1, F-1, J-1, OPT, CPT or any other employment-based visa)
Equal Opportunity Statement
All qualified applicants are considered for employment without regard to race, color, religion, age, sex, sexual orientation, gender identity, national origin, citizenship status, disability, protected veteran status, or any category protected by applicable federal, state or local laws.
Similar roles
- Senior Data EngineerExperion Technologies · Plano, Texas, United States · Hybrid
- Lead Data EngineerSmart IT Frame LLC · Los Angeles, California, United States · Hybrid
Principal Data EngineerRS21: A Data Science and Visualization Company · United States · Remote
Senior Data EngineerRaag Solutions · Bellevue, Washington, United States · Onsite- Lead Data EngineerRetail Insight Ltd · Illinois, United States · Hybrid