Performance Test Data Engineer

Roseville, California, United StatesOnsiteContractPosted 2 months agoVisa sponsorship available

Is this role right for you?

Upload your resume and get a skill-by-skill breakdown — see exactly where you match, where you're close, and what to highlight. Not a mystery percentage.

Get a tailored resume highlighting what this role needs.

Role summary

We are seeking a Data Platform Engineer with a strong focus on Quality Assurance and Data Storage. This role involves designing, developing, and executing data validation and QA test strategies for large-scale data platforms, including ETL/ELT pipelines and data lakes. Responsibilities include performing end-to-end data validation, validating large datasets using SQL and Python, and ensuring data quality, accuracy, and performance across distributed environments. The ideal candidate will have hands-on experience with data lakes (S3, ADLS, HDFS), various data formats (Parquet, ORC, Delta Lake), and automated testing frameworks. Experience with cloud platforms like AWS or Azure is required.

Job Description: Data Platform Engineer (QA + Storage Focus)

Role Overview

We are looking for a
Data Platform Engineer with strong QA and Data Validation experience
to support large-scale data platforms. The ideal candidate will have hands-on experience in
testing data pipelines, validating data lakes/storage systems, and ensuring data quality, accuracy, and performance across distributed environments
.

Key Responsibilities

- Design, develop, and execute
data validation and QA test strategies
for ETL/ELT pipelines
- Perform
end-to-end data validation
between source systems and target data platforms (Data Lake / Data Warehouse)
- Validate
large-scale datasets
(millions/billions of records) using SQL, Python, and PySpark
- Perform
file-level and storage validation
across data lakes (S3 / ADLS / HDFS)
- File count validation
- Schema validation
- Partition validation
- Data completeness checks
- Test and validate
data ingestion pipelines
(batch & streaming)
- Validate data across
Bronze / Silver / Gold layers (Medallion architecture)
- Perform
data reconciliation and consistency checks
across multiple systems
- Develop and maintain
automated data validation frameworks
using Python (PyTest or similar)
- Implement and monitor
data quality checks
(nulls, duplicates, referential integrity)
- Validate
data formats
such as Parquet, ORC, Delta Lake
- Conduct
performance testing of data pipelines and queries
(Spark / SQL)
- Analyze and validate
data processing performance, latency, and throughput
- Collaborate with Data Engineers to
identify and fix data issues and optimize pipelines

Required Skills

Data QA / Testing

- Strong experience in
ETL/ELT testing and data validation
- Expertise in
SQL for data validation and reconciliation
- Experience with
test case design, execution, and defect tracking
- Knowledge of
data quality frameworks and validation techniques

Data Engineering Knowledge

- Understanding of
data pipelines (ADF / Airflow / Glue / Databricks)
- Experience with
PySpark / Apache Spark (basic to intermediate)
- Familiarity with
data modeling and transformations

Storage / Data Lake Validation (MANDATORY)

- Hands-on experience with
Data Lakes (AWS S3 / Azure ADLS / HDFS)
- Strong knowledge of:
- File-based validation
- Partitioning strategies
- Schema evolution
- Experience validating
Parquet / ORC / Delta Lake datasets

Programming & Tools

- Python (for automation/testing)
- SQL (strong)
- Experience with
PyTest / automation frameworks
- Git / CI-CD basics

Cloud Platforms (Any One)

AWS (S3, Glue, Athena) OR
Azure (ADLS, ADF, Databricks)

Nice to Have

- Experience with
Great Expectations / Deequ (data quality tools)
- Knowledge of
Kafka / streaming validation
- Experience with
Delta Lake features (time travel, versioning)
- Exposure to
data governance tools (Glue Catalog, Unity Catalog)

Ideal Candidate Profile

- Strong
Data Engineer with QA/testing experience
- Hands-on with
data validation + storage systems
- Comfortable working with
large-scale distributed data platforms
- Detail-oriented with a focus on
data accuracy, quality, and performance

Ready to apply?

You'll be redirected to Rezolve Ai's application page.