Data Engineer (AWS / Azure / GCP | Python | PySpark | ETL Pipelines)
Role summary
Seeking a Data Engineer with expertise in Python, PySpark, and cloud platforms (AWS, Azure, or GCP) to design, build, and maintain scalable data pipelines and ETL solutions. The role involves developing cloud-native data platforms for analytics and data-driven applications, focusing on backend data engineering and large-scale data pipelines. Responsibilities include data pipeline development, cloud architecture implementation, data modeling, querying, data quality checks, monitoring, and collaboration within CI/CD environments.
Location:\* Remote (U.S.)\*
Employment Type:\* Contract / Full-Time\*
Job Overview
We are seeking a Data Engineer with strong experience in Python, PySpark, and cloud-based data platforms (AWS, Azure, or GCP) to design, build, and maintain scalable data pipelines and ETL solutions.
This role focuses on developing cloud-native data platforms that support analytics, reporting, and data-driven applications.
Note: Experience with any one cloud platform (AWS, Azure, or GCP) is sufficient. Experience across multiple platforms is a plus.Alternative Titles (for search visibility)
This role may also be relevant for candidates with experience as a Big Data Engineer, ETL Developer, Cloud Data Engineer, Data Platform Engineer, or Spark Engineer.
The candidate should not require sponsorship to work in the U.S.A - now or in the future.
Key Responsibilities
Data Pipeline Development
- Design and build scalable ETL pipelines using Python and PySpark
- Process large datasets using distributed data processing frameworks
Cloud & Data Architecture
- Develop solutions using cloud-native data services, including:
- AWS: Glue, Redshift, Lambda, SNS/SQS, Step Functions
- Azure: Data Factory, Synapse Analytics, Event Hub
- GCP: Dataflow, BigQuery, Pub/Sub
- Implement event-driven and serverless architectures
Data Modeling & Querying
- Design and optimize data models in cloud data warehouses such as:
- Redshift (AWS)
- Synapse (Azure)
- BigQuery (GCP)
- Write efficient SQL for data transformation, validation, and reporting
Data Quality & Operations
- Implement data quality checks, monitoring, and alerting
- Handle pipeline failures, logging, and operational support
Collaboration & DevOps
- Collaborate with data analysts, engineers, and stakeholders
- Contribute to CI/CD pipelines and infrastructure-as-code
- Maintain documentation for reproducible deployments
Required Skills
- Strong programming experience in Python and PySpark
- Hands-on experience with at least one cloud platform: AWS, Azure, or GCP
- Experience with cloud-native data services (examples across platforms listed above)
- Strong SQL skills and experience with data modeling and query optimization
- Experience building and maintaining ETL pipelines and data workflows
- Familiarity with data quality frameworks, monitoring, and alerting
- Experience with Git and CI/CD workflows
Nice to Have
- Experience working across multiple cloud platforms (AWS, Azure, GCP)
- Experience with data lake architectures (S3, ADLS, GCS)
- Exposure to Spark optimization and distributed systems
- Experience with streaming data pipelines
Important (Candidate Fit)
This role focuses on backend data engineering and large-scale data pipelines.
Candidates whose experience is primarily in the following areas may not be a good fit:
- Business Intelligence (BI) or reporting-only roles
- Tableau / Power BI development without backend data engineering
- Pure data analysis without ETL or pipeline development experience
Keywords
Data Engineer, AWS, Azure, GCP, PySpark, ETL Developer, Big Data Engineer, Data Pipelines, Redshift, BigQuery, Synapse, Cloud Data Engineer, Spark
Suggested Pay Range: $60 – $70 / hour
Pay: $60.00 - $70.00 per hour
Work Location: Remote