Programmers.io Verified
IT Services, Software Development, Staff Augmentation, Consulting
Data Engineer - Python/PySpark
Irving, Texas, United StatesHybridFull TimePosted 1 month agoVisa sponsorship available
Compensation estimateAI
See base, equity, bonus, and total comp estimates for this role — free, no credit card.
Sign up to see compensation estimateJob Role: Data Engineer - Python/PySpark
Location: Irving TX (3 Days onsite/week)
Duration: Full-Time
Job Description:
- Strong hands-on development experience in Python, PySpark, and SQL.
- Experience building large-scale ETL/ELT pipelines for structured and unstructured data.
- Deep understanding of Spark and distributed computing fundamentals (transformations, shuffles, optimization).
- Experience with big data frameworks such as Hadoop and Spark.
- Proficiency with Git-based repositories (Bitbucket / GitHub).
- Experience working with AWS, Azure, or GCP environments.
- Strong understanding of database design, data modeling, warehouse schemas (star/snowflake).
- Experience with CI/CD automation and pipeline development.
- Strong analytical and troubleshooting skills for resolving complex data issues.
- Ability to collaborate with cross-functional teams and convert business requirements into technical solutions.
- Design, develop, and maintain robust, scalable ETL/ELT pipelines.
- Write efficient, reusable, and scalable code in Python and PySpark for distributed data processing.
- Review existing data engineering code and identify opportunities for refactoring or performance improvement.
- Implement data validation, cleansing, reconciliation, and quality checks across the data lifecycle.
- Collaborate with IT and business stakeholders to understand data requirements and translate them into solutions.
- Monitor pipeline performance, troubleshoot failures, and optimize for latency, throughput, and cost.
- Participate in code reviews, enforce coding standards, and contribute to engineering best practices.
- Build and maintain CI/CD pipelines for testing, packaging, and deployment of data pipelines.
- Ensure data reliability, security, and consistency across environments.
- Work with cloud services and big data platforms to support modern data architecture.