Advance Data Engineer

CanadaOnsiteFull TimePosted 2 months ago

Is this role right for you?

Upload your resume and get a skill-by-skill breakdown — see exactly where you match, where you're close, and what to highlight. Not a mystery percentage.

Get a tailored resume highlighting what this role needs.

Role summary

We are seeking an Advanced Data Engineer to enhance our Databricks architecture. You will develop and maintain scalable data processing scripts using Python and PySpark, ensuring code quality and reusability. This role requires a deep understanding of datasets, proficiency in advanced SQL, and strong data engineering principles for designing and optimizing ETL/ELT processes. You will manage data workflows with Apache Airflow, utilize Git/GitHub for version control and CI/CD automation, and work within the Databricks environment. Experience with AI coding assistants and data governance best practices is essential. Collaboration with cross-functional teams is key to success.

Job Responsibilities

- Understand, analyze, and contribute to the current Databricks architecture and design principles, ensuring scalability and performance.
- Develop and maintain efficient data processing scripts using
Python and PySpark
, ensuring clean, reusable, and scalable code.
- Demonstrate a deep understanding of datasets, including structure, lineage, semantics, and business context.
- Use GitHub for version control and collaborate effectively using GitHub Actions for automating workflows and CI/CD pipelines.
- Configure and maintain CI/CD pipelines in a DevOps environment for seamless code integration and deployment.
- Leverage AI coding assistants like GitHub Copilot and Databricks Assistant to improve development efficiency and code quality.
- Collaborate with cross‑functional teams including data scientists, analysts, and platform engineers.
- Utilize advanced
SQL
for data transformation, analysis, and troubleshooting across large-scale datasets.
- Apply strong
data engineering
principles to design, optimize, and maintain scalable ETL/ELT processes.
- Build and manage data workflows using
Apache Airflow
or similar orchestration tools to ensure reliable automation and scheduling.
- Work extensively within the
DBX (Databricks)
environment to develop scalable pipelines and enforce best practices across the platform.

Required Qualifications

- 5+ years of experience in data engineering or related roles.
- Proficient in Python and
PySpark
, with a strong foundation in distributed data processing.
- Hands-on experience working with
Databricks (DBX)
, including workspace administration and Unity Catalog integration.
- Strong understanding of data security and governance best practices.
- Proficiency in
SQL
, including complex queries, optimization, and performance tuning.
- Experience with monitoring tools such as Datadog for data system observability.
- Proficiency in Git/GitHub, including pull requests, branching strategies, and GitHub Actions.
- Experience with DevOps practices related to CI/CD, especially in data pipeline deployments.
- Familiarity with AI-powered coding tools such as GitHub Copilot and Databricks Assistant.
- Strong problem‑solving skills and ability to work in a fast‑paced, collaborative environment.
- Experience in workflow orchestration, preferably with
Apache Airflow
.

Preferred Skillset

Databricks or Azure certifications are a plus.
Experience in cloud platforms (Azure) in a data engineering context.
Familiarity with modern data stack tools and frameworks.
Excellent communication and documentation skills.

Ready to apply?

You'll be redirected to TechDoQuest's application page.

Similar roles

Advance Data Engineer
TechDoQuest · Toronto, Ontario, Canada · Remote