
Data & Software Engineer
Role summary
We are seeking a Data & Software Engineer to join a small team focused on building complex data flows for a custom application. The role requires advanced Python programming, familiarity with Java, and a strong understanding of data security, privacy, governance, and compliance principles. You will be responsible for building production data pipelines and ETL workflows at scale, leveraging tools like Python, Spark, Docker, AWS services, SQL databases (MySQL, PostgreSQL), and orchestration tools such as Airflow. Experience with data catalogs, lineage tracking, geospatial data, and AI/ML integration is also essential. The ideal candidate will work with stakeholders to design solutions with minimal oversight and contribute to documentation and best practices.
Overview:
We are seeking a Data & Software Engineer works with a small team to build complex data flows for a custom application. Successful candidate will have advanced Python programming skills, familiarity with Java, an understanding of data security, privacy, governance and compliance principles and a demonstrated history of building production data pipelines and ETL workflows at scale. Candidate must have experience:
Responsibilities:
- - Building end-to-end data pipelines leveraging Python
Using orchestration tools to deploy data pipelines, including configuring and updating Spark Jobs
- Containerizing and deploying applications in cloud environments like AWS.
- Working with MySQL and PostgreSQL including performance tuning, schema design, and query optimization for complex, analytical workloads.
- Leveraging industry standard tools for code control (Git, IaaC control, etc.)
- Working with data catalogs, tracking data lineage and handling a variety of data formats, including Geospatial.
- Using Bash scripting for automation and data processing tasks
- Integrating Al/ML services and models
- - Work with stakeholders to understand data requirements, assess feasibility, and design appropriate solutions with minimal oversight
- Leverage strong problem-solving and debugging skills for data quality issues, pipeline failures, and performance bottlenecks
- Leverage a background in large-scale data migration or platform modernization efforts
Contribute to data engineering documentation, best practices, and design patterns.
Qualifications:
- Active TS/SCI W/ Polygraph required.
- Bachelor's degree in Computer Science, Engineering, Finance, or a related technical field, or equivalent practical experience.
- Minimum of 5 years' experience with:
- Apache Spark & PySpark
- Advanced Python skills (including Pandas & NumPy)
- Docker, Podman
- AWS S3, Lambda & Step functions
- Apache Iceberg, Airflow, etc.
- SQL (with Trino)
- NoSQL, DynamoDB
- Unity Catalog OSS, Apache Polaris
- Apache Superset
- Terraform or CloudFormation
- OpenLineage
- H3, PostGIS

