Principal Data Platform Engineer
Role summary
Audiience is seeking a Principal Data Platform Engineer to design and build the data infrastructure powering their AI models in the publishing industry. This founding engineering role focuses on creating data pipelines, quality systems, and feedback loops from scratch, without legacy technical debt. The ideal candidate will have strong experience with production data pipelines (Spark, dbt, Airflow), ML data quality, annotation workflows, vector databases, and cloud data stacks. They will define data strategy, ensure data health through observability, and partner with ML research. This is a fully remote position requiring Pacific Time zone hours.
## About Audiience
We're transforming how content is created and trusted in publishing. We deliver technology that is accurate, scalable, and creative – built to elevate both craft and integrity. We attract the best in the business not through traditional methods, but through the solutions we create and the culture we've built.
## Our Culture
- Low ego, high confidence – We sharpen each other through continuous improvement
- Open communication – Even when it creates necessary conflict
- Systems thinking – We solve complex problems through collaboration
- Human-centered – We work because we love what we do, but we are human first
- Integritycreativity – We win together or not at all
## The Role
The dirtiest secret in AI is this: your models are only as good as your data. Everyone knows it. Very few can actually do something about it. At Audiience, we're entering a niche market in publishing that has never had an AI-native solution - which means we get to define what good data looks like, from scratch, without the legacy technical debt that bogs down every other team working on similar problems.
We're building a founding engineering team of rare individuals who can think from first principles and build with precision. This is a seat reserved for someone who is obsessed with the data flywheel - who understands that in AI, data infrastructure isn't a support function, it's the competitive moat. You'll design and own the pipelines, quality systems, and feedback loops that make our models measurably better over time.
This role is not about maintaining someone else's stack. It's about architecting the system that powers a category-defining product. If you've ever wanted to build the data foundation for something important - from the very first line - this is that opportunity.
## What You'll Do
- Design and build the data ingestion, transformation, and labeling pipelines that power our AI models
- Define and enforce data quality standards, annotation schemas, and governance frameworks from the ground up
- Build feedback loops that capture real-world model performance and translate it back into training signal
- Partner closely with ML research and infrastructure to ensure data formats, volumes, and quality match training needs
- Create observability into data health - drift detection, quality degradation, and coverage gaps
- Shape the data strategy for a domain where ground truth doesn't exist yet - you'll help invent it
## What We're Looking For
Core Technical Expertise
- Strong experience building production-grade data pipelines (Spark, dbt, Airflow, or equivalent)
- Deep understanding of data quality, schema evolution, and versioning in ML contexts
- Familiarity with annotation and labeling workflows and tooling (Label Studio, Scale AI, or equivalent)
- Experience with vector databases, embedding pipelines, and retrieval infrastructure
- Proficiency with cloud data stacks (S3/GCS, Snowflake, BigQuery, or equivalent)
- Solid understanding of how data decisions directly impact model performance and training dynamics
Communication
- Communication excellence – Can write clear data contracts, schema documentation, and governance policies that engineers and non-technical stakeholders can actually use
- Demonstrated ability to explain data quality tradeoffs and their downstream model consequences in writing and in conversation
Background
- Degree not required
- Prior experience as a data engineer, ML data platform engineer, or data infrastructure lead
- Startup or fast-moving environment experience is a plus
## Your Mindset
- Problem-solving prowess – You see problems others don't and solve them in ways others can't
- Tenacious learner – Self-taught capabilities and continuous improvement are in your DNA
- Systems thinker – You understand how complex systems interact and create elegant solutions
- Results-oriented – Bias toward flexibility, impact, and getting it done
- Collaborative by nature – You believe we can only win if we do it together
## Nice to Have
- Experience with RLHF data pipelines or human preference data collection
- Prior work in media, publishing, or content-heavy domains
- Contributions to open data tooling or data-centric AI initiatives
- Experience with synthetic data generation or augmentation strategies
- Previous startup or early-stage engineering experience
- Volunteer work
## Why Join Us
- Build the data foundation for something that has never existed - in a market that has never been touched
- Join a founding core of technical builders who treat data as the strategic asset it actually is
- Solve extraordinary problems with fewer resources than competitors - your impact is magnified
- Work with brilliant misfits who value craft, integrity, and creativity over politics
- Own the data flywheel - your work directly determines how fast and how well our models improve
- Continuous learning - work at the bleeding edge of AI data infrastructure with teammates who challenge and sharpen you
## Location
This role is fully remote; however, you must be willing to work Pacific Time zone hours. Occasional travel will be required for team workshops.
##
## Come Work With Us!
We offer competitive compensation and benefits, equity, generous time off to recharge, and flexible working hours.
We are an equal opportunity employer committed to building a diverse team. We welcome applications from all backgrounds, especially those who might not check every box but possess the savant-level problem-solving abilities we seek.
Similar roles
- Senior Data Platform EngineerEITACIES Inc. · Austin, Texas, United States · Onsite
- Senior Data Platform EngineerUpstart · United States |, United States · Remote
- Data Platform EngineerFigma · San Francisco, Ca • New York, Ny • United States · Remote
Lead Data Platform EngineerMastercard · Ontario, Canada · Hybrid
Data Platform EngineerPwC · Ontario, Canada · Hybrid