Staff Data Engineer

Toronto, Ontario, CanadaOnsiteFull TimeStaffPosted 2 months ago

Is this role right for you?

Upload your resume and get a skill-by-skill breakdown — see exactly where you match, where you're close, and what to highlight. Not a mystery percentage.

Get a tailored resume highlighting what this role needs.

Role summary

Katalyze AI is seeking a Staff or Senior Data Engineer to lead their data infrastructure. This role involves making key architectural decisions, designing integrations with customer systems (MES, LIMS, ERP, historians), and building scalable real-time and batch data streaming pipelines. You will establish data quality and observability frameworks, collaborate with ML/Data Science teams, and design data storage solutions optimized for AI/ML. The position requires 7+ years of experience, deep expertise in data streaming technologies, proficiency in Python and SQL, and experience with cloud data platforms. A background in industrial data systems is a plus.

About Katalyze AI

Katalyze AI is a fast-growing AI-driven biotech platform company on a mission to make life-saving drugs accessible and affordable for everyone. Our AI Agents help pharmaceutical and biotech companies increase production efficiency, reduce costs, and minimize waste. We're a team of humble, fast-moving, and curious craftspeople working at the intersection of science and AI.

About the Role

We're looking for a Staff or Senior Data Engineer to own the data infrastructure that powers Katalyze AI's platform. You'll make architecture decisions, design integrations with customer data systems, and build the streaming pipelines that give our AI models and agents access to clean, reliable, real-time data — setting the patterns the team builds on as we scale.

What You'll Do

Own the data infrastructure architecture — define standards, patterns, and tooling decisions for the data layer
Design and build data integration pipelines connecting customer systems (MES, LIMS, ERP, historians) to the Katalyze AI platform
Develop and operate real-time and batch data streaming infrastructure (Kafka, Kinesis, or similar) at scale
Build and maintain ETL/ELT pipelines for structured and unstructured scientific data
Establish data quality, reliability, and observability frameworks across all pipelines
Collaborate with ML and Data Science teams to deliver clean, well-structured data for model training and inference
Design data schemas and storage solutions (data lakes, warehouses) optimized for AI/ML workloads
Work directly with customer IT teams during deployments to establish secure data connections and meet compliance requirements
Set technical direction for the data engineering function as the team grows

What We're Looking For

7+ years of data engineering experience, with a track record of owning systems — not just building within them
Demonstrated experience making architecture decisions: choosing tools, designing schemas, defining standards
Deep expertise in data streaming (Kafka, Kinesis, Flink, or Spark Streaming) — designed and operated in production, at scale
Strong proficiency in building data integrations with external enterprise systems (REST APIs, OPC-UA, proprietary connectors)
Experience with cloud data platforms (AWS Glue, Databricks, Snowflake, or similar)
Proficiency in Python and SQL; experience with dbt or similar transformation tooling
Strong data quality and observability instincts — you've built frameworks, not just used existing tools
Background in industrial data systems (OSIsoft PI, Ignition, MES/LIMS integrations) is a strong plus
Comfortable communicating technical tradeoffs to non-technical stakeholders

Ready to apply?

You'll be redirected to Katalyze AI's application page.

Is this role right for you?

Role summary

Similar roles