Senior Data Scientist

United StatesOnsiteFull TimeSeniorPosted 1 month agoVisa sponsorship available

Compensation estimateAI

See base, equity, bonus, and total comp estimates for this role — free, no credit card.

Brief Description
Senior Data Scientist
At Zywave, we believe in building AI systems that are reliable, measurable, and continuously improving. The Senior Data Scientist will partner directly with our VP of AI Engineering and Data Science to implement our evaluation framework for agentic AI models. This role is deeply technical, requiring expertise in statistical rigor, and production ML workflows. You'll establish the methodologies and infrastructure that ensure our AI systems meet quality standards and drive measurable business impact.
What you will do:
Evaluation Framework Design & Implementation

Design and build comprehensive evaluation frameworks for agentic AI models, including benchmark creation, metric definition, and success criteria.
Establish evaluation pipelines that integrate seamlessly into our ML development lifecycle.
Create automated testing suites that assess model performance across multiple dimensions (accuracy, latency, cost, safety, user experience).
Develop methodologies for evaluating complex agent behaviors, multi-turn interactions, and reasoning capabilities.

Experimentation & A/B Testing

Design and execute rigorous A/B tests and multivariate experiments to measure model performance in production.
Build statistical frameworks for experiment analysis, including power analysis, significance testing, and causal inference.
Partner with engineering teams to implement experimentation infrastructure and ensure proper randomization and isolation.
Communicate experiment results and recommendations to technical and non-technical stakeholders.

Data Science & ML Operations

Apply classical machine learning techniques to support model evaluation, feature engineering, and performance prediction.
Build data pipelines that support evaluation workflows, from data collection through metric computation.
Develop monitoring systems to detect model degradation, drift, and anomalies in production.
Create dashboards and reporting tools that provide visibility into model performance and experimentation results.

Strategic Partnership & Leadership

Collaborate closely with the VP of AI Engineering and Data Science to shape our AI quality strategy.
Influence technical decisions through data-driven insights and rigorous analysis.
Establish best practices and standards for evaluation across the organization.
Mentor other data scientists and engineers on experimentation methodologies and statistical principles.

Generative AI & LLM Evaluation (Bonus)

Design evaluation approaches specific to generative AI systems (prompt quality, hallucination detection, output consistency).
Develop human-in-the-loop evaluation workflows and LLM-as-judge frameworks.
Stay current with emerging evaluation methodologies in the rapidly evolving GenAI landscape.

What you should bring:

Strong foundation in statistical methods, experimental design, and causal inference—you understand the math behind A/B testing and can design experiments that answer complex questions.
Proven experience building evaluation frameworks or quality measurement systems for ML/AI products in production environments.
Deep expertise in data workflows, including ETL/ELT, data modeling, and pipeline orchestration tools.
Solid machine learning fundamentals across supervised/unsupervised learning, model evaluation, and feature engineering.
Proficiency in Python and the data science stack (pandas, scikit-learn, SQL, visualization libraries).
Experience with modern data platforms and tools (data warehouses, workflow orchestration, version control).
Strong communication skills—you can translate complex statistical concepts for diverse audiences and drive alignment on technical decisions.
Passion for quality and a methodical approach to measuring what matters.
(Preferred) Experience with generative AI evaluation, LLM observability tools, or prompt engineering.

Requirements
Senior Data Scientist
At Zywave, we believe in building AI systems that are reliable, measurable, and continuously improving. The Senior Data Scientist will partner directly with our VP of AI Engineering and Data Science to implement our evaluation framework for agentic AI models. This role is deeply technical, requiring expertise in statistical rigor, and production ML workflows. You'll establish the methodologies and infrastructure that ensure our AI systems meet quality standards and drive measurable business impact.
What you will do:
Evaluation Framework Design & Implementation

Design and build comprehensive evaluation frameworks for agentic AI models, including benchmark creation, metric definition, and success criteria.
Establish evaluation pipelines that integrate seamlessly into our ML development lifecycle.
Create automated testing suites that assess model performance across multiple dimensions (accuracy, latency, cost, safety, user experience).
Develop methodologies for evaluating complex agent behaviors, multi-turn interactions, and reasoning capabilities.

Experimentation & A/B Testing

Design and execute rigorous A/B tests and multivariate experiments to measure model performance in production.
Build statistical frameworks for experiment analysis, including power analysis, significance testing, and causal inference.
Partner with engineering teams to implement experimentation infrastructure and ensure proper randomization and isolation.
Communicate experiment results and recommendations to technical and non-technical stakeholders.

Data Science & ML Operations

Apply classical machine learning techniques to support model evaluation, feature engineering, and performance prediction.
Build data pipelines that support evaluation workflows, from data collection through metric computation.
Develop monitoring systems to detect model degradation, drift, and anomalies in production.
Create dashboards and reporting tools that provide visibility into model performance and experimentation results.

Strategic Partnership & Leadership

Collaborate closely with the VP of AI Engineering and Data Science to shape our AI quality strategy.
Influence technical decisions through data-driven insights and rigorous analysis.
Establish best practices and standards for evaluation across the organization.
Mentor other data scientists and engineers on experimentation methodologies and statistical principles.

Generative AI & LLM Evaluation (Bonus)

Design evaluation approaches specific to generative AI systems (prompt quality, hallucination detection, output consistency).
Develop human-in-the-loop evaluation workflows and LLM-as-judge frameworks.
Stay current with emerging evaluation methodologies in the rapidly evolving GenAI landscape.

What you should bring:

Strong foundation in statistical methods, experimental design, and causal inference—you understand the math behind A/B testing and can design experiments that answer complex questions.
Proven experience building evaluation frameworks or quality measurement systems for ML/AI products in production environments.
Deep expertise in data workflows, including ETL/ELT, data modeling, and pipeline orchestration tools.
Solid machine learning fundamentals across supervised/unsupervised learning, model evaluation, and feature engineering.
Proficiency in Python and the data science stack (pandas, scikit-learn, SQL, visualization libraries).
Experience with modern data platforms and tools (data warehouses, workflow orchestration, version control).
Strong communication skills—you can translate complex statistical concepts for diverse audiences and drive alignment on technical decisions.
Passion for quality and a methodical approach to measuring what matters.
(Preferred) Experience with generative AI evaluation, LLM observability tools, or prompt engineering.

Summary
Senior Data Scientist
At Zywave, we believe in building AI systems that are reliable, measurable, and continuously improving. The Senior Data Scientist will partner directly with our VP of AI Engineering and Data Science to implement our evaluation framework for agentic AI models. This role is deeply technical, requiring expertise in statistical rigor, and production ML workflows. You'll establish the methodologies and infrastructure that ensure our AI systems meet quality standards and drive measurable business impact.
What you will do:
Evaluation Framework Design & Implementation

Design and build comprehensive evaluation frameworks for agentic AI models, including benchmark creation, metric definition, and success criteria.
Establish evaluation pipelines that integrate seamlessly into our ML development lifecycle.
Create automated testing suites that assess model performance across multiple dimensions (accuracy, latency, cost, safety, user experience).
Develop methodologies for evaluating complex agent behaviors, multi-turn interactions, and reasoning capabilities.

Experimentation & A/B Testing

Design and execute rigorous A/B tests and multivariate experiments to measure model performance in production.
Build statistical frameworks for experiment analysis, including power analysis, significance testing, and causal inference.
Partner with engineering teams to implement experimentation infrastructure and ensure proper randomization and isolation.
Communicate experiment results and recommendations to technical and non-technical stakeholders.

Data Science & ML Operations

Apply classical machine learning techniques to support model evaluation, feature engineering, and performance prediction.
Build data pipelines that support evaluation workflows, from data collection through metric computation.
Develop monitoring systems to detect model degradation, drift, and anomalies in production.
Create dashboards and reporting tools that provide visibility into model performance and experimentation results.

Strategic Partnership & Leadership

Collaborate closely with the VP of AI Engineering and Data Science to shape our AI quality strategy.
Influence technical decisions through data-driven insights and rigorous analysis.
Establish best practices and standards for evaluation across the organization.
Mentor other data scientists and engineers on experimentation methodologies and statistical principles.

Generative AI & LLM Evaluation (Bonus)

Design evaluation approaches specific to generative AI systems (prompt quality, hallucination detection, output consistency).
Develop human-in-the-loop evaluation workflows and LLM-as-judge frameworks.
Stay current with emerging evaluation methodologies in the rapidly evolving GenAI landscape.

What you should bring:

Strong foundation in statistical methods, experimental design, and causal inference—you understand the math behind A/B testing and can design experiments that answer complex questions.
Proven experience building evaluation frameworks or quality measurement systems for ML/AI products in production environments.
Deep expertise in data workflows, including ETL/ELT, data modeling, and pipeline orchestration tools.
Solid machine learning fundamentals across supervised/unsupervised learning, model evaluation, and feature engineering.
Proficiency in Python and the data science stack (pandas, scikit-learn, SQL, visualization libraries).
Experience with modern data platforms and tools (data warehouses, workflow orchestration, version control).
Strong communication skills—you can translate complex statistical concepts for diverse audiences and drive alignment on technical decisions.
Passion for quality and a methodical approach to measuring what matters.
(Preferred) Experience with generative AI evaluation, LLM observability tools, or prompt engineering.

Ready to apply?

You'll be redirected to Zywave's application page.

Compensation estimateAI

Similar roles