Data Scientist

St. Louis, Missouri, United StatesOnsiteContractPosted 2 months ago

Is this role right for you?

Upload your resume and get a skill-by-skill breakdown — see exactly where you match, where you're close, and what to highlight. Not a mystery percentage.

Get a tailored resume highlighting what this role needs.

Role summary

This role is for a Data Scientist who will leverage advanced Data Science, Machine Learning, and Generative AI, combined with Data Engineering skills using Spark and AWS, to build production-grade analytical products. Responsibilities include leading the full ML lifecycle from problem framing to deployment and monitoring, developing various ML models (supervised, unsupervised, NLP, Generative AI, graph analytics), and building anomaly detection, predictive maintenance, and risk scoring models. The role also involves designing and maintaining scalable ETL pipelines with Spark, conducting large-scale EDA, and communicating complex findings to diverse audiences. Expertise in Python, Spark, SQL, and statistical modeling is required, with AWS experience being preferred.

Job Description

This role blends advanced Data Science, Machine Learning, and Generative AI with robust Data Engineering, particularly using Spark and AWS, to deliver scalable, production-grade analytical products. You will design end-to-end data solutions—from data ingestion to model deployment—while partnering with engineering and product teams to translate insights into actionable outcomes.

Roles & Responsibilities

Lead the full ML development lifecycle: problem framing, hypothesis formulation, feature engineering, model development, validation, deployment, and monitoring.
Develop, test, and optimize machine learning models including:
Supervised & unsupervised learning
Statistical modeling and forecasting
Natural Language Processing (NLP)
Generative AI techniques for automation and insight extraction
Graph/network analytics for analyzing network behaviors and relationships
Build advanced anomaly detection, predictive maintenance, and risk scoring models for network security and operational efficiency.
Conduct large-scale exploratory data analysis (EDA) to identify trends, data quality issues, and opportunities for automation.
Define and implement model evaluation and A/B testing strategies.
Collaborate with ML engineering teams to operationalize models using MLOps best practices.
Communicate complex analytical findings through clear narratives, visualizations, and presentations tailored to technical and non-technical audiences.

Data Engineering & ETL

Design, develop, and maintain scalable, fault-tolerant ETL pipelines using Spark to support analytics and machine learning workloads.
Implement monitoring, alerting, and automated recovery mechanisms to ensure data pipeline reliability.
Build robust feature pipelines that enable real-time and batch ML processing.
Integrate data from a wide range of sources: APIs, Flat files , Relational databases and Distributed file systems (HDFS/S3)
Support continuous integration and continuous delivery (CI/CD) workflows for data and ML components.

Required Qualifications

Strong communication, presentation skills, and ability to translate analytics into business value.
Expertise in programming languages commonly used in data science: Python (primary), Scala or Java (preferred for ETL/engineering).
Proven experience with Spark and large-scale distributed data processing.
Deep understanding of: Statistical modeling, Hypothesis testing, Experimental design and Causality and multicollinearity.
Strong SQL skills and experience with relational and NoSQL databases.
Expertise across a wide range of ML methodologies: Regression, classification, clustering, Time-series forecasting, Bayesian methods, NLP and text analytics and Graph analytics.
Experience with data preprocessing, feature engineering, and EDA.
Familiarity with data architectures such as data lakes, warehouses, and marts.
Demonstrated ability to continuously learn, adapt, and share knowledge.

Preferred Qualifications

Experience with AWS services (S3, EMR, Lambda, Glue, SageMaker).
Prior exposure to Generative AI, LLMs, prompt engineering, or building AI-driven automation systems.
Experience with Linux-based systems.
Background in text mining, document classification, or large-scale unstructured data processing.
Bachelor’s degree in Computer Science, Data Science, Statistics, Mathematics, Physics, Engineering, Operations Research, or a related field.
Master’s degree with 6+ years or Bachelor’s degree with 8+ years of relevant work experience.

Ready to apply?

You'll be redirected to VeriiPro's application page.

Is this role right for you?

Role summary

Similar roles