Senior AI Data Engineer

Menlo Park, California, United StatesOnsiteContractSeniorPosted 14 days agoVisa sponsorship available

Compensation estimateAI

See base, equity, bonus, and total comp estimates for this role — free, no credit card.

Job Title: Senior AI Data Engineer (Contract)

Location: Menlo Park, CA

Duration: 7 months (with potential for extension)

As a Senior AI Data Engineer, you will design and operate end‑to‑end pipelines that not only move and transform data, but enrich it using ML models such as classifiers, embedding models, and large language models. The role sits at the intersection of data engineering and ML systems, requiring strong systems thinking around throughput, retries, async execution, and capacity management.

You will work closely with engineers and researchers to support image generation and evaluation workflows, contributing directly to data quality, model performance, and scalability.

Required Skills & Experience

Strong data engineering expertise, including advanced SQL, complex query optimization, and production pipeline orchestration (e.g., Airflow or equivalent)

Hands‑on experience integrating ML inference into data pipelines, including:

Calling inference endpoints
Managing batching and throughput
Handling failures and retries at scale

Experience operating large-scale production pipelines with high reliability and performance requirements.
Proficiency using AI‑assisted coding tools to accelerate development, debugging, and code reviews.
Strong communication skills and ability to collaborate with engineers, researchers, and cross‑functional teams.

Preferred Qualifications

Experience working with embeddings and vector search, including storage, indexing, and similarity queries.
Familiarity with content understanding models, such as image classification, OCR, safety or quality scoring.
Experience using LLMs for data workflows, including automated annotation, data cleaning, or evaluation tasks.
Knowledge of generative AI systems, particularly image generation and corresponding evaluation metrics.
Background working in data engineering, ML engineering, or hybrid roles that support model training or evaluation.

Responsibilities

AI‑Augmented Data Pipelines: Design and maintain large‑scale data pipelines (up to billions of records/images) that combine SQL-based transformations with ML model inference for data cleaning, labeling, and enrichment.
Remote Inference Orchestration: Build and own systems that orchestrate remote model inference within pipelines, including batching, async execution, retries, fallback logic, and graceful degradation under load.
Feature & Embedding Pipelines: Develop scalable pipelines to generate, store, validate, and serve vector embeddings. Manage nearest‑neighbor indexes and ensure data quality at scale.
Data Curation at Scale: Source, filter, and curate training datasets using both structured queries and model‑derived signals (e.g., visual quality scores, content classification, safety filters). Own the end‑to‑end data lifecycle with a focus on quality, governance, and compliance.
LLM‑Assisted Annotation: Design pipelines that use large language models and vision models for automated data annotation. Create auditing workflows to evaluate and improve annotation quality.
Shared Tooling & Frameworks: Contribute reusable components and frameworks that simplify AI‑augmented data pipelines, such as standardized model‑invocation operators and async job orchestration patterns.

Ready to apply?

You'll be redirected to Intelliswift - An LTTS Company's application page.

Compensation estimateAI

Similar roles