AI Engineer – Clinical Data Science

United StatesOnsiteContractPosted 2 months agoVisa sponsorship available

Is this role right for you?

Upload your resume and get a skill-by-skill breakdown — see exactly where you match, where you're close, and what to highlight. Not a mystery percentage.

Get a tailored resume highlighting what this role needs.

Role summary

We are seeking an AI Engineer to join our Data Science team within a pharmaceutical organization. This hands-on role involves designing, developing, and deploying generative AI systems to automate clinical reporting, extract insights from documents, and enhance data-driven decision-making. Responsibilities include building LLM-powered tools, RAG pipelines, and multi-agent systems, with a strong emphasis on production code quality, software engineering best practices, and utilizing AI-assisted development tools. Experience with Python, LLM APIs, RAG systems, vector databases, and cloud platforms like GCP (Vertex AI) is required. Preferred qualifications include experience with agentic AI patterns, document processing, and LLM evaluation.

AI Engineer – Clinical Data Science

About the Role

We are looking for an AI Engineer to join our Data Science team, building AI-powered solutions for clinical data processing and analysis within a major pharmaceutical organization. You will design, develop and deploy generative AI systems that automate clinical reporting workflows, extract intelligence from documents, and accelerate data-driven decision making.

This is a hands-on engineering role — you’ll be writing production code, not just building prototypes.

Responsibilities

Generative AI & Automation:

Develop LLM-powered automation tools for clinical reporting and document generation workflows
Build AI-driven code generation pipelines and quality assessment frameworks
Design and implement human-in-the-loop review workflows with feedback loops to continuously improve output quality

Research & Evaluation:

Research and evaluate emerging AI methods, frameworks, and techniques for specific tasks — e.g. comparing fine-tuning vs zero-shot approaches, assessing new document extraction tools, or trialling new agentic frameworks
Prototype and benchmark new approaches before recommending adoption
Stay current with a rapidly evolving field and bring new ideas to the team

Agentic AI & Orchestration:

Design and build multi-agent systems for data workflows — agents that retrieve, generate, validate, and iterate autonomously
Implement agent orchestration using frameworks such as Google ADK, LangGraph, or LangChain
Deploy and manage agents on Google Vertex AI

Document Understanding & RAG:

Build document processing pipelines (PDFs, Word/DOCX) — extraction, parsing, table detection, structure recognition
Design and build RAG pipelines grounded in source documents
Process, extract and transform data from unstructured and semi-structured sources

Code Quality & Engineering Practices:

Write clean, well-tested, maintainable Python code following SOLID principles and recognised design patterns
Apply single responsibility, dependency inversion, and interface segregation in real codebases — not just theory
Write meaningful tests, and maintain high standards across the team
Refactor and improve existing code as part of normal development workflow

AI-Assisted Development:

Use AI coding tools (e.g. Gemini CLI, GitHub Copilot) as a core part of your development workflow
Critically review and validate AI-generated code — understanding what it produces, why, and when it’s wrong
Write effective prompts to direct AI tools toward correct, secure, well-structured output
Know when to use AI and when to write code manually — judgement over speed

Platform & Infrastructure:

Integrate and orchestrate LLM providers available through Google Vertex AI (Gemini, etc.)
Build internal tools and applications using Streamlit and FastAPI
Containerize and deploy services using Docker

Required Skills & Experience

MSc in Data Science, Computer Science, Bioinformatics, or related field (or equivalent practical experience)
Strong Python skills
Hands-on experience building RAG systems or LLM-powered applications (using LangChain, LlamaIndex, or similar frameworks)
Experience integrating LLM APIs (Google Gemini, OpenAI, or similar) — we work primarily through Google Vertex AI
Working knowledge of vector databases (ChromaDB, Weaviate, Qdrant, Pinecone, or similar)
Cloud platform experience (GCP preferred, especially Vertex AI)
Docker and containerized deployments
Strong software engineering fundamentals — SOLID principles, clean code practices, design patterns, testing, version control (Git), code review
Comfortable using AI-assisted development tools (e.g. Gemini CLI, GitHub Copilot) — and critically evaluating what they produce

Strongly Preferred

Experience with agentic AI patterns — multi-agent orchestration, tool use, autonomous workflows (LangGraph, Google ADK, or similar)
Document processing experience — extracting and parsing data from PDFs and Word/DOCX files programmatically
Understanding of LLM evaluation principles and output quality assessment (BLEU, ROUGE etc, code execution metrics, or similar)
Data science fundamentals — Pandas, NumPy, scikit-learn, statistical analysis, data visualization
Prompt engineering and optimisation techniques
Streamlit application development

Nice to Have

Domain Knowledge:

Clinical trials or pharmaceutical industry experience
Familiarity with clinical data standards
Awareness of regulatory and data privacy requirements in life sciences

Infrastructure & DevOps:

Terraform or infrastructure-as-code experience
CI/CD pipeline design (GitHub Actions or similar)

Knowledge Graphs:

Neo4j, Cypher query language
NetworkX for graph analytics
Graph-based RAG or knowledge extraction

AI/ML:

Experience with LLM-driven code generation
LLM fine-tuning experience (e.g. LoRA, PEFT, RLHF, Vertex AI model tuning, or similar approaches)
NLP and text processing (HuggingFace Transformers, Sentence-Transformers)
PyTorch or TensorFlow (for custom model work if needed)
Google ADK (Agent Development Kit) or Vertex AI Agent Builder
Model Context Protocol (MCP) for tool integration and interoperability

Other:

Frontend experience (React, TypeScript)
FastAPI or Flask REST API development
PostgreSQL or similar relational databases

What You’ll Work With

- Languages:
Python (primary), SQL, some TypeScript/R
- AI/ML:
LangChain, LlamaIndex, LangGraph, Google ADK, MCP, HuggingFace Transformers, Sentence-Transformers, Google Gemini (via Vertex AI)
- Document Processing:
PyMuPDF, python-docx, pdfplumber, OCR tools
- Data:
Pandas, NumPy, SciPy, scikit-learn, Plotly
- Databases:
Vector databases, graph databases, relational databases
- Infrastructure:
Docker, Google Cloud Platform (Vertex AI, GCS), Terraform, GitHub Actions
- Applications:
Streamlit, FastAPI, Flask
- Tools:
Python packaging, testing frameworks, linting, Git

About You

You care about code quality — not just making things work, but making them maintainable
You’re comfortable working across the full stack of an AI application, from data ingestion to user-facing tools
You can context-switch between multiple projects and work autonomously
You’re curious about the clinical/pharmaceutical domain and motivated to learn it
You see AI-assisted development as a force multiplier, not a replacement for engineering judgment
You’re a self-directed learner who researches new methods and tools, evaluates them critically, and knows when to adopt vs when to stick with what works

Ready to apply?

You'll be redirected to princeps technologies's application page.