Founding Data Engineer (Healthcare)
Role summary
Clara is seeking a Founding Data Engineer to build the AI-first medical practice's data infrastructure. This role involves ingesting, normalizing, and structuring messy healthcare data (FHIR, HL7, CCDAs, PDFs) into clean formats for AI and clinical use. Responsibilities include designing data models, building AWS-based storage and retrieval systems, implementing data sanitization, and ensuring compliance with healthcare data standards. The ideal candidate has 5+ years of software engineering experience with significant healthcare data processing, strong Python and AWS skills, and experience with document extraction and PII handling. This is a critical role impacting patient health outcomes, with opportunities for leadership and architectural decision-making in San Francisco.
Clara is building the first AI-first medical practice from the ground up. We're reimagining primary care and providing patients with deep insights and immediate diagnosis and treatment in ways not previously possible.
Unlike most health AI startups, we deliver actual care — our clinical team writes prescriptions, orders labs, and manages patients. You'll build the data infrastructure that powers real medical decisions, not a demo.
Clara's founders have built and scaled a previous telemedicine company to over $100M in revenue and we're funded by top investors.
---
## What You'll Build
You'll own Clara's data foundation — the pipelines that ingest, normalize, and deliver patient health records to our AI and clinical team. Healthcare data is notoriously messy (FHIR bundles, CCDAs, scanned PDFs, lab feeds), and you'll turn that chaos into clean, structured data our AI can reason over.
You'll work directly with our clinical team — physicians and nurse practitioners who depend on what you build to treat real patients every day.
---
## Your Responsibilities
### Data Ingestion & ETL (50%)
* Build pipelines to ingest medical records from HIEs, EHRs, and patient uploads
* Process FHIR bundles, HL7 messages, and CCDAs into normalized data models
* Extract structured data from unstructured sources (PDFs, scanned documents, XML, HTML, JSON)
* Handle webhook processing for real-time data feeds
* Build robust error handling and data validation
### Data Infrastructure (30%)
* Design and maintain data models optimized for AI retrieval and clinical workflows
* Build efficient storage and retrieval systems on AWS (S3, DynamoDB)
* Implement data compression strategies for large medical records
* Create data sanitization and PII handling pipelines
* Monitor data quality and pipeline health
### Healthcare Data Standards (20%)
* Ensure compliance with FHIR/HL7 standards
* Map between different healthcare data formats
* Stay current on healthcare interoperability requirements
* Work with external data partners and APIs
---
## Tech Stack
* **Backend:** Python
* **Data Formats:** FHIR, HL7, CCDAs, PDF, XML, JSON
* **Cloud:** AWS
* **Database:** PostgreSQL
---
## What We're Looking For
### Required
* 5+ years software engineering with significant experience processing healthcare data
* Deep knowledge of FHIR bundles, HL7, and healthcare data standards
* Strong Python skills for data processing workflows
* Experience with document extraction from messy sources (PDF, XML, HTML, JSON)
* AWS experience (S3, DynamoDB)
* Understanding of data sanitization and PII handling in healthcare
* AI-native (uses Claude/Cursor daily)
### Ideal
* Built ETL pipelines at scale for healthcare organizations
* Experience with medical document parsing and text extraction
* Familiarity with HIE integrations and health data networks
* Built webhook-based real-time data systems
* Understanding of HIPAA technical requirements
---
## What Clara Offers
### Impact
* Ship code that affects real patient health outcomes
* Make architectural decisions for an AI-first medical practice
* Work directly with CTO on technical strategy
### Team & Culture
* High-performance founding team at the forefront of healthcare AI
* Work alongside clinicians — our medical team delivers real care (prescriptions, lab orders, referrals), so you'll see your data pipelines impact patient outcomes directly
* In-person collaboration in San Francisco (5 days/week)
* No bureaucracy, ship fast
### Growth
* First engineering hires → opportunity to become engineering managers as team scales
* Ownership of entire domains (data, AI, product, platform)
* Work with cutting-edge AI and healthcare technology
---
## What Your Day to Day Will Look Like
In your first months, you'll be building the data backbone that makes Clara's AI possible — integrating with health data networks to pull patient medical records in seconds, processing messy FHIR bundles and CCDAs into clean structured data, and building pipelines for lab results and clinical documents. You'll wrangle healthcare data in all its chaotic forms: scanned PDFs, XML feeds, JSON payloads, and webhook streams. Every pipeline you build feeds directly into our AI and clinical workflows, so you'll see the impact immediately when a doctor reviews a patient's complete medical history that your code assembled.
As Clara scales, your responsibilities will evolve. You'll have the opportunity to grow into engineering leadership, architect data systems for millions of patients, mentor new hires, and own increasingly complex integrations with health networks and wearables. The data challenges you solve today will look different from the scale problems you tackle in a year — and that's exactly what makes this a founding role.