Data Engineer

San Francisco, California, United StatesHybridFull TimeEntry-level (exp-based)$122,000–$167,000 /yrPosted 2 months agoVisa sponsorship available

Is this role right for you?

Upload your resume and get a skill-by-skill breakdown — see exactly where you match, where you're close, and what to highlight. Not a mystery percentage.

Get a tailored resume highlighting what this role needs.

Role summary

Baselayer is seeking a Data Engineer to build and scale its data infrastructure, focusing on reliability, performance, and data quality. This hands-on, cross-functional role involves designing, building, and maintaining scalable data pipelines, owning data quality through monitoring and validation, and developing data models for analytics, reporting, and machine learning. The engineer will partner with Product and Engineering teams, optimize pipelines, and ensure data accessibility in a regulated environment. The ideal candidate has 1-3 years of experience in data engineering with strong Python and SQL skills, experience with ETL/ELT, data warehouses, and workflow orchestration tools.

About Baselayer
Trusted by 2,200+ financial institutions, Baselayer is the intelligent business identity platform that helps verify any business, automate KYB, and monitor real-time risk. Baselayer’s B2B risk solutions and identity graph network leverage state and federal government filings and proprietary data sources to prevent fraud, accelerate onboarding, and lower credit losses.

About the Role
We are looking for a Data Engineer to build and scale Baselayer’s data infrastructure. You will own the pipelines and data systems that power analytics, reporting, and machine learning across the company, with a focus on reliability, performance, and data quality.

This role is hands-on and highly cross-functional. You will work closely with Product and Engineering to ensure data is accessible, trusted, and delivered in a way that supports product capabilities in a regulated environment.

What You’ll Do

Design, build, and maintain scalable data pipelines that ingest, clean, validate, and transform data from internal systems and external sources
Own data reliability and quality through monitoring, alerting, lineage, and validation frameworks
Build and maintain data models and curated datasets that support analytics, dashboards, customer reporting, and downstream ML use cases
Partner with Engineering to define best practices for data architecture, storage, access controls, and performance
Implement orchestration and scheduling for batch and near-real-time workflows as needed
Optimize pipeline performance, cost, and scalability as data volumes grow
Develop and maintain documentation and runbooks for pipelines, datasets, and operational procedures
Identify data gaps and instrumentation needs, and work with engineering teams to improve event capture and logging

About You
You want to learn fast, take ownership, and do work that matters. You are not just doing this for the win. You are doing it because you have something to prove and want to be great.

You thrive in the details, care about correctness, and take pride in building robust systems that other teams can rely on. You operate with urgency, handle ambiguity well, and consistently raise the bar on data quality and reliability.

Required Experience and Skills

1 to 3 years of experience in data engineering, analytics engineering, or backend engineering with significant data pipeline ownership
Strong Python skills and experience building production-grade data workflows
Strong SQL skills with experience designing data models and transforming large datasets
Experience building and maintaining ETL or ELT pipelines and working with data warehouses or analytics databases
Familiarity with orchestration tools and workflow scheduling (for example Airflow, Dagster, Prefect, or similar)
Strong understanding of data quality, testing, observability, and operational best practices
Comfort working with large-scale datasets and troubleshooting performance issues
Ability to communicate clearly with technical and non-technical stakeholders

What Sets You Apart

Experience working with identity, fraud, risk, compliance, or other regulated datasets
Experience integrating with external data sources, APIs, and government or registry data
Familiarity with streaming or near-real-time data patterns
Highly feedback-oriented with a desire for continuous improvement

Work Location

Hybrid in SF, in office 3 days per week

Compensation and Benefits

Salary range of $122,000 to $167,000
Equity package
Unlimited vacation
Comprehensive health coverage
401(k) with company match

Ready to apply?

You'll be redirected to Baselayer's application page.

Is this role right for you?

Role summary

Similar roles