Data Automation Engineer

Washington, District of Columbia, United StatesRemoteFull TimePosted 2 days ago

Is this role right for you?

Upload your resume and get a skill-by-skill breakdown — see exactly where you match, where you're close, and what to highlight. Not a mystery percentage.

Get a tailored resume highlighting what this role needs.

Role summary

Seeking a Data Automation Engineer to design and implement AI-driven automation solutions on AWS and Azure. Responsibilities include building scalable data pipelines, integrating cloud services, enterprise tools, and Generative AI for analytics, reporting, and customer engagement. Key tasks involve developing ETL/ELT processes, leveraging GenAI for data quality and LLM-assisted transformations, implementing SQL optimizations, applying CI/CD best practices, and ensuring security and compliance. Experience with data engineering tools, cloud platforms, and GenAI frameworks is required, along with strong troubleshooting skills and the ability to obtain a Public Trust clearance.

Job Description
Data Automation Engineer
Location:
Washington, DC

Remote (fully remote with potential quarterly travel to Gaithersburg, MD / Washington D.C. metro area)

Clearance:
Public Trust (or willingness to obtain; must be a U.S. Citizen)
Note:
NOT OPEN TO C2C OR W2 REFERRALS AT THIS TIME
Job Description
Seeking a Data Automation Engineer to design and implement innovative, AI-driven automation solutions across AWS and Azure hybrid environments. Responsible for building intelligent, scalable data pipelines and automations integrating cloud services, enterprise tools, and Generative AI for mission-critical analytics, reporting, and customer engagement platforms.
Key Responsibilities

Design and maintain data pipelines in AWS using S3, RDS/SQL Server, Glue, Lambda, EMR, DynamoDB, and Step Functions
Develop ETL/ELT processes between DynamoDB, SQL Server (AWS), and AWS ↔ Azure SQL systems
Integrate AWS Connect CRM data into enterprise data pipelines for analytics and reporting
Engineering ingestion pipelines with Apache Spark, Flume, Kafka for real-time/batch processing into Apache Solr, AWS OpenSearch
Leverage Generative AI services (AWS Bedrock, Amazon Q, Azure OpenAI, Hugging Face, LangChain) for:

Vector generation and embeddings from unstructured data
Automated data quality checks, metadata tagging, and lineage tracking
LLM-assisted transformation and anomaly detection in ETL
Conversational BI interfaces for natural language access to Solr and SQL data
AI-powered copilots for pipeline monitoring and troubleshooting

Implement SQL Server stored procedures, indexing, query optimization, and performance tuning
Apply CI/CD best practices using GitHub, Jenkins, or Azure DevOps
Ensure security and compliance via IAM, KMS encryption, VPC isolation, RBAC, firewalls
Support Agile DevOps processes with sprint-based delivery

Required Qualifications

BS in Computer Science or related field with 2+ years data engineering/automation experience
Hands-on experience with SQL, SSIS, Python, Spark, Bash, PowerShell, AWS/Azure CLIs
Experience with AWS services (S3, RDS/SQL Server, Glue, Lambda, EMR, DynamoDB)
Familiarity with Apache Flume, Kafka, Solr for large-scale data ingestion and search
Familiarity with LLM/Gen AI frameworks (AWS Bedrock, Azure OpenAI, or open-source platforms/tools)
Experience integrating REST API calls in data pipelines and workflows
Familiarity with JIRA, GitHub / Azure DevOps / Jenkins for SDLC and CI/CD automation
Strong troubleshooting and performance optimization skills in SQL, Spark or other data engineering solutions
Experience operationalizing Generative AI (GenAI Ops) pipelines, including model deployment, monitoring, retraining, and lifecycle management
Good communication and presentation skills
Ability to obtain Federal government Public Trust clearance

Preferred Qualifications (Plus)

Certifications: AWS Data Engineer, AWS AI/ML Specialty, Azure AI Engineer, Databricks Certified Data Engineer
Experience implementing RAG pipelines, embeddings, and vector search with Solr, OpenSearch, FAISS, Pinecone, or Pgvector/SQL Server vector types
Experience with GenAI-powered coding tools (Claude Code, OpenAI Codex, VS Code)
Experience with multi-cloud data integration (AWS ↔ Azure SQL)
Familiarity with Microsoft BizTalk and SSIS for SQL Server ETL workflows
Knowledge of data lineage/governance tools (Purview, Unity Catalog, AWS Glue Catalog)
Familiarity with Infrastructure-as-Code (Terraform/CloudFormation, Bicep) for automated deployments
Experience with compliance frameworks (FedRAMP, PCI-DSS, HIPAA)

Ready to apply?

You'll be redirected to ChatGPT Jobs's application page.

Is this role right for you?

Role summary

Similar roles