Staff Data Engineer (Audio/ML)
Compensation estimateAI
See base, equity, bonus, and total comp estimates for this role — free, no credit card.
Sign up to see compensation estimateThe Skywalker Sound Development Group is seeking an experienced Data Engineer with a focus on Audio/ML to specialize in the creation, management, and optimization of data pipelines to support cutting-edge AI/ML research. This is a critical role in preparing high-quality datasets for the training, retraining, and evaluation of machine learning models tailored to immersive and multichannel audio applications.
As a Data Engineer (Audio/ML), you will focus on developing robust pipelines for processing complex media datasets, enabling AI/ML researchers to build transformative solutions for speech processing, style transfer, and source separation. Your work will directly contribute to creating innovative soundtrack workflows for global media production.
*This role is considered Hybrid, which means the employee will work 2-3 days onsite at our Nicasio, CA office and occasionally from home.*
What You'll Do
- Design, implement, and maintain scalable, automated data pipelines for the ingestion, preprocessing, and transformation of large-scale audio datasets.
- Ensure pipelines support efficient model training and retraining workflows, enabling continuous improvement of AI/ML models.
- Collaborate with AI/ML researchers to define data requirements and integrate feedback to improve data pipeline functionality.
- Develop advanced preprocessing techniques for immersive and multichannel audio formats (e.g., Dolby Atmos, high-order ambisonics).
- Automate data cleaning, normalization, and augmentation processes to prepare datasets for various model architectures, including foundational models and transformers.
- Integrate external datasets and APIs while ensuring compliance with legal and ethical data usage standards.
- Monitor and optimize pipeline performance to handle complex and dynamic data structures effectively.
- Create tools and workflows for annotating, labeling, and curating datasets, including the use of active learning methods.
- Perform exploratory data analysis to uncover trends, validate dataset quality, and identify data gaps.
What We’re Looking For
- Master’s Degree with preference for PhD in Data Engineering/Science, Computer Science, Signal Processing, or a related field.
- 8+years of experience in data engineering or data science with a focus on building pipelines for AI/ML applications.
- Proficiency in Python, with expertise in data manipulation libraries such as Pandas, NumPy, and PyTorch’s data utilities.
- Hands-on experience with audio processing libraries and tools (e.g., Librosa, FFmpeg, SoX) for handling complex audio formats.
- Familiarity with scalable pipeline tools like GitLab, Apache Spark, Airflow, or Luigi, and experience with containerized workflows (Docker, Kubernetes).
- Strong understanding of data pipeline requirements for model training, retraining, and evaluation in iterative research workflows.
- Experience with immersive and multichannel audio formats.
- Knowledge of cloud-based platforms and tools for storage and processing, such as AWS S3, Redshift, or Google BigQuery.
- Strong problem-solving skills, with a proactive mindset for addressing evolving data challenges.
Preferred Qualifications
- Experience integrating data pipelines with AI/ML workflows, including active learning and model retraining.
- Familiarity with audio-specific datasets and metadata management strategies.
- Knowledge of machine learning principles and how data quality impacts model performance.
- Experience with distributed training pipelines and large-scale dataset processing.
- Contributions to open-source projects or published research in the fields of data science or audio processing.
- Experience with visualization tools (e.g., Tableau, Matplotlib) for quality assurance and exploratory data analysis.
- Expertise in designing systems to support AI/ML model monitoring and retraining over time.
The hiring range for this position in Nicasio, CA is $170,500 to $228,600 per year. The base pay actually offered will take into account internal equity and also may vary depending on the candidate’s geographic region, job-related knowledge, skills, and experience among other factors. A bonus and/or long-term incentive units may be provided as part of the compensation package, in addition to the full range of medical, financial, and/or other benefits, dependent on the level and position offered.
Similar roles
- Staff Data Engineer (Audio/ML)The Walt Disney Company · San Geronimo, California, United States · Hybrid
- Staff Data Engineer (Audio/ML)The Walt Disney Company · San Geronimo, California, United States · Hybrid
- Staff Data Engineer (Audio/ML)The Walt Disney Company · San Geronimo, California, United States · Hybrid
- Staff Data Engineer (Audio/ML)The Walt Disney Company · San Geronimo, California, United States · Hybrid
- Staff Data Engineer (Audio/ML)The Walt Disney Company · San Geronimo, California, United States · Hybrid