We're in beta · Starting with US & Canada · Shipping weekly — your feedback shapes RiseMe
The Walt Disney Company logo
The Walt Disney Company Verified
Entertainment, Media, Hospitality, Retail, Broadcasting

Staff Data Engineer (Audio/ML)

San Geronimo, California, United StatesHybridFull TimeStaff$170,500–$228,600 /yrPosted 1 month agoVisa sponsorship available

Compensation estimateAI

See base, equity, bonus, and total comp estimates for this role — free, no credit card.

Sign up to see compensation estimate

The Skywalker Sound Development Group is seeking an experienced Data Engineer with a focus on Audio/ML to specialize in the creation, management, and optimization of data pipelines to support cutting-edge AI/ML research. This is a critical role in preparing high-quality datasets for the training, retraining, and evaluation of machine learning models tailored to immersive and multichannel audio applications.
As a Data Engineer (Audio/ML), you will focus on developing robust pipelines for processing complex media datasets, enabling AI/ML researchers to build transformative solutions for speech processing, style transfer, and source separation. Your work will directly contribute to creating innovative soundtrack workflows for global media production.
*This role is considered Hybrid, which means the employee will work 2-3 days onsite at our Nicasio, CA office and occasionally from home.*
What You'll Do

  • Design, implement, and maintain scalable, automated data pipelines for the ingestion, preprocessing, and transformation of large-scale audio datasets.
  • Ensure pipelines support efficient model training and retraining workflows, enabling continuous improvement of AI/ML models.
  • Collaborate with AI/ML researchers to define data requirements and integrate feedback to improve data pipeline functionality.
  • Develop advanced preprocessing techniques for immersive and multichannel audio formats (e.g., Dolby Atmos, high-order ambisonics).
  • Automate data cleaning, normalization, and augmentation processes to prepare datasets for various model architectures, including foundational models and transformers.
  • Integrate external datasets and APIs while ensuring compliance with legal and ethical data usage standards.
  • Monitor and optimize pipeline performance to handle complex and dynamic data structures effectively.
  • Create tools and workflows for annotating, labeling, and curating datasets, including the use of active learning methods.
  • Perform exploratory data analysis to uncover trends, validate dataset quality, and identify data gaps.

What We’re Looking For

  • Master’s Degree with preference for PhD in Data Engineering/Science, Computer Science, Signal Processing, or a related field.
  • 8+years of experience in data engineering or data science with a focus on building pipelines for AI/ML applications.
  • Proficiency in Python, with expertise in data manipulation libraries such as Pandas, NumPy, and PyTorch’s data utilities.
  • Hands-on experience with audio processing libraries and tools (e.g., Librosa, FFmpeg, SoX) for handling complex audio formats.
  • Familiarity with scalable pipeline tools like GitLab, Apache Spark, Airflow, or Luigi, and experience with containerized workflows (Docker, Kubernetes).
  • Strong understanding of data pipeline requirements for model training, retraining, and evaluation in iterative research workflows.
  • Experience with immersive and multichannel audio formats.
  • Knowledge of cloud-based platforms and tools for storage and processing, such as AWS S3, Redshift, or Google BigQuery.
  • Strong problem-solving skills, with a proactive mindset for addressing evolving data challenges.

Preferred Qualifications

  • Experience integrating data pipelines with AI/ML workflows, including active learning and model retraining.
  • Familiarity with audio-specific datasets and metadata management strategies.
  • Knowledge of machine learning principles and how data quality impacts model performance.
  • Experience with distributed training pipelines and large-scale dataset processing.
  • Contributions to open-source projects or published research in the fields of data science or audio processing.
  • Experience with visualization tools (e.g., Tableau, Matplotlib) for quality assurance and exploratory data analysis.
  • Expertise in designing systems to support AI/ML model monitoring and retraining over time.

The hiring range for this position in Nicasio, CA is $170,500 to $228,600 per year. The base pay actually offered will take into account internal equity and also may vary depending on the candidate’s geographic region, job-related knowledge, skills, and experience among other factors. A bonus and/or long-term incentive units may be provided as part of the compensation package, in addition to the full range of medical, financial, and/or other benefits, dependent on the level and position offered.

Ready to apply?
You'll be redirected to The Walt Disney Company's application page.

Similar roles