Dream3D logo
Dream3D Verified
Manufacturing, 3D Printing, E-commerce

Member of the Technical Staff - Image / Video Data Engineer

New York, New York, United StatesRemoteFull TimeStaff$110,000–$150,000 /yrPosted 2 months agoHidden Gem · YC Startup

Is this role right for you?

Upload your resume and get a skill-by-skill breakdown — see exactly where you match, where you're close, and what to highlight. Not a mystery percentage.

Get a tailored resume highlighting what this role needs.

Role summary

We are seeking an experienced engineer to lead large-scale data processing efforts, focusing on designing, building, and maintaining distributed systems for terabytes of image and video data used in training generative models. Responsibilities include implementing data pipelines, managing Kubernetes and Ray for distributed computing, deploying ML models for data preparation, and ensuring data quality and annotation for ML training readiness.

### About the Role

We are looking for an experienced engineer to lead our large-scale data processing efforts. In this role, you will be responsible for designing, building, and maintaining robust distributed systems that process terabytes of image and video data used to train state-of-the art generative models.

### Key Responsibilities

* Design, implement, and optimize complex data processing pipelines responsible for ingesting and transforming large media datasets.
* Manage containerized applications on Kubernetes; deploy and scale distributed systems leveraging Ray to process tasks and orchestrate compute workloads.
* Implement and deploy state-of-the-art ML models for data cleaning, processing, and preparation
* Ensure data quality, diversity, and proper annotation (including captioning) for training readiness
* Work closely in the model development loop to update data as necessitated by the training trajectory

### Ideal Experiences

* Deep understanding of Python and various file systems for data intensive manipulation and analysis
* Demonstrable experience deploying, managing, and scaling containerized applications on Kubernetes clusters.
* Hands-on experience with distributed computing engines such as Ray, including task scheduling, fault tolerance, and resource management.
* Experience with image and video processing libraries (e.g., OpenCV, FFmpeg)
* Experience working with large image/video datasets, including efficient data handling, transformation, and feature extraction.
* Familiarity with data annotation and captioning processes for ML training datasets
Ready to apply?
You'll be redirected to Dream3D's application page.