AI/ML Engineer

San Francisco, California, United StatesRemoteFull Time$130,000–$170,000 /yrPosted 2 months agoHidden Gem · YC Startup

Is this role right for you?

Upload your resume and get a skill-by-skill breakdown — see exactly where you match, where you're close, and what to highlight. Not a mystery percentage.

Get a tailored resume highlighting what this role needs.

Role summary

DeepAware AI is seeking an AI/ML Engineer to develop and deploy machine learning models for their Data Center Infrastructure Management (DCIM) platform. The role involves working on reinforcement learning for workload scheduling, optimization algorithms for energy and cost savings, and anomaly detection for security and reliability. The engineer will be responsible for model development, implementation of detection pipelines, collaboration with data engineers, benchmarking, and contributing to system architecture and deployment strategies. Proficiency in Python and PyTorch/TensorFlow, along with experience in distributed training and production deployment, is required. Familiarity with energy systems, scheduling algorithms, or operations research is a plus.

**About the Role**\
DeepAware AI (YC S25) is building secure, efficient, and autonomous infrastructure for the AI era. As an AI/ML Engineer, you’ll design, build, and deploy machine learning models that power our next-generation Data Center Infrastructure Management (DCIM) platform. Your work will focus on **reinforcement learning** for intelligent workload scheduling, **optimization algorithms** for energy and cost savings, and **anomaly detection** to prevent downtime and enhance security.

You’ll be joining a fast-moving, technically ambitious team tackling some of the most complex real-world AI problems at the intersection of computing, energy, and robotics.

**Responsibilities**

* Develop and refine reinforcement learning models for GPU workload placement and power optimization
* Implement anomaly detection pipelines for real-time threat detection and failure alerts
* Collaborate with data engineers to ensure high-quality, production-ready datasets
* Benchmark models against industry baselines and integrate them into our production systems
* Contribute to overall architecture and deployment strategies for large-scale AI infrastructure

**Requirements**

* Strong background in machine learning; hands-on experience with reinforcement learning techniques
* Proficiency in Python and PyTorch or TensorFlow
* Experience with distributed training and deployment in production environments
* Familiarity with energy systems, scheduling algorithms, or operations research is a plus
* Ability to thrive in a startup environment — ownership mindset, adaptability, and collaborative spirit

**Nice-to-Have**

* Experience with NVIDIA CUDA/cuDNN, Triton Inference Server, or ROS2 for robotics integration
* Knowledge of data center operations or AI infrastructure optimization

**Location:**\
San Francisco Bay Area preferred; remote considered for exceptional candidates

**Why DeepAware?**\
You’ll be working on problems that directly impact the sustainability and reliability of the world’s AI infrastructure — with a team that values technical excellence, creativity, and impact.

Ready to apply?

You'll be redirected to DeepAware AI's application page.

Is this role right for you?

Role summary

Similar roles