AI/ML Engineer
Role summary
DeepAware AI is seeking an AI/ML Engineer to develop and deploy machine learning models for their Data Center Infrastructure Management (DCIM) platform. The role involves working on reinforcement learning for workload scheduling, optimization algorithms for energy and cost savings, and anomaly detection for security and reliability. The engineer will be responsible for model development, implementation of detection pipelines, collaboration with data engineers, benchmarking, and contributing to system architecture and deployment strategies. Proficiency in Python and PyTorch/TensorFlow, along with experience in distributed training and production deployment, is required. Familiarity with energy systems, scheduling algorithms, or operations research is a plus.
DeepAware AI (YC S25) is building secure, efficient, and autonomous infrastructure for the AI era. As an AI/ML Engineer, you’ll design, build, and deploy machine learning models that power our next-generation Data Center Infrastructure Management (DCIM) platform. Your work will focus on **reinforcement learning** for intelligent workload scheduling, **optimization algorithms** for energy and cost savings, and **anomaly detection** to prevent downtime and enhance security.
You’ll be joining a fast-moving, technically ambitious team tackling some of the most complex real-world AI problems at the intersection of computing, energy, and robotics.
**Responsibilities**
* Develop and refine reinforcement learning models for GPU workload placement and power optimization
* Implement anomaly detection pipelines for real-time threat detection and failure alerts
* Collaborate with data engineers to ensure high-quality, production-ready datasets
* Benchmark models against industry baselines and integrate them into our production systems
* Contribute to overall architecture and deployment strategies for large-scale AI infrastructure
**Requirements**
* Strong background in machine learning; hands-on experience with reinforcement learning techniques
* Proficiency in Python and PyTorch or TensorFlow
* Experience with distributed training and deployment in production environments
* Familiarity with energy systems, scheduling algorithms, or operations research is a plus
* Ability to thrive in a startup environment — ownership mindset, adaptability, and collaborative spirit
**Nice-to-Have**
* Experience with NVIDIA CUDA/cuDNN, Triton Inference Server, or ROS2 for robotics integration
* Knowledge of data center operations or AI infrastructure optimization
**Location:**\
San Francisco Bay Area preferred; remote considered for exceptional candidates
**Why DeepAware?**\
You’ll be working on problems that directly impact the sustainability and reliability of the world’s AI infrastructure — with a team that values technical excellence, creativity, and impact.
Similar roles
- Senior AI/ML EngineerModern Government Solutions · Point Mugu, California, United States · Onsite
AI/ML EngineerTechTrend, Inc. · Reston, Virginia, United States · Hybrid
AI/ML EngineerSignature IT World Inc · Austin, Texas, United States · Hybrid
Intermediate AI/ML EngineerSolink · Ottawa, Ontario, Canada · Hybrid- AI/ML EngineerJobgether · United States · Remote