Staff/Senior Staff AI Engineer

San Francisco, California, United StatesOnsiteFull TimeStaff$313,055–$450,000 /yrPosted 1 month agoVisa sponsorship available

Compensation estimateAI

See base, equity, bonus, and total comp estimates for this role — free, no credit card.

### Who you are
- Bachelor's in Computer Science, AI, Machine Learning, or related fields with at least 8 years of industry experience
- Strong hands-on experience across the full post-training pipeline for large models
- Deep familiarity with preference learning and alignment techniques, including DPO, GRPO, and RL-based post-training methodologies
- Proven experience designing domain-specific data strategies and training methodologies
- Experience training and post-training specialized small models from scratch
- Solid understanding of reinforcement learning fundamentals and their application to model alignment
- Experience deploying models in low-latency production environments using frameworks such as vLLM, SGLang, or similar

### What the job involves
- We are seeking a highly skilled and hands-on Machine Learning Engineer specializing in large model post-training and alignment. This role focuses on designing, executing, and optimizing post-training pipelines to improve model performance, controllability, domain adaptation, and reasoning capabilities
- You will work across the full lifecycle of post-training—from data strategy and reward modeling to reinforcement learning–based optimization and production-grade inference deployment
- Lead and execute the full post-training pipeline for large language models (LLMs), including supervised fine-tuning, preference optimization, and reinforcement learning–based methods
- Design and implement advanced training paradigms such as DPO (Direct Preference Optimization) and GRPO (Generalized Reward Policy Optimization)
- Develop domain-specific data recipes, curation strategies, and augmentation pipelines to optimize task performance
- Conduct post-training of specialized small models from scratch, including architecture selection, dataset construction, and optimization strategy
- Build and refine Reward Models to support alignment and downstream optimization
- Design and implement RLAIF (Reinforcement Learning from AI Feedback) closed-loop systems
- Optimize inference efficiency and deploy models using low-latency serving frameworks such as vLLM and SGLang
- Evaluate model performance using both automated benchmarks and human/AI feedback loops
- Collaborate with research and infrastructure teams to productionize training and deployment workflows

### Benefits
- Comprehensive insurance package including medical, dental, vision, disability & life insurance.
- Paid Parental Leave
- Employee Referral Bonus Program paid in BTC
- More surprises when you join!

Ready to apply?

You'll be redirected to OKX's application page.