Machine Learning Engineer (Staff Level)

San Jose, California, United StatesHybridFull TimeStaffPosted 2 months ago

Is this role right for you?

Upload your resume and get a skill-by-skill breakdown — see exactly where you match, where you're close, and what to highlight. Not a mystery percentage.

Get a tailored resume highlighting what this role needs.

Role summary

A leading healthcare Revenue Cycle Management (RCM) provider seeks a Staff-level Machine Learning Engineer to own training and serving ML models in production at scale. This hybrid role requires 3-5 years of experience in ML/AI engineering, focusing on delivering high-throughput, low-latency ML services with reliability and cost improvements. Key technical skills include deep learning frameworks (PyTorch, TensorFlow), distributed training techniques, inference optimization (quantization, pruning), scalable serving strategies, and data/storage solutions (SQL/NoSQL, vector stores). The role involves understanding the full ML lifecycle and writing performant code, with collaboration across Research, Platform/Infra, Data, and Product teams.

One of our clients who are a leading provider of
Revenue Cycle Management (RCM)
for the healthcare industry are looking to fill
"ML Engineer" (various levels - Senior, Lead & Staff)
who has experience owning training and/or serving in production at scale.

Hybrid role (3 days onsite either from San Jose, CA or Austin, TX)

Educational Qualifications:

Bachelor's in computer science, Electrical/Computer Engineering, or a related

field required; Master’s preferred (or equivalent industry experience).

Strong systems/ML engineering with exposure to distributed training and inference optimization.

Industry Experience:

3–5 years in ML/AI engineering roles owning training and/or serving in production at scale.
Demonstrated success delivering high-throughput, low-latency ML services with reliability and cost improvements.
Experience collaborating across Research, Platform/Infra, Data, and Product functions.

Technical Skills:

Familiarity with deep learning frameworks: PyTorch (primary), TensorFlow.

Exposure to large model training techniques (DDP, FSDP, ZeRO, pipeline/tensor

parallelism); distributed training experience a plus

Optimization: experience profiling and optimizing code execution and model

inference: (PTQ/QAT/AWQ/GPTQ), pruning, distillation, KV-cache optimization, Flash Attention

Scalable serving: autoscaling, load balancing, streaming, batching, caching;

collaboration with platform engineers.

Data & storage: SQL/NoSQL, vector stores (FAISS/Milvus/Pinecone/pgvector),

Parquet/Delta, object stores.

Write performant, maintainable code

Understanding of the full ML lifecycle: data collection, model training, deployment, inference, optimization, and evaluation.

Ready to apply?

You'll be redirected to Tykhe Inc's application page.