Amazon Web Services logo
Amazon Web Services Verified
Cloud Computing, Information Technology, Software

EFA Network Sr. Software Engineer, EFA ML Software Team

Washington, United StatesHybridFull TimeSenior$168,100–$227,400 /yrPosted 2 months agoVisa sponsorship available

Is this role right for you?

Upload your resume and get a skill-by-skill breakdown — see exactly where you match, where you're close, and what to highlight. Not a mystery percentage.

Get a tailored resume highlighting what this role needs.

Role summary

This role is for a Senior Software Engineer on the EFA ML Software Team at AWS, focusing on the user-space software for the Elastic Fabric Adapter (EFA) network card. The engineer will write high-performance C code for open-source projects like Libfabric and Open MPI, invent new networking APIs, and work with ML/HPC customers. Key responsibilities include technical leadership, system design, full software development lifecycle management, and providing expert support to AI companies. The role requires a strong focus on performance, reliability, and scaling for clustered workloads.

Description
Want to help make the next generation of Machine Learning in the cloud possible? Do you have a laser focus on performance in your team's code? We want to talk to you!
We own the user-space software that makes the Elastic Fabric Adapter (EFA) network card work for Machine Learning (ML) and High-Performance Computing (HPC) customers on AWS. Across multiple projects written in C, our team enables customers to network thousands of GPU and CPU instance types to handle the toughest clustered workloads. Lead a dynamic, fast-paced group that has a big impact every day on the hottest companies doing AI and HPC today.
Key job responsibilities
You will help lead a team of obsessed networking developers operating at the highest levels in networking. You will write the highest-performing code in C for multiple open source projects supporting EFA, such as Libfabric and Open MPI. You will work with multiple teams in the stack to invent new APIs for the latest concepts in networking in the cloud. Dive deep into how your customers are doing collectives and messaging at high bandwidth and low latency. Provide expert-level support to some of the biggest names in AI in the world.
A day in the life
Start from the needs of your customer and invent new ways of cutting the occupancy of the software stack for EFA. Drive your peers and leadership to accept your excellent written designs. Work with our ML Infrastructure team to see your products perform on 100s and 1000s of top-end machine clusters.
About The Team
We are a fast-paced team that owns the user-space software stack for EFA. As part of Annapurna Labs in AWS we are very nimble, paying careful attention to what the AI industry is going to try next, and having our products ready. We focus heavily on automation, confining operations to the most interesting problems as customers continuously experiment with what our network can do. Our team is a place of growth, concentrating on your career and goals and motivating you to achieve your highest potential.
Basic Qualifications

  • 5+ years of non-internship professional software development experience
  • 5+ years of leading design or architecture (design patterns, reliability and scaling) of new and existing systems experience
  • 5+ years of full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations experience
  • Experience as a mentor, tech lead or leading an engineering team
  • Min 5+ years programming in C at a low level.

Preferred Qualifications

  • Bachelor's degree in computer science or equivalent

Amazon is an equal opportunity employer and does not discriminate on the basis of protected veteran status, disability, or other legally protected status.
Our inclusive culture empowers Amazonians to deliver the best results for our customers. If you have a disability and need a workplace accommodation or adjustment during the application and hiring process, including support for the interview or onboarding process, please visit https://amazon.jobs/content/en/how-we-hire/accommodations for more information. If the country/region you’re applying in isn’t listed, please contact your Recruiting Partner.
The base salary range for this position is listed below. Your Amazon package will include sign-on payments and restricted stock units (RSUs). Final compensation will be determined based on factors including experience, qualifications, and location. Amazon also offers comprehensive benefits including health insurance (medical, dental, vision, prescription, Basic Life & AD&D insurance and option for Supplemental life plans, EAP, Mental Health Support, Medical Advice Line, Flexible Spending Accounts, Adoption and Surrogacy Reimbursement coverage), 401(k) matching, paid time off, and parental leave. Learn more about our benefits at https://amazon.jobs/en/benefits.
USA, WA, Seattle - 168,100.00 - 227,400.00 USD annually
Company
- Amazon Development Center U.S., Inc.
Job ID: A3113957

Sample Amazon Web Services interview questions

  • 1

    Outline the architecture for a distributed ML system that ensures reproducibility and version control of models and data.

    system designmedium
  • 2

    Design a music streaming service like Spotify.

    system designmedium
  • 3

    What is a DDoS attack and how does AWS protect customers from it (WAF, Shield)?

    technicalmedium
  • 4

    Before obtaining an IP, what source address does a client use in a DHCP request, and what are the four DHCP packet types?

    technicalmedium
  • 5

    How do you troubleshoot frequent application crashes on a Mac?

    technicalmedium

Sign up for a personalized interview prep pack tailored to this role.

Ready to apply?
You'll be redirected to Amazon Web Services's application page.