Google logo
Google Verified
Internet Services, Software, AI, Cloud Computing, Hardware

Staff Software Engineer, Infrastructure

California, United StatesOnsiteFull TimeStaff$207,000–$300,000 /yrPosted 2 months agoVisa sponsorship available

Is this role right for you?

Upload your resume and get a skill-by-skill breakdown — see exactly where you match, where you're close, and what to highlight. Not a mystery percentage.

Get a tailored resume highlighting what this role needs.

Role summary

Google is seeking a Staff Software Engineer, Infrastructure to develop next-generation technologies for massive-scale information handling. This role focuses on building software for data-driven planning and provisioning capabilities to optimize ML accelerator fleet utilization and resource allocation. The engineer will collaborate with cross-functional teams to design and implement solutions that scale operations and manage the fleet of ML accelerators, contributing to Google Cloud's enterprise-grade solutions. The position requires a strong background in software development, distributed systems, and architecture, with preferred experience in ML infrastructure and technical leadership.

### Minimum qualifications:

  • Bachelor's degree or equivalent practical experience.
  • 8 years of experience testing, and launching software products.
  • 5 years of experience building and developing large-scale infrastructure, distributed systems or networks, or experience with compute technologies, storage, or hardware architecture.
  • 3 years of experience with software design and architecture.

### Preferred qualifications:

  • Master’s degree or PhD in Engineering, Computer Science, or a related technical field.
  • 8 years of experience with data structures and algorithms.
  • 3 years of experience in a technical leadership role leading project teams and setting technical direction.
  • 3 years of experience working in a complex, matrixed organization involving cross-functional, or cross-business projects.
  • Experience with Machine Learning Infrastructure (accelerators, frameworks, etc.), or with capacity planning.

## About the job

Google's software engineers develop the next-generation technologies that change how billions of users connect, explore, and interact with information and one another. Our products need to handle information at massive scale, and extend well beyond web search. We're looking for engineers who bring fresh ideas from all areas, including information retrieval, distributed computing, large-scale system design, networking and data storage, security, artificial intelligence, natural language processing, UI design and mobile; the list goes on and is growing every day. As a software engineer, you will work on a specific project critical to Google’s needs with opportunities to switch teams and projects as you and our fast-paced business grow and evolve. We need our engineers to be versatile, display leadership qualities and be enthusiastic to take on new problems across the full-stack as we continue to push technology forward.

Google is an AI first company. Every day thousands of Machine Learning (ML) practitioners train thousands of models so that ML can be used in products across the company generating hundreds of millions of queries per second.

In this role, you will build software for data-driven planning and provisioning capabilities that drive up utilization, unlock resources for reuse, enable agile rebalancing of ML resource allocations in response to business and strategic needs, and scale the management of the rapidly growing fleet of ML accelerator resources.

Google Cloud accelerates every organization’s ability to digitally transform its business and industry. We deliver enterprise-grade solutions that leverage Google’s cutting-edge technology, and tools that help developers build more sustainably. Customers in more than 200 countries and territories turn to Google Cloud as their trusted partner to enable growth and solve their most critical business problems.

The US base salary range for this full-time position is $207,000-$300,000 + bonus + equity + benefits. Our salary ranges are determined by role, level, and location. Within the range, individual pay is determined by work location and additional factors, including job-related skills, experience, and relevant education or training. Your recruiter can share more about the specific salary range for your preferred location during the hiring process.

Please note that the compensation details listed in US role postings reflect the base salary only, and do not include bonus, equity, or benefits. Learn more about benefits at Google.

## Responsibilities

  • Build software to increase transparency and accountability to relevant owners for sources of inefficiency across the ML accelerator fleet and improve the planning and provisioning of the ML fleet to scale more efficiently.
  • Collaborate with cross-functional teams, where you will scope, design, implement, and interact with stakeholders across SWE, SRE, and PARM organizations.
  • Deliver solutions that maximize the resources available to PAs, enable prompt and efficient resource allocations to changing compute demands, operationalize new capacity and consumption models.
  • Scale the operation and manage the fleet of ML accelerators.

Google is proud to be an equal opportunity workplace and is an affirmative action employer. We are committed to equal employment opportunity regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity or Veteran status. We also consider qualified applicants regardless of criminal histories, consistent with legal requirements. See also Google's EEO Policy and EEO is the Law. If you have a disability or special need that requires accommodation, please let us know by completing our Accommodations for Applicants form.

Ready to apply?
You'll be redirected to Google's application page.

Similar roles