What you'll do
- Lead design and development of next-generation AWS platforms for AI/ML and HPC workloads.
- Collaborate across teams including SDEs, Hardware Engineers, TPMs, and Principals to deliver scalable, reliable server solutions.
- Own and proactively improve system reliability, testability, and diagnostics using a full stack from baremetal hardware to userland software.
- Drive continuous price-performance improvements for large-scale AI model training infrastructure.
- Operate within a fast-paced, growing team focused on launching hardware globally with direct impact on AWS business outcomes.
What you should know
- Ideal candidates are innovative self-starters with deep knowledge of hardware and software stack and strong debugging skills.
- The role offers ownership and leadership opportunities in designing complex, scalable systems impacting millions of customers.
- Applicants should be prepared for a highly collaborative environment working with diverse technical roles and global teams.
- Candidates will face challenges in solving undefined, complex architectural problems requiring tactical coding and system design.
- The position is onsite in Seattle and demands experience in Linux/Unix environments, systems development, and cloud-scale operations.
About the company
- Amazon Web Services (AWS) is the world’s largest and most comprehensive cloud platform, trusted by startups to Global 500 companies.
- AWS emphasizes a culture of innovation, inclusion, and continuous learning with employee-led affinity groups and diversity initiatives.
- The company values work-life harmony and provides flexibility to support employee success both at work and home.
- AWS Hardware Engineering focuses on industry-leading server designs that are frugal, operationally excellent, and critical to AWS success.
- AWS is a global, large-scale enterprise with teams distributed across Seattle, Cupertino, and Austin, supporting worldwide datacenters.
Key required skills
Proficiency in programming languages such as C++, Java, Python, or Golang with 3+ years experience4+ years of professional software development and systems development in IT or data center environmentsExperience with designing scalable, reliable systems including architecture, automation, and process improvementsStrong knowledge of Linux/Unix deployment and operationsAbility to lead complex software or infrastructure projects from design through production deployment