What you'll do
- Lead design and development of next-generation AWS AI/ML server platforms with a focus on performance and scalability.
- Collaborate with cross-functional teams including SDEs, Hardware Engineers, TPMs, and managers across multiple locations.
- Own system architecture, proactively identify issues, and implement tactical solutions to ensure high reliability and testability.
- Drive improvements impacting AWS’s bottom line by delivering complex, scalable, and performant software solutions in production.
- Engage in full-stack systems development spanning baremetal hardware to userland software within cloud-scale environments.
What you should know
- This role offers ownership and impact on high-scale cloud infrastructure supporting AI/ML workloads globally.
- Candidates should be prepared for a fast-paced, collaborative environment involving multiple engineering disciplines.
- Strong problem-solving skills are essential to address complex architectural challenges in hardware-software integration.
- Applicants with diverse or non-traditional backgrounds are encouraged to apply, reflecting AWS’s inclusive hiring philosophy.
- The position is onsite in Austin, TX, requiring hands-on engagement with hardware and software teams across locations.
About the company
- AWS is a global leader in cloud computing, pioneering innovations trusted by startups and Global 500 companies alike.
- The Hardware Engineering team emphasizes operational excellence and frugality in server design critical to AWS success.
- AWS fosters an inclusive culture with employee-led affinity groups and ongoing diversity and learning initiatives.
- The company values work-life harmony and offers flexibility to support employee well-being and productivity.
- AWS is committed to mentorship, career growth, and continuous learning, aiming to be Earth’s Best Employer.
Key required skills
Proficiency in programming with modern languages such as C++, Python, Java, or Golang.Experience in systems design, architecture, and deployment of scalable, reliable software solutions.Strong background in Linux/Unix environments and data center operations.Ability to lead and deliver complex software/hardware infrastructure projects in production.Knowledge of hardware-software interaction, debugging, and system reliability practices.