What you'll do
- Lead design and development of next-generation AWS platforms for AI/ML and HPC workloads with a focus on server hardware and software integration.
- Collaborate cross-functionally with engineers, TPMs, and managers across multiple locations to deliver high-quality, scalable, and reliable server solutions.
- Own system architecture, proactively identify deficiencies, and implement tactical solutions to improve testability, reliability, and performance.
- Work hands-on with a full technical stack from baremetal hardware to userland software, including x86 architecture and cloud-scale systems.
- Drive continuous improvements impacting AWS’s bottom line and customer experience through innovation and operational excellence.
What you should know
- This role offers ownership and direct impact on AWS’s AI/ML infrastructure and cloud service performance.
- Candidates should be prepared for a fast-paced, collaborative environment involving cross-disciplinary teams and global coordination.
- Strong problem-solving skills are essential to address complex, undefined architectural challenges.
- Applicants will benefit from a background in both software and hardware systems development within data center environments.
- AWS encourages candidates from diverse and non-traditional backgrounds to apply, emphasizing growth and mentorship.
About the company
- Amazon Web Services (AWS) is the world’s largest and most comprehensive cloud platform, trusted by startups to Global 500 companies.
- AWS values a culture of inclusion, curiosity, and continuous learning, supported by employee-led affinity groups and diversity initiatives.
- The company emphasizes work-life harmony and flexibility to support employee well-being and productivity.
- AWS is a pioneer in cloud computing, continuously innovating with new services and infrastructure to maintain industry leadership.
- The Hardware Engineering team operates globally with a focus on frugality, operational excellence, and cutting-edge technology.
Key required skills
Proficiency in programming languages such as C++, Python, Java, or GolangExperience with systems design, architecture, and reliability engineering in IT or data center environmentsStrong knowledge of Linux/Unix deployment and operationsAbility to lead design and deployment of complex, scalable, and performant software solutionsFamiliarity with hardware-software integration, x86 architecture, and cloud-scale systems