What you'll do
- Lead design and development of next-generation AWS AI/ML server platforms with a focus on scalability and performance.
- Collaborate cross-functionally with SDEs, hardware engineers, TPMs, and managers across multiple AWS teams and locations.
- Own system reliability by proactively identifying issues, writing tactical code, and scaling solutions for server systems.
- Solve complex architectural problems involving hardware, software, x86 architecture, and system diagnostics.
- Drive continuous improvements impacting AWS bottom line and customer experience in cloud AI training and inference.
What you should know
- Role requires being an innovative self-starter with strong organizational and communication skills.
- Candidates should be comfortable working in a fast-paced, growing, and collaborative environment with global teams.
- Opportunity to have direct ownership and visible impact on AWS cloud infrastructure and product improvements.
- Applicants should expect to solve undefined, complex system problems spanning hardware and software layers.
- Experience with Agile methodologies and a passion for continuous learning and high standards are valued.
About the company
- Amazon Web Services (AWS) is the world’s largest and most comprehensive cloud platform, trusted by startups and Global 500 companies.
- AWS values innovation and operational excellence, pioneering cloud computing with continuous new service releases.
- The Hardware Engineering team focuses on frugal, high-quality server designs critical to AWS business success.
- AWS fosters an inclusive culture with employee-led affinity groups and events promoting diversity and belonging.
- The company supports work-life harmony and offers extensive mentorship and career growth resources.
Key required skills
Strong programming skills in C++, Python, Java, or similar modern languagesDeep knowledge of systems engineering fundamentals including networking, storage, and operating systemsExperience with designing and architecting scalable, reliable systemsHands-on experience with server hardware and system diagnosticsFamiliarity with PowerShell, Agile Scrum methodology preferred