What you'll do
- Lead design and development of next-generation AWS platforms for AI/ML and HPC workloads at cloud scale.
- Own and proactively improve server system reliability, testability, and diagnostics using hardware and software expertise.
- Collaborate cross-functionally with SDEs, hardware engineers, TPMs, and managers across multiple AWS teams globally.
- Drive complex architectural problem solving and deliver scalable, performant software solutions in production.
- Operate in a fast-paced, innovative environment with ownership of implementation and direct impact on AWS products.
What you should know
- Ideal candidates are innovative self-starters with deep knowledge across hardware, software, and system layers.
- The role demands strong systems debugging skills and ability to develop tactical solutions before customer impact.
- Applicants should be comfortable working in a highly collaborative, fast-paced, and technically challenging environment.
- This position offers ownership and visibility into impactful projects shaping the future of cloud AI infrastructure.
- Candidates should expect to engage with complex server designs and cloud-scale operational challenges.
About the company
- Amazon Web Services is the world’s largest and most broadly adopted cloud platform, pioneering cloud computing innovation.
- AWS values an inclusive culture that fosters diversity, employee-led affinity groups, and continuous learning.
- The company emphasizes work-life harmony and flexibility to support employee success both professionally and personally.
- AWS is a large, global enterprise with teams distributed across Seattle, Cupertino, Austin, and worldwide data centers.
- Amazon promotes mentorship and career growth, encouraging diverse experiences and non-traditional career paths.
Key required skills
C++PythonJavaLinuxSystems design