What you'll do
- Lead design and development of high-performance accelerator servers for AI/ML workloads at cloud scale.
- Collaborate with a diverse cross-functional team including software, hardware, network engineers, and operations.
- Solve complex system-level problems involving hardware-software integration, reliability, and diagnostics.
- Drive testability, reliability, and scalability improvements throughout server conception, design, and operations.
- Own end-to-end delivery of innovative infrastructure solutions powering next-generation AI and HPC cloud services.
What you should know
- Ideal candidates are innovative self-starters with deep understanding of full technical stack from hardware to software.
- The role demands strong problem-solving and debugging skills in complex server and cloud environments.
- Applicants should be comfortable working onsite in Austin, TX with a collaborative and multidisciplinary team.
- Expect to engage in highly technical leadership involving system design, automation, and operational excellence.
- This position offers the chance to work on cutting-edge AI infrastructure impacting the future of cloud computing.
About the company
- Amazon Web Services (AWS) is a global leader in cloud computing with a broad and innovative product portfolio.
- The company values diversity and inclusion, fostering an environment that welcomes bold ideas and unique perspectives.
- AWS emphasizes work-life harmony and offers flexibility to support employee well-being.
- Strong focus on mentorship and continuous career growth through knowledge sharing and performance development.
- Amazon is a large-scale, high-impact technology company pioneering cloud infrastructure for AI and enterprise customers.
Key required skills
C++PythonLinuxSystems designHardware-software integration