Back to Amazon Web Services (AWS)
Amazon Web Services (AWS) logo
Amazon Web Services (AWS)·Seattle, United States·onsite

Sr. System Development Engineer, High-Performance Accelerator Servers for AI/ML

$136,100 - $235,200 Posted 20 days ago
Apply on company site
Ruby optionalBackend

About the role

What you'll do

  • Lead design, development, and operation of next-generation infrastructure for AI/ML and HPC workloads at cloud scale.
  • Collaborate cross-functionally with software, hardware, network engineers, and operations teams to ensure high reliability and performance of AWS accelerator servers.
  • Decompose complex server system issues into manageable tasks and deliver solutions using a combination of hardware, software, and system design expertise.
  • Drive quality and reliability improvements throughout server conception, design, testing, launch, and operations phases.
  • Act as a technical leader with strong organizational and communication skills to influence and deliver scalable, performant software solutions.

What you should know

  • Ideal candidates should be innovative self-starters with deep knowledge across the full technical stack from hardware to userland software.
  • The role involves working in a fast-paced, collaborative environment with diverse teams across hardware engineering and cloud services.
  • Applicants must be comfortable with complex debugging and diagnosing issues in large-scale server systems.
  • This position offers opportunities to influence the future of Generative AI infrastructure and cloud computing at massive scale.
  • Candidates should be prepared to lead and deliver high-impact, reliable, and scalable solutions in production environments.

About the company

  • Amazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform, serving startups to Global 500 companies.
  • AWS fosters an inclusive culture that values diversity, curiosity, and employee-led affinity groups promoting equity and belonging.
  • The company emphasizes work-life harmony and flexibility to support employees’ success both professionally and personally.
  • AWS is committed to mentorship and continuous career growth, providing resources for knowledge sharing and professional development.
  • Amazon is a large, global technology leader known for pioneering cloud computing and continuous innovation in AI and infrastructure.

Key required skills

Systems development and operations experience in Linux/Unix environmentsProficiency in programming with modern languages such as C++, Java, Python, or GolangExperience in designing and architecting scalable and reliable systemsStrong knowledge of hardware/software integration and x86 architectureProven ability to lead complex software or infrastructure projects from design through deployment

Summary generated from the original posting.