What you'll do
- Build, maintain, and evolve Braze's internal infrastructure as a service platform to support engineering teams.
- Develop and operate large-scale distributed processing frameworks handling trillions of jobs daily using technologies like Sidekiq.
- Collaborate with multiple engineering teams to define and implement enterprise-grade IaaS solutions with strong SLAs.
- Manage incident response through on-call rotations and continuously improve system reliability and automation.
- Improve observability, debuggability, and operational ergonomics for large-scale asynchronous and background processing systems.
What you should know
- Candidates should be prepared to work onsite in San Francisco and collaborate across global, often asynchronous teams.
- The role demands a high bar for autonomy, accountability, and rapid delivery in a fast-growing, high-scale environment.
- Applicants will engage in complex distributed systems challenges and have opportunities to impact infrastructure reliability at scale.
- Expect to participate in incident management and on-call rotations to maintain platform availability and improve systems.
- Braze offers a comprehensive benefits package including equity, flexible PTO, family services, and a supportive community culture.
About the company
- Braze is a leading customer engagement platform recognized for innovation in marketing technology and AI-powered personalization.
- The company values a collaborative, kind, and passionate culture with a strong emphasis on work-life harmony and equity.
- Braze is a global company with offices worldwide and a diverse, inclusive workforce certified as a Great Place to Work in multiple countries.
- They prioritize continuous learning and professional development with formal career pathing and learning stipends.
- Braze has received multiple accolades including Best Companies to Work For and industry leadership awards.
Key required skills
Ruby on RailsSidekiqKafkaKubernetesDistributed systems