What you'll do
- Build, maintain, and evolve Braze's internal infrastructure as a service platform to support large-scale distributed systems.
- Collaborate with engineering teams to define and implement IaaS solutions that improve deployment speed and infrastructure reliability.
- Design and operate internal software frameworks for asynchronous and background processing at massive scale using technologies like Sidekiq.
- Manage incident response through PagerDuty rotations, focusing on prevention and continuous system improvement.
- Develop automation and tooling to reduce operational pain and improve workflow efficiency for engineering teams.
What you should know
- Candidates should be prepared to work in a fast-paced, high-impact environment with a bias toward action and autonomy.
- The role requires collaboration across global remote teams and strong documentation practices to avoid redundant work.
- Applicants will engage with large-scale API-driven systems and complex distributed processing challenges.
- The position includes on-call responsibilities for incident management and system reliability.
- Braze offers a comprehensive benefits package including equity, professional development, and family support.
About the company
- Braze is a leading customer engagement platform known for composable intelligence and AI-powered marketing technology.
- The company values kindness, teamwork, and work-life harmony while navigating rapid global growth.
- Braze is recognized as a Great Place to Work and has received multiple industry accolades for culture and technology.
- It operates at a massive scale with billions of monthly active users and global offices across multiple continents.
- Braze emphasizes equity, inclusion, and diversity, committing to fair and accessible recruiting and employee support.
Key required skills
Ruby on RailsSidekiqKafkaKubernetesDistributed systems