What you'll do
- Build, maintain, and evolve Braze's internal infrastructure as a service (IaaS) platform to support large-scale distributed systems.
- Collaborate with engineering teams to define and implement infrastructure solutions that accelerate deployment and improve reliability.
- Develop and operate asynchronous and background processing frameworks handling trillions of jobs daily using technologies like Sidekiq.
- Manage incident response through PagerDuty rotations, focusing on prevention and continuous system improvements.
- Automate infrastructure operations to reduce toil and enhance scalability, observability, and operational safety.
What you should know
- Candidates should be prepared to work in a fast-paced, high-scale environment with a focus on automation and reliability.
- The role requires strong collaboration skills across global and often asynchronous teams.
- Applicants will have the opportunity to impact infrastructure that supports billions of users and messages daily.
- There is a significant emphasis on documentation, operational discipline, and proactive incident management.
- The role offers exposure to cutting-edge distributed systems, Kubernetes automation, and large-scale API-driven platforms.
About the company
- Braze is a leading customer engagement platform recognized for innovation in marketing technology and AI-powered personalization.
- The company fosters a collaborative, transparent, and inclusive culture with strong emphasis on work-life harmony and equity.
- Braze is a global company with offices worldwide and a workforce passionate about high standards and teamwork.
- It has received multiple accolades including Best Companies to Work For and Great Place to Work certifications internationally.
- Braze offers comprehensive benefits including equity grants, flexible PTO, family services, and professional development support.
Key required skills
Ruby on RailsSidekiqKafkaKubernetesDistributed systems