What you'll do
- Lead the development and evolution of the observability platform including logging, metrics, and tracing infrastructure.
- Build and integrate AI-native capabilities for anomaly detection, failure diagnosis, and root cause analysis.
- Create developer tools such as dashboards, notebooks, and interactive debugging experiences.
- Drive reliability automation with intelligent alerting, diagnostics, and incident response systems.
- Collaborate across teams to embed observability and reliability best practices and mentor engineers.
What you should know
- The role requires 8+ years of experience in building and operating large-scale observability or monitoring infrastructure.
- Candidates should be comfortable working in ambiguous environments and solving unscoped problems end to end.
- Strong communication skills are essential to align technical and non-technical stakeholders effectively.
- Opportunity to influence and shape engineering standards and reliability culture across the organization.
- Hybrid onsite work is expected 2-3 days per week in the San Francisco Bay Area office.
About the company
- Gusto is a mission-driven company focused on empowering the small business economy with payroll, HR, and benefits solutions.
- The company supports over 200,000 small businesses nationwide with offices in Denver, San Francisco, and New York.
- Gusto emphasizes a collaborative and inclusive workplace culture with hybrid work expectations.
- They integrate AI tools as a fundamental part of their workflows and encourage employee fluency in AI.
- Gusto offers competitive compensation, benefits, and equity to all full-time employees, reflecting shared success.
Key required skills
RubyPythonTypeScriptAWSKubernetesTerraformDatadogDistributed systems