Build and own the agent execution layer including browser and computer automation for multi-step task orchestration.
Design and implement a playbook system to structure tasks and enable agents to learn from human corrections.
Develop pipelines for agents to learn from user task demonstrations and generate reusable workflows.
Construct robust evaluation frameworks to measure agent reliability, cost efficiency, and consistency.
Collaborate with domain experts and integrate with external tools across multiple business domains.

Opportunity to work on cutting-edge AI agent systems impacting multiple business functions.
Requires experience with LLM orchestration frameworks and full stack development in Python.
Role involves complex problem solving in automation and AI system reliability evaluation.
Work environment is highly collaborative with domain experts and technical leaders.
Exposure to regulated or sensitive domains like healthcare is a valuable plus.

A dynamic leader in the software industry focused on next-generation AI platforms for knowledge work.
Emphasizes innovation in AI to transform productivity and workflow automation at scale.
Operates in a hybrid work environment based in New York City with visionary founders and technical leaders.
Focuses on building reliable, scalable AI systems with continuous improvement baked in.
Likely a mid-sized or startup-stage company with a strong technical and collaborative culture.

PythonLLM orchestrationBrowser automationFull stack developmentAI system evaluation

AI Engineer

About the role