What you'll do
- Build and own the agent execution layer including browser and computer automation for multi-step task orchestration.
- Design and implement a playbook system to structure tasks and enable agents to learn from human corrections.
- Develop pipelines for agents to learn from user task demonstrations and generate reusable workflows.
- Construct robust evaluation frameworks to measure agent reliability, cost efficiency, and consistency.
- Collaborate with domain experts and integrate with external tools across multiple business domains.
What you should know
- Opportunity to work on cutting-edge AI agent systems impacting multiple business functions.
- Requires experience with LLM orchestration frameworks and full stack development in Python.
- Role involves complex problem solving in automation and AI system reliability evaluation.
- Work environment is highly collaborative with domain experts and technical leaders.
- Exposure to regulated or sensitive domains like healthcare is a valuable plus.
About the company
- A dynamic leader in the software industry focused on next-generation AI platforms for knowledge work.
- Emphasizes innovation in AI to transform productivity and workflow automation at scale.
- Operates in a hybrid work environment based in New York City with visionary founders and technical leaders.
- Focuses on building reliable, scalable AI systems with continuous improvement baked in.
- Likely a mid-sized or startup-stage company with a strong technical and collaborative culture.
Key required skills
PythonLLM orchestrationBrowser automationFull stack developmentAI system evaluation