Skip to content

Conversation

@tkattkat
Copy link
Collaborator

@tkattkat tkattkat commented Jan 6, 2026

Why

Previously, agent executions could end without a structured final response - either by hitting maxSteps or the LLM breaking out of its loop without calling the close tool. This made it difficult to:

  1. Reliably determine if a task was completed successfully
  2. Extract structured data from the agent's execution

What Changed

Ensured Close Tool is Always Called

  • Added handleCloseToolCall utility that forces a close tool call via a separate generateText call when the main agent loop ends without explicitly closing
  • Integrated via new ensureClosed private method in v3AgentHandler.ts
  • Works for both execute() and stream() modes
  • Triggers when maxSteps is reached or the LLM stops ( completes its task)

Added Output Schema Support (Experimental)

  • Users can now pass a Zod schema to agent.execute({ output: z.object({...}) }) to return structured data at the end of execution
  • The schema dynamically extends the close tool's input schema
  • Extracted data is returned in result.output
  • Added validation:
    • CUA mode: Throws StagehandInvalidArgumentError (not supported)
    • Non-CUA without experimental: true: Throws ExperimentalNotConfiguredError

Example Usage

const result = await agent.execute({
  instruction: "search for a shampoo on amazon and click into one of the results",
  maxSteps: 20,
  output: z.object({
    productName: z.string().describe("The name of the shampoo product"),
    price: z.string().describe("The price of the product"),
    rating: z.string().describe("The star rating of the product"),
  }),
});

console.log(result.output);
// { productName: "...", price: "$12.99", rating: "4.5 out of 5 stars" }

Test Plan

  • Verify close tool is called when agent naturally completes (no change in behavior)
  • Verify close tool is forced when maxSteps is reached
  • Verify output schema extracts data correctly in execute() mode
  • Verify output schema extracts data correctly in stream() mode
  • Verify output throws StagehandInvalidArgumentError when used with CUA mode
  • Verify output throws ExperimentalNotConfiguredError when used without experimental: true

Summary by cubic

Ensures every agent run ends with a structured final response and adds optional structured output via a Zod schema. Improves reliability by always setting completion status and final reasoning.

  • New Features
    • Always triggers a "close" tool call at the end of a run (LLM stops or maxSteps), for both execute() and stream().
    • Optional output schema: pass output: z.object({...}) to return typed data in result.output.
    • Validation: output schema is not supported in CUA (throws StagehandInvalidArgumentError). In non-CUA, requires experimental: true (throws ExperimentalNotConfiguredError otherwise).
    • Removed "close" from the main tool list and system prompt; closing is handled automatically post-run.

Written for commit dfb703a. Summary will update on new commits.

@changeset-bot
Copy link

changeset-bot bot commented Jan 6, 2026

🦋 Changeset detected

Latest commit: dfb703a

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 3 packages
Name Type
@browserbasehq/stagehand Patch
@browserbasehq/stagehand-evals Patch
@browserbasehq/stagehand-server Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@tkattkat tkattkat changed the title Update close tool Update close tool + add output to agent result Jan 6, 2026
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 8 files

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Jan 6, 2026

Greptile Summary

  • Ensures agent executions always end with a structured close tool call and adds optional Zod schema-based output extraction to improve reliability and data extraction capabilities
  • Removes close tool from the manual agent toolkit and automatically handles closing via handleCloseToolCall utility when agents complete or reach maxSteps
  • Adds experimental output schema feature allowing users to extract structured data from agent results via result.output with proper validation for CUA mode restrictions

Important Files Changed

Filename Overview
packages/core/lib/v3/agent/utils/handleCloseToolCall.ts New utility that forces close tool calls via separate LLM inference to ensure structured agent completion
packages/core/lib/v3/handlers/v3AgentHandler.ts Integrates forced close handling and output schema support into agent execution flow
packages/core/lib/v3/types/public/agent.ts Adds output field to agent options and result types for structured data extraction

Confidence score: 4/5

  • This PR requires careful review due to significant architectural changes in agent completion handling
  • Score reflects potential for agent behavior changes and the introduction of experimental features that alter the fundamental execution flow
  • Pay close attention to handleCloseToolCall.ts and the close handling logic in v3AgentHandler.ts to ensure reliable forced closing

Sequence Diagram

sequenceDiagram
    participant User
    participant AgentHandler as V3AgentHandler
    participant LLMClient
    participant CloseHandler as handleCloseToolCall
    participant Model as LanguageModel
    
    User->>AgentHandler: "execute(instruction, options)"
    AgentHandler->>AgentHandler: "prepareAgent()"
    AgentHandler->>LLMClient: "generateText(systemPrompt, messages, tools)"
    
    loop Agent Steps (up to maxSteps)
        LLMClient->>Model: "Generate next step"
        Model-->>LLMClient: "Tool calls + reasoning"
        LLMClient->>AgentHandler: "onStepFinish(toolCalls, results)"
        AgentHandler->>AgentHandler: "Process tool results and update state"
        
        alt Tool call is "close"
            AgentHandler->>AgentHandler: "Mark state.completed = true"
        end
    end
    
    LLMClient-->>AgentHandler: "Generation result"
    
    alt state.completed == false
        Note over AgentHandler,CloseHandler: Force close tool call
        AgentHandler->>CloseHandler: "handleCloseToolCall(model, messages, instruction, outputSchema)"
        CloseHandler->>Model: "generateText() with close tool only"
        Model-->>CloseHandler: "Close tool call with reasoning + output"
        CloseHandler-->>AgentHandler: "closeResult(reasoning, taskComplete, output)"
        AgentHandler->>AgentHandler: "Update state with close result"
    end
    
    AgentHandler->>AgentHandler: "consolidateMetricsAndResult()"
    AgentHandler-->>User: "AgentResult(success, message, actions, output)"
Loading

{ name: "wait", description: "Wait for a specified time" },
{ name: "navback", description: "Navigate back in browser history" },
{ name: "scroll", description: "Scroll the page x pixels up or down" },
{ name: "close", description: "Mark the task as complete or failed" },
Copy link
Member

@pirate pirate Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
{ name: "close", description: "Mark the task as complete or failed" },
{ name: "close", description: "Mark the task as complete or failed" }, // TODO: consider renaming this tool to "done"

import { StagehandZodObject } from "../../zodCompat";
interface CloseResult {
reasoning: string;
taskComplete: boolean;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think success: true | false may be better

reasoning: string;
taskComplete: boolean;
messages: ModelMessage[];
output?: Record<string, unknown>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend making output required, LLMs are really good at inferring what the ideal output should be for a task.

e.g. if user is researching something often it nails it and puts the exact data they were looking for in output.

@tkattkat tkattkat merged commit 6fbf5fc into main Jan 8, 2026
19 checks passed
@github-actions github-actions bot mentioned this pull request Jan 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants