Skip to content

feat: replace subagent and skills with unified summon extension#6964

Open
tlongwell-block wants to merge 26 commits intomainfrom
summon_extension
Open

feat: replace subagent and skills with unified summon extension#6964
tlongwell-block wants to merge 26 commits intomainfrom
summon_extension

Conversation

@tlongwell-block
Copy link
Collaborator

@tlongwell-block tlongwell-block commented Feb 4, 2026

Introduces the summon extension to unify goose's delegation and knowledge-loading capabilities.

New Tools

Tool Purpose Example
load Discover sources or inject knowledge load() lists all, load(source: "rust-patterns") loads content
delegate Run task in isolated subagent delegate(instructions: "...") or delegate(source: "review", async: true)

Features

  • Source discovery: Recipes, skills, agents from local/global paths with priority resolution
  • Async execution: Background tasks with turn tracking and MOIM status reporting
  • Unprefixed tools: Cleaner tool names without extension__ prefix

Removed

  • subagent_tool.rs, skills_extension.rs, builtin_skills/ - replaced by summon

Closes #6202

Implements unified 'summon' platform extension with two intuitive tools:
- load: Inject knowledge into current context ("teach me this")
- delegate: Run tasks in isolated subagents ("do this for me")

Key features:
- Source discovery from recipes, skills, agents with priority ordering
- Skill/agent frontmatter parsing with Claude model shorthand translation
- Sync and async delegation with background task tracking
- MOIM status reporting for background tasks with rounded durations
- Nested delegation prevention (subagents cannot spawn subagents)
- 60s TTL caching for source discovery

Removes:
- subagent_tool.rs (523 LOC)
- skills_extension.rs (865 LOC)
- builtin_skills/ directory

Adds:
- summon_extension.rs (1834 LOC) - consolidated implementation

The summon extension provides a cleaner mental model while maintaining
all existing functionality. Builtin skills are now inlined in the
extension.

Ref: GitHub Discussion #6202
- Skill-only delegation now sets prompt: 'Apply the skill knowledge to produce a useful result.'
- Agent-only delegation now sets prompt: 'Proceed with your expertise to produce a useful result.'
- This prevents subagents from receiving the meaningless 'Begin.' prompt

Also includes:
- Agent model override now correctly applied from recipe.settings
- Model/provider/temperature precedence: params > recipe.settings > session
- DEFAULT_SUBAGENT_MAX_TURNS updated to 50 for consistency

Crossfire review scores:
- Default subagent: 8.5/10
- goose-gpt-5-2: 8/10

No blocking issues found. Ready to merge.
Implement meta-based tool ownership to enable platform extensions to
expose tools without the extension__ prefix.

Changes:
- Add unprefixed_tools flag to PlatformExtensionDef (enabled for summon)
- Embed goose_extension in Tool.meta for all tools during fetch
- Add resolve_tool() helper using meta-based ownership lookup
- Simplify dispatch_tool_call() to use unified resolution path
- Update filter_tools() and reply_parts.rs to use meta ownership
- Add collision detection for duplicate tool names
- Remove unused get_client_for_tool() method

This allows the summon extension's load and delegate tools to appear
without prefix while maintaining correct dispatch and filtering.
Clean up implementation by removing explanatory inline comments.
The code is self-documenting through clear naming and structure.

Removed ~55 comments including:
- Implementation detail explanations
- Precedence order comments
- Test section labels
- Inline explanatory notes

All doc comments (///) preserved for public API documentation.
Replace inline string literals with compile-time bundled .md files
using include_dir!, matching the pattern used for system prompts.

- Add builtin_skills module with include_dir! macro
- Restore goose_doc_guide.md skill file from main
- Update summon_extension to use builtin_skills::get_all()

Benefits:
- Easier to maintain with full editor support
- Syntax highlighting for markdown content
- Simple to add new skills (just create .md files)
- Consistent with prompts/ directory pattern
- Add prefix-based fallback in resolve_tool() to handle prefixed tool
  calls without triggering tools/list, fixing MCP replay test failures
- Set unprefixed_tools: true for code_execution extension to maintain
  backward compatibility with LLMs calling execute_code without prefix
- Refactor subagent_handler with OnMessageCallback type and
  run_subagent_task_with_callback() to eliminate code duplication
- Update summon_extension to use shared subagent infrastructure
- Fix CLI output.rs to recognize unprefixed tool names (execute_code,
  delegate) for proper rendering

All 548 lib tests and 4 MCP integration tests pass.
- Replace subagent references with delegate tool tests
- Add load tool testing (discovery, builtin skills, knowledge injection)
- Add async delegate test with MOIM monitoring
- Add nested delegation prevention test (critical security)
- Add source-based delegate test
- Rename test_phases parameter value from 'subagents' to 'delegation'
- Clean up trailing whitespace throughout file
When async delegates complete, their results are now persisted in
completed_tasks instead of being discarded. The agent can retrieve
completed task outputs by calling load(task_id).

Changes:
- Add CompletedTask struct to store task results with metadata
- Add completed_tasks field to SummonClient
- Modify cleanup_completed_tasks() to move finished tasks to completed_tasks
- Add handle_load_task_result() to retrieve and format task output
- Update handle_load() to check for task_ prefix and call cleanup first
- Update get_moim() to show completed tasks with retrieval hints
- Update handle_load_discovery() to list completed tasks awaiting retrieval
- Add comprehensive test_async_task_result_lifecycle test

MOIM now shows:
- Running tasks with sleep hint
- Completed tasks with 'use load("task_id") to get result' hint

Implements TASK-09A from #6202
Review feedback:
- Remove doc comments that restate what code does (extension.rs,
  extension_manager.rs, subagent_handler.rs, summon_extension.rs)
- Consolidate add_local_*, add_global_*, add_recipe_path_* functions
  into a single discover_filesystem_sources() method that builds
  directory lists declaratively and iterates through them

The source discovery consolidation reduces ~80 LOC by eliminating
8 separate add_* methods and replacing them with inline directory
list construction using iterator chains.
…mon_extension

Removed doc comments that merely restate what the code does:
- extension_manager.rs: get_tool_owner, is_unprefixed_extension
- summon_extension.rs: skill/agent directory comments, handle_load_source,
  get_task_description

These comments added no value beyond what the function names and code
already communicate clearly.
- Update delegate tool description: 'Parallel execution requires async: true'
- Expand goose-self-test.yaml parallel delegate test to explicitly validate:
  - Sync delegates run sequentially even when called in same message
  - Async delegates run in parallel when called in same message
  - Test includes timestamp comparison to verify behavior

This documents the expected behavior due to MCP protocol constraints where
sync tool calls are serialized per extension connection.
Model names should be specified in full by the agent/recipe.
The provider layer handles validation. Removes unnecessary
maintenance burden of keeping shorthand mappings up to date.
Replace custom task_id generation with session ID (YYYYMMDD_N format).
This simplifies the code and makes task IDs match session IDs for
easier debugging and correlation.
Resolved conflicts:
- Updated McpClientTrait::call_tool signature to include working_dir parameter
- Re-exported SUBAGENT_TOOL_REQUEST_TYPE from agents module for goose-cli
- Integrated origin/main's working_dir support with summon_extension's changes
@tlongwell-block

This comment was marked as resolved.

@github-actions

This comment was marked as resolved.

Add ability to cancel running background tasks via load(source: "task_id", cancel: true).

- Store CancellationToken in BackgroundTask instead of creating orphan tokens
- Add `cancel` parameter to load tool schema
- Cancel triggers token, waits up to 5s for graceful shutdown, then aborts
- Cancelled tasks return partial output with "⊘ Cancelled" status
- Update MOIM hint to show cancel option when tasks are running
- Add cancellation test to goose-self-test.yaml
@tlongwell-block

This comment was marked as resolved.

@github-actions

This comment was marked as resolved.

- Add Drop impl to cancel background tasks on shutdown (best-effort via try_lock)
- Propagate ModelConfig::new error instead of unwrap
- Document unprefixed_tools field in PlatformExtensionDef
@DOsinga
Copy link
Collaborator

DOsinga commented Feb 5, 2026

I haven't looked at the details here, but this feels like the right way to do this.

@block block deleted a comment from github-actions bot Feb 5, 2026
* origin/main:
  fix: detect context length errors in GCP Vertex AI provider (#6976)
  Added the ability to escape Jinja variables in recipes to include them in prompts (#6975)
  Bug Fix: bump pctx (#6967)
  fix(acp): fixtures now raise content mismatch errors (#6912)
  custom provider form minor improvements (#6966)
  Fix 'Edit In Place' feature appending instead of editing messages (#6955)
  docs: change RPI slash commands (#6963)
Parallel async delegates were causing 'database is locked' errors due to
concurrent write operations (create_session, add_message) competing for
SQLite's single writer lock.

Solution: Add a tokio::sync::Mutex to SessionStorage that serializes all
write operations. This gracefully handles intra-process contention while
SQLite's busy_timeout (5s) handles any cross-process scenarios.

Protected methods:
- create_session
- add_message
- apply_update
- replace_conversation
- delete_session
- truncate_conversation
- update_message_metadata
The load_callback_configs function was failing when encountering
unprefixed tools (like summon's load/delegate) because it expected
all tool names to contain '__'. Now uses get_tool_owner() to get
the namespace for unprefixed tools, making them available in code
execution mode as Summon.load() and Summon.delegate().
…ptions

Two fixes for subrecipes support:

1. CLI now stores recipe on session after creation, allowing the summon
   extension to discover sub_recipes defined in the parent recipe. This
   matches the behavior of goose-server and scheduler.

2. Subrecipe descriptions now show available parameters by loading the
   actual recipe file and extracting parameter keys. Example output:
   'Gather file statistics (params: directory, fast_mode)'
When a recipe defines sub_recipes, the summon extension is now automatically
added to the extensions list. This ensures the delegate tool is available
without requiring users to manually specify the summon extension.

Changes:
- Add ensure_summon_for_subrecipes() helper in Recipe that auto-injects
  the summon platform extension when sub_recipes are present
- Update test assertions to expect summon in extension list
- Update test script to look for 'delegate' instead of 'subagent'
…tion guidance

Replace redundant Options section (duplicated parameter descriptions) with
practical guidance on using delegates effectively:

- Clarify that delegates know only instructions + source content
- Warn that delegates cannot coordinate and same-file work causes conflicts
- Explain async/sync usage: parallel needs async + sleep, single task uses sync
- Distinguish research (read-only, parallelize freely) from work (partition files)
- Add orchestration pattern: decompose → async delegates → sleep → synthesize
- Update combined mode with concrete example: source + instructions pairing
@tlongwell-block
Copy link
Collaborator Author

/goose

@github-actions
Copy link
Contributor

github-actions bot commented Feb 6, 2026

Summary: This PR unifies goose's delegation (subagent) and knowledge-loading (skills) capabilities into a single "summon" extension with load and delegate tools. The architectural approach is sound - consolidating related functionality and introducing unprefixed tools for cleaner ergonomics. However, there are some implementation concerns around test coverage and resource management.

🟡 Warnings

  1. Test coverage reduced for tool resolution (extension_manager.rs:1704-1770 deleted)

    The old test_get_client_for_tool tests covered important edge cases: unicode tool names, multiple underscores in names, and various prefixing scenarios. These tests were deleted but the new resolve_tool function has different (and more complex) resolution logic with no equivalent test coverage.

    // Old tests removed:
    // - test handling of "__client" prefixes
    // - test "__cli__ent__" (multiple underscores)
    // - unicode handling

    The new resolution has two paths: (1) prefix-based lookup and (2) metadata-based fallback. Both deserve test coverage.

  2. Completed tasks accumulate unbounded (summon_extension.rs)

    The completed_tasks HashMap only gets entries removed when a user calls load(task_id). Long-running sessions that spawn many background tasks but forget to retrieve results will accumulate memory:

    pub struct CompletedTask {
        pub id: String,
        pub description: String,
        pub result: Result<String, String>,  // Could be large
        pub turns_taken: u32,
        pub duration: Duration,
    }

    Consider adding a TTL or max capacity with LRU eviction.

  3. Tool name collision silently drops tools (extension_manager.rs:936-948)

    When two extensions expose unprefixed tools with the same name, the second one is silently skipped with only a warning:

    if seen_names.contains(&tool_name) {
        warn!(
            tool = %tool_name,
            extension = %ext_name,
            "Duplicate tool name - skipping"
        );
        continue;
    }

    This is reasonable behavior, but the warning may be missed. Consider surfacing this in the MOIM or system prompt if it happens, as it could cause user confusion when a tool doesn't behave as expected.

🟢 Suggestions

  1. Source discovery errors are silently ignored (various scan_* methods)

    All scan_recipes_dir, scan_skills_dir, scan_agents_dir methods silently ignore read_dir failures. While this is often fine (directories may not exist), it could mask permission issues:

    let entries = match std::fs::read_dir(dir) {
        Ok(e) => e,
        Err(_) => return,  // All errors ignored
    };

    Consider at least debug-level logging for non-NotFound errors.

  2. is_session_id validation is loose (summon_extension.rs)

    The function checks if a string looks like a task ID by matching a timestamp prefix pattern. This works but could theoretically match user-provided source names that happen to look like timestamps:

    fn is_session_id(s: &str) -> bool {
        let parts: Vec<&str> = s.split('-').collect();
        // checks for timestamp-like prefix
    }

    A more robust approach might be to track known task IDs explicitly.

✅ Highlights

  • Good deduplication strategy: Using HashSet<String> in discover_filesystem_sources ensures sources from higher-priority directories (local → global) take precedence cleanly.

  • Proper cancellation handling: Background tasks properly receive and respect cancellation tokens, and the last_activity tracking enables idle detection.

  • Clean tool ownership tracking: Storing goose_extension in tool metadata is a good pattern that allows resolution without relying on naming conventions.

  • Subagent recursion prevention: The check if !is_subagent { tools.push(self.create_delegate_tool()); } correctly prevents infinite delegation chains.

  • Helpful MOIM integration: The get_moim implementation provides clear status on background tasks with formatted duration and turn counts.


Review generated by goose

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants