Skip to content

human_in_the_loop: duplicate tool-call-name ambiguity guard silently skipped, causing wrong decisions to be applied #11756

Description

@immuhammadfurqan

Describe the bug

_apply_tool_execution_decisions (in haystack/human_in_the_loop/strategies.py) is supposed to refuse to proceed when it can't unambiguously match a ToolExecutionDecision to a tool call — specifically, when multiple tool calls share the same tool name but have no tool_call_id. There's already a test for this (test_two_teds_same_name_no_ids), and it works only when every tool call/decision in the batch lacks an id.

The guard is:

decision_by_id = {d.tool_call_id: d for d in tool_execution_decisions if d.tool_call_id}
decision_by_name = {d.tool_name: d for d in tool_execution_decisions if d.tool_name}

if not decision_by_id and len(decision_by_name) < len(tool_execution_decisions):
    raise ValueError(...)

The problem: not decision_by_id is checked across the entire batch. If any unrelated tool call elsewhere in the same batch happens to carry an id, decision_by_id becomes non-empty, and the whole guard is skipped — even though the other tool calls (the ones with duplicate names and no ids) are just as ambiguous as before. Once skipped, decision_by_name.get(tc.tool_name) returns whichever decision happens to be last in the dict, and that decision is silently applied to the wrong tool call. This is silent data corruption, not just a missed validation — there's no error, no warning, just a tool getting invoked with someone else's arguments.

Error message

No error is raised — that's the bug. Expected: ValueError: ToolExecutionDecisions are missing tool_call_id fields and there are multiple tool calls with the same name.... Actual: silent success with corrupted arguments.

Expected behavior

The ambiguity check should fire whenever there's a duplicate tool name among the calls/decisions that lack an id, regardless of whether other, unrelated tool calls in the same batch happen to have ids.

To Reproduce

from haystack.dataclasses import ChatMessage, ToolCall
from haystack.human_in_the_loop import ToolExecutionDecision
from haystack.human_in_the_loop.strategies import _apply_tool_execution_decisions

message_with_tool_calls = ChatMessage.from_assistant(
    tool_calls=[
        ToolCall(tool_name="add_database_tool", arguments={"name": "Malte"}),  # no id
        ToolCall(tool_name="add_database_tool", arguments={"name": "Milos"}),  # no id, duplicate name
        ToolCall(tool_name="other_tool", arguments={}, id="xyz"),  # has id, unrelated
    ]
)

rejection_messages, new_tool_call_messages = _apply_tool_execution_decisions(
    tool_call_messages=[message_with_tool_calls],
    tool_execution_decisions=[
        ToolExecutionDecision(tool_name="add_database_tool", execute=True, final_tool_params={"name": "Malte"}),
        ToolExecutionDecision(tool_name="add_database_tool", execute=True, final_tool_params={"name": "Milos"}),
        ToolExecutionDecision(tool_name="other_tool", execute=True, tool_call_id="xyz", final_tool_params={}),
    ],
)

for m in new_tool_call_messages:
    print(m.tool_calls)
# Both `add_database_tool` calls end up with arguments {'name': 'Milos'}.
# No ValueError is raised. The "Malte" call is silently lost/overwritten.

Additional context

This is the same validation already covered by test_two_teds_same_name_no_ids in test/human_in_the_loop/test_strategies.py, but that test only has tool calls without ids at all. The bug only shows up with a mix of id-bearing and id-less calls in the same batch, which isn't currently tested.

I'd like to fix the guard to check ambiguity only among the id-less subset, and add a regression test for the mixed case, then submit a PR.

System:

  • OS: Windows
  • Haystack version: main

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Medium priority, add to the next sprint if no P1 available

    Type

    No fields configured for Bug.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions