fix: resilient DSML tool-call parsing and repair in long contexts by jackygurui · Pull Request #192 · antirez/ds4

jackygurui · 2026-05-18T15:40:16Z

Summary

Long-context and long-generation scenarios expose several failure modes in DSML tool-call parsing. The model's attention degrades over thousands of tokens, producing malformed DSML that causes finish=error and breaks the agent loop. This PR adds a robust repair layer (try_repair_dsml) that handles all observed failure modes, plus targeted fixes for false positives and visibility improvements.

Problem

Three failure modes were observed in stress testing (256K context, q4-imatrix):

Mode	Description	Symptom
Unterminated	Model stops mid-DSML, missing closing tags	`finish=error`, "unterminated tool call"
Malformed closed	Outer tags balanced but inner tags broken	`finish=error`, "invalid tool call"
Hallucinated	`<tool_calls>...</tool_calls>` wrapping plain reasoning text	False tool call detection

Before this fix, all three modes would land in finish=error, silently aborting the agent loop. Additionally, DSML tags mentioned inside <thinking> text (e.g. the model explaining tool syntax in its reasoning) were being counted by the tag scanner in try_repair_dsml, causing false positive repairs.

Changes

try_repair_dsml() — single-pass DSML repair (2 commits)
- Rewritten from 6-pass O(6n) to single-pass O(n) tag counting for all 6 tag types
- Unterminated repair: appends missing closing tags in correct nesting order (parameter → invoke → tool_calls)
- Hallucinated repair: when tool_calls tags exist but contain no <invoke>, strips the orphaned tags so the text is treated as plain content
- Orphan end tag guard: returns early when closing tags outnumber opening tags (prevents size_t underflow)
- Integrated into both the unterminated-tool-call path and the invalid-tool-call recovery path
- Added 10 unit tests covering all 3 DSML styles, unterminated, malformed, hallucinated, and balanced cases
Ignore DSML inside <thinking> (1 commit)
- try_repair_dsml now finds the last </thinking> boundary and scans for DSML tags only from there — deliberately matching the same strategy used by parse_generated_message_ex, which was introduced in commit 037ee39 ("Ignore tool calls emitted inside thinking") to fix the same class of false positives reported in #167
- The hallucinated tag-stripping path also preserves the thinking section verbatim, only stripping tags from the post-thinking region
Debug visibility (1 commit)
- Logs a stderr message when require_thinking_closed=true is triggered but the model never closes <thinking>

Results

Before this fix, the 38 repaired + 3 orphan + 2 invalid cases would all have been counted as invalid, with every occurrence causing a hard finish=error. After the fix:

Tool calls: 169 | Invalid: 2 | Repaired: 38 | Orphan: 3

Only 2 cases remain unrecoverable; the other 41 are gracefully handled (38 repaired, 3 hallucinated-tags stripped). 100% repair success rate on unterminated tool calls.

During long tool-call generations (2000+ tokens), the model's attention degrades and drops closing DSML tags before reaching max_tokens. This causes finish=error with 'unterminated tool call', aborting the turn. Fix: before returning error, attempt to repair by appending missing closing tags (parameter -> invoke -> tool_calls in nesting order), then re-parse to verify the repair produces valid tool calls. - Add try_repair_dsml() to detect and fix unclosed DSML blocks - Integrate repair at the unterminated tool call error path - Add test_dsml_repair_produces_parseable_calls() with 7 scenarios covering all three DSML styles and multiple truncation patterns - Tests verify structural accuracy: tool name and arguments are correct Results: 0 finish=error across 156+ requests, 100% repair success rate on unterminated tool calls.

Long-context generations produce malformed DSML that parse_generated_message cannot parse, causing "invalid tool call" and breaking the agent loop. Three failure modes observed in stress testing (256K, q4-imatrix): Mode 1 (unterminated): model stops mid-DSML, missing closing tags Mode 2 (malformed closed): outer tags balanced but inner tags broken Mode 3 (hallucinated): tool_calls tags wrap plain reasoning text This commit addresses modes 1 and 2 via try_repair_dsml(): single-pass tag counting (O(n)) followed by appending missing closing tags in reverse nesting order (parameter -> invoke -> tool_calls). Also adds unit tests. Mode 3 is handled by antirez's commit 037ee39 which prevents DSML inside thinking from being detected as executable tool calls. Also adds orphan end tag guard: when toe>tos or ioe>ios or poe>pos, the size_t subtraction would underflow. Return false early. Signed-off-by: Rui Gu <jackygurui@gmail.com>

When parse_generated_message_ex is called with require_thinking_closed=true and the model never outputs </thinking>, the entire generation is treated as reasoning and any DSML inside is silently ignored. This stderr log makes the gate visible for debugging. Refs: antirez#167, commit 037ee39 (Ignore tool calls emitted inside thinking) Signed-off-by: Rui Gu <jackygurui@gmail.com>

try_repair_dsml scanned the full generated text for DSML tags. When the model discusses DSML syntax in its reasoning (e.g. explaining the DSML tags), those text mentions inflate the tag counts, causing false positive repairs (appending unnecessary closing tags). Fix: find the last </thinking> boundary and start counting only from there. DSML mentioned inside reasoning is model text, not executable tool calls — matches the same approach used by parse_generated_message_ex (commit 037ee39). Also updated the hallucinated strip path to copy the thinking section verbatim and only strip from the post-thinking region. Real-world validation: observed this exact false positive in production. The model was explaining how try_repair_dsml works and quoted the DSML tag syntax in its explanation. The parser mistook the quote for a real tool call and the tag counting inflated, causing a failed repair. Metrics from production use (after all fixes in this branch): Tool calls: 169 | Invalid: 2 | Repaired: 38 | Orphan: 3 Only 2 cases remain unrecoverable; the other 41 (38 repaired + 3 orphan) are now gracefully handled instead of causing finish=error. Signed-off-by: Rui Gu <jackygurui@gmail.com>

jackygurui added 4 commits May 18, 2026 23:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: resilient DSML tool-call parsing and repair in long contexts#192

fix: resilient DSML tool-call parsing and repair in long contexts#192
jackygurui wants to merge 4 commits into
antirez:mainfrom
jackygurui:fix/dsml-repair-unterminated-tool-calls

jackygurui commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jackygurui commented May 18, 2026

Summary

Problem

Changes

Results

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant