-
Notifications
You must be signed in to change notification settings - Fork 3k
Description
Problem (one or two sentences)
When running Qwen3 Coder Next in LM Studio connected to Roo Code (VS Code extension), tool calls are not executed incrementally during generation. Instead, they appear to be applied only after the model finishes generating its entire response.
If the model produces a large response, this causes Roo Code to time out before the tool calls are applied.
This is not a long prompt processing issue — the problem occurs during response generation.
Context (who is affected and when)
Here’s a clean, professional GitHub issue you can copy and paste into the Roo Code repo:
Bug Report: Tool Calls Not Applied Until End of Generation (Causes Timeout with Large Responses)
Summary
When running Qwen3 Coder Next in LM Studio connected to Roo Code , tool calls are not executed incrementally during generation. Instead, they appear to be applied only after the model finishes generating its entire response.
If the model produces a large response, this causes Roo Code to time out before the tool calls are applied.
This is not a long prompt processing issue — the problem occurs during response generation.
Environment
- Model: Qwen3 Coder Next
- Backend: LM Studio
- Client: Roo Code (VS Code extension)
- Connection: LM Studio local server → Roo Code
- Hardware: Strix Halo 128gb
- OS: Windows
Important Clarification
This is not caused by:
- Slow prompt processing
- Initial inference delay
- Hardware limitations
The issue happens specifically while the model is generating a long response.
Additional Notes
It appears Roo Code may be:
- Buffering the entire response before parsing tool calls, or
- Waiting for a full completion event before executing tools
If tool calls were processed incrementally from the stream, this timeout issue would likely not occur.
Potential Area to Investigate
- Streaming tool call parsing
- OpenAI-compatible streaming implementation
- Function/tool call handling in streaming mode
If logs or additional diagnostics are needed, I can provide them.
Reproduction steps
Reproduction Steps
-
Run LM Studio with Qwen3 Coder Next.
-
Connect Roo Code to LM Studio.
-
Trigger a coding task that produces:
- A long generation
- Multiple tool calls
-
Observe that tool calls are not applied until the model finishes generating.
-
Large outputs result in timeout before tool execution.
Expected result
Expected Behavior * Tool calls should be detected and executed as soon as they are generated (streaming tool execution). * Roo Code should not wait for the full model completion before applying tool calls. * Large responses should not cause timeouts if tool calls are already available in the stream.
Actual result
Actual Behavior * Model begins generating a response normally. * Tool calls appear in the output stream. * Roo Code does not apply the tool calls immediately. * It waits until the model completes generation. * For large responses, this results in a timeout before tool execution occurs.
Variations tried (optional)
No response
App Version
3.47.3
API Provider (optional)
None
Model Used (optional)
Qwen3 Coder Next
Roo Code Task Links (optional)
No response