Fix LLM callback isolation without serializing requests#4252
Fix LLM callback isolation without serializing requests#4252VedantMadane wants to merge 3 commits intocrewAIInc:mainfrom
Conversation
|
Not covered in this PR description:
If you prefer, I can add a follow up commit that documents these options or adds a concurrency focused test. |
# Conflicts: # lib/crewai/src/crewai/llm.py
31fdc55 to
35483b6
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
This PR is being reviewed by Cursor Bugbot
Details
Your team is on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle for each member of your team.
To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.
| call_id=get_current_call_id(), | ||
| ), | ||
| ) | ||
| raise |
There was a problem hiding this comment.
set_callbacks is now dead code, never called
Low Severity
The set_callbacks static method mutates LiteLLM's global callback lists, which is exactly the pattern this PR removes. Its only call site (self.set_callbacks(callbacks or []) in __init__) was deleted, and a grep confirms no other callers exist anywhere in the codebase or tests. Leaving it around risks someone re-introducing the global-mutation pattern unintentionally.


This is a follow-up to #4218 (auto-closed by bot) addressing the same race in LLM callback handling without holding a global lock across the network call.
What changed
test_llm_callback_replacementdeterministic by mockinglitellm.completion(removes sleep/heisenbug).Why
The approach in #4218 used a class-level lock held across the entire LLM request which can serialize all concurrent agent calls. This keeps concurrency while still ensuring callback isolation.
Fixes #4214.
Note
Medium Risk
Touches the core
LLM.call/LLM.acallexecution path by changing how callbacks are wired into LiteLLM, which could affect observability/token-usage integrations and streaming/non-streaming parity under concurrency.Overview
Stops mutating LiteLLM’s global callback state for per-request handlers, and instead computes
effective_callbacksand passes them viaparams["callbacks"]on eachcompletion/acompletioncall to avoid cross-request races.Updates tests to make callback isolation deterministic: rewrites
test_llm_callback_replacementto mocklitellm.completion(removing the sleep/flakiness) and adds a new threaded concurrency test asserting each request receives only its own callback and token usage is not mixed.Written by Cursor Bugbot for commit 8dfc1f4. This will update automatically on new commits. Configure here.