Skip to content

[opt]Defer UCM KV Dump Waiting Until Request Completion#989

Open
qyh111 wants to merge 9 commits into
ModelEngine-Group:developfrom
qyh111:dev_qyh
Open

[opt]Defer UCM KV Dump Waiting Until Request Completion#989
qyh111 wants to merge 9 commits into
ModelEngine-Group:developfrom
qyh111:dev_qyh

Conversation

@qyh111

@qyh111 qyh111 commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

Purpose

Move UCM KV dump completion waiting out of wait_for_save() so model execution is no longer blocked by async dump tasks, while still preserving delayed block release correctness for finished requests.

This also fixes the non-HMA MultiConnector path by forwarding request_finished() from the outer UCMConnector to the actual inner connector, ensuring requests that UCM saves asynchronously are properly marked for delayed free before they are returned from get_finished().

Modifications

  • Added async dump task tracking for UCM Direct and LayerWise connectors.
  • Changed wait_for_save() to submit dump tasks without blocking on task completion.
  • Moved dump completion waiting to get_finished() for requests that have finished and require delayed block release.
  • Added preemption handling to wait only for pending dump tasks related to preempted requests.
  • Added per-task event handle release after dump task completion or submission failure.
  • Removed save-duration metrics from the delayed wait_for_save() path because the timing is no longer accurate after deferring completion.
  • Updated LayerWise save flow to use the same delayed dump completion mechanism.
  • Added UCMConnector.request_finished() forwarding for the non-HMA path so MultiConnector can correctly count UCM async saves.
  • Rebased the branch on latest origin/develop and resolved conflicts with the new layerwise / pipeline store metrics changes.

Test

@qyh111 qyh111 changed the title Dev qyh delay wait for save Jun 1, 2026
@qyh111 qyh111 changed the title delay wait for save [opt]Defer UCM KV Dump Waiting Until Request Completion Jun 1, 2026
Comment thread ucm/integration/vllm/ucm_connector.py
Comment thread ucm/integration/vllm/ucm_connector.py
Comment thread ucm/integration/vllm/ucm_connector.py
Comment thread ucm/integration/vllm/ucm_connector.py
@dante159753

dante159753 commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

I found one issue in the CP path: UCMCPConnector.build_connector_meta() does not mirror UCMDirectConnector.build_connector_meta() by adding requests with dump_block_ids to the scheduler-side _async_dump_req_ids set. As a result, request_finished() returns False in the CP path, so vLLM may free the blocks immediately while the worker-side dump task has been deferred until get_finished(). That leaves a window where the blocks can be reused or overwritten before the dump wait completes. Please mirror the DirectConnector tracking logic in this override before returning the metadata.

@qyh111

qyh111 commented Jun 2, 2026

Copy link
Copy Markdown
Contributor Author

I found one issue in the CP path: UCMCPConnector.build_connector_meta() does not mirror UCMDirectConnector.build_connector_meta() by adding requests with dump_block_ids to the scheduler-side _async_dump_req_ids set. As a result, request_finished() returns False in the CP path, so vLLM may free the blocks immediately while the worker-side dump task has been deferred until get_finished(). That leaves a window where the blocks can be reused or overwritten before the dump wait completes. Please mirror the DirectConnector tracking logic in this override before returning the metadata.

I think this does not apply to the CP path. UCMCPConnector does not override build_connector_meta(); its MRO is UCMCPConnector -> UCMLayerWiseConnector -> UCMDirectConnector, and UCMLayerWiseConnector does not override build_connector_meta either. Therefore CP uses UCMDirectConnector.build_connector_meta(), which already adds dump requests to scheduler-side _async_dump_req_ids before returning metadata.

Comment thread ucm/integration/vllm/ucm_connector.py
qyh111 added 4 commits June 6, 2026 15:39
# Conflicts:
#	ucm/integration/vllm/device.py
#	ucm/integration/vllm/ucm_connector.py
# Conflicts:
#	ucm/integration/vllm/ucm_connector.py
try:
self._wait_pending_dump_task(pending_dump_task)
except Exception as e:
logger.error(f"wait for dump kv cache failed. {type(e).__name__}: {e}")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Resource Leak: When an exception occurs in _wait_pending_dump_task, the event handle is not released. The _release_dump_event_handle is only called inside _wait_pending_dump_task after store.wait() succeeds. If store.wait() raises an exception, the event handle remains in pending_dump_task.event_handle and is never cleaned up. Consider adding cleanup in the except block: self._release_dump_event_handle(pending_dump_task)

for request_id, request in metadata.request_meta.items()
if len(request.dump_block_ids[0]) > 0
}
self._async_dump_req_ids.update(metadata_dump_request_ids)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ State Accumulation: _async_dump_req_ids is updated but never cleared. Over time, this set could accumulate stale request IDs from completed or preempted requests. Consider clearing this set when requests are finished, or in _flush_pending_dump_tasks when flushing matching request IDs.

) -> None:
if self.use_layerwise:
return
preempted_req_ids = (

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Type Safety: Using getattr(kv_connector_metadata, 'preempted_req_ids', None) assumes the metadata object has this attribute. If the metadata type changes, this could silently return None instead of raising an error. Consider explicit type checking or adding a hasattr check with logging if the attribute is missing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants