[opt]Defer UCM KV Dump Waiting Until Request Completion#989
Conversation
|
I found one issue in the CP path: |
I think this does not apply to the CP path. UCMCPConnector does not override build_connector_meta(); its MRO is UCMCPConnector -> UCMLayerWiseConnector -> UCMDirectConnector, and UCMLayerWiseConnector does not override build_connector_meta either. Therefore CP uses UCMDirectConnector.build_connector_meta(), which already adds dump requests to scheduler-side _async_dump_req_ids before returning metadata. |
# Conflicts: # ucm/integration/vllm/device.py # ucm/integration/vllm/ucm_connector.py
# Conflicts: # ucm/integration/vllm/ucm_connector.py
| try: | ||
| self._wait_pending_dump_task(pending_dump_task) | ||
| except Exception as e: | ||
| logger.error(f"wait for dump kv cache failed. {type(e).__name__}: {e}") |
There was a problem hiding this comment.
_wait_pending_dump_task, the event handle is not released. The _release_dump_event_handle is only called inside _wait_pending_dump_task after store.wait() succeeds. If store.wait() raises an exception, the event handle remains in pending_dump_task.event_handle and is never cleaned up. Consider adding cleanup in the except block: self._release_dump_event_handle(pending_dump_task)
| for request_id, request in metadata.request_meta.items() | ||
| if len(request.dump_block_ids[0]) > 0 | ||
| } | ||
| self._async_dump_req_ids.update(metadata_dump_request_ids) |
There was a problem hiding this comment.
_async_dump_req_ids is updated but never cleared. Over time, this set could accumulate stale request IDs from completed or preempted requests. Consider clearing this set when requests are finished, or in _flush_pending_dump_tasks when flushing matching request IDs.
| ) -> None: | ||
| if self.use_layerwise: | ||
| return | ||
| preempted_req_ids = ( |
There was a problem hiding this comment.
💡 Type Safety: Using getattr(kv_connector_metadata, 'preempted_req_ids', None) assumes the metadata object has this attribute. If the metadata type changes, this could silently return None instead of raising an error. Consider explicit type checking or adding a hasattr check with logging if the attribute is missing.
Purpose
Move UCM KV dump completion waiting out of
wait_for_save()so model execution is no longer blocked by async dump tasks, while still preserving delayed block release correctness for finished requests.This also fixes the non-HMA
MultiConnectorpath by forwardingrequest_finished()from the outerUCMConnectorto the actual inner connector, ensuring requests that UCM saves asynchronously are properly marked for delayed free before they are returned fromget_finished().Modifications
wait_for_save()to submit dump tasks without blocking on task completion.get_finished()for requests that have finished and require delayed block release.wait_for_save()path because the timing is no longer accurate after deferring completion.UCMConnector.request_finished()forwarding for the non-HMA path soMultiConnectorcan correctly count UCM async saves.origin/developand resolved conflicts with the new layerwise / pipeline store metrics changes.Test