feat(gadget): support user recovery handover#51
Conversation
|
@claude review |
|
Claude finished @samcday's task in 2m 31s —— View job Code Review
The overall approach is sound: quiesce in-flight I/O on SIGHUP using a background thread (so the async event loop keeps servicing I/O completions during the quiesce wait), then drop device handles to release them for adoption. The two-phase split ( Issues1.
loop {
let mut busy = false;
for dev_id in dev_ids.iter().copied() { // iterates ALL ids every pass
match ublk.start_user_recovery(dev_id).await {
Ok(()) => { ... }
Err(err) if error_is_errno(&err, libc::EBUSY) => { busy = true; ... }
Err(err) => return Err(err)...; // any other error is fatal
}
}
if !busy { return Ok(()); }
// sleep 100ms, loop back
}If device 0 enters recovery ( Fix: track which ids have been successfully started and skip them on retry. Fix this → 2.
3. Hardcoded raw kernel constants without version context
const UBLK_CMD_QUIESCE_DEV_RAW: u32 = 0x16;
const UBLK_F_QUIESCE_RAW: u64 = 1 << 12;The 4.
info.flags |=
(UBLK_F_USER_RECOVERY | UBLK_F_USER_RECOVERY_REISSUE) as u64 | UBLK_F_QUIESCE_RAW;
Minor observations
The new |
Quiesce and release existing ublk devices on SIGHUP so a successor daemon can adopt them without completing in-flight I/O as failures. Add VM coverage for preserving the device identity across handover. Signed-off-by: Sam Day <me@samcday.com>
|
@claude did a bunch of rework, previous feedback should be addressed but also the way we quiesce and deal with the uring changed a fair bit, please re-review. |
|
Claude finished @samcday's task in 8m 32s —— View job Code Review (re-review)
Previous feedback — all addressed1. 2. 3. Raw 4. New quiesce/uring logicThe rewrite is a genuine improvement. The key design points and their correctness: Two-phase drain: quiesce only starts after
Union access fixed:
Remaining issues1.
The first call is correct and necessary — it prevents 2. Items already in
When SIGHUP fires, the This is actually correct behavior: // Quiesce aborts idle FETCH commands, but already-fetched requests still
// need their completions committed before the old owner can exit cleanly.This is only partially true. Requests dispatched to the app are committed before device drop (because 3.
#ifdef UBLK_U_CMD_QUIESCE_DEV
MARK_FIX_753(UBLK_U_CMD_QUIESCE_DEV);
#endifSince 4.
The helper is identical in both files. Moving it to the shared Minor observations
The quiesce architecture is solid. The test covering drain-before-quiesce is the right invariant to pin down. Ready to merge once the comment at line 1559–1560 is updated (or once you're satisfied the behavior is documented elsewhere). |
Quiesce and release existing ublk devices on SIGHUP so a successor daemon can adopt them without completing in-flight I/O as failures. Add VM coverage for preserving the device identity across handover.