Skip to content

Conversation

@congqixia
Copy link
Contributor

Related to #44620
Related to unstable ut "internal/querycoordv2 TestServer/TestNodeUp"

Introduce SessionWatcher interface to fix race condition and goroutine leak that caused unstable unit test TestServer/TestNodeUp.

Changes:

  • Add SessionWatcher interface with EventChannel() and Stop() methods
  • Refactor WatchServices() to return SessionWatcher instead of raw channel
  • Fix cleanup order in QueryCoordV2: stop watcher before session
  • Update DataCoord, ConnectionManager to use SessionWatcher
  • Add MockSessionWatcher for testing

Fixes race condition between session context cancellation and internal loop exit. Eliminates goroutine leak by providing explicit lifecycle management.

@sre-ci-robot sre-ci-robot added the size/XL Denotes a PR that changes 500-999 lines. label Nov 17, 2025
@mergify mergify bot added dco-passed DCO check passed. kind/bug Issues or changes related a bug labels Nov 17, 2025
@sre-ci-robot
Copy link
Contributor

[ci-v2-notice]
Notice: We are gradually rolling out the new ci-v2 system.

  • Legacy CI jobs remain unaffected, you can just ignore ci-v2 if you don't want to run it.
  • Additional "ci-v2/*" checkers will run for this PR to ensure the new ci-v2 system is working as expected.
  • For tests that exist in both v1 and v2, passing in either system is considered PASS.

To rerun ci-v2 checks, comment with:

  • /ci-rerun-code-check // for ci-v2/code-check
  • /ci-rerun-build // for ci-v2/build
  • /ci-rerun-ut-integration // for ci-v2/ut-integration
  • /ci-rerun-ut-go // for ci-v2/ut-go
  • /ci-rerun-ut-cpp // for ci-v2/ut-cpp
  • /ci-rerun-ut // for all ci-v2/ut-integration, ci-v2/ut-go, ci-v2/ut-cpp
  • /ci-rerun-e2e-arm // for ci-v2/e2e-arm

If you have any questions or requests, please contact @zhikunyao.

@congqixia
Copy link
Contributor Author

/ci-rerun-ut-go

@mergify
Copy link
Contributor

mergify bot commented Nov 17, 2025

@congqixia cpu-e2e job failed, comment /run-cpu-e2e can trigger the job again.

@congqixia
Copy link
Contributor Author

/run-cpu-e2e

@codecov
Copy link

codecov bot commented Nov 17, 2025

Codecov Report

❌ Patch coverage is 95.65217% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.50%. Comparing base (caed0fe) to head (f6a9728).
⚠️ Report is 31 commits behind head on master.

Files with missing lines Patch % Lines
cmd/tools/migration/migration/runner.go 0.00% 2 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##           master   #45627       +/-   ##
===========================================
- Coverage   83.18%   76.50%    -6.68%     
===========================================
  Files         521     1875     +1354     
  Lines       81313   292178   +210865     
===========================================
+ Hits        67642   223539   +155897     
- Misses      13671    61240    +47569     
- Partials        0     7399     +7399     
Components Coverage Δ
Client 78.17% <ø> (∅)
Core 83.19% <98.38%> (+0.01%) ⬆️
Go 74.62% <95.52%> (∅)
Files with missing lines Coverage Δ
internal/datacoord/server.go 68.00% <100.00%> (ø)
internal/distributed/connection_manager.go 71.27% <100.00%> (ø)
internal/querycoordv2/server.go 76.03% <100.00%> (ø)
internal/util/sessionutil/session_util.go 75.59% <100.00%> (ø)
cmd/tools/migration/migration/runner.go 0.00% <0.00%> (ø)

... and 1349 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@mergify
Copy link
Contributor

mergify bot commented Nov 17, 2025

@congqixia cpu-e2e job failed, comment /run-cpu-e2e can trigger the job again.

1 similar comment
@mergify
Copy link
Contributor

mergify bot commented Nov 17, 2025

@congqixia cpu-e2e job failed, comment /run-cpu-e2e can trigger the job again.

@congqixia congqixia force-pushed the fix/refine_sessionwatcher_unstable_ut branch from c06be7a to d95492a Compare November 18, 2025 02:57
@mergify
Copy link
Contributor

mergify bot commented Nov 18, 2025

@congqixia cpu-e2e job failed, comment /run-cpu-e2e can trigger the job again.

…ordv2

Related to milvus-io#44620
Related to unstable ut "internal/querycoordv2 TestServer/TestNodeUp"

Introduce SessionWatcher interface to fix race condition and goroutine leak that
caused unstable unit test TestServer/TestNodeUp.

Changes:
- Add SessionWatcher interface with EventChannel() and Stop() methods
- Refactor WatchServices() to return SessionWatcher instead of raw channel
- Fix cleanup order in QueryCoordV2: stop watcher before session
- Update DataCoord, ConnectionManager to use SessionWatcher
- Add MockSessionWatcher for testing

Fixes race condition between session context cancellation and internal loop exit.
Eliminates goroutine leak by providing explicit lifecycle management.

Signed-off-by: Congqi Xia <[email protected]>
@congqixia congqixia force-pushed the fix/refine_sessionwatcher_unstable_ut branch from d95492a to 9062071 Compare November 18, 2025 07:50
@sre-ci-robot
Copy link
Contributor

[ci-v2-notice]
Notice: We are gradually rolling out the new ci-v2 system.

  • Legacy CI jobs remain unaffected, you can just ignore ci-v2 if you don't want to run it.
  • Additional "ci-v2/*" checkers will run for this PR to ensure the new ci-v2 system is working as expected.
  • For tests that exist in both v1 and v2, passing in either system is considered PASS.

To rerun ci-v2 checks, comment with:

  • /ci-rerun-code-check // for ci-v2/code-check
  • /ci-rerun-build // for ci-v2/build
  • /ci-rerun-ut-integration // for ci-v2/ut-integration
  • /ci-rerun-ut-go // for ci-v2/ut-go
  • /ci-rerun-ut-cpp // for ci-v2/ut-cpp
  • /ci-rerun-ut // for all ci-v2/ut-integration, ci-v2/ut-go, ci-v2/ut-cpp
  • /ci-rerun-e2e-arm // for ci-v2/e2e-arm

If you have any questions or requests, please contact @zhikunyao.

Signed-off-by: Congqi Xia <[email protected]>
@congqixia
Copy link
Contributor Author

/ci-rerun-ut-integration

@mergify
Copy link
Contributor

mergify bot commented Nov 21, 2025

@congqixia cpu-e2e job failed, comment /run-cpu-e2e can trigger the job again.

@congqixia
Copy link
Contributor Author

/run-cpu-e2e

@congqixia
Copy link
Contributor Author

/ci-rerun-ut-integration

1 similar comment
@congqixia
Copy link
Contributor Author

/ci-rerun-ut-integration

@mergify mergify bot added the ci-passed label Nov 21, 2025
Copy link
Member

@liliu-z liliu-z left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@sre-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: congqixia, liliu-z

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@sre-ci-robot sre-ci-robot merged commit f51fcc0 into milvus-io:master Nov 21, 2025
17 of 20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved area/compilation ci-passed dco-passed DCO check passed. kind/bug Issues or changes related a bug lgtm size/XL Denotes a PR that changes 500-999 lines.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants