Skip to content

Conversation

@Shivs11
Copy link
Member

@Shivs11 Shivs11 commented Nov 13, 2025

What changed?

  • WISOTT

Why?

  • To be sure of the revision number mechanics we have implemented.

How did you test it?

  • built
  • run locally and tested manually
  • covered by existing tests
  • added new unit test(s)
  • added new functional test(s)

Potential risks

  • None until we flip the DC switch :)

Note

Enables and rewrites AutoUpgrade no-bounce tests using revision numbers, adds several new lag/child/CAN/retry tests, and updates idle workflow poller to accept context for cancellable polling.

  • Tests (worker versioning v3):
    • Enable/Rewrite: Implement TestAutoUpgradeWorkflows_NoBouncingBetweenVersions using revision-number mechanics and cancellable idle pollers.
    • New/Expanded Scenarios (revision numbers):
      • Workflow/Activity TQ lag transitions: TestWorkflowTQLags_DependentActivityStartsTransition, TestActivityTQLags_DependentActivityCompletesOnTheNewVersion.
      • Child workflow behaviors across TQ/version lag: TestChildStartsWithParentRevision_SameTQ_TQAhead, TestChildStartsWithParentRevision_SameTQ_TQLags, TestChildStartsWithNoInheritedAutoUpgradeInfo_CrossTQ.
      • Continue-as-new and retry stability: TestContinueAsNewOfAutoUpgradeWorkflow_RevisionNumberMechanics, testRetryNoBounceBack (and callers) to avoid bounce-back under rollback.
    • Skip gating: Gate several tests on useRevisionNumbers instead of useNewDeploymentData.
  • Infrastructure:
    • idlePollWorkflow now takes context.Context and passes it via taskpoller.WithContext, enabling time-bounded idle pollers; update all call sites accordingly.
    • Remove ad-hoc sync.WaitGroup uses in favor of context-based cancellation/timeouts.

Written by Cursor Bugbot for commit 3a6fab3. This will update automatically on new commits. Configure here.

@Shivs11 Shivs11 marked this pull request as ready for review November 13, 2025 22:03
@Shivs11 Shivs11 requested review from a team as code owners November 13, 2025 22:03
}}, []string{}, tqTypeWf, tqTypeAct)

// Wait until all task queue partitions know that v1 is current.
s.waitForDeploymentDataPropagation(tv1, versionStatusCurrent, false, tqTypeWf, tqTypeAct)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, later when we clean up all old tests we should maybe change this function to wait for the particular revision number propagation rather than status of the version.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO I think this test should be thinking of "propagation complete" == "the right revision number has been synced to matching partitions. We should have tests in the worker_deployment_suite that should test out if the right revision number is synced to these partitions alongside the statuses of each version.

I think having this differences, given we have two suites, would make test writing simpler and easier to read for a new user.


func (s *Versioning3Suite) TestAutoUpgradeWorkflows_NoBouncingBetweenVersions() {
s.T().Skip("This test is flaky right now and shall be fixed in a future PR.") // TODO (Shivam)
if !s.useNewDeploymentData {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

}

func (s *Versioning3Suite) idlePollWorkflow(
ctx context.Context,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this function shall now take in a context. The reason this was done is as follows:

There are some tests that kickstart a v0 poller (v0 is just an example) for the sole purpose of checking if a task were to ever go this worker. These tests initialize them in a waitGroup and then towards the end of the test, just before it gets cleaned up, there is a wg.Wait() present which ensures that this poller completes running before the test can be terminated safely.

However, this does mean that the minimum wait time for the test will never be lesser than the minimum poll time of this v0 poller. This makes our tests long to run. Thus, passing this context and cancelling it is a safe and efficient way to terminate the test.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

}

func (s *Versioning3Suite) idlePollWorkflow(
ctx context.Context,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

@Shivs11 Shivs11 merged commit f0544f0 into main Dec 1, 2025
59 checks passed
@Shivs11 Shivs11 deleted the ss/fix-autoupgrade-bouncing-tests branch December 1, 2025 17:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants