Skip to content

Conversation

@gow
Copy link
Contributor

@gow gow commented Nov 6, 2025

What changed?

Note: This API is part of an experimental feature - "pause single workflow"

  • Added a new API (in Frontend & History) to pause workflow execution.
  • Updated DescribeWorkflowExecution API to show the pause info.
  • Added a dynamic config to control the access to this feature. The PauseWorkflowExecution API fails if the config is not set.
  • Added simple acceptance functional tests. These tests will evolve and new ones added as the feature implementation progresses.

Followup PRs:

  • Idempotency check in pause api.
  • Validate input sizes.
  • Intercept workflow task creation.
  • Intercept and pause activities.
  • Unpause API

Why?

This API is needed to implement the "pause single workflow" feature.

How did you test it?

  • built
  • run locally and tested manually
  • covered by existing tests
  • added new unit test(s)
  • added new functional test(s)

Note

Adds a new PauseWorkflowExecution API (frontend + history) gated by per-namespace config and exposes pause info via DescribeWorkflowExecution, with client support and tests.

  • Historyservice (API + server):
    • New RPC: PauseWorkflowExecution with messages PauseWorkflowExecutionRequest/Response in request_response.proto and service definitions; generated Go, gRPC handlers, and mocks.
    • Engine/Handler: Implements pause via pauseworkflow.Invoke and Handler.PauseWorkflowExecution (adds workflow paused event; no WFT creation).
    • Telemetry/Logging: Adds routing/log tags and telemetry allowlist for PauseWorkflowExecution.
  • Frontend:
    • Endpoint: WorkflowHandler.PauseWorkflowExecution forwarding to history; gated by WorkflowPauseEnabled.
    • Dynamic Config: Adds frontend.WorkflowPauseEnabled (namespace-scoped).
  • DescribeWorkflowExecution:
    • Populates WorkflowExecutionExtendedInfo.PauseInfo when present.
  • Clients:
    • Adds client, metric, and retryable wrappers for PauseWorkflowExecution.
  • Tests:
    • Functional tests for pausing a workflow and config-disabled behavior.
  • Deps:
    • Bumps go.temporal.io/api version.

Written by Cursor Bugbot for commit ed7ce3a. This will update automatically on new commits. Configure here.

@gow gow requested review from a team as code owners November 6, 2025 23:58
@gow gow marked this pull request as draft November 6, 2025 23:58
@gow gow force-pushed the cg/wf_pause_1 branch 2 times, most recently from 5a861df to 5c592f0 Compare November 14, 2025 06:48
@gow gow force-pushed the cg/wf_pause_2_api branch from c76e337 to 728fbd9 Compare November 14, 2025 14:53
@gow gow requested review from bergundy and spkane31 November 14, 2025 19:24
@gow gow marked this pull request as ready for review November 14, 2025 19:24
f, ok := t.FieldByName(fieldName)
if !ok {
fmt.Printf("\nstack trace: path: [%s]\n", path)
fmt.Println(string(debug.Stack()))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Remove Accidental Debug Output

Debug print statements were accidentally left in the code generation tool. The fmt.Printf and fmt.Println statements with stack trace output appear to be temporary debugging code that should be removed before committing.

Fix in Cursor Fix in Web

}

// VerifyChildExecutionCompletionRecorded checks if child completion result is recorded in parent workflow.
// This is only used by standby transfer close execution logic to make sure parent workflow has the result
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonder if we can add a linter for these extra spaces at the end of lines, just muddles up your PR and they're hard to undo because it's likely a diff in editor settings

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. I too agree this should be a lint rule. I'm not even sure which setting in my editor causes this on save.

Comment on lines 36 to 40
const (
testEndSignal = "test-end"
pauseIdentity = "functional-test"
pauseReason = "pausing workflow for acceptance test"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are duplicated in the tests, could they be package level consts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very large package. So adding a package level const is probably not a good idea. I'll initialize these in test setup so that they are done only once and scoped to this test.

Comment on lines 45 to 50
workflowFn := func(ctx workflow.Context) (string, error) {
signalCh := workflow.GetSignalChannel(ctx, testEndSignal)
var signalPayload string
signalCh.Receive(ctx, &signalPayload)
return signalPayload, nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

workflowFn is duplicated, could also break this out

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do.

Comment on lines 97 to 113
// TODO: currently pause workflow execution does not intercept workflow creation. Fix the reset of this test when that is implemented.
// For now sending this signal will complete the workflow and finish the test.
err = s.SdkClient().SignalWorkflow(ctx, workflowID, runID, testEndSignal, "test end signal")
s.NoError(err)

s.EventuallyWithT(func(t *assert.CollectT) {
desc, err := s.SdkClient().DescribeWorkflowExecution(ctx, workflowID, runID)
require.NoError(t, err)
info := desc.GetWorkflowExecutionInfo()
require.NotNil(t, info)
require.Equal(t, enumspb.WORKFLOW_EXECUTION_STATUS_COMPLETED, info.GetStatus())
}, 5*time.Second, 200*time.Millisecond)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making sure I understand this correctly, we don't intercept the signal yet, but we're planning on adding that in a future PR before the first release?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. This workflow and the tests will change a bit in subsequent PRs.

)

WorkflowPauseEnabled = NewNamespaceBoolSetting(
"history.WorkflowPauseEnabled",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

frontend?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved it.

}

// PauseWorkflowExecution pauses a workflow execution.
func (wh *WorkflowHandler) PauseWorkflowExecution(ctx context.Context, request *workflowservice.PauseWorkflowExecutionRequest) (_ *workflowservice.PauseWorkflowExecutionResponse, retError error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does pause workflow count as an action? if so we need to update the telemetry interceptor to emit an action metric with the api is called.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added it.


// copy pause info to the response if it exists
if executionInfo.PauseInfo != nil {
result.WorkflowExecutionInfo.PauseInfo = &workflowpb.WorkflowExecutionPauseInfo{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WorkflowExecutionInfo is used by visibility as well. Shall we add PauseInfo to WorkflowExtendedInfo instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. I'll add it to WorkflowExtendedInfo and then update this PR.

return nil, serviceerror.NewInvalidArgument("pause request is nil.")
}

if !shard.GetConfig().WorkflowPauseEnabled(namespaceID.String()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should do this in frontend.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

return nil, err
}
return response, nil
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Potential null pointer dereference in client shard routing

The PauseWorkflowExecution client method calls request.GetPauseRequest().GetWorkflowId() without first checking if GetPauseRequest() returns nil. If the pause_request field is nil, this will cause a nil pointer dereference. While the history service API layer checks for nil and returns an error, the client-side wrapper should be defensive, similar to how CompleteNexusOperationChasm handles potential nil values by deserializing and checking before accessing nested fields.

Fix in Cursor Fix in Web

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is in generated code.

@gow gow force-pushed the cg/wf_pause_2_api branch from 408fa90 to 1d4e330 Compare November 19, 2025 06:43
Base automatically changed from cg/wf_pause_1 to main November 19, 2025 18:37
@gow gow force-pushed the cg/wf_pause_2_api branch from 36c201e to 35acc0c Compare November 20, 2025 20:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants