Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .changeset/fix-flaky-e2e-tests.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
"@workflow/core": patch
---

Fix flaky E2E tests: widen promiseAny timing gaps, reduce stream chunk delay, add health checks and increase dev test timeouts
46 changes: 43 additions & 3 deletions .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -328,7 +328,21 @@ jobs:
run: |
cd workbench/${{ matrix.app.name }} && pnpm dev &
echo "starting tests in 10 seconds" && sleep 10
pnpm vitest run packages/core/e2e/dev.test.ts; sleep 10
pnpm vitest run packages/core/e2e/dev.test.ts
echo "Waiting for server to stabilize..."
health_ok=false
for i in $(seq 1 60); do
if curl -sf --max-time 5 "$DEPLOYMENT_URL/.well-known/workflow/v1/manifest.json" > /dev/null 2>&1; then
echo "Server healthy after ${i}s"
health_ok=true
break
fi
sleep 1
done
Comment on lines 334 to 341
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new manifest polling loop will break when healthy, but if it never becomes healthy within 60 iterations the script still continues to run the E2E suite, reintroducing flakiness and making failures harder to diagnose. After the loop, add an explicit failure (non-zero exit) if the health check never succeeded; also consider adding a short curl --max-time so a single request can’t hang the job indefinitely.

Suggested change
for i in $(seq 1 60); do
if curl -sf "$DEPLOYMENT_URL/.well-known/workflow/v1/manifest.json" > /dev/null 2>&1; then
echo "Server healthy after ${i}s"
break
fi
sleep 1
done
health_ok=0
for i in $(seq 1 60); do
if curl -sf --max-time 5 "$DEPLOYMENT_URL/.well-known/workflow/v1/manifest.json" > /dev/null 2>&1; then
echo "Server healthy after ${i}s"
health_ok=1
break
fi
sleep 1
done
if [ "$health_ok" -ne 1 ]; then
echo "Server failed to become healthy after 60 seconds"
exit 1
fi

Copilot uses AI. Check for mistakes.
if [ "$health_ok" != "true" ]; then
echo "Server did not become healthy within 60s"
exit 1
fi
pnpm run test:e2e --reporter=default --reporter=json --outputFile=e2e-local-dev-${{ matrix.app.name }}-${{ matrix.app.canary && 'canary' || 'stable' }}.json
env:
NODE_OPTIONS: "--enable-source-maps"
Expand Down Expand Up @@ -395,7 +409,20 @@ jobs:
- name: Run E2E Tests
run: |
cd workbench/${{ matrix.app.name }} && pnpm start &
echo "starting tests in 10 seconds" && sleep 10
echo "Waiting for server to be ready..."
health_ok=false
for i in $(seq 1 60); do
if curl -sf --max-time 5 "$DEPLOYMENT_URL/.well-known/workflow/v1/manifest.json" > /dev/null 2>&1; then
echo "Server healthy after ${i}s"
health_ok=true
break
fi
sleep 1
done
if [ "$health_ok" != "true" ]; then
echo "Server did not become healthy within 60s"
exit 1
fi
pnpm run test:e2e --reporter=default --reporter=json --outputFile=e2e-local-prod-${{ matrix.app.name }}-${{ matrix.app.canary && 'canary' || 'stable' }}.json
env:
NODE_OPTIONS: "--enable-source-maps"
Expand Down Expand Up @@ -481,7 +508,20 @@ jobs:
- name: Run E2E Tests
run: |
cd workbench/${{ matrix.app.name }} && pnpm start &
echo "starting tests in 10 seconds" && sleep 10
echo "Waiting for server to be ready..."
health_ok=false
for i in $(seq 1 60); do
if curl -sf --max-time 5 "$DEPLOYMENT_URL/.well-known/workflow/v1/manifest.json" > /dev/null 2>&1; then
echo "Server healthy after ${i}s"
health_ok=true
break
fi
sleep 1
done
if [ "$health_ok" != "true" ]; then
echo "Server did not become healthy within 60s"
exit 1
fi
pnpm run test:e2e --reporter=default --reporter=json --outputFile=e2e-local-postgres-${{ matrix.app.name }}-${{ matrix.app.canary && 'canary' || 'stable' }}.json
env:
NODE_OPTIONS: "--enable-source-maps"
Expand Down
6 changes: 3 additions & 3 deletions packages/core/e2e/dev.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@
restoreFiles.length = 0;
});

test('should rebuild on workflow change', { timeout: 30_000 }, async () => {
test('should rebuild on workflow change', { timeout: 60_000 }, async () => {

Check failure on line 60 in packages/core/e2e/dev.test.ts

View workflow job for this annotation

GitHub Actions / E2E Windows Tests

packages/core/e2e/dev.test.ts > dev e2e > should rebuild on workflow change

Error: Test timed out in 60000ms. If this is a long-running test, pass a timeout value as the last argument or configure it globally with "testTimeout". ❯ packages/core/e2e/dev.test.ts:60:5
const workflowFile = path.join(appPath, workflowsDir, testWorkflowFile);

const content = await fs.readFile(workflowFile, 'utf8');
Expand Down Expand Up @@ -85,7 +85,7 @@
}
});

test('should rebuild on step change', { timeout: 30_000 }, async () => {
test('should rebuild on step change', { timeout: 60_000 }, async () => {

Check failure on line 88 in packages/core/e2e/dev.test.ts

View workflow job for this annotation

GitHub Actions / E2E Windows Tests

packages/core/e2e/dev.test.ts > dev e2e > should rebuild on step change

Error: Test timed out in 60000ms. If this is a long-running test, pass a timeout value as the last argument or configure it globally with "testTimeout". ❯ packages/core/e2e/dev.test.ts:88:5
const stepFile = path.join(appPath, workflowsDir, testWorkflowFile);

const content = await fs.readFile(stepFile, 'utf8');
Expand Down Expand Up @@ -115,7 +115,7 @@

test(
'should rebuild on adding workflow file',
{ timeout: 30_000 },
{ timeout: 60_000 },
async () => {
const workflowFile = path.join(
appPath,
Expand Down
31 changes: 30 additions & 1 deletion packages/core/e2e/e2e.test.ts
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
import { withResolvers } from '@workflow/utils';
import fs from 'fs';
import path from 'path';
import { afterAll, assert, describe, expect, test } from 'vitest';
import { afterAll, assert, beforeAll, describe, expect, test } from 'vitest';
import { dehydrateWorkflowArguments } from '../src/serialization';
import {
cliHealthJson,
Expand Down Expand Up @@ -136,6 +136,35 @@ async function getWorkflowReturnValue(runId: string) {
// NOTE: Temporarily disabling concurrent tests to avoid flakiness.
// TODO: Re-enable concurrent tests after conf when we have more time to investigate.
describe('e2e', () => {
// Wait for the deployment to be healthy before running tests
beforeAll(async () => {
const manifestUrl = new URL(
'/.well-known/workflow/v1/manifest.json',
deploymentUrl
);
for (let i = 1; i <= 60; i++) {
try {
const controller = new AbortController();
const timeout = setTimeout(() => controller.abort(), 5_000);
const res = await fetch(manifestUrl, {
headers: getProtectionBypassHeaders(),
signal: controller.signal,
});
clearTimeout(timeout);
if (res.ok) {
console.log(`Server healthy after ${i}s`);
return;
}
} catch {
// Server not ready yet
}
await new Promise((resolve) => setTimeout(resolve, 1_000));
}
throw new Error(
`Server at ${deploymentUrl} did not become healthy within 60s`
Comment on lines 145 to 164
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new beforeAll health check polls fetch(deploymentUrl) without getProtectionBypassHeaders(). On Vercel runs with Deployment Protection enabled, this request can stay non-2xx even though the workflow endpoints are reachable with bypass headers, causing an unnecessary 60s delay/failure. Consider polling a known workflow endpoint (e.g. /.well-known/workflow/v1/manifest.json or the ?__health endpoints) and include the protection-bypass headers (and ideally a per-request timeout via AbortController) so the check reflects actual E2E readiness.

Suggested change
for (let i = 1; i <= 60; i++) {
try {
const res = await fetch(deploymentUrl);
if (res.ok) {
console.log(`Server healthy after ${i}s`);
return;
}
} catch {
// Server not ready yet
}
await new Promise((resolve) => setTimeout(resolve, 1_000));
}
throw new Error(
`Server at ${deploymentUrl} did not become healthy within 60s`
const healthUrl = new URL(
'/.well-known/workflow/v1/manifest.json',
deploymentUrl
).toString();
const headers = getProtectionBypassHeaders?.();
for (let i = 1; i <= 60; i++) {
const controller = new AbortController();
const timeoutId = setTimeout(() => controller.abort(), 5_000);
try {
const res = await fetch(healthUrl, {
headers,
signal: controller.signal,
});
if (res.ok) {
console.log(`Server healthy after ${i}s`);
return;
}
} catch {
// Server not ready yet or request timed out
} finally {
clearTimeout(timeoutId);
}
await new Promise((resolve) => setTimeout(resolve, 1_000));
}
throw new Error(
`Server at ${healthUrl} did not become healthy within 60s`

Copilot uses AI. Check for mistakes.
);
}, 60_000);

// Write E2E metadata file with runIds for observability links
afterAll(() => {
writeE2EMetadata();
Expand Down
6 changes: 3 additions & 3 deletions workbench/example/workflows/99_e2e.ts
Original file line number Diff line number Diff line change
Expand Up @@ -78,8 +78,8 @@ export async function promiseAnyWorkflow() {
'use workflow';
const winner = await Promise.any([
stepThatFails(),
specificDelay(1000, 'b'), // "b" should always win
specificDelay(3000, 'c'),
specificDelay(100, 'b'), // "b" should always win
specificDelay(10000, 'c'),
]);
return winner;
}
Expand All @@ -96,7 +96,7 @@ async function genReadableStream() {
for (let i = 0; i < 10; i++) {
console.log('enqueueing', i);
controller.enqueue(encoder.encode(`${i}\n`));
await new Promise((resolve) => setTimeout(resolve, 1000));
await new Promise((resolve) => setTimeout(resolve, 500));
}
console.log('closing controller');
controller.close();
Expand Down
Loading