Conversation
- Widen promiseAnyWorkflow timing gaps (100ms/10s vs 1s/3s) to prevent queue scheduling jitter from causing non-deterministic winners - Reduce readableStreamWorkflow inter-chunk delay from 1s to 500ms to reduce total stream duration and waitUntil pressure - Replace fixed `sleep 10` in CI with health-check poll loop after dev.test.ts to prevent e2e tests from hitting a mid-rebuild server - Increase dev.test.ts timeouts from 30s to 60s for nitro-based frameworks that do full (non-incremental) rebuilds on CI - Add beforeAll health check in e2e.test.ts to verify server readiness Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
🦋 Changeset detectedLatest commit: fa30b64 The changes in this PR will be included in the next version bump. This PR includes changesets to release 15 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
🧪 E2E Test Results✅ All tests passed Summary
Details by Category✅ ▲ Vercel Production
✅ 💻 Local Development
✅ 📦 Local Production
✅ 🐘 Local Postgres
✅ 🪟 Windows
✅ 🌍 Community Worlds
❌ Some E2E test jobs failed:
Check the workflow run for details. |
📊 Benchmark Results
workflow with no steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Next.js (Turbopack) | Nitro workflow with 1 step💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Nitro | Next.js (Turbopack) workflow with 10 sequential steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Express | Next.js (Turbopack) workflow with 25 sequential steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Express | Next.js (Turbopack) workflow with 50 sequential steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Next.js (Turbopack) | Express | Nitro Promise.all with 10 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Next.js (Turbopack) | Nitro | Express Promise.all with 25 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Nitro | Next.js (Turbopack) Promise.all with 50 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Next.js (Turbopack) | Nitro Promise.race with 10 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Express | Next.js (Turbopack) Promise.race with 25 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Next.js (Turbopack) | Nitro | Express Promise.race with 50 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Next.js (Turbopack) | Nitro Stream Benchmarks (includes TTFB metrics)workflow with stream💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Express | Next.js (Turbopack) SummaryFastest Framework by WorldWinner determined by most benchmark wins
Fastest World by FrameworkWinner determined by most benchmark wins
Column Definitions
Worlds:
|
There was a problem hiding this comment.
Pull request overview
This PR targets CI flakiness in the E2E suite by adjusting timing-sensitive workflows, extending dev-mode rebuild timeouts, and adding/strengthening “server ready” checks before running tests.
Changes:
- Adjust workflow timing in
promiseAnyWorkflowand reducereadableStreamWorkflowinter-chunk delay. - Add a
beforeAllreadiness poll inpackages/core/e2e/e2e.test.ts. - Replace a fixed post-
dev.test.tssleep with a manifest polling loop in.github/workflows/tests.yml, and increasedev.test.tstimeouts.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| workbench/example/workflows/99_e2e.ts | Tweaks Promise.any timing gaps and reduces stream chunk delay to reduce timing-related flakes. |
| packages/core/e2e/e2e.test.ts | Adds a pre-suite “deployment healthy” poll before executing E2E tests. |
| packages/core/e2e/dev.test.ts | Increases dev rebuild test timeouts to better accommodate slow CI rebuilds. |
| .github/workflows/tests.yml | Replaces fixed sleep with a manifest polling loop before running E2E tests. |
| .changeset/fix-flaky-e2e-tests.md | Publishes a patch changeset documenting the flake fixes. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| for i in $(seq 1 60); do | ||
| if curl -sf "$DEPLOYMENT_URL/.well-known/workflow/v1/manifest.json" > /dev/null 2>&1; then | ||
| echo "Server healthy after ${i}s" | ||
| break | ||
| fi | ||
| sleep 1 | ||
| done |
There was a problem hiding this comment.
The new manifest polling loop will break when healthy, but if it never becomes healthy within 60 iterations the script still continues to run the E2E suite, reintroducing flakiness and making failures harder to diagnose. After the loop, add an explicit failure (non-zero exit) if the health check never succeeded; also consider adding a short curl --max-time so a single request can’t hang the job indefinitely.
| for i in $(seq 1 60); do | |
| if curl -sf "$DEPLOYMENT_URL/.well-known/workflow/v1/manifest.json" > /dev/null 2>&1; then | |
| echo "Server healthy after ${i}s" | |
| break | |
| fi | |
| sleep 1 | |
| done | |
| health_ok=0 | |
| for i in $(seq 1 60); do | |
| if curl -sf --max-time 5 "$DEPLOYMENT_URL/.well-known/workflow/v1/manifest.json" > /dev/null 2>&1; then | |
| echo "Server healthy after ${i}s" | |
| health_ok=1 | |
| break | |
| fi | |
| sleep 1 | |
| done | |
| if [ "$health_ok" -ne 1 ]; then | |
| echo "Server failed to become healthy after 60 seconds" | |
| exit 1 | |
| fi |
| for (let i = 1; i <= 60; i++) { | ||
| try { | ||
| const res = await fetch(deploymentUrl); | ||
| if (res.ok) { | ||
| console.log(`Server healthy after ${i}s`); | ||
| return; | ||
| } | ||
| } catch { | ||
| // Server not ready yet | ||
| } | ||
| await new Promise((resolve) => setTimeout(resolve, 1_000)); | ||
| } | ||
| throw new Error( | ||
| `Server at ${deploymentUrl} did not become healthy within 60s` |
There was a problem hiding this comment.
The new beforeAll health check polls fetch(deploymentUrl) without getProtectionBypassHeaders(). On Vercel runs with Deployment Protection enabled, this request can stay non-2xx even though the workflow endpoints are reachable with bypass headers, causing an unnecessary 60s delay/failure. Consider polling a known workflow endpoint (e.g. /.well-known/workflow/v1/manifest.json or the ?__health endpoints) and include the protection-bypass headers (and ideally a per-request timeout via AbortController) so the check reflects actual E2E readiness.
| for (let i = 1; i <= 60; i++) { | |
| try { | |
| const res = await fetch(deploymentUrl); | |
| if (res.ok) { | |
| console.log(`Server healthy after ${i}s`); | |
| return; | |
| } | |
| } catch { | |
| // Server not ready yet | |
| } | |
| await new Promise((resolve) => setTimeout(resolve, 1_000)); | |
| } | |
| throw new Error( | |
| `Server at ${deploymentUrl} did not become healthy within 60s` | |
| const healthUrl = new URL( | |
| '/.well-known/workflow/v1/manifest.json', | |
| deploymentUrl | |
| ).toString(); | |
| const headers = getProtectionBypassHeaders?.(); | |
| for (let i = 1; i <= 60; i++) { | |
| const controller = new AbortController(); | |
| const timeoutId = setTimeout(() => controller.abort(), 5_000); | |
| try { | |
| const res = await fetch(healthUrl, { | |
| headers, | |
| signal: controller.signal, | |
| }); | |
| if (res.ok) { | |
| console.log(`Server healthy after ${i}s`); | |
| return; | |
| } | |
| } catch { | |
| // Server not ready yet or request timed out | |
| } finally { | |
| clearTimeout(timeoutId); | |
| } | |
| await new Promise((resolve) => setTimeout(resolve, 1_000)); | |
| } | |
| throw new Error( | |
| `Server at ${healthUrl} did not become healthy within 60s` |
- e2e.test.ts beforeAll: poll manifest endpoint with getProtectionBypassHeaders() and AbortController timeout instead of bare fetch(deploymentUrl) which fails with Deployment Protection - tests.yml: add health_ok flag + exit 1 fallback and curl --max-time 5 - Also replace sleep 10 in local-prod and local-postgres jobs with the same health check poll pattern Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary
Addresses multiple sources of CI flakiness in E2E tests, identified by analyzing the last 2-3 days of GitHub Actions runs.
Changes
promiseAnyWorkflowtiming fix — Widened timing gaps from 1s/3s to 100ms/10s to prevent queue scheduling jitter from causing non-deterministic winners. Matches the pattern already used bypromiseRaceWorkflowwhich passes consistently.readableStreamWorkflowchunk delay reduction — Reduced inter-chunk delay from 1000ms to 500ms (total stream time from ~10s to ~5s) to reduce pressure onwaitUntilin the step handler.sleep 10afterdev.test.tswith an active health-check loop polling the manifest endpoint (up to 60s). Prevents e2e tests from hitting a mid-rebuild server.dev.test.tstimeouts from 30s to 60s for nitro-based frameworks that do full (non-incremental) rebuilds on CI.beforeAllhealth check — Added health check ine2e.test.tsthat polls the deployment URL before running any tests.Known issues flagged for follow-up
Stream truncation (
readableStreamWorkflow): ThewaitUntil-based stream piping architecture is correct by design — steps should complete immediately while streams pipe in the background. However, in local dev mode,waitUntilmay not reliably keep async work alive for long durations after the step handler returns. The 500ms delay reduction mitigates this, but the underlyingwaitUntilreliability in dev mode may need further investigation.Webhook hook-token conflicts — missing server-side idempotency (confirmed bug):
Intermittent
Hook token "..." is already in use by another workflowerrors are caused by a missing idempotency check in hook creation. Investigation confirmed that replay determinism is sound — two replays of the same workflow run produce identical correlation IDs (seeded PRNG + fixed timestamp + deterministic code path). The actual bug is:handleSuspensionand try to create the same hook with the samecorrelationIdandtokenhandleHookCreated(workflow-server/lib/data/events.ts:1079-1085) checks for conflicts by(ownerId, projectId, environment, token)but does NOT check if the existing hook belongs to the samerunIdwith the samehookId(correlationId)hook_conflictinstead of recognizing it as an idempotent retryThis parallels how step creation already handles duplicate requests —
suspension-handler.ts:176-184catches 409s for duplicate steps. Hooks need the same idempotency pattern. Fix needed inworkflow-server/lib/data/events.tsandworld-local/src/storage/events-storage.ts: when a token conflict is detected, checkexistingHook.hookId === hookId && existingHook.runId === runId— if true, return success instead of conflict.Test plan
promiseAnyWorkflowpasses consistently across all Vercel prod appsreadableStreamWorkflowpasses consistently🤖 Generated with Claude Code