Skip to content

Fix flaky E2E tests#969

Draft
pranaygp wants to merge 2 commits intomainfrom
pranaygp/flaky-tests
Draft

Fix flaky E2E tests#969
pranaygp wants to merge 2 commits intomainfrom
pranaygp/flaky-tests

Conversation

@pranaygp
Copy link
Collaborator

@pranaygp pranaygp commented Feb 6, 2026

Summary

Addresses multiple sources of CI flakiness in E2E tests, identified by analyzing the last 2-3 days of GitHub Actions runs.

Changes

  • promiseAnyWorkflow timing fix — Widened timing gaps from 1s/3s to 100ms/10s to prevent queue scheduling jitter from causing non-deterministic winners. Matches the pattern already used by promiseRaceWorkflow which passes consistently.
  • readableStreamWorkflow chunk delay reduction — Reduced inter-chunk delay from 1000ms to 500ms (total stream time from ~10s to ~5s) to reduce pressure on waitUntil in the step handler.
  • CI health-check poll — Replaced fixed sleep 10 after dev.test.ts with an active health-check loop polling the manifest endpoint (up to 60s). Prevents e2e tests from hitting a mid-rebuild server.
  • Dev test timeout increase — Increased dev.test.ts timeouts from 30s to 60s for nitro-based frameworks that do full (non-incremental) rebuilds on CI.
  • beforeAll health check — Added health check in e2e.test.ts that polls the deployment URL before running any tests.

Known issues flagged for follow-up

Stream truncation (readableStreamWorkflow): The waitUntil-based stream piping architecture is correct by design — steps should complete immediately while streams pipe in the background. However, in local dev mode, waitUntil may not reliably keep async work alive for long durations after the step handler returns. The 500ms delay reduction mitigates this, but the underlying waitUntil reliability in dev mode may need further investigation.

Webhook hook-token conflicts — missing server-side idempotency (confirmed bug):

Intermittent Hook token "..." is already in use by another workflow errors are caused by a missing idempotency check in hook creation. Investigation confirmed that replay determinism is sound — two replays of the same workflow run produce identical correlation IDs (seeded PRNG + fixed timestamp + deterministic code path). The actual bug is:

  1. Two concurrent invocations of the same run (queue retry/race) both reach handleSuspension and try to create the same hook with the same correlationId and token
  2. The server's handleHookCreated (workflow-server/lib/data/events.ts:1079-1085) checks for conflicts by (ownerId, projectId, environment, token) but does NOT check if the existing hook belongs to the same runId with the same hookId (correlationId)
  3. The second invocation sees the hook exists and returns hook_conflict instead of recognizing it as an idempotent retry

This parallels how step creation already handles duplicate requests — suspension-handler.ts:176-184 catches 409s for duplicate steps. Hooks need the same idempotency pattern. Fix needed in workflow-server/lib/data/events.ts and world-local/src/storage/events-storage.ts: when a token conflict is detected, check existingHook.hookId === hookId && existingHook.runId === runId — if true, return success instead of conflict.

Test plan

  • CI passes on this branch (Tests workflow, especially the previously-flaky matrix jobs)
  • Verify promiseAnyWorkflow passes consistently across all Vercel prod apps
  • Verify readableStreamWorkflow passes consistently
  • Verify local dev tests (fastify, nuxt, express, hono) pass without server crashes
  • Verify dev.test.ts doesn't timeout on nitro-based frameworks

🤖 Generated with Claude Code

- Widen promiseAnyWorkflow timing gaps (100ms/10s vs 1s/3s) to prevent
  queue scheduling jitter from causing non-deterministic winners
- Reduce readableStreamWorkflow inter-chunk delay from 1s to 500ms to
  reduce total stream duration and waitUntil pressure
- Replace fixed `sleep 10` in CI with health-check poll loop after
  dev.test.ts to prevent e2e tests from hitting a mid-rebuild server
- Increase dev.test.ts timeouts from 30s to 60s for nitro-based
  frameworks that do full (non-incremental) rebuilds on CI
- Add beforeAll health check in e2e.test.ts to verify server readiness

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings February 6, 2026 23:32
@changeset-bot
Copy link

changeset-bot bot commented Feb 6, 2026

🦋 Changeset detected

Latest commit: fa30b64

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 15 packages
Name Type
@workflow/core Patch
@workflow/builders Patch
@workflow/cli Patch
@workflow/docs-typecheck Patch
@workflow/next Patch
@workflow/nitro Patch
@workflow/web-shared Patch
workflow Patch
@workflow/astro Patch
@workflow/nest Patch
@workflow/rollup Patch
@workflow/sveltekit Patch
@workflow/vite Patch
@workflow/world-testing Patch
@workflow/nuxt Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@vercel
Copy link
Contributor

vercel bot commented Feb 6, 2026

@github-actions
Copy link
Contributor

github-actions bot commented Feb 6, 2026

🧪 E2E Test Results

All tests passed

Summary

Passed Failed Skipped Total
✅ ▲ Vercel Production 215 0 302 517
✅ 💻 Local Development 215 0 20 235
✅ 📦 Local Production 215 0 20 235
✅ 🐘 Local Postgres 215 0 20 235
✅ 🪟 Windows 0 0 47 47
✅ 🌍 Community Worlds 12 0 188 200
Total 872 0 597 1469

Details by Category

✅ ▲ Vercel Production
App Passed Failed Skipped
✅ astro 0 0 47
✅ example 0 0 47
✅ express 43 0 4
✅ fastify 0 0 47
✅ hono 43 0 4
✅ nextjs-turbopack 0 0 47
✅ nextjs-webpack 0 0 47
✅ nitro 43 0 4
✅ nuxt 43 0 4
✅ sveltekit 0 0 47
✅ vite 43 0 4
✅ 💻 Local Development
App Passed Failed Skipped
✅ express-stable 43 0 4
✅ hono-stable 43 0 4
✅ nitro-stable 43 0 4
✅ nuxt-stable 43 0 4
✅ vite-stable 43 0 4
✅ 📦 Local Production
App Passed Failed Skipped
✅ express-stable 43 0 4
✅ hono-stable 43 0 4
✅ nitro-stable 43 0 4
✅ nuxt-stable 43 0 4
✅ vite-stable 43 0 4
✅ 🐘 Local Postgres
App Passed Failed Skipped
✅ express-stable 43 0 4
✅ hono-stable 43 0 4
✅ nitro-stable 43 0 4
✅ nuxt-stable 43 0 4
✅ vite-stable 43 0 4
✅ 🪟 Windows
App Passed Failed Skipped
✅ nextjs-turbopack 0 0 47
✅ 🌍 Community Worlds
App Passed Failed Skipped
✅ mongodb-dev 3 0 0
✅ mongodb 0 0 47
✅ redis-dev 3 0 0
✅ redis 0 0 47
✅ starter-dev 3 0 0
✅ starter 0 0 47
✅ turso-dev 3 0 0
✅ turso 0 0 47

📋 View full workflow run


Some E2E test jobs failed:

  • Vercel Prod: failure
  • Local Dev: failure
  • Local Prod: failure
  • Local Postgres: failure
  • Windows: failure

Check the workflow run for details.

@github-actions
Copy link
Contributor

github-actions bot commented Feb 6, 2026

📊 Benchmark Results

📈 Comparing against baseline from main branch. Green 🟢 = faster, Red 🔺 = slower.

workflow with no steps

💻 Local Development

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
💻 Local 🥇 Express 0.030s (-30.2% 🟢) 1.008s (~) 0.978s 10 1.00x
💻 Local Next.js (Turbopack) 0.038s (~) 1.016s (~) 0.977s 10 1.28x
💻 Local Nitro 0.043s (-1.6%) 1.008s (~) 0.965s 10 1.42x
🐘 Postgres Nitro 0.191s (-11.8% 🟢) 1.015s (~) 0.824s 10 6.34x
🐘 Postgres Express 0.247s (+33.4% 🔺) 1.015s (~) 0.768s 10 8.20x
🐘 Postgres Next.js (Turbopack) 0.395s (-6.9% 🟢) 1.020s (~) 0.625s 10 13.12x

▲ Production (Vercel)

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
▲ Vercel 🥇 Express 0.612s (-17.9% 🟢) 1.708s (+6.0% 🔺) 1.097s 10 1.00x
▲ Vercel Next.js (Turbopack) 0.616s (-25.6% 🟢) 1.692s (-4.3%) 1.076s 10 1.01x
▲ Vercel Nitro 0.712s (+8.6% 🔺) 1.612s (~) 0.900s 10 1.16x

🔍 Observability: Express | Next.js (Turbopack) | Nitro

workflow with 1 step

💻 Local Development

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
💻 Local 🥇 Express 1.079s (-3.4%) 2.007s (~) 0.929s 10 1.00x
💻 Local Next.js (Turbopack) 1.096s (~) 2.014s (~) 0.918s 10 1.02x
💻 Local Nitro 1.118s (~) 2.008s (~) 0.890s 10 1.04x
🐘 Postgres Express 2.195s (-8.6% 🟢) 3.015s (~) 0.820s 10 2.03x
🐘 Postgres Next.js (Turbopack) 2.284s (+1.1%) 3.020s (~) 0.736s 10 2.12x
🐘 Postgres Nitro 2.332s (+5.1% 🔺) 3.015s (~) 0.684s 10 2.16x

▲ Production (Vercel)

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
▲ Vercel 🥇 Express 2.477s (-1.6%) 3.523s (-5.8% 🟢) 1.046s 10 1.00x
▲ Vercel Nitro 2.556s (~) 4.023s (+7.9% 🔺) 1.468s 10 1.03x
▲ Vercel Next.js (Turbopack) 2.667s (+6.3% 🔺) 3.568s (+2.5%) 0.902s 10 1.08x

🔍 Observability: Express | Nitro | Next.js (Turbopack)

workflow with 10 sequential steps

💻 Local Development

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
💻 Local 🥇 Express 10.561s (-2.6%) 11.017s (~) 0.456s 3 1.00x
💻 Local Next.js (Turbopack) 10.745s (~) 11.022s (~) 0.277s 3 1.02x
💻 Local Nitro 10.842s (~) 11.013s (~) 0.170s 3 1.03x
🐘 Postgres Next.js (Turbopack) 20.151s (~) 21.055s (~) 0.903s 2 1.91x
🐘 Postgres Nitro 20.437s (+32.0% 🔺) 21.034s (+31.3% 🔺) 0.597s 2 1.94x
🐘 Postgres Express 20.498s (+0.5%) 21.033s (~) 0.535s 2 1.94x

▲ Production (Vercel)

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
▲ Vercel 🥇 Nitro 19.430s (-3.4%) 20.259s (-2.9%) 0.829s 2 1.00x
▲ Vercel Express 19.636s (-1.8%) 20.640s (-1.1%) 1.004s 2 1.01x
▲ Vercel Next.js (Turbopack) 21.111s (+3.3%) 35.083s (+64.6% 🔺) 13.972s 1 1.09x

🔍 Observability: Nitro | Express | Next.js (Turbopack)

workflow with 25 sequential steps

💻 Local Development

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
💻 Local 🥇 Express 26.799s (-2.5%) 27.037s (-3.5%) 0.238s 3 1.00x
💻 Local Next.js (Turbopack) 27.260s (~) 28.044s (~) 0.783s 3 1.02x
💻 Local Nitro 27.475s (~) 28.019s (~) 0.544s 3 1.03x
🐘 Postgres Next.js (Turbopack) 45.998s (-8.6% 🟢) 46.582s (-8.8% 🟢) 0.584s 2 1.72x
🐘 Postgres Nitro 50.394s (+30.9% 🔺) 51.064s (+30.8% 🔺) 0.671s 2 1.88x
🐘 Postgres Express 50.407s (~) 51.086s (~) 0.679s 2 1.88x

▲ Production (Vercel)

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
▲ Vercel 🥇 Nitro 48.880s (-2.0%) 49.540s (-2.5%) 0.660s 2 1.00x
▲ Vercel Express 49.148s (-2.5%) 49.910s (-2.3%) 0.762s 2 1.01x
▲ Vercel Next.js (Turbopack) 49.834s (-1.3%) 50.489s (-1.7%) 0.656s 2 1.02x

🔍 Observability: Nitro | Express | Next.js (Turbopack)

workflow with 50 sequential steps

💻 Local Development

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
💻 Local 🥇 Express 55.741s (-2.6%) 56.053s (-3.4%) 0.312s 2 1.00x
💻 Local Next.js (Turbopack) 56.454s (~) 57.059s (~) 0.605s 2 1.01x
💻 Local Nitro 57.132s (~) 58.036s (~) 0.904s 2 1.02x
🐘 Postgres Next.js (Turbopack) 75.240s (-25.1% 🟢) 76.151s (-24.7% 🟢) 0.911s 2 1.35x
🐘 Postgres Express 100.183s (~) 101.157s (~) 0.974s 1 1.80x
🐘 Postgres Nitro 100.265s (+31.4% 🔺) 101.178s (+31.2% 🔺) 0.913s 1 1.80x

▲ Production (Vercel)

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
▲ Vercel 🥇 Next.js (Turbopack) 110.048s (+3.7%) 111.257s (+4.3%) 1.209s 1 1.00x
▲ Vercel Express 120.885s (+13.8% 🔺) 122.075s (+8.5% 🔺) 1.190s 1 1.10x
▲ Vercel Nitro 137.454s (+28.5% 🔺) 138.653s (+28.2% 🔺) 1.199s 1 1.25x

🔍 Observability: Next.js (Turbopack) | Express | Nitro

Promise.all with 10 concurrent steps

💻 Local Development

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
💻 Local 🥇 Express 1.351s (-4.1%) 2.007s (~) 0.656s 15 1.00x
💻 Local Next.js (Turbopack) 1.387s (~) 2.011s (~) 0.624s 15 1.03x
💻 Local Nitro 1.416s (+1.0%) 2.007s (~) 0.590s 15 1.05x
🐘 Postgres Next.js (Turbopack) 2.164s (+1.1%) 3.017s (+6.4% 🔺) 0.853s 10 1.60x
🐘 Postgres Express 2.309s (+1.8%) 3.013s (~) 0.704s 10 1.71x
🐘 Postgres Nitro 2.335s (+3.5%) 3.013s (+9.6% 🔺) 0.679s 10 1.73x

▲ Production (Vercel)

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
▲ Vercel 🥇 Next.js (Turbopack) 2.696s (-1.9%) 3.768s (~) 1.071s 8 1.00x
▲ Vercel Nitro 2.797s (+7.1% 🔺) 3.994s (+8.9% 🔺) 1.197s 8 1.04x
▲ Vercel Express 2.905s (+1.3%) 4.058s (+6.8% 🔺) 1.153s 8 1.08x

🔍 Observability: Next.js (Turbopack) | Nitro | Express

Promise.all with 25 concurrent steps

💻 Local Development

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
💻 Local 🥇 Express 2.281s (-12.2% 🟢) 3.240s (+7.5% 🔺) 0.958s 10 1.00x
💻 Local Next.js (Turbopack) 2.457s (-0.5%) 3.186s (+5.5% 🔺) 0.730s 10 1.08x
💻 Local Nitro 2.545s (-1.9%) 3.017s (~) 0.472s 10 1.12x
🐘 Postgres Express 7.507s (-17.5% 🟢) 8.030s (-16.1% 🟢) 0.523s 4 3.29x
🐘 Postgres Nitro 7.664s (-30.7% 🟢) 8.058s (-31.0% 🟢) 0.394s 4 3.36x
🐘 Postgres Next.js (Turbopack) 12.908s (+1.7%) 13.377s (~) 0.468s 3 5.66x

▲ Production (Vercel)

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
▲ Vercel 🥇 Express 3.252s (-15.7% 🟢) 4.177s (-10.3% 🟢) 0.925s 8 1.00x
▲ Vercel Nitro 3.279s (+3.5%) 4.082s (+2.7%) 0.803s 8 1.01x
▲ Vercel Next.js (Turbopack) 3.600s (+2.1%) 4.437s (+2.9%) 0.837s 7 1.11x

🔍 Observability: Express | Nitro | Next.js (Turbopack)

Promise.all with 50 concurrent steps

💻 Local Development

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
💻 Local 🥇 Express 6.181s (-15.1% 🟢) 7.016s (-15.3% 🟢) 0.835s 5 1.00x
💻 Local Next.js (Turbopack) 7.033s (-1.4%) 7.674s (-1.4%) 0.641s 4 1.14x
💻 Local Nitro 7.273s (+1.2%) 8.049s (-1.2%) 0.776s 4 1.18x
🐘 Postgres Express 44.705s (-8.5% 🟢) 45.160s (-8.3% 🟢) 0.455s 1 7.23x
🐘 Postgres Nitro 49.776s (-4.6%) 50.192s (-5.5% 🟢) 0.416s 1 8.05x
🐘 Postgres Next.js (Turbopack) 54.980s (-2.1%) 55.260s (-3.4%) 0.280s 1 8.90x

▲ Production (Vercel)

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
▲ Vercel 🥇 Express 3.275s (-71.1% 🟢) 4.093s (-66.9% 🟢) 0.818s 8 1.00x
▲ Vercel Next.js (Turbopack) 3.426s (-68.4% 🟢) 4.285s (-62.9% 🟢) 0.859s 8 1.05x
▲ Vercel Nitro 3.609s (-75.5% 🟢) 4.515s (-71.4% 🟢) 0.906s 7 1.10x

🔍 Observability: Express | Next.js (Turbopack) | Nitro

Promise.race with 10 concurrent steps

💻 Local Development

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
💻 Local 🥇 Express 1.378s (-3.9%) 2.007s (~) 0.629s 15 1.00x
💻 Local Next.js (Turbopack) 1.392s (~) 2.010s (~) 0.618s 15 1.01x
💻 Local Nitro 1.450s (+2.3%) 2.006s (~) 0.556s 15 1.05x
🐘 Postgres Nitro 1.966s (-12.0% 🟢) 2.519s (-2.8%) 0.554s 12 1.43x
🐘 Postgres Express 2.036s (-0.8%) 2.598s (+11.9% 🔺) 0.562s 12 1.48x
🐘 Postgres Next.js (Turbopack) 2.421s (+4.2%) 2.932s (+7.0% 🔺) 0.510s 11 1.76x

▲ Production (Vercel)

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
▲ Vercel 🥇 Nitro 2.453s (-3.2%) 3.625s (+1.9%) 1.173s 9 1.00x
▲ Vercel Express 2.537s (-19.0% 🟢) 3.699s (-12.5% 🟢) 1.162s 9 1.03x
▲ Vercel Next.js (Turbopack) 2.625s (+2.8%) 3.649s (~) 1.024s 9 1.07x

🔍 Observability: Nitro | Express | Next.js (Turbopack)

Promise.race with 25 concurrent steps

💻 Local Development

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
💻 Local 🥇 Express 2.374s (-12.3% 🟢) 3.324s (+9.8% 🔺) 0.950s 10 1.00x
💻 Local Next.js (Turbopack) 2.612s (-0.8%) 3.041s (-4.3%) 0.429s 10 1.10x
💻 Local Nitro 2.693s (~) 3.010s (~) 0.317s 10 1.13x
🐘 Postgres Nitro 11.062s (-6.3% 🟢) 11.723s (-2.5%) 0.661s 3 4.66x
🐘 Postgres Express 11.640s (+5.6% 🔺) 12.052s (+6.1% 🔺) 0.412s 3 4.90x
🐘 Postgres Next.js (Turbopack) 14.438s (+5.4% 🔺) 14.708s (+2.3%) 0.270s 3 6.08x

▲ Production (Vercel)

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
▲ Vercel 🥇 Next.js (Turbopack) 2.974s (-6.2% 🟢) 3.958s (~) 0.984s 8 1.00x
▲ Vercel Nitro 3.067s (-8.8% 🟢) 3.872s (-11.2% 🟢) 0.805s 8 1.03x
▲ Vercel Express 3.077s (+10.8% 🔺) 4.199s (+12.8% 🔺) 1.123s 8 1.03x

🔍 Observability: Next.js (Turbopack) | Nitro | Express

Promise.race with 50 concurrent steps

💻 Local Development

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
💻 Local 🥇 Express 6.679s (-18.2% 🟢) 7.664s (-15.7% 🟢) 0.985s 4 1.00x
💻 Local Nitro 7.860s (-0.7%) 8.800s (-1.3%) 0.940s 4 1.18x
💻 Local Next.js (Turbopack) 7.883s (+4.6%) 8.610s (+0.5%) 0.727s 4 1.18x
🐘 Postgres Nitro 47.978s (-7.9% 🟢) 48.261s (-9.1% 🟢) 0.283s 1 7.18x
🐘 Postgres Express 52.774s (+4.0%) 53.122s (+3.9%) 0.348s 1 7.90x
🐘 Postgres Next.js (Turbopack) 57.866s (+0.6%) 58.326s (~) 0.460s 1 8.66x

▲ Production (Vercel)

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
▲ Vercel 🥇 Express 11.955s (+52.8% 🔺) 12.771s (+49.5% 🔺) 0.815s 3 1.00x
▲ Vercel Next.js (Turbopack) 12.561s (+52.5% 🔺) 13.554s (+51.2% 🔺) 0.993s 3 1.05x
▲ Vercel Nitro 15.344s (+147.8% 🔺) 16.539s (+142.5% 🔺) 1.195s 2 1.28x

🔍 Observability: Express | Next.js (Turbopack) | Nitro

Stream Benchmarks (includes TTFB metrics)
workflow with stream

💻 Local Development

World Framework Workflow Time TTFB Slurp Wall Time Overhead Samples vs Fastest
💻 Local 🥇 Express 0.114s (-36.9% 🟢) 1.000s (+0.8%) 0.012s (-17.7% 🟢) 1.019s (~) 0.905s 10 1.00x
💻 Local Next.js (Turbopack) 0.147s (+3.4%) 1.004s (~) 0.016s (+7.5% 🔺) 1.027s (~) 0.880s 10 1.29x
💻 Local Nitro 0.186s (+3.3%) 0.992s (~) 0.014s (+3.0%) 1.021s (~) 0.835s 10 1.63x
🐘 Postgres Next.js (Turbopack) 1.339s (-31.6% 🟢) 1.712s (-31.5% 🟢) 0.000s (-100.0% 🟢) 2.019s (-25.7% 🟢) 0.680s 10 11.74x
🐘 Postgres Express 2.372s (+1.1%) 2.671s (-0.9%) 0.000s (-100.0% 🟢) 3.017s (~) 0.645s 10 20.81x
🐘 Postgres Nitro 2.438s (+93.2% 🔺) 2.604s (+46.4% 🔺) 0.000s (+Infinity% 🔺) 3.015s (+49.7% 🔺) 0.577s 10 21.39x

▲ Production (Vercel)

World Framework Workflow Time TTFB Slurp Wall Time Overhead Samples vs Fastest
▲ Vercel 🥇 Nitro 2.354s (-59.2% 🟢) 2.760s (-54.9% 🟢) 0.191s (-20.0% 🟢) 3.459s (-51.6% 🟢) 1.105s 10 1.00x
▲ Vercel Express 2.440s (-60.9% 🟢) 2.795s (-52.7% 🟢) 0.159s (+0.5%) 3.482s (-52.6% 🟢) 1.042s 10 1.04x
▲ Vercel Next.js (Turbopack) 2.759s (-48.2% 🟢) 2.977s (-32.0% 🟢) 0.232s (+25.3% 🔺) 3.865s (-38.4% 🟢) 1.106s 10 1.17x

🔍 Observability: Nitro | Express | Next.js (Turbopack)

Summary

Fastest Framework by World

Winner determined by most benchmark wins

World 🥇 Fastest Framework Wins
💻 Local Express 12/12
🐘 Postgres Next.js (Turbopack) 5/12
▲ Vercel Express 5/12
Fastest World by Framework

Winner determined by most benchmark wins

Framework 🥇 Fastest World Wins
Express 💻 Local 11/12
Next.js (Turbopack) 💻 Local 11/12
Nitro 💻 Local 11/12
Column Definitions
  • Workflow Time: Runtime reported by workflow (completedAt - createdAt) - primary metric
  • TTFB: Time to First Byte - time from workflow start until first stream byte received (stream benchmarks only)
  • Slurp: Time from first byte to complete stream consumption (stream benchmarks only)
  • Wall Time: Total testbench time (trigger workflow + poll for result)
  • Overhead: Testbench overhead (Wall Time - Workflow Time)
  • Samples: Number of benchmark iterations run
  • vs Fastest: How much slower compared to the fastest configuration for this benchmark

Worlds:

  • 💻 Local: In-memory filesystem world (local development)
  • 🐘 Postgres: PostgreSQL database world (local development)
  • ▲ Vercel: Vercel production/preview deployment
  • 🌐 Starter: Community world (local development)
  • 🌐 Turso: Community world (local development)
  • 🌐 MongoDB: Community world (local development)
  • 🌐 Redis: Community world (local development)
  • 🌐 Jazz: Community world (local development)

📋 View full workflow run

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR targets CI flakiness in the E2E suite by adjusting timing-sensitive workflows, extending dev-mode rebuild timeouts, and adding/strengthening “server ready” checks before running tests.

Changes:

  • Adjust workflow timing in promiseAnyWorkflow and reduce readableStreamWorkflow inter-chunk delay.
  • Add a beforeAll readiness poll in packages/core/e2e/e2e.test.ts.
  • Replace a fixed post-dev.test.ts sleep with a manifest polling loop in .github/workflows/tests.yml, and increase dev.test.ts timeouts.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
workbench/example/workflows/99_e2e.ts Tweaks Promise.any timing gaps and reduces stream chunk delay to reduce timing-related flakes.
packages/core/e2e/e2e.test.ts Adds a pre-suite “deployment healthy” poll before executing E2E tests.
packages/core/e2e/dev.test.ts Increases dev rebuild test timeouts to better accommodate slow CI rebuilds.
.github/workflows/tests.yml Replaces fixed sleep with a manifest polling loop before running E2E tests.
.changeset/fix-flaky-e2e-tests.md Publishes a patch changeset documenting the flake fixes.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 333 to 339
for i in $(seq 1 60); do
if curl -sf "$DEPLOYMENT_URL/.well-known/workflow/v1/manifest.json" > /dev/null 2>&1; then
echo "Server healthy after ${i}s"
break
fi
sleep 1
done
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new manifest polling loop will break when healthy, but if it never becomes healthy within 60 iterations the script still continues to run the E2E suite, reintroducing flakiness and making failures harder to diagnose. After the loop, add an explicit failure (non-zero exit) if the health check never succeeded; also consider adding a short curl --max-time so a single request can’t hang the job indefinitely.

Suggested change
for i in $(seq 1 60); do
if curl -sf "$DEPLOYMENT_URL/.well-known/workflow/v1/manifest.json" > /dev/null 2>&1; then
echo "Server healthy after ${i}s"
break
fi
sleep 1
done
health_ok=0
for i in $(seq 1 60); do
if curl -sf --max-time 5 "$DEPLOYMENT_URL/.well-known/workflow/v1/manifest.json" > /dev/null 2>&1; then
echo "Server healthy after ${i}s"
health_ok=1
break
fi
sleep 1
done
if [ "$health_ok" -ne 1 ]; then
echo "Server failed to become healthy after 60 seconds"
exit 1
fi

Copilot uses AI. Check for mistakes.
Comment on lines 141 to 154
for (let i = 1; i <= 60; i++) {
try {
const res = await fetch(deploymentUrl);
if (res.ok) {
console.log(`Server healthy after ${i}s`);
return;
}
} catch {
// Server not ready yet
}
await new Promise((resolve) => setTimeout(resolve, 1_000));
}
throw new Error(
`Server at ${deploymentUrl} did not become healthy within 60s`
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new beforeAll health check polls fetch(deploymentUrl) without getProtectionBypassHeaders(). On Vercel runs with Deployment Protection enabled, this request can stay non-2xx even though the workflow endpoints are reachable with bypass headers, causing an unnecessary 60s delay/failure. Consider polling a known workflow endpoint (e.g. /.well-known/workflow/v1/manifest.json or the ?__health endpoints) and include the protection-bypass headers (and ideally a per-request timeout via AbortController) so the check reflects actual E2E readiness.

Suggested change
for (let i = 1; i <= 60; i++) {
try {
const res = await fetch(deploymentUrl);
if (res.ok) {
console.log(`Server healthy after ${i}s`);
return;
}
} catch {
// Server not ready yet
}
await new Promise((resolve) => setTimeout(resolve, 1_000));
}
throw new Error(
`Server at ${deploymentUrl} did not become healthy within 60s`
const healthUrl = new URL(
'/.well-known/workflow/v1/manifest.json',
deploymentUrl
).toString();
const headers = getProtectionBypassHeaders?.();
for (let i = 1; i <= 60; i++) {
const controller = new AbortController();
const timeoutId = setTimeout(() => controller.abort(), 5_000);
try {
const res = await fetch(healthUrl, {
headers,
signal: controller.signal,
});
if (res.ok) {
console.log(`Server healthy after ${i}s`);
return;
}
} catch {
// Server not ready yet or request timed out
} finally {
clearTimeout(timeoutId);
}
await new Promise((resolve) => setTimeout(resolve, 1_000));
}
throw new Error(
`Server at ${healthUrl} did not become healthy within 60s`

Copilot uses AI. Check for mistakes.
@pranaygp pranaygp marked this pull request as draft February 6, 2026 23:47
- e2e.test.ts beforeAll: poll manifest endpoint with
  getProtectionBypassHeaders() and AbortController timeout instead
  of bare fetch(deploymentUrl) which fails with Deployment Protection
- tests.yml: add health_ok flag + exit 1 fallback and curl --max-time 5
- Also replace sleep 10 in local-prod and local-postgres jobs with
  the same health check poll pattern

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant