Skip to content

perf: parallelize integration tests via GitHub Actions matrix sharding (4 shards) #863

Description

@ismisepaul

Problem

The integration-tests job in .github/workflows/test.yml takes ~4 minutes. All 40 *IT classes run serially in a single JVM against one shared MySQL instance.

Background — what's already been done

This builds on prior perf work, do not redo it:

So the remaining lever is no longer per-class cost (already minimized) — it's serial execution. We parallelize.

Why naive thread-parallelism is unsafe

The 40 classes share global state: one MySQL core schema plus in-memory static caches (ScoreboardStatus, CheatSheetStatus, CountdownHandler, FeedbackStatus, OpenRegistration, ModulePlan). Running them concurrently in one JVM/DB would corrupt each other's state. reseedTestData() guarantees isolation only between serial classes, not concurrent ones.

Recommended approach: GitHub Actions matrix sharding (4 shards)

Split the 40 IT classes across 4 parallel jobs, each on its own runner with its own MySQL service container (4 MySQL instances total, not 40). Because each shard is a separate JVM + separate DB, the shared-state problem disappears with no test-code changes. Expected wall-clock ~4 min → ~1–1.5 min (cost: ~4× runner-minutes).

Failsafe honors -Dit.test=<comma-separated globs> on the command line, so shards can be selected by package glob without maintaining hardcoded class lists. Suggested balanced grouping:

Shard Globs (-Dit.test=) ~classes
1 dbProcs.*IT, servlets.LoginIT, servlets.LogoutIT, servlets.SetupIT, testUtils.*IT 7 (incl. heavy Getter/Setter)
2 servlets.admin.** 18 (small/fast each)
3 servlets.module.lesson.*IT 9
4 servlets.api.*IT, servlets.module.GetModuleIT, servlets.module.challenge.*IT 6

Implementation sketch: turn the job into a strategy.matrix over the 4 shards (with fail-fast: false), keep the existing services.mysql + setup steps per shard, and append -Dit.test=${{ matrix.shard.tests }} to the existing mvn verify step.

Alternatives considered

  • Failsafe forkCount + per-fork schema (${surefire.forkNumber}core_1, core_2…): keeps it to one runner (no extra cost) but requires making the DB/schema name configurable in TestProperties (currently hardcoded "core" at TestProperties.java:494) and provisioning N schemas. More invasive than matrix sharding.
  • Cheaper wins regardless: share the compiled artifact between the build and IT jobs instead of recompiling per shard; verify the Maven cache is actually hitting.

Acceptance criteria

  • IT wall-clock time meaningfully reduced (target ~1–1.5 min).
  • All 40 IT classes still execute — no shard gaps; verify total test count matches pre-change.
  • Adding a new IT class is easy to route to a shard (prefer package-glob over hardcoded class names).
  • fail-fast: false so one shard failing still reports the others.
  • No production code changes (matrix approach); if forkCount is chosen instead, only TestProperties changes.

Refs: Maven Failsafe — fork options & parallel execution, GitHub Actions matrix strategy

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status
    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions