Skip to content

Improve applyCompMatr summation accuracy#781

Open
LubuSeb wants to merge 1 commit into
QuEST-Kit:develfrom
LubuSeb:codex/unitaryhack-quest-598-compensated
Open

Improve applyCompMatr summation accuracy#781
LubuSeb wants to merge 1 commit into
QuEST-Kit:develfrom
LubuSeb:codex/unitaryhack-quest-598-compensated

Conversation

@LubuSeb
Copy link
Copy Markdown

@LubuSeb LubuSeb commented Jun 5, 2026

Closes #598.

Summary

  • Replaces the naive inner accumulation in cpu_statevec_anyCtrlAnyTargDenseMatr_sub() with component-wise compensated summation over cpu_qcomp.re and cpu_qcomp.im.
  • Writes each output amplitude once after its local reduction rather than repeatedly updating amps[i] inside the summation loop.
  • Adds a deterministic complex cancellation regression for 4-target applyCompMatr, which exercises the 3+ target CompMatr path rather than the one/two-target specialisations.

Notes

I saw the earlier closed attempt in #777. This version keeps the same narrow two-file scope, but uses direct qreal compensation for the real and imaginary components instead of relying on complex add/sub overloads in the hot loop. I also checked base_qcomp: the current operators are ordinary component-wise arithmetic, so independent real/imaginary Kahan compensation is compatible with the backend representation.

Local measurements

Configuration: Windows, GCC 13.2.0, Release, single CPU (QUEST_ENABLE_OMP=OFF, QUEST_ENABLE_MPI=OFF, QUEST_ENABLE_CUDA=OFF, QUEST_ENABLE_HIP=OFF). The benchmark applies a dense CompMatr whose first output row is [large+i*large, 1-i, ..., 1-i, -large-i*large] to an all-ones state. The expected first amplitude is (2^targets - 2) - i(2^targets - 2).

precision targets baseline abs error patched abs error baseline avg ms patched avg ms
1 4 14 0 0.0114 0.0117
1 8 254 0 0.0779 0.2100
1 10 1022 0 1.0715 3.0297
2 4 14 0 0.0113 0.0121
2 8 254 0 0.0801 0.2008
2 10 1022 0 1.4134 3.2629
4 4 14 0 0.0137 0.0134
4 8 254 0 0.4391 0.4150
4 10 1022 0 7.1805 6.6778

The measurements show the expected accuracy improvement. The overhead is visible for larger single/double precision reductions, which seems consistent with the tradeoff described in the issue.

Testing

cmake -S . -B build-598-fp2 -G Ninja -D CMAKE_BUILD_TYPE=Release -D QUEST_BUILD_TESTS=ON -D QUEST_FLOAT_PRECISION=2 -D QUEST_ENABLE_OMP=OFF -D QUEST_ENABLE_MPI=OFF -D QUEST_ENABLE_CUDA=OFF -D QUEST_ENABLE_HIP=OFF
cmake --build build-598-fp2 --parallel
build-598-fp2/tests/tests.exe '*applyCompMatr*' --reporter compact
ctest --test-dir build-598-fp2 --output-on-failure -j 4
git diff --check

Results:

  • *applyCompMatr*: passed, 10 test cases / 10,003 assertions.
  • Full CPU-only double-precision ctest: passed.
  • git diff --check: passed; Git emitted only Windows line-ending conversion warnings.

Prepared with AI assistance; I reviewed the patch and ran the listed local checks.

@TysonRayJones
Copy link
Copy Markdown
Member

How come you separate out the real and imaginary components, mr robo?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants