Skip to content

test(plotting): replace bilinear-resize fallback; fix dpi propagation (#1327)#1359

Merged
Marius1311 merged 1 commit into
mainfrom
claude/quizzical-chaplygin-7aab41
Jun 5, 2026
Merged

test(plotting): replace bilinear-resize fallback; fix dpi propagation (#1327)#1359
Marius1311 merged 1 commit into
mainfrom
claude/quizzical-chaplygin-7aab41

Conversation

@Marius1311

Copy link
Copy Markdown
Collaborator

Closes #1327.

Problem

tests/_helpers.py::resize_images_to_same_sizes ran before every compare_images, bilinearly resizing the rendered figure to the baseline's pixel size whenever they differed. A size mismatch is itself a signal that something changed (matplotlib/layout/DPI/backend), and resizing both:

  • masked real regressions — a genuinely different layout was squashed to match, then passed under the RMS tolerance, and
  • manufactured fake diffs — interpolation blurs the image before the RMS comparison.

What changed

1. Harness: resize fallback → size tolerance + crop

(tests/_helpers.py, tests/test_plotting.py)

  • Remove resize_images_to_same_sizes.
  • Add assert_image_sizes_close — a relative size tolerance (SIZE_RTOL = 0.1, the size-axis analogue of STRICT_TOL): fail loudly on gross size divergence, absorb sub-pixel bbox_inches="tight" rounding drift across platforms / matplotlib versions.
  • Add crop_images_to_common_size — crop (no interpolation) so compare_images gets identical dimensions without blurring. The RMS comparison then runs on real, non-interpolated pixels.

This is consistent with #1328's philosophy (robustness via low DPI + tolerance, not pinning) rather than a strict size equality, which would flag ~17 sub-pixel-jitter tests — including baselines #1328 just regenerated.

2. dpi propagation fix

(src/cellrank/_utils/_utils.py, src/cellrank/estimators/terminal_states/_term_states_estimator.py)

Removing the resize exposed that plot_macrostates rendered ~2.6× too large: _plot_discrete and the _plot_continuous non-same-plot branch popped dpi and discarded it (the # handled at figure level comment was never implemented), so sc.pl.embedding rendered at scanpy's default ~100 dpi instead of the requested 40.

save_fig now accepts an optional dpi forwarded to fig.savefig, and both embedding paths pass it through. This restores baseline-matching resolution and recovers test_final_states / test_lin_probs (which now pass). The new param is backward-compatible — all existing callers behave identically.

Stale baselines (7 xfail)

Removing the resize surfaced 7 committed baselines it had been masking. They are genuinely stale and need regenerating on Linux CI (they can't be reliably regenerated on macOS — font metrics shift the tight bbox). They're marked xfail(strict=False) with reasons so they're tracked, not masked, and pass without XPASS errors once regenerated:

  • 5 GPCCA macrostate-scatter (test_gpcca_meta_states*, test_gpcca_final_states) — scvelo→scanpy content drift (Replace scv.pl.scatter and scvelo paga with scanpy equivalents #1302, ~69 RMS). The size half is now fixed by the dpi change; the content half remains.
  • test_proj_default_ordering, test_msc_default — later layout/figsize changes.

Coordination with #1328

These 7 should be regenerated and un-xfailed as part of #1328 (they are the canonical visual-regression tests #1328 keeps, not knob tests it converts). Ordering matters: the dpi fix must be in the tree before those macrostate-scatter baselines are regenerated, otherwise the oversized renders get baked in and have to be regenerated twice. strict=False means there's no race if a regenerated baseline lands before its marker is removed.

Testing

Local (macOS): 298 passed, 7 xfailed, 0 failed for tests/test_plotting.py; tests/test_gpcca.py plotting paths 32 passed. pre-commit clean.

🤖 Generated with Claude Code

…#1327)

The image-comparison harness called `resize_images_to_same_sizes` before every
`compare_images`, bilinearly resizing the rendered figure to the baseline's pixel
size whenever they differed. A size mismatch is itself a signal that something
changed (matplotlib/layout/DPI/backend), and resizing both *masked real
regressions* (a different layout squashed to match) and *manufactured fake diffs*
(interpolation blurs the image before the RMS comparison).

Harness (tests/_helpers.py, tests/test_plotting.py):
- Remove `resize_images_to_same_sizes`.
- Add `assert_image_sizes_close` (relative tolerance `SIZE_RTOL = 0.1`, the
  size-axis analogue of `STRICT_TOL`): fail loudly on gross size divergence,
  absorb sub-pixel `bbox_inches="tight"` rounding drift across platforms /
  matplotlib versions.
- Add `crop_images_to_common_size` (crop, no interpolation) so `compare_images`
  gets identical dimensions without blurring; the RMS comparison runs on real,
  non-interpolated pixels.

dpi propagation (src/cellrank/_utils/_utils.py, _term_states_estimator.py):
Removing the resize exposed that `plot_macrostates` rendered ~2.6x too large:
`_plot_discrete` and the `_plot_continuous` non-same-plot branch popped `dpi` and
discarded it (the "handled at figure level" comment was never implemented), so
`sc.pl.embedding` rendered at scanpy's default ~100 dpi instead of the requested
40. `save_fig` now accepts a `dpi` argument forwarded to `fig.savefig`, and both
embedding paths pass it through. This restores baseline-matching resolution and
recovers `test_final_states` / `test_lin_probs`.

Stale baselines: 7 committed figures the resize had been masking are genuinely
stale and need regenerating on Linux CI (they can't be reliably regenerated on
macOS). They are marked `xfail(strict=False)` with reasons so they are tracked,
not masked, and pass without XPASS errors once regenerated:
- 5 GPCCA macrostate-scatter tests: scvelo->scanpy content drift (#1302); the
  size half is now fixed by the dpi change, the ~69 RMS content half remains.
- test_proj_default_ordering, test_msc_default: later layout/figsize changes.

These should be regenerated and un-xfailed as part of #1328. The dpi fix must be
in the tree before that regeneration, otherwise the oversized renders get baked
in as baselines.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@Marius1311 Marius1311 merged commit 495a419 into main Jun 5, 2026
9 of 10 checks passed
@Marius1311 Marius1311 deleted the claude/quizzical-chaplygin-7aab41 branch June 5, 2026 13:19
Marius1311 added a commit that referenced this pull request Jun 5, 2026
Follow-up to #1359, which exposed (by removing the resize fallback) seven
committed baselines it had been masking and marked them xfail(strict=False)
for #1328 to regenerate. With the #1359 dpi fix now in the tree, regenerate
all seven from the Linux `rendered-figures` CI artifact (py3.12-stable) so
they match the renderer CI validates against, and remove the xfail markers +
`_STALE_*` reason constants:

- 5 GPCCA macrostate-scatter: test_gpcca_meta_states{,_discrete,_no_same_plot,
  _time}, test_gpcca_final_states (scvelo->scanpy content + the oversized dpi).
- test_proj_default_ordering, test_msc_default (layout/figsize).

Reviewed each regenerated figure against its predecessor: same data/content,
correct size -- benign drift, not a rendering regression. All seven pass at
STRICT_TOL=50 on macOS locally too (deterministic UMAP-positioned renders).

Also bump test_paga_pie to tol=75. Removing the resize fallback surfaced that
the paga pie graph -- the most rasterization-sensitive plot in the suite (a
dense web of edges + pie-wedge nodes) -- drifts to ~55 RMS on pre-release
matplotlib; its layout is deterministic (basis="umap"), so this is pure
cross-version rendering jitter. A modestly looser per-test tolerance absorbs
it on every platform (still far tighter than this plot's old tol=250, and
still catches real regressions), which is cleaner than a version-conditional
xfail.

Full plotting suite: 311 passed, 7 skipped.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Marius1311 added a commit that referenced this pull request Jun 5, 2026
…#1360)

Follow-up to #1359, which exposed (by removing the resize fallback) seven
committed baselines it had been masking and marked them xfail(strict=False)
for #1328 to regenerate. With the #1359 dpi fix now in the tree, regenerate
all seven from the Linux `rendered-figures` CI artifact (py3.12-stable) so
they match the renderer CI validates against, and remove the xfail markers +
`_STALE_*` reason constants:

- 5 GPCCA macrostate-scatter: test_gpcca_meta_states{,_discrete,_no_same_plot,
  _time}, test_gpcca_final_states (scvelo->scanpy content + the oversized dpi).
- test_proj_default_ordering, test_msc_default (layout/figsize).

Reviewed each regenerated figure against its predecessor: same data/content,
correct size -- benign drift, not a rendering regression. All seven pass at
STRICT_TOL=50 on macOS locally too (deterministic UMAP-positioned renders).

Also bump test_paga_pie to tol=75. Removing the resize fallback surfaced that
the paga pie graph -- the most rasterization-sensitive plot in the suite (a
dense web of edges + pie-wedge nodes) -- drifts to ~55 RMS on pre-release
matplotlib; its layout is deterministic (basis="umap"), so this is pure
cross-version rendering jitter. A modestly looser per-test tolerance absorbs
it on every platform (still far tighter than this plot's old tol=250, and
still catches real regressions), which is cleaner than a version-conditional
xfail.

Full plotting suite: 311 passed, 7 skipped.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Remove the bilinear-resize fallback in plotting image-comparison tests

1 participant