test(plotting): replace bilinear-resize fallback; fix dpi propagation (#1327) by Marius1311 · Pull Request #1359 · scverse/cellrank

Marius1311 · 2026-06-05T12:52:27Z

Closes #1327.

Problem

tests/_helpers.py::resize_images_to_same_sizes ran before every compare_images, bilinearly resizing the rendered figure to the baseline's pixel size whenever they differed. A size mismatch is itself a signal that something changed (matplotlib/layout/DPI/backend), and resizing both:

masked real regressions — a genuinely different layout was squashed to match, then passed under the RMS tolerance, and
manufactured fake diffs — interpolation blurs the image before the RMS comparison.

What changed

1. Harness: resize fallback → size tolerance + crop

(tests/_helpers.py, tests/test_plotting.py)

Remove resize_images_to_same_sizes.
Add assert_image_sizes_close — a relative size tolerance (SIZE_RTOL = 0.1, the size-axis analogue of STRICT_TOL): fail loudly on gross size divergence, absorb sub-pixel bbox_inches="tight" rounding drift across platforms / matplotlib versions.
Add crop_images_to_common_size — crop (no interpolation) so compare_images gets identical dimensions without blurring. The RMS comparison then runs on real, non-interpolated pixels.

This is consistent with #1328's philosophy (robustness via low DPI + tolerance, not pinning) rather than a strict size equality, which would flag ~17 sub-pixel-jitter tests — including baselines #1328 just regenerated.

2. dpi propagation fix

(src/cellrank/_utils/_utils.py, src/cellrank/estimators/terminal_states/_term_states_estimator.py)

Removing the resize exposed that plot_macrostates rendered ~2.6× too large: _plot_discrete and the _plot_continuous non-same-plot branch popped dpi and discarded it (the # handled at figure level comment was never implemented), so sc.pl.embedding rendered at scanpy's default ~100 dpi instead of the requested 40.

save_fig now accepts an optional dpi forwarded to fig.savefig, and both embedding paths pass it through. This restores baseline-matching resolution and recovers test_final_states / test_lin_probs (which now pass). The new param is backward-compatible — all existing callers behave identically.

Stale baselines (7 `xfail`)

Removing the resize surfaced 7 committed baselines it had been masking. They are genuinely stale and need regenerating on Linux CI (they can't be reliably regenerated on macOS — font metrics shift the tight bbox). They're marked xfail(strict=False) with reasons so they're tracked, not masked, and pass without XPASS errors once regenerated:

5 GPCCA macrostate-scatter (test_gpcca_meta_states*, test_gpcca_final_states) — scvelo→scanpy content drift (Replace scv.pl.scatter and scvelo paga with scanpy equivalents #1302, ~69 RMS). The size half is now fixed by the dpi change; the content half remains.
test_proj_default_ordering, test_msc_default — later layout/figsize changes.

Coordination with #1328

These 7 should be regenerated and un-xfailed as part of #1328 (they are the canonical visual-regression tests #1328 keeps, not knob tests it converts). Ordering matters: the dpi fix must be in the tree before those macrostate-scatter baselines are regenerated, otherwise the oversized renders get baked in and have to be regenerated twice. strict=False means there's no race if a regenerated baseline lands before its marker is removed.

Testing

Local (macOS): 298 passed, 7 xfailed, 0 failed for tests/test_plotting.py; tests/test_gpcca.py plotting paths 32 passed. pre-commit clean.

🤖 Generated with Claude Code

…#1327) The image-comparison harness called `resize_images_to_same_sizes` before every `compare_images`, bilinearly resizing the rendered figure to the baseline's pixel size whenever they differed. A size mismatch is itself a signal that something changed (matplotlib/layout/DPI/backend), and resizing both *masked real regressions* (a different layout squashed to match) and *manufactured fake diffs* (interpolation blurs the image before the RMS comparison). Harness (tests/_helpers.py, tests/test_plotting.py): - Remove `resize_images_to_same_sizes`. - Add `assert_image_sizes_close` (relative tolerance `SIZE_RTOL = 0.1`, the size-axis analogue of `STRICT_TOL`): fail loudly on gross size divergence, absorb sub-pixel `bbox_inches="tight"` rounding drift across platforms / matplotlib versions. - Add `crop_images_to_common_size` (crop, no interpolation) so `compare_images` gets identical dimensions without blurring; the RMS comparison runs on real, non-interpolated pixels. dpi propagation (src/cellrank/_utils/_utils.py, _term_states_estimator.py): Removing the resize exposed that `plot_macrostates` rendered ~2.6x too large: `_plot_discrete` and the `_plot_continuous` non-same-plot branch popped `dpi` and discarded it (the "handled at figure level" comment was never implemented), so `sc.pl.embedding` rendered at scanpy's default ~100 dpi instead of the requested 40. `save_fig` now accepts a `dpi` argument forwarded to `fig.savefig`, and both embedding paths pass it through. This restores baseline-matching resolution and recovers `test_final_states` / `test_lin_probs`. Stale baselines: 7 committed figures the resize had been masking are genuinely stale and need regenerating on Linux CI (they can't be reliably regenerated on macOS). They are marked `xfail(strict=False)` with reasons so they are tracked, not masked, and pass without XPASS errors once regenerated: - 5 GPCCA macrostate-scatter tests: scvelo->scanpy content drift (#1302); the size half is now fixed by the dpi change, the ~69 RMS content half remains. - test_proj_default_ordering, test_msc_default: later layout/figsize changes. These should be regenerated and un-xfailed as part of #1328. The dpi fix must be in the tree before that regeneration, otherwise the oversized renders get baked in as baselines. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Follow-up to #1359, which exposed (by removing the resize fallback) seven committed baselines it had been masking and marked them xfail(strict=False) for #1328 to regenerate. With the #1359 dpi fix now in the tree, regenerate all seven from the Linux `rendered-figures` CI artifact (py3.12-stable) so they match the renderer CI validates against, and remove the xfail markers + `_STALE_*` reason constants: - 5 GPCCA macrostate-scatter: test_gpcca_meta_states{,_discrete,_no_same_plot, _time}, test_gpcca_final_states (scvelo->scanpy content + the oversized dpi). - test_proj_default_ordering, test_msc_default (layout/figsize). Reviewed each regenerated figure against its predecessor: same data/content, correct size -- benign drift, not a rendering regression. All seven pass at STRICT_TOL=50 on macOS locally too (deterministic UMAP-positioned renders). Also bump test_paga_pie to tol=75. Removing the resize fallback surfaced that the paga pie graph -- the most rasterization-sensitive plot in the suite (a dense web of edges + pie-wedge nodes) -- drifts to ~55 RMS on pre-release matplotlib; its layout is deterministic (basis="umap"), so this is pure cross-version rendering jitter. A modestly looser per-test tolerance absorbs it on every platform (still far tighter than this plot's old tol=250, and still catches real regressions), which is cleaner than a version-conditional xfail. Full plotting suite: 311 passed, 7 skipped. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…#1360) Follow-up to #1359, which exposed (by removing the resize fallback) seven committed baselines it had been masking and marked them xfail(strict=False) for #1328 to regenerate. With the #1359 dpi fix now in the tree, regenerate all seven from the Linux `rendered-figures` CI artifact (py3.12-stable) so they match the renderer CI validates against, and remove the xfail markers + `_STALE_*` reason constants: - 5 GPCCA macrostate-scatter: test_gpcca_meta_states{,_discrete,_no_same_plot, _time}, test_gpcca_final_states (scvelo->scanpy content + the oversized dpi). - test_proj_default_ordering, test_msc_default (layout/figsize). Reviewed each regenerated figure against its predecessor: same data/content, correct size -- benign drift, not a rendering regression. All seven pass at STRICT_TOL=50 on macOS locally too (deterministic UMAP-positioned renders). Also bump test_paga_pie to tol=75. Removing the resize fallback surfaced that the paga pie graph -- the most rasterization-sensitive plot in the suite (a dense web of edges + pie-wedge nodes) -- drifts to ~55 RMS on pre-release matplotlib; its layout is deterministic (basis="umap"), so this is pure cross-version rendering jitter. A modestly looser per-test tolerance absorbs it on every platform (still far tighter than this plot's old tol=250, and still catches real regressions), which is cleaner than a version-conditional xfail. Full plotting suite: 311 passed, 7 skipped. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Marius1311 merged commit 495a419 into main Jun 5, 2026
9 of 10 checks passed

Marius1311 deleted the claude/quizzical-chaplygin-7aab41 branch June 5, 2026 13:19

Marius1311 mentioned this pull request Jun 5, 2026

test(plotting): regenerate stale baselines on Linux; un-xfail (#1328) #1360

Merged

Marius1311 mentioned this pull request Jun 5, 2026

Reduce and refactor the plotting image-comparison test surface #1328

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(plotting): replace bilinear-resize fallback; fix dpi propagation (#1327)#1359

test(plotting): replace bilinear-resize fallback; fix dpi propagation (#1327)#1359
Marius1311 merged 1 commit into
mainfrom
claude/quizzical-chaplygin-7aab41

Marius1311 commented Jun 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Marius1311 commented Jun 5, 2026

Problem

What changed

1. Harness: resize fallback → size tolerance + crop

2. dpi propagation fix

Stale baselines (7 xfail)

Coordination with #1328

Testing

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Stale baselines (7 `xfail`)