test(plotting): replace bilinear-resize fallback; fix dpi propagation (#1327)#1359
Merged
Conversation
…#1327) The image-comparison harness called `resize_images_to_same_sizes` before every `compare_images`, bilinearly resizing the rendered figure to the baseline's pixel size whenever they differed. A size mismatch is itself a signal that something changed (matplotlib/layout/DPI/backend), and resizing both *masked real regressions* (a different layout squashed to match) and *manufactured fake diffs* (interpolation blurs the image before the RMS comparison). Harness (tests/_helpers.py, tests/test_plotting.py): - Remove `resize_images_to_same_sizes`. - Add `assert_image_sizes_close` (relative tolerance `SIZE_RTOL = 0.1`, the size-axis analogue of `STRICT_TOL`): fail loudly on gross size divergence, absorb sub-pixel `bbox_inches="tight"` rounding drift across platforms / matplotlib versions. - Add `crop_images_to_common_size` (crop, no interpolation) so `compare_images` gets identical dimensions without blurring; the RMS comparison runs on real, non-interpolated pixels. dpi propagation (src/cellrank/_utils/_utils.py, _term_states_estimator.py): Removing the resize exposed that `plot_macrostates` rendered ~2.6x too large: `_plot_discrete` and the `_plot_continuous` non-same-plot branch popped `dpi` and discarded it (the "handled at figure level" comment was never implemented), so `sc.pl.embedding` rendered at scanpy's default ~100 dpi instead of the requested 40. `save_fig` now accepts a `dpi` argument forwarded to `fig.savefig`, and both embedding paths pass it through. This restores baseline-matching resolution and recovers `test_final_states` / `test_lin_probs`. Stale baselines: 7 committed figures the resize had been masking are genuinely stale and need regenerating on Linux CI (they can't be reliably regenerated on macOS). They are marked `xfail(strict=False)` with reasons so they are tracked, not masked, and pass without XPASS errors once regenerated: - 5 GPCCA macrostate-scatter tests: scvelo->scanpy content drift (#1302); the size half is now fixed by the dpi change, the ~69 RMS content half remains. - test_proj_default_ordering, test_msc_default: later layout/figsize changes. These should be regenerated and un-xfailed as part of #1328. The dpi fix must be in the tree before that regeneration, otherwise the oversized renders get baked in as baselines. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Marius1311
added a commit
that referenced
this pull request
Jun 5, 2026
Follow-up to #1359, which exposed (by removing the resize fallback) seven committed baselines it had been masking and marked them xfail(strict=False) for #1328 to regenerate. With the #1359 dpi fix now in the tree, regenerate all seven from the Linux `rendered-figures` CI artifact (py3.12-stable) so they match the renderer CI validates against, and remove the xfail markers + `_STALE_*` reason constants: - 5 GPCCA macrostate-scatter: test_gpcca_meta_states{,_discrete,_no_same_plot, _time}, test_gpcca_final_states (scvelo->scanpy content + the oversized dpi). - test_proj_default_ordering, test_msc_default (layout/figsize). Reviewed each regenerated figure against its predecessor: same data/content, correct size -- benign drift, not a rendering regression. All seven pass at STRICT_TOL=50 on macOS locally too (deterministic UMAP-positioned renders). Also bump test_paga_pie to tol=75. Removing the resize fallback surfaced that the paga pie graph -- the most rasterization-sensitive plot in the suite (a dense web of edges + pie-wedge nodes) -- drifts to ~55 RMS on pre-release matplotlib; its layout is deterministic (basis="umap"), so this is pure cross-version rendering jitter. A modestly looser per-test tolerance absorbs it on every platform (still far tighter than this plot's old tol=250, and still catches real regressions), which is cleaner than a version-conditional xfail. Full plotting suite: 311 passed, 7 skipped. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Marius1311
added a commit
that referenced
this pull request
Jun 5, 2026
…#1360) Follow-up to #1359, which exposed (by removing the resize fallback) seven committed baselines it had been masking and marked them xfail(strict=False) for #1328 to regenerate. With the #1359 dpi fix now in the tree, regenerate all seven from the Linux `rendered-figures` CI artifact (py3.12-stable) so they match the renderer CI validates against, and remove the xfail markers + `_STALE_*` reason constants: - 5 GPCCA macrostate-scatter: test_gpcca_meta_states{,_discrete,_no_same_plot, _time}, test_gpcca_final_states (scvelo->scanpy content + the oversized dpi). - test_proj_default_ordering, test_msc_default (layout/figsize). Reviewed each regenerated figure against its predecessor: same data/content, correct size -- benign drift, not a rendering regression. All seven pass at STRICT_TOL=50 on macOS locally too (deterministic UMAP-positioned renders). Also bump test_paga_pie to tol=75. Removing the resize fallback surfaced that the paga pie graph -- the most rasterization-sensitive plot in the suite (a dense web of edges + pie-wedge nodes) -- drifts to ~55 RMS on pre-release matplotlib; its layout is deterministic (basis="umap"), so this is pure cross-version rendering jitter. A modestly looser per-test tolerance absorbs it on every platform (still far tighter than this plot's old tol=250, and still catches real regressions), which is cleaner than a version-conditional xfail. Full plotting suite: 311 passed, 7 skipped. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #1327.
Problem
tests/_helpers.py::resize_images_to_same_sizesran before everycompare_images, bilinearly resizing the rendered figure to the baseline's pixel size whenever they differed. A size mismatch is itself a signal that something changed (matplotlib/layout/DPI/backend), and resizing both:What changed
1. Harness: resize fallback → size tolerance + crop
(
tests/_helpers.py,tests/test_plotting.py)resize_images_to_same_sizes.assert_image_sizes_close— a relative size tolerance (SIZE_RTOL = 0.1, the size-axis analogue ofSTRICT_TOL): fail loudly on gross size divergence, absorb sub-pixelbbox_inches="tight"rounding drift across platforms / matplotlib versions.crop_images_to_common_size— crop (no interpolation) socompare_imagesgets identical dimensions without blurring. The RMS comparison then runs on real, non-interpolated pixels.This is consistent with #1328's philosophy (robustness via low DPI + tolerance, not pinning) rather than a strict size equality, which would flag ~17 sub-pixel-jitter tests — including baselines #1328 just regenerated.
2. dpi propagation fix
(
src/cellrank/_utils/_utils.py,src/cellrank/estimators/terminal_states/_term_states_estimator.py)Removing the resize exposed that
plot_macrostatesrendered ~2.6× too large:_plot_discreteand the_plot_continuousnon-same-plot branch poppeddpiand discarded it (the# handled at figure levelcomment was never implemented), sosc.pl.embeddingrendered at scanpy's default ~100 dpi instead of the requested 40.save_fignow accepts an optionaldpiforwarded tofig.savefig, and both embedding paths pass it through. This restores baseline-matching resolution and recoverstest_final_states/test_lin_probs(which now pass). The new param is backward-compatible — all existing callers behave identically.Stale baselines (7
xfail)Removing the resize surfaced 7 committed baselines it had been masking. They are genuinely stale and need regenerating on Linux CI (they can't be reliably regenerated on macOS — font metrics shift the
tightbbox). They're markedxfail(strict=False)with reasons so they're tracked, not masked, and pass without XPASS errors once regenerated:test_gpcca_meta_states*,test_gpcca_final_states) — scvelo→scanpy content drift (Replace scv.pl.scatter and scvelo paga with scanpy equivalents #1302, ~69 RMS). The size half is now fixed by the dpi change; the content half remains.test_proj_default_ordering,test_msc_default— later layout/figsize changes.Coordination with #1328
These 7 should be regenerated and un-
xfailed as part of #1328 (they are the canonical visual-regression tests #1328 keeps, not knob tests it converts). Ordering matters: the dpi fix must be in the tree before those macrostate-scatter baselines are regenerated, otherwise the oversized renders get baked in and have to be regenerated twice.strict=Falsemeans there's no race if a regenerated baseline lands before its marker is removed.Testing
Local (macOS):
298 passed, 7 xfailed, 0 failedfortests/test_plotting.py;tests/test_gpcca.pyplotting paths32 passed. pre-commit clean.🤖 Generated with Claude Code