Skip to content

Conversation

@timtreis
Copy link
Member

@timtreis timtreis commented Sep 30, 2025

For other downstream functions, such as #1036, one needs a function to robustly detect where in the image the tissue is and how many of those there are.

This function
a) implements two algorithms for identifying the tissue (otsu & felzenszwalb)
b) deals with arbitrary channel input
c) heuristically tries to identify what is a sample and what is either just random stuff (dirt, Visium frame etc). As a fallback, one can pass in the number of samples expected which should be more robust
d) adds the mask back to the sdata object with the same structure and transformations as the original image had
e) does everything in dask so it's quite fast

sdata = sq.datasets.visium_hne_sdata()

sq.exp.im.detect_tissue(
    sdata,
    image_key="hne",
)

sdata
SpatialData object, with associated Zarr store: [/Users/tim.treis/.cache/squidpy/visium_hne_sdata.zarr](https://file+.vscode-resource.vscode-cdn.net/Users/tim.treis/.cache/squidpy/visium_hne_sdata.zarr)
├── Images
│     └── 'hne': DataTree[cyx] (3, 11757, 11291), (3, 5878, 5645), (3, 2939, 2822), (3, 1469, 1411)
├── Labels
│     └── 'hne_tissue': DataTree[yx] (11757, 11291), (5878, 5645), (2939, 2822), (1469, 1411)
├── Shapes
│     └── 'spots': GeoDataFrame shape: (2688, 2) (2D shapes)
└── Tables
      └── 'adata': AnnData (2688, 18078)
with coordinate systems:
    ▸ 'global', with elements:
        hne (Images), hne_tissue (Labels), spots (Shapes)
with the following elements not in the Zarr store:
    ▸ hne_tissue (Labels)
(
    sdata
    .pl.render_images("hne")
    .pl.render_labels("hne_tissue", fill_alpha=0, contour_px=10, outline_alpha=1)
    .pl.show()
)
image

Todo

  • Manual tests on a bunch of different inputs IHC / H&E / DAPI / multichannel etc
  • Test with multiple samples in the same image
  • Write unit tests for functions

@timtreis timtreis linked an issue Sep 30, 2025 that may be closed by this pull request
Copy link
Member

@selmanozleyen selmanozleyen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi these are some initial feedbacks I will get into more details tomorrow

@timtreis
Copy link
Member Author

Notes to self: Works fine in the happy path (white bg, rgb specimen) but fails when the specimen is weird. Potential other idea:

Use https://scikit-image.org/docs/0.25.x/auto_examples/segmentation/plot_trainable_segmentation.html

  • Start off by defining corners (potentially overwriteable by user) as background class.
  • Randomly sample squares across the image, if it's (across channels) within median +/- 1 sd, assign as other bg tiles, everything else is tissue
  • Build essentially a 2 class-mask which is then fed to the classifier

@timtreis
Copy link
Member Author

Re-requesting also from @flying-sheep because I had to make changes to the hatch logic for stuff to even work. Not sure how the tests passed beforehand but if I'm not mistaken, certain actions just didn't exist?

Copy link
Member

@flying-sheep flying-sheep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there were no commands missing, hatch pre-defines them: https://hatch.pypa.io/latest/config/internal/testing/#scripts

unfortunately adding the diff-cover command means you have to re-define everything, since you can’t just override individual scripts in an environment (known hatch issue).

To explain how things should work:

  • all test/coverage dependencies should be in one spot (here the test extra)
  • the hatch-test env should define the commands as explained in the link above:
    • run should run the tests using pytest ...
    • run-cov should run the tests using coverage run -m pytest ...
    • cov-combine should combine the .coverage-xyz files written from the subprocesses spawned by pytest-xdist
    • cov-report should do all the reporting
  • locally, you should use hatch test [-c] [-p] ... which will use these commands behind the scenes
    • VS Code only understands pytest-cov, which is therefore only needed for VS Code’s test running GUI.
  • in CI, we sadly need to use matrix_name, so we can’t use the nice hatch test command and need to manually use the run* and cov* commands

@flying-sheep
Copy link
Member

flying-sheep commented Oct 27, 2025

hmm, strange, upload successful, but the link doesn’t work, and I don’t see the “codecov” CI job below:

info - 2025-10-27 10:14:23,124 -- ci service found: github-actions
info - 2025-10-27 10:14:23,188 -- Found 1 coverage files to report
info - 2025-10-27 10:14:23,188 -- > /home/runner/work/squidpy/squidpy/coverage.xml
info - 2025-10-27 10:14:23,757 -- Your upload is now processing. When finished, results will be available at: https://app.codecov.io/github/selmanozleyen/squidpy/commit/11fedf3561e44764fd652ac51a47ae0eeba3ea64
info - 2025-10-27 10:14:24,000 -- Process Upload complete

@timtreis timtreis merged commit 900d7b3 into main Oct 27, 2025
10 checks passed
@timtreis timtreis deleted the feature/issue1042-function-to-automatically-generate-tissue-masks-in-he branch October 27, 2025 15:05
@timtreis
Copy link
Member Author

Should Selmans handle be in that link? Shouldn't it be something at the org level?

@flying-sheep
Copy link
Member

yeah, that confused me too. the uploads should definitely go to the scverse org, something must be broken with the config here.

@timtreis
Copy link
Member Author

Did Selman maybe overwrite the codecov token with a private one?

@flying-sheep
Copy link
Member

flying-sheep commented Oct 27, 2025

possible, let’s make sure this isn’t the case.

/edit: done

@selmanozleyen
Copy link
Member

True, there wasn't a CODECOV token because we now take it env variables from the gh repo settings and I just used mine. Thought it would use the orgs since I was part of it. Nice catch

selmanozleyen pushed a commit to selmanozleyen/squidpy that referenced this pull request Nov 3, 2025
* mvp for function; without testgs

* added option to retain holes

* refactor + 1 test

* added missing import

* renamed test so that a plot would be generated

* added img from runner; cross-os-data-cache

* improved docstring

* added data download script to correct location

* updated hatch commands

* modified coverage combine

* removed superflous combine step

* first download data, then run tests

* attempt to simplify

* aligned testing

* updated toml

* aligned __init__ files

* no uv cache for data download

* removed download step that'd never get hit

* simplify

* parallel

* speed up tests

---------

Co-authored-by: Phil Schaf <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Function to automatically generate tissue masks in H&E

4 participants