Skip to content

Adapting SpatialData to naming restrictions for HEST samples #104

@rushin682

Description

@rushin682

Hi @guillaumejaume @pauldoucet ,

I hope you both are doing well.

I have been creating zarr files from the HEST objects and I noticed naming errors in some files. This is due to the new naming restrictions imposed in SpatialData: Discussion #707

See the error following and the corresponding fix:

>>> st = next(iter_hest(data_path, id_list=["MISC62"]))
>>> sdata = st.to_spatial_data(fullres=False)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/ictstr01/groups/peng/projects/rushin.gindra/hai_spatial_clip/baselines/HEST/src/hest/HESTData.py", line 687, in to_spatial_data
    new_table = TableModel.parse(new_table, region=REGION, region_key=REGION_KEY, instance_key=INSTANCE_KEY)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lustre/groups/peng/workspace/rushin.gindra/miniforge3/envs/hescape-data/lib/python3.11/site-packages/spatialdata/models/models.py", line 1089, in parse
    validate_table_attr_keys(adata)
  File "/lustre/groups/peng/workspace/rushin.gindra/miniforge3/envs/hescape-data/lib/python3.11/site-packages/spatialdata/_core/validation.py", line 236, in validate_table_attr_keys
    with raise_validation_errors(
  File "/lustre/groups/peng/workspace/rushin.gindra/miniforge3/envs/hescape-data/lib/python3.11/site-packages/spatialdata/_core/validation.py", line 382, in __exit__
    raise ValidationError(title=self._message, errors=self._collector.errors)
spatialdata._core.validation.ValidationError: Table contains invalid names.
For renaming, please see the discussion here https://github.com/scverse/spatialdata/discussions/707 .
  var/Feature Counts in Spots Under Tissue: Name must contain only alphanumeric characters, underscores, dots and hyphens.
  var/Median Normalized Average Counts: Name must contain only alphanumeric characters, underscores, dots and hyphens.
  var/Barcodes Detected per Feature: Name must contain only alphanumeric characters, underscores, dots and hyphens.

I propose the following solution for the same:

import re
from spatialdata._core.validation import validate_table_attr_keys, ValidationError

def transform_name(old_name: str) -> str:
    return re.sub(r"[^\w\._-]", "_", old_name)

st = next(iter_hest(args.data_path, id_list=[biospecimen_id]))
adata = st.adata
try:
    validate_table_attr_keys(adata)
except ValidationError as e:
    # print(f"Failed to validate table attribute keys for {name}: {e}") 
    # print({error.location[1]: transform_name(error.location[1]) for error in e._errors})   
    adata.var = adata.var.rename(columns={error.location[1]: transform_name(error.location[1]) for error in e._errors}, errors="raise")
    st.adata = adata.copy()
sdata = st.to_spatial_data(fullres=True)

Ofc, we can try and add this validation to the to_spatial_data function, but the bottleneck being, the naming error can be in any attribute for the following ("obs", "obsm", "obsp", "var", "varm", "varp", "uns", "layers")....So we may have to come up with an elegant solution...

Let's discuss how to proceed with this, and I can open a PR accordingly.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions