-
Notifications
You must be signed in to change notification settings - Fork 41
Open
Description
Hi @guillaumejaume @pauldoucet ,
I hope you both are doing well.
I have been creating zarr files from the HEST objects and I noticed naming errors in some files. This is due to the new naming restrictions imposed in SpatialData: Discussion #707
See the error following and the corresponding fix:
>>> st = next(iter_hest(data_path, id_list=["MISC62"]))
>>> sdata = st.to_spatial_data(fullres=False)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/ictstr01/groups/peng/projects/rushin.gindra/hai_spatial_clip/baselines/HEST/src/hest/HESTData.py", line 687, in to_spatial_data
new_table = TableModel.parse(new_table, region=REGION, region_key=REGION_KEY, instance_key=INSTANCE_KEY)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/lustre/groups/peng/workspace/rushin.gindra/miniforge3/envs/hescape-data/lib/python3.11/site-packages/spatialdata/models/models.py", line 1089, in parse
validate_table_attr_keys(adata)
File "/lustre/groups/peng/workspace/rushin.gindra/miniforge3/envs/hescape-data/lib/python3.11/site-packages/spatialdata/_core/validation.py", line 236, in validate_table_attr_keys
with raise_validation_errors(
File "/lustre/groups/peng/workspace/rushin.gindra/miniforge3/envs/hescape-data/lib/python3.11/site-packages/spatialdata/_core/validation.py", line 382, in __exit__
raise ValidationError(title=self._message, errors=self._collector.errors)
spatialdata._core.validation.ValidationError: Table contains invalid names.
For renaming, please see the discussion here https://github.com/scverse/spatialdata/discussions/707 .
var/Feature Counts in Spots Under Tissue: Name must contain only alphanumeric characters, underscores, dots and hyphens.
var/Median Normalized Average Counts: Name must contain only alphanumeric characters, underscores, dots and hyphens.
var/Barcodes Detected per Feature: Name must contain only alphanumeric characters, underscores, dots and hyphens.I propose the following solution for the same:
import re
from spatialdata._core.validation import validate_table_attr_keys, ValidationError
def transform_name(old_name: str) -> str:
return re.sub(r"[^\w\._-]", "_", old_name)
st = next(iter_hest(args.data_path, id_list=[biospecimen_id]))
adata = st.adata
try:
validate_table_attr_keys(adata)
except ValidationError as e:
# print(f"Failed to validate table attribute keys for {name}: {e}")
# print({error.location[1]: transform_name(error.location[1]) for error in e._errors})
adata.var = adata.var.rename(columns={error.location[1]: transform_name(error.location[1]) for error in e._errors}, errors="raise")
st.adata = adata.copy()
sdata = st.to_spatial_data(fullres=True)Ofc, we can try and add this validation to the to_spatial_data function, but the bottleneck being, the naming error can be in any attribute for the following ("obs", "obsm", "obsp", "var", "varm", "varp", "uns", "layers")....So we may have to come up with an elegant solution...
Let's discuss how to proceed with this, and I can open a PR accordingly.
Metadata
Metadata
Assignees
Labels
No labels