Skip to content

seqyuan/trackcell

Repository files navigation

TrackCell

A Python package for processing and vis single-cell and spatial transcriptomics data.

Installation

pip install trackcell -i https://pypi.org/simple
# pip install --upgrade trackcell==0.1.9 -i https://pypi.org/simple

Usage

Reading SpaceRanger Output

Reading Cell Segmentation Data

import trackcell as tcl

# Read SpaceRanger cell segmentation output
adata = tcl.io.read_hd_cellseg(
    datapath="SpaceRanger4.0/Cse1/outs/segmented_outputs",
    sample="Cse1"
)

# The resulting AnnData object contains:
# - Expression matrix in .X
# - Cell metadata in .obs
# - Gene metadata in .var
# - Spatial coordinates in .obsm["spatial"]
# - Tissue images in .uns["spatial"][sample]["images"]
# - Scalefactors in .uns["spatial"][sample]["scalefactors"]
# - Cell geometries in .uns["spatial"][sample]["geometries"] (GeoDataFrame)
# - Cell geometries in .obs["geometry"] (WKT strings for serialization)

Subsetting Data and Synchronizing Geometries

Important: When you subset data loaded with read_hd_cellseg(), you must call sync_geometries_after_subset() to synchronize the geometries:

import trackcell as tcl
import numpy as np

# Read data
adata = tcl.io.read_hd_cellseg(
    datapath="SpaceRanger4.0/Cse1/outs/segmented_outputs",
    sample="Cse1"
)

# Subset by spatial region
x_min, x_max = 16000, 18000
y_min, y_max = 14000, 18000

spatial_coords = adata.obsm['spatial']
mask = ((spatial_coords[:, 0] >= x_min) & (spatial_coords[:, 0] <= x_max) &
        (spatial_coords[:, 1] >= y_min) & (spatial_coords[:, 1] <= y_max))

adata_subset = adata[mask].copy()

# IMPORTANT: Synchronize geometries after subsetting
tcl.io.sync_geometries_after_subset(adata_subset, sample="Cse1")

# Now you can safely plot the subset
tcl.pl.spatial_cell(adata_subset, color="classification")

Why this is necessary: When you subset an AnnData object, adata.obs and adata.obsm are automatically subset, but adata.uns["spatial"][sample]["geometries"] (GeoDataFrame) is not. Without synchronization, plotting may fail with errors like ValueError: aspect must be finite and positive.

Reading Bin-Level Data (2um/8um/16um)

import trackcell as tcl

# Read SpaceRanger bin-level output (2um/8um/16um bins)
adata = tcl.io.read_hd_bin(
    datapath="SpaceRanger4.0/Cse1/binned_outputs",
    sample="Cse1",
    binsize=16  # Bin size in micrometers (default: 16, common values: 2, 8, or 16)
)

# The function automatically handles:
# - filtered_feature_bc_matrix.h5 (preferred) or filtered_feature_bc_matrix/ directory
# - tissue_positions.parquet or tissue_positions.csv
# - tissue_hires_image.png and tissue_lowres_image.png
# - scalefactors_json.json

# The resulting AnnData object contains:
# - Expression matrix in .X
# - Bin metadata in .obs (with spatial coordinates)
# - Gene metadata in .var
# - Spatial coordinates in .obsm["spatial"]
# - Tissue images in .uns["spatial"][sample]["images"]
# - Scalefactors in .uns["spatial"][sample]["scalefactors"]
# - Bin size in .uns["spatial"][sample]["binsize"] (e.g., 2, 8, or 16)

# Access the bin size information:
print(f"Bin size: {adata.uns['spatial']['Cse1']['binsize']} um")

Visualization

Plotting with Cell Polygons
# Plot cells as polygons (requires data loaded with read_hd_cellseg)
tcl.pl.spatial_cell(
    adata, 
    color="classification",  # Color by cell type
    groups=['Cluster-2', 'Cluster-3'],  # Optional: filter specific groups
    figsize=(10, 10),
    edges_width=0.5,
    edges_color="black",
    alpha=0.8
)
# Plot continuous values (e.g., distance to a label)
tcl.pl.spatial_cell(
    adata,
    color="Cluster-2_dist",  # Distance to Cluster-2
    cmap="Reds",
    figsize=(10, 10)
)
Traditional Point-based Visualization
# Using scanpy (point-based)
sc.pl.spatial(adata, color='classification', size=2, 
              groups=['Cluster-2', 'Cluster-3'],
              legend_fontsize=12, spot_size=10, frameon=True
             )
# Using squidpy (point-based)
sq.pl.spatial_scatter(
    adata, shape=None, color=["classification"], 
    edges_width=0, size=0.1, 
    library_id="spatial", 
    groups=['Cluster-2', 'Cluster-3'],
    figsize=(5, 4), 
    #cmap='Blues'
    #palette = mycolor
    #img_key="0.3_mpp_150_buffer", 
    #basis="spatial_cropped_150_buffer"
)

Converting annohdcell Output to TrackCell Format

TrackCell provides two methods to convert annohdcell's bin2cell output into trackcell-compatible format with polygon geometries for spatial visualization.

Method 1: Convert from 2μm Bin H5AD Only

Create a new cell-level h5ad from annohdcell's 2μm bin h5ad with cell labels:

import trackcell as tcl

# Convert annohdcell 2μm bin h5ad to trackcell format
adata = tcl.io.convert_annohdcell_to_trackcell(
    bin_h5ad_path="b2c_2um.h5ad",
    output_h5ad_path="trackcell_format.h5ad",
    sample="sample1"
)

# Now visualize with trackcell
tcl.pl.spatial_cell(adata, sample="sample1")

Method 2: Add Geometries to Existing Cell H5AD

Add polygon geometries to annohdcell's final cell h5ad output (preserves exact count aggregation):

import trackcell as tcl

# Add geometries to annohdcell's final cell h5ad
adata = tcl.io.add_geometries_to_annohdcell_output(
    bin_h5ad_path="b2c_2um.h5ad",      # 2μm bin h5ad with cell labels
    cell_h5ad_path="b2c_cell.h5ad",    # Final cell h5ad from annohdcell
    output_h5ad_path="b2c_cell_with_geom.h5ad",
    sample="sample1"
)

# Now visualize with trackcell
tcl.pl.spatial_cell(adata, sample="sample1")

Key differences:

  • Method 1: Quick conversion, simple count summation
  • Method 2: Preserves annohdcell's exact count aggregation and all metadata

For detailed documentation, see docs/convert_annohdcell.md

Computing Distances to a Label (10x HD)

# Compute distance to a specific annotation label stored in adata.obs["group_col"]
tcl.tl.hd_labeldist(
    adata,
    groupby="classification",    # obs column containing cell type annotations
    label="Cluster-2",       # target label to measure distances from
    inplace=True          # add "{label}_px" and "{label}_dist" to adata.obs
)

# When inplace=False the function returns a DataFrame with the two columns:
dist_df = tcl.tl.hd_labeldist(adata, groupby="group_col", label="Neuron", inplace=False)
# Visualize distance using cell polygons
tcl.pl.spatial_cell(adata, color='Cluster-2_dist', cmap='Reds', figsize=(10, 10))

# Or using traditional point-based visualization
sc.pl.spatial(adata, color='Cluster-2_dist', size=2,
              legend_fontsize=12, spot_size=10, frameon=True
             )

Development

License

update

git tag v0.3.8
git push origin v0.3.8

# In GitHub, go to “Releases” → “Draft a new release”.

About

singlecell and spatial analysis

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages