Releases: data2intelligence/SecActpy
v0.2.5 โ Streaming H5AD for >5M-cell datasets
What's New
- Streaming H5AD (
streaming=True): Two-pass chunk-reading algorithm for >5M-cell datasets. Peak memory drops from ~200 GB to ~3 GB. H5ADChunkReaderfor memory-efficient H5AD reading via h5py- Fixed H5AD index column detection for
obs.attrs['_index']convention - Added
"symbol"fallback for Ensembl gene name resolution
See CHANGELOG for full details.
v0.2.3
Cross-Package Reproducibility
SecActPy, R SecAct, and RidgeR now produce identical z-scores when using matching RNG and grouping settings.
New Features
- rng_method parameter โ Explicit control over the random number generator in all inference functions:
- "srand" โ Matches R SecAct's srand/rand() (platform-dependent; Linux and macOS differ)
- "gsl" โ GSL Mersenne Twister, cross-platform reproducible (matches RidgeR rng_method="gsl")
- "numpy" โ Fast native NumPy RNG (when exact R matching is not needed)
- The existing use_gsl_rng parameter is preserved for backward compatibility
- is_group_sig=True by default โ Signature grouping (Pearson r >= 0.9) is now enabled by default in all three
inference functions (secact_activity_inference, secact_activity_inference_scrnaseq, secact_activity_inference_st),
matching R SecAct's default behavior
5-Way Verification
| Signature | SecAct R | RidgeR srand | SecActPy srand | RidgeR gsl | SecActPy gsl |
|---|---|---|---|---|---|
| A1BG | 19.21867 | 19.21867 | 19.21867 | 19.20790 | 19.20790 |
| A2M | 25.65251 | 25.65251 | 25.65251 | 25.04980 | 25.04980 |
Full Changelog: v0.2.2...v0.2.3
v0.2.2
Bug Fix
- Fix
ImportErrorwhen usingbatch_size:secact_activity_inference()and
secact_activity_inference_st()crashed withImportError: cannot import name 'ridge_batch' from 'secactpy.ridge'whenbatch_sizewas set. The import was pointing to the wrong module.
New Features
Sparse mode (sparse_mode=True)
Opt-in parameter for memory-efficient processing of sparse Y matrices. Avoids densifying Y during matrix
multiplication by using the identity (Y.T @ T.T).T == T @ Y, with column z-scoring applied as lightweight
corrections on the small output matrix.
Supported in ridge(), ridge_batch(), and all high-level inference functions.
# scRNA-seq: Y stays sparse end-to-end
result = secact_activity_inference_scrnaseq(
adata, cell_type_col="cell_type",
is_single_cell_level=True,
batch_size=5000,
sparse_mode=True
)
# Spatial transcriptomics
result = secact_activity_inference_st(
adata, batch_size=5000, sparse_mode=True
)
โโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโ
โ โ Default โ sparse_mode=True โ
โโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโค
โ Memory โ Full dense Y โ Y stays sparse โ
โโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโค
โ Speed (<5% density) โ Baseline โ ~1.8x faster โ
โโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโค
โ Speed (5โ10% density) โ Baseline โ ~25% slower โ
โโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโค
โ Results โ Identical โ Identical โ
โโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโ
In-flight row-mean centering (row_center=True)
New parameter in ridge_batch() for applying row-mean centering without densifying Y. Computes row-centered column
statistics analytically from sparse Y.v0.2.1
What's Changed
Added
- Streaming output (
output_path,output_compression) in all high-level inference functions:secact_activity_inference(),
secact_activity_inference_scrnaseq(), andsecact_activity_inference_st() use_gsl_rngparameter inridge_batch()โ enables the ~70x faster NumPy RNG path for batch processing
Fixed
use_gsl_rngwas accepted bysecact_activity_inferencebut silently ignored byridge_batch, which always used the slower
GSL RNG. Nowridge_batch(both dense and sparse paths) respects the flag.- Docker R build: add
libfftw3-devforqqconf/metapand pre-installmulttestfrom Bioconductor
Changed
- Expanded README batch processing documentation: explains what batch processing is, in-memory vs streaming modes, dense vs sparse
handling, and includes downloadable example data
Full Changelog: v0.2.0...v0.2.1
secactpy_v0.2.0
Secreted Protein Activity Inference using Ridge Regression
We're excited to announce the official public release of SecActpy v0.2.0, the Python implementation of the SecAct/RidgeR algorithm for inferring secreted protein activity from gene expression data. This release marks the migration to the official data2intelligence organization repository.
๐ฏ Overview
SecActpy enables inference of cytokine/chemokine signaling activity from:
- Bulk RNA-seq โ Differential expression or raw counts
- scRNA-seq โ Single-cell or pseudo-bulk by cell type
- Spatial Transcriptomics โ Visium, CosMx, and other platforms
โจ Key Features
๐ฌ R-Compatible Results
Produces identical results to the R SecAct/RidgeR package using GSL-compatible random number generation.
๐ GPU Acceleration
Optional CuPy backend provides 9โ34x speedup over R implementations.
๐ Million-Sample Scale
Batch processing with streaming H5AD output for datasets that don't fit in memory.
๐งฎ Sparse-Aware
Automatic memory-efficient processing for sparse single-cell data โ just pass sparse matrices directly.
๐พ Smart Caching
Optional permutation table caching for faster repeated analyses.
๐ฌ Built-in Signatures
Includes SecAct and CytoSig signature matrices.
๐ฆ Installation
# From PyPI (recommended)
pip install secactpy
# From GitHub (CPU only)
pip install git+https://github.com/data2intelligence/SecActpy.git
# With GPU support (CUDA 11.x)
pip install "secactpy[gpu] @ git+https://github.com/data2intelligence/SecActpy.git"๐ Quick Start
from secactpy import secact_activity_inference
# Bulk RNA-seq
result = secact_activity_inference(
diff_expr,
is_differential=True,
sig_matrix="secact",
verbose=True
)
# Access results
activity = result['zscore']
pvalues = result['pvalue']from secactpy import secact_activity_inference_st
# Spatial transcriptomics
result = secact_activity_inference_st(
"path/to/visium/",
min_genes=1000,
verbose=True
)from secactpy import ridge_batch
# Large-scale batch processing (dense or sparse)
result = ridge_batch(
X, Y_sparse,
batch_size=10000,
backend='cupy', # GPU
verbose=True
)๐ Performance
| Dataset | R (Mac M1) | R (Linux) | Py (CPU) | Py (GPU) | Speedup |
|---|---|---|---|---|---|
| Bulk (1,170 sp ร 1,000 samples) | 74.4s | 141.6s | 128.8s | 6.7s | 11โ19x |
| scRNA-seq (1,170 sp ร 788 cells) | 54.9s | 117.4s | 104.8s | 6.8s | 8โ15x |
| Visium (1,170 sp ร 3,404 spots) | 141.7s | 379.8s | 381.4s | 11.2s | 13โ34x |
| CosMx (151 sp ร 443,515 cells) | 936.9s | 976.1s | 1226.7s | 99.9s | 9โ12x |
Benchmark Environment:
- Mac CPU: M1 Pro with VECLIB (8 cores)
- Linux CPU: AMD EPYC 7543P (4 cores)
- Linux GPU: NVIDIA A100-SXM4-80GB
๐ API Reference
High-Level Functions
| Function | Description |
|---|---|
secact_activity_inference() |
Bulk RNA-seq inference |
secact_activity_inference_st() |
Spatial transcriptomics inference |
secact_activity_inference_scrnaseq() |
scRNA-seq inference |
Core Functions
| Function | Description |
|---|---|
ridge() |
Ridge regression with permutation testing |
ridge_batch() |
Batch processing (auto-handles dense/sparse) |
load_signature() |
Load built-in signature matrices |
Utilities
| Function | Description |
|---|---|
estimate_batch_size() |
Optimal batch size for available memory |
estimate_memory() |
Memory requirement estimation |
StreamingResultWriter |
Stream results to H5AD |
๐ง Key Parameters
| Parameter | Default | Description |
|---|---|---|
sig_matrix |
"secact" |
Signature: "secact", "cytosig", or DataFrame |
lambda_ |
5e5 |
Ridge regularization parameter |
n_rand |
1000 |
Number of permutations |
seed |
0 |
Random seed for reproducibility |
backend |
"auto" |
"auto", "numpy", or "cupy" |
use_cache |
False |
Cache permutation tables to disk |
๐ Requirements
- Python โฅ 3.9
- NumPy โฅ 1.20
- Pandas โฅ 1.3
- SciPy โฅ 1.7
- h5py โฅ 3.0
- anndata โฅ 0.8
- scanpy โฅ 1.9
- Optional: CuPy โฅ 10.0 (GPU)
๐ณ Docker
# CPU
docker pull psychemistz/secactpy:latest
# GPU
docker pull psychemistz/secactpy:gpu
# With R SecAct/RidgeR for cross-validation
docker pull psychemistz/secactpy:with-r๐ Citation
If you use SecActpy in your research, please cite:
Beibei Ru, Lanqi Gong, Emily Yang, Seongyong Park, George Zaki, Kenneth Aldape, Lalage Wakefield, Peng Jiang. Inference of secreted protein activities in intercellular communication. GitHub: data2intelligence/SecAct
๐ License
MIT License
๐ Related Projects
- SecAct โ Original R implementation
- RidgeR โ R ridge regression package
- SpaCET โ Spatial transcriptomics cell type analysis
- CytoSig โ Cytokine signaling inference
๐ Acknowledgments
Full documentation: https://github.com/data2intelligence/SecActpy