Skip to content

Releases: data2intelligence/SecActpy

v0.2.5 โ€” Streaming H5AD for >5M-cell datasets

08 Mar 13:58

Choose a tag to compare

What's New

  • Streaming H5AD (streaming=True): Two-pass chunk-reading algorithm for >5M-cell datasets. Peak memory drops from ~200 GB to ~3 GB.
  • H5ADChunkReader for memory-efficient H5AD reading via h5py
  • Fixed H5AD index column detection for obs.attrs['_index'] convention
  • Added "symbol" fallback for Ensembl gene name resolution

See CHANGELOG for full details.

v0.2.3

09 Feb 05:47

Choose a tag to compare

Cross-Package Reproducibility

SecActPy, R SecAct, and RidgeR now produce identical z-scores when using matching RNG and grouping settings.

New Features

  • rng_method parameter โ€” Explicit control over the random number generator in all inference functions:
    • "srand" โ€” Matches R SecAct's srand/rand() (platform-dependent; Linux and macOS differ)
    • "gsl" โ€” GSL Mersenne Twister, cross-platform reproducible (matches RidgeR rng_method="gsl")
    • "numpy" โ€” Fast native NumPy RNG (when exact R matching is not needed)
    • The existing use_gsl_rng parameter is preserved for backward compatibility
  • is_group_sig=True by default โ€” Signature grouping (Pearson r >= 0.9) is now enabled by default in all three
    inference functions (secact_activity_inference, secact_activity_inference_scrnaseq, secact_activity_inference_st),
    matching R SecAct's default behavior

5-Way Verification

Signature SecAct R RidgeR srand SecActPy srand RidgeR gsl SecActPy gsl
A1BG 19.21867 19.21867 19.21867 19.20790 19.20790
A2M 25.65251 25.65251 25.65251 25.04980 25.04980

Full Changelog: v0.2.2...v0.2.3

v0.2.2

08 Feb 21:59

Choose a tag to compare

Bug Fix

  • Fix ImportError when using batch_size: secact_activity_inference() and
    secact_activity_inference_st() crashed with ImportError: cannot import name 'ridge_batch' from 'secactpy.ridge' when batch_size was set. The import was pointing to the wrong module.

New Features

Sparse mode (sparse_mode=True)

Opt-in parameter for memory-efficient processing of sparse Y matrices. Avoids densifying Y during matrix
multiplication by using the identity (Y.T @ T.T).T == T @ Y, with column z-scoring applied as lightweight
corrections on the small output matrix.

Supported in ridge(), ridge_batch(), and all high-level inference functions.

# scRNA-seq: Y stays sparse end-to-end
result = secact_activity_inference_scrnaseq(
    adata, cell_type_col="cell_type",
    is_single_cell_level=True,
    batch_size=5000,
    sparse_mode=True
)

# Spatial transcriptomics
result = secact_activity_inference_st(
    adata, batch_size=5000, sparse_mode=True
)
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                       โ”‚   Default    โ”‚ sparse_mode=True โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ Memory                โ”‚ Full dense Y โ”‚ Y stays sparse   โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ Speed (<5% density)   โ”‚ Baseline     โ”‚ ~1.8x faster     โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ Speed (5โ€“10% density) โ”‚ Baseline     โ”‚ ~25% slower      โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ Results               โ”‚ Identical    โ”‚ Identical        โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
In-flight row-mean centering (row_center=True)

New parameter in ridge_batch() for applying row-mean centering without densifying Y. Computes row-centered column
statistics analytically from sparse Y.

v0.2.1

08 Feb 19:11

Choose a tag to compare

What's Changed

Added

  • Streaming output (output_path, output_compression) in all high-level inference functions: secact_activity_inference(),
    secact_activity_inference_scrnaseq(), and secact_activity_inference_st()
  • use_gsl_rng parameter in ridge_batch() โ€” enables the ~70x faster NumPy RNG path for batch processing

Fixed

  • use_gsl_rng was accepted by secact_activity_inference but silently ignored by ridge_batch, which always used the slower
    GSL RNG. Now ridge_batch (both dense and sparse paths) respects the flag.
  • Docker R build: add libfftw3-dev for qqconf/metap and pre-install multtest from Bioconductor

Changed

  • Expanded README batch processing documentation: explains what batch processing is, in-memory vs streaming modes, dense vs sparse
    handling, and includes downloadable example data

Full Changelog: v0.2.0...v0.2.1

secactpy_v0.2.0

06 Jan 17:22

Choose a tag to compare

Secreted Protein Activity Inference using Ridge Regression

We're excited to announce the official public release of SecActpy v0.2.0, the Python implementation of the SecAct/RidgeR algorithm for inferring secreted protein activity from gene expression data. This release marks the migration to the official data2intelligence organization repository.


๐ŸŽฏ Overview

SecActpy enables inference of cytokine/chemokine signaling activity from:

  • Bulk RNA-seq โ€” Differential expression or raw counts
  • scRNA-seq โ€” Single-cell or pseudo-bulk by cell type
  • Spatial Transcriptomics โ€” Visium, CosMx, and other platforms

โœจ Key Features

๐Ÿ”ฌ R-Compatible Results

Produces identical results to the R SecAct/RidgeR package using GSL-compatible random number generation.

๐Ÿš€ GPU Acceleration

Optional CuPy backend provides 9โ€“34x speedup over R implementations.

๐Ÿ“Š Million-Sample Scale

Batch processing with streaming H5AD output for datasets that don't fit in memory.

๐Ÿงฎ Sparse-Aware

Automatic memory-efficient processing for sparse single-cell data โ€” just pass sparse matrices directly.

๐Ÿ’พ Smart Caching

Optional permutation table caching for faster repeated analyses.

๐Ÿ”ฌ Built-in Signatures

Includes SecAct and CytoSig signature matrices.


๐Ÿ“ฆ Installation

# From PyPI (recommended)
pip install secactpy

# From GitHub (CPU only)
pip install git+https://github.com/data2intelligence/SecActpy.git

# With GPU support (CUDA 11.x)
pip install "secactpy[gpu] @ git+https://github.com/data2intelligence/SecActpy.git"

๐Ÿš€ Quick Start

from secactpy import secact_activity_inference

# Bulk RNA-seq
result = secact_activity_inference(
    diff_expr,
    is_differential=True,
    sig_matrix="secact",
    verbose=True
)

# Access results
activity = result['zscore']
pvalues = result['pvalue']
from secactpy import secact_activity_inference_st

# Spatial transcriptomics
result = secact_activity_inference_st(
    "path/to/visium/",
    min_genes=1000,
    verbose=True
)
from secactpy import ridge_batch

# Large-scale batch processing (dense or sparse)
result = ridge_batch(
    X, Y_sparse,
    batch_size=10000,
    backend='cupy',  # GPU
    verbose=True
)

๐Ÿ“ˆ Performance

Dataset R (Mac M1) R (Linux) Py (CPU) Py (GPU) Speedup
Bulk (1,170 sp ร— 1,000 samples) 74.4s 141.6s 128.8s 6.7s 11โ€“19x
scRNA-seq (1,170 sp ร— 788 cells) 54.9s 117.4s 104.8s 6.8s 8โ€“15x
Visium (1,170 sp ร— 3,404 spots) 141.7s 379.8s 381.4s 11.2s 13โ€“34x
CosMx (151 sp ร— 443,515 cells) 936.9s 976.1s 1226.7s 99.9s 9โ€“12x

Benchmark Environment:

  • Mac CPU: M1 Pro with VECLIB (8 cores)
  • Linux CPU: AMD EPYC 7543P (4 cores)
  • Linux GPU: NVIDIA A100-SXM4-80GB

๐Ÿ“š API Reference

High-Level Functions

Function Description
secact_activity_inference() Bulk RNA-seq inference
secact_activity_inference_st() Spatial transcriptomics inference
secact_activity_inference_scrnaseq() scRNA-seq inference

Core Functions

Function Description
ridge() Ridge regression with permutation testing
ridge_batch() Batch processing (auto-handles dense/sparse)
load_signature() Load built-in signature matrices

Utilities

Function Description
estimate_batch_size() Optimal batch size for available memory
estimate_memory() Memory requirement estimation
StreamingResultWriter Stream results to H5AD

๐Ÿ”ง Key Parameters

Parameter Default Description
sig_matrix "secact" Signature: "secact", "cytosig", or DataFrame
lambda_ 5e5 Ridge regularization parameter
n_rand 1000 Number of permutations
seed 0 Random seed for reproducibility
backend "auto" "auto", "numpy", or "cupy"
use_cache False Cache permutation tables to disk

๐Ÿ“‹ Requirements

  • Python โ‰ฅ 3.9
  • NumPy โ‰ฅ 1.20
  • Pandas โ‰ฅ 1.3
  • SciPy โ‰ฅ 1.7
  • h5py โ‰ฅ 3.0
  • anndata โ‰ฅ 0.8
  • scanpy โ‰ฅ 1.9
  • Optional: CuPy โ‰ฅ 10.0 (GPU)

๐Ÿณ Docker

# CPU
docker pull psychemistz/secactpy:latest

# GPU
docker pull psychemistz/secactpy:gpu

# With R SecAct/RidgeR for cross-validation
docker pull psychemistz/secactpy:with-r

๐Ÿ“– Citation

If you use SecActpy in your research, please cite:

Beibei Ru, Lanqi Gong, Emily Yang, Seongyong Park, George Zaki, Kenneth Aldape, Lalage Wakefield, Peng Jiang. Inference of secreted protein activities in intercellular communication. GitHub: data2intelligence/SecAct


๐Ÿ“„ License

MIT License


๐Ÿ”— Related Projects

  • SecAct โ€” Original R implementation
  • RidgeR โ€” R ridge regression package
  • SpaCET โ€” Spatial transcriptomics cell type analysis
  • CytoSig โ€” Cytokine signaling inference

๐Ÿ™ Acknowledgments

  • Original R implementation: SecAct and RidgeR
  • Signature databases: SecAct, CytoSig

Full documentation: https://github.com/data2intelligence/SecActpy