Releases · data2intelligence/SecActpy

08 Mar 13:58

psychemistz

v0.2.5

0a8484c

v0.2.5 — Streaming H5AD for >5M-cell datasets Latest

Latest

What's New

Streaming H5AD (streaming=True): Two-pass chunk-reading algorithm for >5M-cell datasets. Peak memory drops from ~200 GB to ~3 GB.
H5ADChunkReader for memory-efficient H5AD reading via h5py
Fixed H5AD index column detection for obs.attrs['_index'] convention
Added "symbol" fallback for Ensembl gene name resolution

See CHANGELOG for full details.

Assets 2

09 Feb 05:47

psychemistz

v0.2.3

235e546

v0.2.3

Cross-Package Reproducibility

SecActPy, R SecAct, and RidgeR now produce identical z-scores when using matching RNG and grouping settings.

New Features

rng_method parameter — Explicit control over the random number generator in all inference functions:
- "srand" — Matches R SecAct's srand/rand() (platform-dependent; Linux and macOS differ)
- "gsl" — GSL Mersenne Twister, cross-platform reproducible (matches RidgeR rng_method="gsl")
- "numpy" — Fast native NumPy RNG (when exact R matching is not needed)
- The existing use_gsl_rng parameter is preserved for backward compatibility
is_group_sig=True by default — Signature grouping (Pearson r >= 0.9) is now enabled by default in all three
inference functions (secact_activity_inference, secact_activity_inference_scrnaseq, secact_activity_inference_st),
matching R SecAct's default behavior

5-Way Verification

Signature	SecAct R	RidgeR srand	SecActPy srand	RidgeR gsl	SecActPy gsl
A1BG	19.21867	19.21867	19.21867	19.20790	19.20790
A2M	25.65251	25.65251	25.65251	25.04980	25.04980

Full Changelog: v0.2.2...v0.2.3

Assets 2

08 Feb 21:59

psychemistz

v0.2.2

74aaaa4

v0.2.2

Bug Fix

Fix ImportError when using batch_size: secact_activity_inference() and
secact_activity_inference_st() crashed with ImportError: cannot import name 'ridge_batch' from 'secactpy.ridge' when batch_size was set. The import was pointing to the wrong module.

New Features

Sparse mode (`sparse_mode=True`)

Opt-in parameter for memory-efficient processing of sparse Y matrices. Avoids densifying Y during matrix
multiplication by using the identity (Y.T @ T.T).T == T @ Y, with column z-scoring applied as lightweight
corrections on the small output matrix.

Supported in ridge(), ridge_batch(), and all high-level inference functions.

# scRNA-seq: Y stays sparse end-to-end
result = secact_activity_inference_scrnaseq(
    adata, cell_type_col="cell_type",
    is_single_cell_level=True,
    batch_size=5000,
    sparse_mode=True
)

# Spatial transcriptomics
result = secact_activity_inference_st(
    adata, batch_size=5000, sparse_mode=True
)
┌───────────────────────┬──────────────┬──────────────────┐
│                       │   Default    │ sparse_mode=True │
├───────────────────────┼──────────────┼──────────────────┤
│ Memory                │ Full dense Y │ Y stays sparse   │
├───────────────────────┼──────────────┼──────────────────┤
│ Speed (<5% density)   │ Baseline     │ ~1.8x faster     │
├───────────────────────┼──────────────┼──────────────────┤
│ Speed (5–10% density) │ Baseline     │ ~25% slower      │
├───────────────────────┼──────────────┼──────────────────┤
│ Results               │ Identical    │ Identical        │
└───────────────────────┴──────────────┴──────────────────┘
In-flight row-mean centering (row_center=True)

New parameter in ridge_batch() for applying row-mean centering without densifying Y. Computes row-centered column
statistics analytically from sparse Y.

Assets 2

08 Feb 19:11

psychemistz

v0.2.1

d099e8a

v0.2.1

What's Changed

Added

Streaming output (output_path, output_compression) in all high-level inference functions: secact_activity_inference(),
secact_activity_inference_scrnaseq(), and secact_activity_inference_st()
use_gsl_rng parameter in ridge_batch() — enables the ~70x faster NumPy RNG path for batch processing

Fixed

use_gsl_rng was accepted by secact_activity_inference but silently ignored by ridge_batch, which always used the slower
GSL RNG. Now ridge_batch (both dense and sparse paths) respects the flag.
Docker R build: add libfftw3-dev for qqconf/metap and pre-install multtest from Bioconductor

Changed

Expanded README batch processing documentation: explains what batch processing is, in-memory vs streaming modes, dense vs sparse
handling, and includes downloadable example data

Full Changelog: v0.2.0...v0.2.1

Assets 2

06 Jan 17:22

psychemistz

v0.2.0

9b74b47

secactpy_v0.2.0

Secreted Protein Activity Inference using Ridge Regression

We're excited to announce the official public release of SecActpy v0.2.0, the Python implementation of the SecAct/RidgeR algorithm for inferring secreted protein activity from gene expression data. This release marks the migration to the official data2intelligence organization repository.

🎯 Overview

SecActpy enables inference of cytokine/chemokine signaling activity from:

Bulk RNA-seq — Differential expression or raw counts
scRNA-seq — Single-cell or pseudo-bulk by cell type
Spatial Transcriptomics — Visium, CosMx, and other platforms

✨ Key Features

🔬 R-Compatible Results

Produces identical results to the R SecAct/RidgeR package using GSL-compatible random number generation.

🚀 GPU Acceleration

Optional CuPy backend provides 9–34x speedup over R implementations.

📊 Million-Sample Scale

Batch processing with streaming H5AD output for datasets that don't fit in memory.

🧮 Sparse-Aware

Automatic memory-efficient processing for sparse single-cell data — just pass sparse matrices directly.

💾 Smart Caching

Optional permutation table caching for faster repeated analyses.

🔬 Built-in Signatures

Includes SecAct and CytoSig signature matrices.

📦 Installation

# From PyPI (recommended)
pip install secactpy

# From GitHub (CPU only)
pip install git+https://github.com/data2intelligence/SecActpy.git

# With GPU support (CUDA 11.x)
pip install "secactpy[gpu] @ git+https://github.com/data2intelligence/SecActpy.git"

🚀 Quick Start

from secactpy import secact_activity_inference

# Bulk RNA-seq
result = secact_activity_inference(
    diff_expr,
    is_differential=True,
    sig_matrix="secact",
    verbose=True
)

# Access results
activity = result['zscore']
pvalues = result['pvalue']

from secactpy import secact_activity_inference_st

# Spatial transcriptomics
result = secact_activity_inference_st(
    "path/to/visium/",
    min_genes=1000,
    verbose=True
)

from secactpy import ridge_batch

# Large-scale batch processing (dense or sparse)
result = ridge_batch(
    X, Y_sparse,
    batch_size=10000,
    backend='cupy',  # GPU
    verbose=True
)

📈 Performance

Dataset	R (Mac M1)	R (Linux)	Py (CPU)	Py (GPU)	Speedup
Bulk (1,170 sp × 1,000 samples)	74.4s	141.6s	128.8s	6.7s	11–19x
scRNA-seq (1,170 sp × 788 cells)	54.9s	117.4s	104.8s	6.8s	8–15x
Visium (1,170 sp × 3,404 spots)	141.7s	379.8s	381.4s	11.2s	13–34x
CosMx (151 sp × 443,515 cells)	936.9s	976.1s	1226.7s	99.9s	9–12x

Benchmark Environment:

Mac CPU: M1 Pro with VECLIB (8 cores)
Linux CPU: AMD EPYC 7543P (4 cores)
Linux GPU: NVIDIA A100-SXM4-80GB

📚 API Reference

High-Level Functions

Function	Description
`secact_activity_inference()`	Bulk RNA-seq inference
`secact_activity_inference_st()`	Spatial transcriptomics inference
`secact_activity_inference_scrnaseq()`	scRNA-seq inference

Core Functions

Function	Description
`ridge()`	Ridge regression with permutation testing
`ridge_batch()`	Batch processing (auto-handles dense/sparse)
`load_signature()`	Load built-in signature matrices

Utilities

Function	Description
`estimate_batch_size()`	Optimal batch size for available memory
`estimate_memory()`	Memory requirement estimation
`StreamingResultWriter`	Stream results to H5AD

🔧 Key Parameters

Parameter	Default	Description
`sig_matrix`	`"secact"`	Signature: "secact", "cytosig", or DataFrame
`lambda_`	`5e5`	Ridge regularization parameter
`n_rand`	`1000`	Number of permutations
`seed`	`0`	Random seed for reproducibility
`backend`	`"auto"`	"auto", "numpy", or "cupy"
`use_cache`	`False`	Cache permutation tables to disk

📋 Requirements

Python ≥ 3.9
NumPy ≥ 1.20
Pandas ≥ 1.3
SciPy ≥ 1.7
h5py ≥ 3.0
anndata ≥ 0.8
scanpy ≥ 1.9
Optional: CuPy ≥ 10.0 (GPU)

🐳 Docker

# CPU
docker pull psychemistz/secactpy:latest

# GPU
docker pull psychemistz/secactpy:gpu

# With R SecAct/RidgeR for cross-validation
docker pull psychemistz/secactpy:with-r

📖 Citation

If you use SecActpy in your research, please cite:

Beibei Ru, Lanqi Gong, Emily Yang, Seongyong Park, George Zaki, Kenneth Aldape, Lalage Wakefield, Peng Jiang. Inference of secreted protein activities in intercellular communication. GitHub: data2intelligence/SecAct

📄 License

MIT License

🔗 Related Projects

SecAct — Original R implementation
RidgeR — R ridge regression package
SpaCET — Spatial transcriptomics cell type analysis
CytoSig — Cytokine signaling inference

🙏 Acknowledgments

Original R implementation: SecAct and RidgeR
Signature databases: SecAct, CytoSig

Full documentation: https://github.com/data2intelligence/SecActpy

Assets 2

Releases: data2intelligence/SecActpy

v0.2.5 — Streaming H5AD for >5M-cell datasets

What's New

Uh oh!

v0.2.3

Uh oh!

v0.2.2

Bug Fix

New Features

Sparse mode (sparse_mode=True)

Uh oh!

v0.2.1

What's Changed

Added

Fixed

Changed

Uh oh!

secactpy_v0.2.0

🎯 Overview

✨ Key Features

🔬 R-Compatible Results

🚀 GPU Acceleration

📊 Million-Sample Scale

🧮 Sparse-Aware

💾 Smart Caching

🔬 Built-in Signatures

📦 Installation

🚀 Quick Start

📈 Performance

📚 API Reference

High-Level Functions

Core Functions

Utilities

🔧 Key Parameters

📋 Requirements

🐳 Docker

📖 Citation

📄 License

🔗 Related Projects

🙏 Acknowledgments

Uh oh!

Sparse mode (`sparse_mode=True`)