scope-ml uses machine learning to classify light curves from the Zwicky Transient Facility (ZTF) and the Vera C. Rubin Observatory (LSST). The documentation is hosted at https://zwickytransientfacility.github.io/scope-ml/. For local preview, install mkdocs-material and run mkdocs serve.
Feature generation includes period-finding (Conditional Entropy, Analysis of Variance, Lomb-Scargle, FPW) and Fourier decomposition via the periodfind library. Fourier features are computed using a batched weighted linear least-squares solver with BIC model selection, replacing the previous per-source scipy.optimize.curve_fit loop.
The pipeline supports running against Rubin Data Preview 1 (DP1) data stored as local parquet files, bypassing the TAP API entirely. This is the recommended approach for large-scale feature generation.
You need three gzip-compressed parquet files downloaded from the Rubin Science Platform and placed in a single directory:
/path/to/dp1_data/
Object.parquet.gzip
ForcedSource.parquet.gzip
Visit.parquet.gzip
Tell scope-ml to use local files instead of the TAP API by setting data_path in config.yaml:
rubin:
data_path: /path/to/dp1_data/Or set the environment variable:
export RUBIN_DATA_PATH=/path/to/dp1_data/When data_path (or RUBIN_DATA_PATH) is set, all Rubin commands automatically use the local parquet backend (RubinLocalClient) instead of the TAP client. No token is needed for local mode.
For a small number of sources (e.g. a cone search or a short object list):
# Cone search
generate-features-rubin --ra 62.0 --dec -37.0 --radius 60 --doCPU
# From a CSV file with an objectId column
generate-features-rubin --objectid-file my_objects.csv --doCPUOutput is written to generated_features_rubin/gen_features_rubin.parquet by default.
For processing the full DP1 catalog, use the chunked SLURM workflow:
1. Prepare chunks -- scan the local parquet files, filter to objects with enough detections, and split into chunk CSVs:
prepare-rubin-chunks \
--data-path /path/to/dp1_data/ \
--chunk-size 5000 \
--min-n-lc-points 50 \
--output-dir rubin_chunksThis writes rubin_chunks/chunk_000.csv, rubin_chunks/chunk_001.csv, etc., plus a master list rubin_chunks/all_eligible_objectids.csv.
2. Generate the SLURM array script:
generate-features-rubin-slurm \
--chunk-dir rubin_chunks \
--output-dir rubin_slurm \
--venv /path/to/your/.venv \
--cpus-per-task 8 \
--top-n-periods 50This writes rubin_slurm/run_rubin_features.sh. Edit the script to adjust partition, account, memory, and module loads for your cluster.
3. Submit the array job:
sbatch rubin_slurm/run_rubin_features.shEach array task processes one chunk and writes generated_features_rubin/gen_features_rubin_<TASK_ID>.parquet.
4. Combine results after all jobs finish:
combine-rubin-features \
--input-dir generated_features_rubin \
--output generated_features_rubin/dp1_features_combined.parquetIf you don't have a SLURM cluster, you can run chunks sequentially (or resume after interruption) with:
python tools/run_rubin_chunked.py \
--objectid-file rubin_chunks/all_eligible_objectids.csv \
--doCPU \
--chunk-size 5000 \
--top-n-periods 50Completed chunks are saved to generated_features_rubin/chunks/ and skipped on restart, so the job is resumable.
| Command | Description |
|---|---|
get-rubin-ids |
Discover object IDs via cone search or read from CSV |
generate-features-rubin |
Generate features for a set of Rubin sources |
prepare-rubin-chunks |
Scan local parquet files and split eligible objects into chunk CSVs |
generate-features-rubin-slurm |
Generate a SLURM array job script from chunk files |
combine-rubin-features |
Merge per-chunk parquet outputs into a single file |
We gratefully acknowledge previous and current support from the U.S. National Science Foundation (NSF) Harnessing the Data Revolution (HDR) Institute for Accelerated AI Algorithms for Data-Driven Discovery (A3D3) under Cooperative Agreement No. PHY-2117997.

