time-series-machine-learning
diff --git a/‎tsml_eval/publications/clustering/__init__.py‎
Lines changed: 1 addition & 0 deletions b/‎tsml_eval/publications/clustering/__init__.py‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎tsml_eval/publications/clustering/kasba/README.md‎
Lines changed: 109 additions & 0 deletions b/‎tsml_eval/publications/clustering/kasba/README.md‎
Lines changed: 109 additions & 0 deletions
diff --git a/‎tsml_eval/publications/clustering/kasba/__init__.py‎
Lines changed: 1 addition & 0 deletions b/‎tsml_eval/publications/clustering/kasba/__init__.py‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎tsml_eval/publications/clustering/kasba/_experiment_script.py‎
Lines changed: 149 additions & 0 deletions b/‎tsml_eval/publications/clustering/kasba/_experiment_script.py‎
Lines changed: 149 additions & 0 deletions
@@ -0,0 +1 @@
+"""Files for clustering publications."""
@@ -0,0 +1,109 @@
+# 📘 KASBA: k-means Accelerated Stochastic Subgradient Barycentre Averaging
+**Official Repository for the KASBA Time Series Clustering Paper**
+
+This repository accompanies the paper:
+
+> **Rock the KASBA: Blazingly Fast and Accurate Time Series Clustering**
+>
+> https://arxiv.org/abs/2411.17838
+
+KASBA is a $k$-means clustering algorithm that uses the Move-Split-Merge (MSM) elastic distance at all stages of clustering, applies a randomised stochastic subgradient descent to find barycentre centroids, links each stage of clustering to accelerate convergence and exploits the metric property of MSM distance to avoid a large proportion of distance calculations. It is a versatile and scalable clusterer designed for real-world TSCL applications. It allows practitioners to balance  runtime and clustering performance when similarity is best measured by an elastic distance.
+
+KASBA delivers state-of-the-art clustering performance while achieving 1–3 orders of magnitude speedups over existing elastic distance–based k-means algorithms.
+
+This repository contains the exact model configurations, experiment scripts, and visualisation tools used to produce the results in the paper.
+
+---
+
+## 📁 Repository Structure
+
+    kasba/
+    ├── README.md                   # This file
+    ├── __init__.py
+    ├── _utils.py                   # Internal utilities used across the project
+    ├── _model_configuration.py     # Definitions of all models and configurations used in experiments
+    ├── _experiment_script.py       # Script used to run experiments on datasets
+    ├── kasba.ipynb                 # Notebook demonstrating how to run KASBA
+    ├── result_visualisation.ipynb  # Notebook for generating CD diagrams, MCM plots, etc.
+    └── results/                    # Raw CSV result files used in the paper
+        └── combined                # Subfolder for combined results
+            └── k-shape-compare     # Subfolders results in section 5.4
+            └── section-5.1         # Subfolders results in section 5.1
+        └── train-test              # Subfolders for train and test results
+            └── section-5.1         # Subfolders results in section 5.1
+            └── section-5.2         # Subfolders results in section 5.2
+            └── section-5.3         # Subfolders results in section 5.3
+
+
+## 🚀 Getting Started
+
+### Install dependencies
+
+Create and activate a virtual environment from tsml-eval:
+
+    python3 -m venv venv
+    source venv/bin/activate
+    pip install -e .
+
+If you are reading this message you will have to install a specific branch
+of aeon while we wait for a new release. Run the following command to install:
+
+    pip uninstall aeon
+    pip install git+https://github.com/aeon-toolkit/aeon@kasba-results#egg=aeon
+
+Note: The project uses aeon, numpy, matplotlib, and other standard scientific Python packages.
+
+---
+
+## 🧪 Running KASBA
+
+Minimal example from the kasba.ipynb notebook:
+
+    from kasba import KASBA
+    from aeon.datasets import load_dataset
+
+    X, y = load_dataset("GunPoint")
+
+    model = KASBA(
+        n_clusters=2,
+        distance="msm",
+        distance_params={
+            "c": 1.0
+        },
+    )
+
+    labels = model.fit_predict(X)
+
+The notebook demonstrates:
+
+- How to use KASBA with different elastic distances
+- How to cluster multivariate or unequal-length time series
+- How to run multiple initialisations
+- How to inspect convergence behaviour
+
+---
+
+## 📊 Reproducing Figures (CD & MCM)
+
+Use the result_visualisation.ipynb notebook to generate:
+
+- Critical Difference diagrams
+- Model Comparison Matrices
+- Ranking curves and statistical tests
+
+---
+
+## 📜 Citation
+
+If you use KASBA in academic work, please cite the paper:
+
+    C. Holder, A. Bagnall, Rock the kasba: Blazingly fast and accurate time
+    series clustering, arXiv preprint arXiv:2411.17838 (2024)
+
+(A full BibTeX entry will be added once the paper is published.)
+
+---
+
+## 🤝 Contact
+
+For questions or queries please open an issue on tsml-eval.
@@ -0,0 +1 @@
+"""Files for Rock the KASBA."""
@@ -0,0 +1,149 @@
+import sys
+
+import numpy as np
+
+from tsml_eval.experiments import (
+    run_clustering_experiment as tsml_clustering_experiment,
+)
+from tsml_eval.publications.clustering.kasba._model_configuration import (
+    EXPERIMENT_MODELS,
+)
+from tsml_eval.publications.clustering.kasba._utils import (
+    _parse_command_line_bool,
+    check_experiment_results_exist,
+    load_dataset_from_file,
+)
+
+
+def run_threaded_clustering_experiment(
+    dataset: str,
+    clusterer_name: str,
+    dataset_path: str,
+    results_path: str,
+    combine_test_train: bool,
+    resample_id: int,
+):
+    """Run clustering experiment.
+
+    Parameters
+    ----------
+    dataset : str
+        Dataset name.
+    distance : str
+        Distance string (assumed correct and final), e.g.:
+        "msm", "dtw", "soft_msm", "soft_dtw",
+        "soft_divergence_msm", "soft_divergence_dtw".
+    clusterer_str : str
+        Free-form label used only for naming/logging (not logic).
+    dataset_path : str
+        Path to the dataset.
+    results_path : str
+        Path to the results.
+    averaging_method : str
+        One of: "soft", "kasba", "petitjean_ba", "subgradient_ba".
+    combine_test_train : bool, default=False
+        Boolean indicating if data should be combined for test and train.
+    resample_id : int, default=0
+        Integer indicating the resample id.
+    n_jobs : int default=-1
+        Integer indicating the number of jobs to run in parallel.
+    """
+    if clusterer_name not in EXPERIMENT_MODELS:
+        raise ValueError(f"Unknown clusterer_name '{clusterer_name}'")
+
+    # Skip if results already exist
+    if check_experiment_results_exist(
+        model_name=clusterer_name,
+        dataset=dataset,
+        combine_test_train=combine_test_train,
+        path_to_results=results_path,
+        resample_id=resample_id,
+    ):
+        return (
+            f"[SKIP] {clusterer_name} (resample {resample_id}): "
+            f"results already exist."
+        )
+
+    X_train, y_train, X_test, y_test = load_dataset_from_file(
+        dataset,
+        dataset_path,
+        normalize=True,
+        combine_test_train=combine_test_train,
+        resample_id=0,
+    )
+    n_clusters = np.unique(y_train).size
+
+    factory = EXPERIMENT_MODELS[clusterer_name]
+    clusterer = factory(
+        n_clusters=n_clusters,
+        random_state=resample_id,
+        n_jobs=1,
+    )
+
+    tsml_clustering_experiment(
+        X_train=X_train,
+        y_train=y_train,
+        clusterer=clusterer,
+        results_path=results_path,
+        X_test=X_test,
+        y_test=y_test,
+        n_clusters=n_clusters,
+        clusterer_name=clusterer_name,
+        dataset_name=dataset,
+        resample_id=resample_id,
+        data_transforms=None,
+        build_test_file=not combine_test_train,
+        build_train_file=True,
+        benchmark_time=True,
+    )
+    print(f"[DONE] {clusterer_name} (resample {resample_id})")
+
+
+# Boolean to toggle if running locally or via command line.
+RUN_LOCALLY = True
+
+if __name__ == "__main__":
+    """NOTE: To run with command line arguments, set RUN_LOCALLY to False."""
+    if RUN_LOCALLY:
+        print("RUNNING WITH TEST CONFIG")
+
+        dataset = "GunPoint"
+        clusterer_name = "KASBA"
+        combine_test_train = True
+
+        dataset_path = (
+            "/Users/chrisholder/Documents/Research/datasets/UCR/Univariate_ts"
+        )
+        results_path = "/Users/chrisholder/projects/kasba-experiments/full_results"
+        run_threaded_clustering_experiment(
+            dataset=dataset,
+            clusterer_name=clusterer_name,
+            dataset_path=dataset_path,
+            results_path=results_path,
+            combine_test_train=combine_test_train,
+            resample_id=0,
+        )
+
+    else:
+        if len(sys.argv) != 6:
+            print(
+                "Usage: python _clustering_experiment_all.py "
+                "<dataset> <clusterer_name> <dataset_path> <result_path> "
+                "<combine_test_train>"
+            )
+            sys.exit(1)
+
+        dataset = str(sys.argv[1])
+        clusterer_name = str(sys.argv[2])
+        dataset_path = str(sys.argv[3])
+        results_path = str(sys.argv[4])
+        combine_test_train = _parse_command_line_bool(sys.argv[5])
+
+        run_threaded_clustering_experiment(
+            dataset=dataset,
+            clusterer_name=clusterer_name,
+            dataset_path=dataset_path,
+            results_path=results_path,
+            combine_test_train=combine_test_train,
+            resample_id=1,
+        )
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1 @@`
	`1`	`+"""Files for clustering publications."""`