Optimize `build_condensed_hierarchy`

`ML::HDBSCAN::detail::Condense::build_condensed_hierarchy` becomes a bottleneck within end-to-end HDBSCAN runs after optimizing the kNN construction (use NN Descent for kNN).

The below is the breakdown of running HDBSCAN on subsamples of the Appliances Amazon review dataset (1.8M x 768).
It can be seen that especially when we optimize the MR (mutual reachability) kNN step NN Descent, the `build_condensed_hierarchy` takes up a very large portion of the e2e time.

<img width="773" height="466" alt="Image" src="https://github.com/user-attachments/assets/e2a47bfd-bd2c-4783-a4a4-b1203fbc5fbc" />

If the tree turns out to be lopsided or very deep (which can happen depending on the data distribution), or if the branching factor is small at the top of the tree, this can take up a lot of time. Below is the breakdown for a make_blobs synthetic dataset.

<img width="567" height="344" alt="Image" src="https://github.com/user-attachments/assets/dffaed8c-56b9-404e-8daf-866fa0319343" />

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize `build_condensed_hierarchy` #7377

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Optimize build_condensed_hierarchy #7377

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Optimize `build_condensed_hierarchy` #7377