Skip to content

Conversation

@aknayar
Copy link
Contributor

@aknayar aknayar commented Oct 30, 2025

This PR adds IndexFlatL2Panorama, integrating Panorama (as specified in the paper) into IndexFlatL2. This is the first step in creating an IndexRefinePanorama, which will use IndexFlatL2Panorama (or an IndexPreTransform with an IndexFlatL2Panorama) as its refine_index.

Refactoring

Since the bulk of Panorama's refinement logic would be duplicated between IndexFlatL2Panorama and IndexIVFFlatPanorama, it has been factored out into a new Panorama struct. This struct contains key parameters (batch_size, d, etc.) and the following utility functions:

  • copy_codes_to_level_layout: Writes new vectors to codes following Panorama's storage layout
  • compute_cumulative_sums: Computes the cumulative sums for new vectors
  • compute_query_cum_sums: Computes the cumulative sums for a new query
  • progressive_filter_batch: Performs Panorama refinement on a batch of vectors

These utilities will be shared by most Panorama indexes, which is why I have refactored them into their own utility.

IndexRefinePanorama

While the IndexFlatL2Panorama implemented in this PR technically contains all the functionality needed to implement IndexRefinePanorama (performing search on a subset of indices), it is not ready to be used as a refine_index. The current implementation is not optimized for the case of IndexRefine, where we perform search on a very small subset of the datapoints. This leads to vastly scattered memory accesses during the search, to the point where the overhead of maintaining active_indices and exact_distances can thwart Panorama's speedups.

As such, to optimize for IndexRefine we will need a standalone implementation of search_subset which instead does the following:

  1. Iterate over the subset of indices (rather than batches of codes)
  2. For each index i, compute its distance alone by Panorama refinement (essentially having batch_size = 1. In fact, for this very reason I have made batch_size a parameter in the constructor—IndexRefine will require it to be 1 due to noncontiguous memory accesses, but typical workloads would benefit from 128-1024.)

This will unfortunately mean we cannot reuse the search utilities in the Panorama struct in this specific case, but will allow us to squeeze 2-5x speedups during the reordering phase of IndexRefine.

Testing

  1. Unit tests can be found in tests/test_flat_l2_panorama.py
  2. A benchmark can be found in benchs/bench_flat_l2_panorama.py, yielding the following results:
======Flat
        Recall@10: 0.980000, speed: 263.214254 ms/query, dims scanned: 100.00%
======PCA960,FlatL2Panorama8_512
        Recall@10: 0.980000, speed: 37.080264 ms/query, dims scanned: 12.62%

The recall being less than 1.0 is perhaps due to discrepancies between faiss results and the ground_truth values.
bench_flat_l2_panorama

@meta-cla meta-cla bot added the CLA Signed label Oct 30, 2025
"""Test when n_levels doesn't evenly divide dimension"""
test_cases = [(65, 4), (63, 8), (100, 7)]

# TODO(aknayar): Test functions like get_single_code().
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add these tests in a follow-up PR.

@aknayar
Copy link
Contributor Author

aknayar commented Oct 30, 2025

I rebuilt with AVX2 on Linux and was unable to reproduce the failing tests seen here, any ideas what may have happened?

@limqiying
Copy link
Contributor

Hey, let me just copy what @mnorris11 said and we can resume the thread from there.

When checking the log, it is just 2 elements wrong in one unit test. Some ideas:

There could be a tie in distance, and different ids happen to get returned. If this is the case, they could update the test where if there is a mismatch, then check distances, and if they are equal then still do not fail the test.

Do you have faiss installed with numpy2? It's a recent integration, and that could be the reason for the difference. Let me know the conda steps you took to repro!

@aknayar
Copy link
Contributor Author

aknayar commented Oct 31, 2025

@limqiying Thank you for the ideas! It seems like it was, in fact, some weird case involving a tie. Panorama also suffers from floating-point imprecision due to how we calculate squared L2 norm: Faiss does $\Vert\ q - p\ \Vert_2^2$ while Panorama does $\Vert\ q\Vert_2^2 + \Vert p\Vert_2^2 - 2(q\cdot p)$. As a result, such cases where ties produce different results are possible (albeit rare). Simply changing the random seed on the dataset generator for the testcase seemed to fix things!

@mnorris11
Copy link

mnorris11 commented Nov 9, 2025

Hi @aknayar , when doing some benchmarks, I cannot seem to get a speedup from IndexFlatPanorama. It could be due to my configuration for nlevels and batch_size. How do you choose these in relation to database size, dimensions, topk, number of queries etc? Say nb = 1 million vectors, d=256, k = 10 or 100, nq = 1000. (I chose 8 for nlevels and 512 for batch_size, which could be suboptimal)

        index_regular = faiss.index_factory(d, "Flat")
        batch_size = 512
        index = faiss.IndexFlatPanorama(d, faiss.METRIC_L2, nlevels, batch_size)
        index_panorama = faiss.IndexPreTransform(faiss.PCAMatrix(d, d), index)
        # add vectors, then query index_regular and index_panorama etc

@aknayar
Copy link
Contributor Author

aknayar commented Nov 9, 2025

Hi @mnorris11, this is interesting—your parameters seem fine.

The theoretical ideal for batch_size is 1, as it allows the most pruning, but in practice anywhere from 256-1024 yields the best speedups (too low will incur more overhead while too high will cause less pruning). It also must be larger than k, but that is already the case here.

nlevels depends mostly on the dimensionality of the dataset. For 256, either 4 or 8 should be best (for 128 dims I'd recommend 4 levels and for 1024 dims and above I'd recommend 8-32 levels). When you increase nlevels, you'll prune more, but also incur more overhead.

I tried your setup on our server and observed a 6.98x speedup on GIST1M:

akash@lutex:~/faiss-pano$ python3 benchs/bench_flat_l2_panorama.py 
======Flat
        Recall@10: 0.980000, speed: 266.571403 ms/query, dims scanned: 100.00%
======PCA960,FlatL2Panorama8_512
        Recall@10: 0.980000, speed: 38.172317 ms/query, dims scanned: 12.62%

Do you mind sharing your benchmarking script so I could give it a try locally? I'm also curious what faiss.cvar.indexPanorama_stats.ratio_dims_scanned outputs after the Panorama query. Thanks!

@mnorris11
Copy link

Hi @mnorris11, this is interesting—your parameters seem fine.

The theoretical ideal for batch_size is 1, as it allows the most pruning, but in practice anywhere from 256-1024 yields the best speedups (too low will incur more overhead while too high will cause less pruning). It also must be larger than k, but that is already the case here.

nlevels depends mostly on the dimensionality of the dataset. For 256, either 4 or 8 should be best (for 128 dims I'd recommend 4 levels and for 1024 dims and above I'd recommend 8-32 levels). When you increase nlevels, you'll prune more, but also incur more overhead.

I tried your setup on our server and observed a 6.98x speedup on GIST1M:

akash@lutex:~/faiss-pano$ python3 benchs/bench_flat_l2_panorama.py 
======Flat
        Recall@10: 0.980000, speed: 266.571403 ms/query, dims scanned: 100.00%
======PCA960,FlatL2Panorama8_512
        Recall@10: 0.980000, speed: 38.172317 ms/query, dims scanned: 12.62%

Do you mind sharing your benchmarking script so I could give it a try locally? I'm also curious what faiss.cvar.indexPanorama_stats.ratio_dims_scanned outputs after the Panorama query. Thanks!

Thanks for the info. I was checking some internal datasets, along with the SyntheticDataset present within Faiss. It looks something like this, piecemeal from the actual script.

def generate_data(d, nt, nb, nq, seed=42):
    ds = SyntheticDataset(d, nt, nb, nq)
    return np.concatenate([ds.get_database(), ds.get_queries(), ds.get_train()])

def get_avg_query_speed(index, xq, k):
    trials = 3
    result = timeit.timeit(
        stmt="index.search(xq, k)",
        number=trials,
        globals={"index": index, "xq": xq, "k": k},
    )
    return result / trials * 1000.0  # ms

d, nb, nt, nq, nlevels, k = 2048, 1_000_000, 150_000, 1000, 8, 100
x = generate_data(d, nt, nb, nq, seed=42)
xb = x[:nb]
xq = x[nb : nb + nq]
xt = x[nb + nq : nb + nq + nt]

nlevel_list = [8, 16, 32, 64]
batch_sizes = [128, 256, 512]
for nl in nlevel_list:
    for bs in batch_sizes:
        index_regular = faiss.index_factory(d, "Flat")
        index = faiss.IndexFlatPanorama(d, faiss.METRIC_L2, nl, bs)
        index_panorama = faiss.IndexPreTransform(faiss.PCAMatrix(d, d), index)
        # index_panorama = faiss.index_factory(d, f"PCA{d},FlatL2Panorama{nl}_{bs}") # Saw some errors related to SWIG with this but didn't check deeper. Could be an issue running it on our internal build system due to SWIG version etc.
        index_regular.train(xt)
        index_regular.add(xb)
        index_panorama.train(xt)
        index_panorama.add(xb)
        first_str = f"nlevel={nl}, batch_size={bs}"

        _, I_regular = index_regular.search(xq, k)
        regular_q_speed = get_avg_query_speed(index_regular, xq, k)
        _, I_panorama = index_panorama.search(xq, k)
        panorama_q_speed = get_avg_query_speed(index_panorama, xq, k)
        _, gold_I = faiss.knn(xq, xb, k)
        result_line = f"""
{first_str}, \
regular_recall={compute_recall(gold_I, I_regular)}, \
regular_q_speed={regular_q_speed}, \
regular_train_time={regular_train_time}, \
panorama_recall={compute_recall(gold_I, I_panorama)}, \
panorama_q_speed={panorama_q_speed}, \
panorama_train_time={panorama_train_time}
"""
        print(result_line, flush=True)

The Flat results are all around 3000ms for this configuration for SyntheticDataset. Panorama results vary from 5000 to 8000.

Definitely let me know if you see issues in the methodology here, this was whipped together somewhat quickly. I can check the cvar a bit later.

@aknayar
Copy link
Contributor Author

aknayar commented Nov 10, 2025

@mnorris11 Thank you so much for sharing your code—I now notice something super important that I completely forgot to mention: When nq > 20 and there is no sel applied, IndexFlatL2 will trigger BLAS code to compute kNN. Since Panorama does a batch-wise refinement, we cannot accelerate it with BLAS. As such, Panorama will likely perform slower than L2Flat if nq > 20.

That being said, Panorama shines when any of the following is true:

  • nq <= 20
  • There is a sel applied
  • You are operating on a subset of the dataset

The third point is most important since this is exactly what IndexRefine does. As such, using IndexRefinePanorama (which will rely on this IndexFlatPanorama) will open the door to speedups regardless of nq or sel.

In the meantime, to properly gauge IndexFlatPanorama's speedups, it is best to set nq to be less than 20. My apologies for forgetting to mention this earlier—I hope this helps.

@mnorris11
Copy link

@mnorris11 Thank you so much for sharing your code—I now notice something super important that I completely forgot to mention: When nq > 20 and there is no sel applied, IndexFlatL2 will trigger BLAS code to compute kNN. Since Panorama does a batch-wise refinement, we cannot accelerate it with BLAS. As such, Panorama will likely perform slower than L2Flat if nq > 20.

That being said, Panorama shines when any of the following is true:

  • nq <= 20
  • There is a sel applied
  • You are operating on a subset of the dataset

The third point is most important since this is exactly what IndexRefine does. As such, using IndexRefinePanorama (which will rely on this IndexFlatPanorama) will open the door to speedups regardless of nq or sel.

In the meantime, to properly gauge IndexFlatPanorama's speedups, it is best to set nq to be less than 20. My apologies for forgetting to mention this earlier—I hope this helps.

I actually do not see this line of code being hit in Run_search_L2sqr at all when debugging the search() section. progressive_filter_batch seems to just call fvec_inner_product. Am I missing something?

I tried to just pass

    params = faiss.SearchParameters()
    params.sel = faiss.IDSelectorAll()
    _, I_panorama = index_panorama.search(xq, k, params=params)

but this did not result in any change. (nq still 1000).

@aknayar
Copy link
Contributor Author

aknayar commented Nov 11, 2025

@mnorris11 The issue is that IndexFlatL2 has this BLAS codepath which gets triggered for nq > 20 and when there's no sel, so I think the solution would be to pass these params for index_regular instead of the Panorama index.

@mnorris11
Copy link

My mistake. Thanks for the swift replies. I see great speedup on all datasets now after passing the params correctly to the Panorama index!

It would be great if we can make this the default for Panorama. What do you think about the Panorama passing IDSelectorAll inside SearchParameters if they are not specified?

@aknayar
Copy link
Contributor Author

aknayar commented Nov 11, 2025

@mnorris11 No worries! I'm very glad you're now seeing some speedups! So it turns out that Panorama actually doesn't need the params=params at all—that doesn't have any effect on index_panorama since Panorama only has one codepath. The only reason for adding params=params at all is to make sure we don't trigger index_base's BLAS code (to ensure a fair comparison, since Panorama fundamentally can't utilize BLAS in the same manner).

Just as a test, if you change your code to the following, you should still observe speedups:

...
# pass params into index_regular
_, I_regular = index_regular.search(xq, k, params=params)

# don't pass params into index_panorama
_, I_panorama = index_panorama.search(xq, k)
...

@mdouze
Copy link
Contributor

mdouze commented Nov 12, 2025

you can also change the variable distance_compute_blas_threshold

FAISS_API extern int distance_compute_blas_threshold;

Copy link
Contributor

@mdouze mdouze left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

}

// IndexFlatL2Panorama
if (match("FlatL2Panorama([0-9]+)(_[0-9]+)?")) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add IndexFlatPanorama and IndexFlatL2Panorama to the wiki. Should this be done post-merge? I'm also unsure on which section in the wiki this would fall into (besides the tree image that displays the hierarchy).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mdouze I can add to the wiki (both the Guidelines to Choosing an Index and the index factory) after merge.

@aknayar
Copy link
Contributor Author

aknayar commented Nov 12, 2025

@mdouze Thanks for the comments! I have addressed them and implemented reconstruct (and reconstruct_n) for IndexFlatPanorama.

@aknayar aknayar requested a review from mdouze November 12, 2025 17:18
}

void IndexFlatPanorama::reconstruct_n(idx_t i, idx_t n, float* recons) const {
Index::reconstruct_n(i, n, recons);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This overrides IndexFlatCodes' implementation of reconstruct_n.

@meta-codesync
Copy link
Contributor

meta-codesync bot commented Nov 17, 2025

@mnorris11 merged this pull request in 9cd408b.

@mnorris11
Copy link

mnorris11 commented Nov 17, 2025

Thanks @aknayar @AlSchlo, the Flat PR has been merged!

@aknayar
Copy link
Contributor Author

aknayar commented Nov 18, 2025

@mnorris11 Thank you! I have a local build for IndexRefinePanorama ready and will aim to get that PR out for review tonight!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants