Skip to content

perf: implement bitpack encoding for LID and MID blocks#328

Open
cheb0 wants to merge 7 commits intomainfrom
312-bitpack-encoding
Open

perf: implement bitpack encoding for LID and MID blocks#328
cheb0 wants to merge 7 commits intomainfrom
312-bitpack-encoding

Conversation

@cheb0
Copy link
Member

@cheb0 cheb0 commented Jan 26, 2026

Description

Replaces varint encoding with faster delta bitpacking. Both LID and MID blocks now use bitpack. Currently, intcomp library is used. The lib doesn't utilize SIMD, so we might update to something else in future.

Measurements

compression
bitpack compresses a lot better: for varints zstd compresses with ratio ~1.7-2.0 while it only compresses delta bitpacked data with ratio ~1.3. Therefore we potentially can disable zstd on benchmarks with a slight dataset size overhead.

dataset size
Overall, we reach approximately same dataset size. For some envs there is a small benefit of around -3% of total dataset.

search latency (prod fractions)

Usually, cold search request are affected. I measured search latency on a single repacked fraction. For example,

  • message:"XYZ" AND NOT k8s_service_name:"ABC" AND NOT request_host:"google.com" AND NOT cluster_name:"zxc"
    12 ms => 8 ms (cold)

For aggregations the benefit is lower simply because there is more CPU work. It would have higher benefit if

  • service:xyz group by k8s_pod
    130 ms => 110 ms (cold)

Overall, the perf improvement is around 5-30% for cold search requests depending on a particular search request.

search latency (logbench)

TODO I will measure it on my PC, since network disk makes measurements of cold search request problematic (results vary a lot).

Fixes #312


@ozontech ozontech deleted a comment from seqbenchbot Feb 6, 2026
@ozontech ozontech deleted a comment from github-actions bot Feb 6, 2026
@ozontech ozontech deleted a comment from seqbenchbot Feb 6, 2026
@dkharms dkharms added the performance Features or improvements that positively affect seq-db performance label Feb 10, 2026
@ozontech ozontech deleted a comment from seqbenchbot Feb 13, 2026
@cheb0 cheb0 changed the title perf: bitpack encoding for LID and MID blocks (draft, work in progress) perf: implement bitpack encoding for LID and MID blocks (draft, work in progress) Mar 10, 2026
@cheb0 cheb0 force-pushed the 312-bitpack-encoding branch from 3accfd6 to 53ef5eb Compare March 11, 2026 10:27
@cheb0 cheb0 changed the title perf: implement bitpack encoding for LID and MID blocks (draft, work in progress) perf: implement bitpack encoding for LID and MID blocks Mar 11, 2026
@cheb0
Copy link
Member Author

cheb0 commented Mar 12, 2026

@seqbenchbot up main bulk

@seqbenchbot
Copy link

seqbenchbot commented Mar 12, 2026

Nice, @cheb0 <(-^,^-)=b!

Your request was successfully served.
Identificator for your ongoing benchmark - dc2d4d40.

Here is a list of helpful links:

  • Take a look at Grafana dashboard;
  • Live-tailing logs are also available;

Have a great time!

@ozontech ozontech deleted a comment from github-actions bot Mar 12, 2026
@ozontech ozontech deleted a comment from github-actions bot Mar 12, 2026
@ozontech ozontech deleted a comment from github-actions bot Mar 12, 2026
@ozontech ozontech deleted a comment from github-actions bot Mar 12, 2026
@ozontech ozontech deleted a comment from github-actions bot Mar 12, 2026
@github-actions
Copy link
Contributor

🔴 Performance Degradation

Some benchmarks have degraded compared to the previous run.
Click on Show table button to see full list of degraded benchmarks.

Show table
Name Previous Current Ratio Verdict
AggDeep/size=10000-4 03387c b4fae3
45145.00 ns/op 52104.00 ns/op 1.15 🔴
AndTree/size=1000-4 03387c b4fae3
4.26 ns/op 4.85 ns/op 1.14 🔴
Block_Pack-4 ------ b4fae3
NaN B/op 27344.00 B/op NaN 🔴
NaN allocs/op 4.00 allocs/op NaN 🔴
NaN ns/op 82987.00 ns/op NaN 🔴
Block_Unpack-4 ------ b4fae3
NaN B/op 262186.00 B/op NaN 🔴
NaN allocs/op 2.00 allocs/op NaN 🔴
NaN ns/op 80490.00 ns/op NaN 🔴
FindSequence_Random/small-4 03387c b4fae3
6427.67 MB/s 5452.96 MB/s 0.85 🔴
40.85 ns/op 46.95 ns/op 1.15 🔴

@cheb0
Copy link
Member Author

cheb0 commented Mar 12, 2026

@seqbenchbot down dc2d4d40

@seqbenchbot
Copy link

seqbenchbot commented Mar 12, 2026

Nice, @cheb0 <(-^,^-)=b!

The benchmark with identificator dc2d4d40 was finished.
I've prepared a summary for you. Click on Show summary button to see it:

Show summary
Query Type mean (ms) stddev (ms) p(50) (ms) p(95) (ms) p(99) (ms) iterations
base comp diff base comp diff base comp diff base comp diff base comp diff base comp diff
bulk
warm 60.99 60.79 -0.33% 23.17 22.77 -1.74% 54.00 54.00 0.00% 107.00 107.00 0.00% 146.00 145.00 -0.68% 9694.00 9652.00 -0.43%

Have a great time!

@ozontech ozontech deleted a comment from seqbenchbot Mar 12, 2026
@ozontech ozontech deleted a comment from seqbenchbot Mar 12, 2026
@ozontech ozontech deleted a comment from github-actions bot Mar 12, 2026
@ozontech ozontech deleted a comment from github-actions bot Mar 12, 2026
@cheb0 cheb0 force-pushed the 312-bitpack-encoding branch from bf8f3d8 to 985f11f Compare March 12, 2026 10:45
@github-actions
Copy link
Contributor

🔴 Performance Degradation

Some benchmarks have degraded compared to the previous run.
Click on Show table button to see full list of degraded benchmarks.

Show table
Name Previous Current Ratio Verdict
Block_Pack-4 ------ 035be1
NaN B/op 27344.00 B/op NaN 🔴
NaN allocs/op 4.00 allocs/op NaN 🔴
NaN ns/op 72861.00 ns/op NaN 🔴
Block_Unpack-4 ------ 035be1
NaN B/op 262183.00 B/op NaN 🔴
NaN allocs/op 2.00 allocs/op NaN 🔴
NaN ns/op 73565.00 ns/op NaN 🔴
Indexer-4 03387c 035be1
721669786.00 B/op 805572232.00 B/op 1.12 🔴

@cheb0 cheb0 force-pushed the 312-bitpack-encoding branch from 985f11f to 4c802d2 Compare March 12, 2026 12:06
@github-actions
Copy link
Contributor

🔴 Performance Degradation

Some benchmarks have degraded compared to the previous run.
Click on Show table button to see full list of degraded benchmarks.

Show table
Name Previous Current Ratio Verdict
AggDeep/size=10000-4 03387c b3e5ee
45145.00 ns/op 51940.00 ns/op 1.15 🔴
AggDeep/size=1000000-4 03387c b3e5ee
4554859.00 ns/op 5474331.00 ns/op 1.20 🔴
And/size=1000-4 03387c b3e5ee
4.29 ns/op 4.83 ns/op 1.13 🔴
Block_Pack-4 ------ b3e5ee
NaN B/op 27344.00 B/op NaN 🔴
NaN allocs/op 4.00 allocs/op NaN 🔴
NaN ns/op 78396.00 ns/op NaN 🔴
Block_Unpack-4 ------ b3e5ee
NaN B/op 262177.00 B/op NaN 🔴
NaN allocs/op 2.00 allocs/op NaN 🔴
NaN ns/op 86552.00 ns/op NaN 🔴
FindSequence_Random/small-4 03387c b3e5ee
6427.67 MB/s 5668.64 MB/s 0.88 🔴
Indexer-4 03387c b3e5ee
721669786.00 B/op 805627342.00 B/op 1.12 🔴

@ozontech ozontech deleted a comment from seqbenchbot Mar 12, 2026
@ozontech ozontech deleted a comment from codecov-commenter Mar 12, 2026
@ozontech ozontech deleted a comment from seqbenchbot Mar 13, 2026
@ozontech ozontech deleted a comment from seqbenchbot Mar 13, 2026
@cheb0 cheb0 marked this pull request as ready for review March 13, 2026 09:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance Features or improvements that positively affect seq-db performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Use bitpack encoding for LID/MID blocks

3 participants