Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add frozen support to roaring64 #688

Draft
wants to merge 10 commits into
base: master
Choose a base branch
from
Draft

Conversation

SLieve
Copy link
Contributor

@SLieve SLieve commented Jan 11, 2025

This PR makes the following changes:

  • Change the internals of both ART and roaring64 to be based on indices into arrays rather than pointers.
  • Use uint64_t as the value of the ART rather than a pointer, which allows the ART to self-contain its serialization code.
  • Add support for frozen (de)serialization. This is done by serializing the arrays of objects and indices into them, which means that during deserialization the calls to memcpy are limited and the bitmap can mostly act as a view into a buffer.
  • Add some benchmarks using synthetic data for roaring64 / roaring 64 C++ / std::set. These benchmarks focus more on the higher-order bits than the current microbenchmarks, and thus are a better test case for 64-bit bitmaps.

This is a large change so there is lots of room for mistakes, please let me know what can be improved.

@SLieve SLieve requested review from lemire and Dr-Emann January 11, 2025 16:18
Copy link
Member

@lemire lemire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CI failure (VS-clang) is unrelated. This PR looks good.

/**
* Creates a readonly bitmap that is a view of the given buffer. The buffer
* should be created with `roaring64_bitmap_frozen_serialize()`, and must be
* aligned by 64 bytes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is 64 byte alignment really required? It seems 8 byte (64 bit) alignment is more likely

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell from here, yes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SLieve I am not sure that it is required. I think it is preferred (for performance).

include/roaring/roaring64.h Outdated Show resolved Hide resolved
src/art/art.c Outdated Show resolved Hide resolved
src/art/art.c Outdated Show resolved Hide resolved
src/art/art.c Outdated Show resolved Hide resolved
src/art/art.c Outdated Show resolved Hide resolved
src/art/art.c Outdated Show resolved Hide resolved
src/art/art.c Show resolved Hide resolved
src/art/art.c Outdated Show resolved Hide resolved
This adds the index of the next free node into a newly freed node, or `capacity` if there are no more free indices.

This significantly speeds up finding the next free index, which is important for add+remove workloads.

Benchmarks
Old:
------------------------------------------------------------------
Benchmark                        Time             CPU   Iterations
------------------------------------------------------------------
r64InsertRemoveRandom/0        127 ns          127 ns      5461079
r64InsertRemoveRandom/1      31633 ns        31604 ns        24028
r64InsertRemoveRandom/2      30782 ns        30769 ns        21859
r64InsertRemoveRandom/3      31985 ns        31969 ns        21558
r64InsertRemoveRandom/4        356 ns          356 ns      1962694
r64InsertRemoveRandom/5      28972 ns        28962 ns        21366
r64InsertRemoveRandom/6      30632 ns        30623 ns        22682
r64InsertRemoveRandom/7        448 ns          448 ns      1601550
r64InsertRemoveRandom/8      32506 ns        32495 ns        21591
r64InsertRemoveRandom/9        689 ns          689 ns      1002237
cppInsertRemoveRandom/0        131 ns          131 ns      5319673
cppInsertRemoveRandom/1      16106 ns        16104 ns        43632
cppInsertRemoveRandom/2       3881 ns         3881 ns       180087
cppInsertRemoveRandom/3       3582 ns         3582 ns       171298
cppInsertRemoveRandom/4        403 ns          402 ns      1666697
cppInsertRemoveRandom/5        993 ns          993 ns       706038
cppInsertRemoveRandom/6       4039 ns         4038 ns       172421
cppInsertRemoveRandom/7        469 ns          469 ns      1440197
cppInsertRemoveRandom/8       1454 ns         1454 ns       633551
cppInsertRemoveRandom/9        654 ns          654 ns      1091588
setInsertRemoveRandom/0       1944 ns         1943 ns       368926
setInsertRemoveRandom/1       1955 ns         1953 ns       404931
setInsertRemoveRandom/2       1911 ns         1910 ns       358466
setInsertRemoveRandom/3       1953 ns         1951 ns       362351
setInsertRemoveRandom/4       2104 ns         2102 ns       321387
setInsertRemoveRandom/5       1944 ns         1943 ns       354836
setInsertRemoveRandom/6       1835 ns         1835 ns       359099
setInsertRemoveRandom/7       1970 ns         1968 ns       372625
setInsertRemoveRandom/8       1894 ns         1892 ns       355456
setInsertRemoveRandom/9       1659 ns         1659 ns       355902

New:
------------------------------------------------------------------
Benchmark                        Time             CPU   Iterations
------------------------------------------------------------------
r64InsertRemoveRandom/0        128 ns          128 ns      5614266
r64InsertRemoveRandom/1        935 ns          935 ns       739679
r64InsertRemoveRandom/2        916 ns          916 ns       739944
r64InsertRemoveRandom/3        936 ns          936 ns       690708
r64InsertRemoveRandom/4        368 ns          368 ns      1957642
r64InsertRemoveRandom/5       1141 ns         1140 ns       592505
r64InsertRemoveRandom/6       1139 ns         1138 ns       657840
r64InsertRemoveRandom/7        481 ns          481 ns      1434967
r64InsertRemoveRandom/8       1447 ns         1446 ns       484463
r64InsertRemoveRandom/9        721 ns          721 ns      1017456
cppInsertRemoveRandom/0        134 ns          134 ns      5524804
cppInsertRemoveRandom/1      15616 ns        15608 ns        47666
cppInsertRemoveRandom/2       3855 ns         3854 ns       180265
cppInsertRemoveRandom/3       3809 ns         3808 ns       183595
cppInsertRemoveRandom/4        412 ns          412 ns      1695708
cppInsertRemoveRandom/5       1012 ns         1011 ns       713501
cppInsertRemoveRandom/6       3410 ns         3409 ns       199214
cppInsertRemoveRandom/7        474 ns          474 ns      1496740
cppInsertRemoveRandom/8       1421 ns         1420 ns       465868
cppInsertRemoveRandom/9        564 ns          564 ns      1148076
setInsertRemoveRandom/0       1956 ns         1956 ns       351283
setInsertRemoveRandom/1       1959 ns         1958 ns       355766
setInsertRemoveRandom/2       1886 ns         1885 ns       357406
setInsertRemoveRandom/3       1905 ns         1904 ns       355235
setInsertRemoveRandom/4       1945 ns         1944 ns       364599
setInsertRemoveRandom/5       1902 ns         1902 ns       350312
setInsertRemoveRandom/6       1907 ns         1906 ns       346962
setInsertRemoveRandom/7       1937 ns         1936 ns       356168
setInsertRemoveRandom/8       1881 ns         1880 ns       341472
setInsertRemoveRandom/9       1962 ns         1961 ns       350643
src/art/art.c Outdated Show resolved Hide resolved
src/roaring64.c Outdated Show resolved Hide resolved
src/art/art.c Outdated
Comment on lines 1874 to 1878
if (art->root != CROARING_ART_NULL_REF) {
art->root = art_move_node_to_shrink(art, art->root);
art_shrink_at(art, art->root);
}
return art_shrink_node_arrays(art);
Copy link
Member

@Dr-Emann Dr-Emann Jan 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this works. I need to think more to verify, but imagine the case where we have (all nodes of size 4):

Capacity: 8
Root: idx 5
Free List: 6 -> 4 -> 8

The root will not be moved (because it's before first_free). At some point, the item at idx 7 will be moved into slot 6. But the item at 5 cannot be moved into free slot 4, because we already visited the root.

@SLieve SLieve marked this pull request as draft January 26, 2025 19:55
This avoids a bug in the following scenario:
  art->leaves = [2,0,x]
  art->first_free[leaf_type] = 1

Where `2` and `0` are pointers to the next free index, and `x` is an occupied
leaf. In this case, if `art_shrink_to_fit` was called, then we would have the
following result:
  art->leaves = [2,x,0]
  art->first_free[leaf_type] = 0

This is not fully shrunken, and therefore wrong. Sorting the free indices fixes
this scenario. Before `art_shrink_to_fit`:
  art->leaves = [1,2,x]
  art->first_free[leaf_type] = 0

After `art_shrink_to_fit`:
  art->leaves = [x,2,3]
  art->first_free[leaf_type] = 1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants