-
Notifications
You must be signed in to change notification settings - Fork 173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add counting bloom filter #519
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #519 +/- ##
==========================================
+ Coverage 95.57% 95.59% +0.01%
==========================================
Files 80 81 +1
Lines 33178 33351 +173
==========================================
+ Hits 31710 31882 +172
+ Misses 1267 1266 -1
- Partials 201 203 +2 ☔ View full report in Codecov by Sentry. |
I guess, a method for getting the distinct count is still needed for users to know whether this filter is close to full or not. And instead of |
Do you want to remove |
I feel the |
@rueian |
The wiki you referenced above also says that count-min sketch and counting bloom filter are essentially the same data structure. Could you elaborate more on their differences? I have no preference between count-min sketch and counting bloom filter. I just want this filter to have more functionalities than the previous standard bloom filter. |
what i understand is both data structures almost same logic
but what i know purpose, functionalities, internal data structure is different. |
I see. But this implementation may have bigger error rate than count-min sketch(This is uncertain.) |
This makes it look the same as the original bloom filter, except that it could still report |
rueidisprob/countingbloomfilter.go
Outdated
for _, key := range keys { | ||
if _, ok := keySet[key]; !ok { | ||
keySet[key] = struct{}{} | ||
deduplicatedKeys = append(deduplicatedKeys, key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we do this deduplication? It feels to me that duplication should be allowed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To mitigate complex logic and guarantee that count is not negative, i added deduplication. But I remove it it is more complex and can downgrade performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. It is indeed complex but looks good to me now.
Merged. Thank you @proost! |
Previous Discussion: #510
There are 2 different points from origin design.
Count
returns all inserted items. not distinct items.RemoveMulti
deduplicate keys. count in Counting Bloom Filter can't be negative. because of these nature,RemoveMulti
operation can be complicated and performance can be bad. so add constraints to avoid difficulties.