feat: add counting bloom filter #519

proost · 2024-03-31T14:09:13Z

Previous Discussion: #510

There are 2 different points from origin design.

Count returns all inserted items. not distinct items.
RemoveMulti deduplicate keys. count in Counting Bloom Filter can't be negative. because of these nature, RemoveMulti operation can be complicated and performance can be bad. so add constraints to avoid difficulties.

codecov-commenter · 2024-03-31T14:30:04Z

Codecov Report

Attention: Patch coverage is 93.78238% with 12 lines in your changes are missing coverage. Please review.

Project coverage is 95.59%. Comparing base (a74b679) to head (f959dc3).
Report is 14 commits behind head on main.

Files	Patch %	Lines
rueidisprob/countingbloomfilter.go	93.40%	8 Missing and 4 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #519      +/-   ##
==========================================
+ Coverage   95.57%   95.59%   +0.01%     
==========================================
  Files          80       81       +1     
  Lines       33178    33351     +173     
==========================================
+ Hits        31710    31882     +172     
+ Misses       1267     1266       -1     
- Partials      201      203       +2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

rueidisprob/countingbloomfilter.go

rueian · 2024-03-31T15:38:06Z

Count returns all inserted items. not distinct items.

I guess, a method for getting the distinct count is still needed for users to know whether this filter is close to full or not.

And instead of Exists, should we provide a method for getting the minimum count of a key? Otherwise, users may just use the standard bloom filter.

proost · 2024-04-01T14:14:52Z

@rueian

And instead of Exists, should we provide a method for getting the minimum count of a key?

Do you want to remove Exists or Adding a new method? According to definition, Counting Bloom Filter can insert same key several times.

rueian · 2024-04-01T15:22:37Z

I feel the Esists is not necessary if we have a method to get the minimum count of a key, but we can keep it anyway. It is still a handy method.

rueidisprob/countingbloomfilter.go

proost · 2024-04-02T13:58:02Z

@rueian
I understand what you said. If we had a method that returns minimum count of a key, it is count-min sketch. not counting bloom filter. do you think count-min sketch is more proper? or do you want to integrate with both data structures?

rueian · 2024-04-02T14:14:06Z

do you think count-min sketch is more proper? or do you want to integrate with both data structures?

The wiki you referenced above also says that count-min sketch and counting bloom filter are essentially the same data structure. Could you elaborate more on their differences?

I have no preference between count-min sketch and counting bloom filter. I just want this filter to have more functionalities than the previous standard bloom filter.

proost · 2024-04-02T14:48:13Z

The wiki you referenced above also says that count-min sketch and counting bloom filter are essentially the same data structure.

what i understand is both data structures almost same logic

use hash function to given key to reduce space usage.
each hashed value store count.

but what i know purpose, functionalities, internal data structure is different.
In terms of purpose, Count-Min Sketch store and get item's frequency but Counting Bloom Filter check whether item is in or not.
In terms of internal data structure, Count-Min Sketch uses 2d array, but Counting Bloom Filter use 1d array.
In terms of functionalities, Count-Min Sketch explicitly supports inserting same data multiple, like this update(data, delta). but Counting Bloom Filter doesn't. AddMulti is just batch function to use Add conveniently in this PR, but not originally created function in the paper.

proost · 2024-04-02T14:52:55Z

I have no preference between count-min sketch and counting bloom filter. I just want this filter to have more functionalities than the previous standard bloom filter.

I see. But this implementation may have bigger error rate than count-min sketch(This is uncertain.)

rueian · 2024-04-03T17:01:18Z

but Counting Bloom Filter check whether item is in or not.

This makes it look the same as the original bloom filter, except that it could still report exists=true even after removing an item (since the item could have been added multiple times). From this perspective, it seems that having a method to retrieve the minimum count of an item is still reasonable.

proost · 2024-04-06T01:38:21Z

@rueian
abda0c8
Adding ItemMinCount and ItemMinCountMulti. But considering purpose of Counting Bloom Filter, leaving left Exists and ExistsMulti makes sense to me.

rueian · 2024-04-07T09:09:27Z

rueidisprob/countingbloomfilter.go

+	for _, key := range keys {
+		if _, ok := keySet[key]; !ok {
+			keySet[key] = struct{}{}
+			deduplicatedKeys = append(deduplicatedKeys, key)


Should we do this deduplication? It feels to me that duplication should be allowed.

f959dc3

To mitigate complex logic and guarantee that count is not negative, i added deduplication. But I remove it it is more complex and can downgrade performance.

I see. It is indeed complex but looks good to me now.

rueian · 2024-04-07T16:03:56Z

Merged. Thank you @proost!

proost added 2 commits March 31, 2024 23:04

feat: add counting bloom filter

0fb6c1f

feat: allow duplication of adding

7566423

rueian reviewed Mar 31, 2024

View reviewed changes

rueidisprob/countingbloomfilter.go Outdated Show resolved Hide resolved

style: make short

7069d90

rueian reviewed Apr 1, 2024

View reviewed changes

rueidisprob/countingbloomfilter.go Outdated Show resolved Hide resolved

proost added 2 commits April 6, 2024 10:34

feat: add item min count multi

abda0c8

style: shorter return

bc2f494

proost requested a review from rueian April 6, 2024 01:38

rueian reviewed Apr 7, 2024

View reviewed changes

refactor: remove deduplicated remove multi

f959dc3

proost requested a review from rueian April 7, 2024 13:08

rueian approved these changes Apr 7, 2024

View reviewed changes

rueian merged commit 6a8ba48 into redis:main Apr 7, 2024
2 checks passed

proost deleted the feat-counting-bloom-filter branch April 8, 2024 02:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add counting bloom filter #519

feat: add counting bloom filter #519

proost commented Mar 31, 2024 •

edited

Loading

codecov-commenter commented Mar 31, 2024 •

edited

Loading

rueian commented Mar 31, 2024

proost commented Apr 1, 2024 •

edited

Loading

rueian commented Apr 1, 2024

proost commented Apr 2, 2024 •

edited

Loading

rueian commented Apr 2, 2024

proost commented Apr 2, 2024

proost commented Apr 2, 2024

rueian commented Apr 3, 2024

proost commented Apr 6, 2024

rueian Apr 7, 2024

proost Apr 7, 2024 •

edited

Loading

rueian Apr 7, 2024

rueian commented Apr 7, 2024

feat: add counting bloom filter #519

feat: add counting bloom filter #519

Conversation

proost commented Mar 31, 2024 • edited Loading

codecov-commenter commented Mar 31, 2024 • edited Loading

Codecov Report

rueian commented Mar 31, 2024

proost commented Apr 1, 2024 • edited Loading

rueian commented Apr 1, 2024

proost commented Apr 2, 2024 • edited Loading

rueian commented Apr 2, 2024

proost commented Apr 2, 2024

proost commented Apr 2, 2024

rueian commented Apr 3, 2024

proost commented Apr 6, 2024

rueian Apr 7, 2024

Choose a reason for hiding this comment

proost Apr 7, 2024 • edited Loading

Choose a reason for hiding this comment

rueian Apr 7, 2024

Choose a reason for hiding this comment

rueian commented Apr 7, 2024

proost commented Mar 31, 2024 •

edited

Loading

codecov-commenter commented Mar 31, 2024 •

edited

Loading

proost commented Apr 1, 2024 •

edited

Loading

proost commented Apr 2, 2024 •

edited

Loading

proost Apr 7, 2024 •

edited

Loading