-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bloom filter intersection failure #57
Comments
Sorry you are running into issues. Can you provide a code sample where you are seeing this issue? Using the following code I was unable to reproduce the error you are seeing. from probables import BloomFilter
blm1 = BloomFilter(est_elements=16000000, false_positive_rate=.05)
blm2 = BloomFilter(est_elements=16000000, false_positive_rate=.05)
for i in range(0, 50000):
blm1.add(str(i))
blm2.add(str(i*2))
print(blm1.estimate_elements())
blm3 = blm1.intersection(blm2)
print(blm3.estimate_elements()) Yes, it currently returns |
Working:
15560749
13664688
7690975
0.4361343208524868 Not working:
15560749
13664688
|
Wow, this is a great catch. This issue arises from the desire to be able to export and import the Bloom filters to disk. To make that work we pack the false positive rate into a struct (C) and use that value for importing and exporting. In this exact case, it causes a us to need a single bin difference which causes the overflow. I have a PR coming that should resolve the issue by storing and using the original false positive rate for when a new fpr is needed. |
Tried an intersection of 2 bloom filters both with est_elements=16000000, got a list index out of range error
Works fine if both have est_elements=16000001.
If one is 160000000 and the other is 16000001, get a None return on the intersection, rather than throwing an error explaining what the problem is.
The text was updated successfully, but these errors were encountered: