-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix the range check for range index on raw column #9453
Conversation
// Compare unsigned | ||
boolean lowerUnbounded = min + Long.MIN_VALUE <= Long.MIN_VALUE; | ||
boolean upperUnbounded = max + Long.MIN_VALUE >= columnMax + Long.MIN_VALUE; | ||
if (lowerUnbounded && upperUnbounded) { | ||
MutableRoaringBitmap all = new MutableRoaringBitmap(); | ||
all.add(0, _numDocs); | ||
all.add(0L, _numDocs); | ||
return all; | ||
} | ||
RangeBitmap rangeBitmap = mapRangeBitmap(); | ||
if (lowerUnbounded) { | ||
return rangeBitmap.lte(max).toMutableRoaringBitmap(); | ||
} | ||
if (upperUnbounded) { | ||
return rangeBitmap.gte(min).toMutableRoaringBitmap(); | ||
} | ||
return rangeBitmap.between(min, max).toMutableRoaringBitmap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are non-functional changes and I don't think they are improvements, I suggest removing them to remove noise in the PR (which is an important bug-fix)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will improve the case of full match, where we don't need to read the range bitmap. But agree we can focus on just the bugfix in this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The data structure already detects these cases and skips them, the mapping should take ~10ns and eliminating that should be balanced against needing to read the code
// Compare unsigned | ||
boolean lowerUnbounded = min + Long.MIN_VALUE <= Long.MIN_VALUE; | ||
boolean upperUnbounded = max + Long.MIN_VALUE >= columnMax + Long.MIN_VALUE; | ||
if (lowerUnbounded && upperUnbounded) { | ||
return _numDocs; | ||
} | ||
RangeBitmap rangeBitmap = mapRangeBitmap(); | ||
if (Long.compareUnsigned(max, columnMax) < 0) { | ||
if (Long.compareUnsigned(min, 0) > 0) { | ||
return (int) rangeBitmap.betweenCardinality(min, max); | ||
} | ||
if (lowerUnbounded) { | ||
return (int) rangeBitmap.lteCardinality(max); | ||
} else { | ||
if (Long.compareUnsigned(min, 0) > 0) { | ||
return (int) rangeBitmap.gteCardinality(min); | ||
} | ||
return (int) _numDocs; | ||
} | ||
if (upperUnbounded) { | ||
return (int) rangeBitmap.gteCardinality(min); | ||
} | ||
return (int) rangeBitmap.betweenCardinality(min, max); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are non-functional changes and I don't think they are improvements, I suggest removing them to remove noise in the PR (which is an important bug-fix)
// TODO: RangeBitmap has a bug in version 0.9.28 which gives wrong result computing between for 2 doubles with | ||
// different sign. The bug is tracked here: https://github.com/RoaringBitmap/RoaringBitmap/issues/586. | ||
// Uncomment this line after the bug is fixed. | ||
// double prev = quantiles[0] - 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A temporary workaround is provided on the issue, I suggest applying the suggestion in this PR and revisit this when the bug is fixed (though that should be soon).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good
21b5026
to
a03ef1a
Compare
// TODO: Handle this before reading the range index | ||
if (min > max || min > _max || max < _min) { | ||
return 0; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The check for min > _max || max < _min
is critical and can only be performed here for raw columns. However, ranges with min > max
making it down to this level is a problem with the query planner and should be fixed there (even on the broker)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally we should also handle min > _max || max < _min
using the column metadata during the planning
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, I don’t think the index should need to do these checks at all, but if you need a quick fix maybe this is the quick solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like the changes I flagged as cosmetic and would prefer you revert them, but since this fixes a bug with invalid ranges I feel it is pragmatic to approve.
When range index is applied to a raw (no-dictionary) column, we need to check
max >= _min
before reading the range bitmap. Failing to do so will result in matching all docs instead of no docs as expected (note that negativemax - _min
will be treated as a huge unsigned long).This PR fixes the range check, and also handles the case of
_max
value not available from the metadataDuring this bug fix, found a bug in RangeIndex and tracked under the issue: RoaringBitmap/RoaringBitmap#586