Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the range check for range index on raw column #9453

Merged
merged 2 commits into from
Sep 23, 2022

Conversation

Jackie-Jiang
Copy link
Contributor

@Jackie-Jiang Jackie-Jiang commented Sep 23, 2022

When range index is applied to a raw (no-dictionary) column, we need to check max >= _min before reading the range bitmap. Failing to do so will result in matching all docs instead of no docs as expected (note that negative max - _min will be treated as a huge unsigned long).

This PR fixes the range check, and also handles the case of _max value not available from the metadata

During this bug fix, found a bug in RangeIndex and tracked under the issue: RoaringBitmap/RoaringBitmap#586

Comment on lines 160 to 175
// Compare unsigned
boolean lowerUnbounded = min + Long.MIN_VALUE <= Long.MIN_VALUE;
boolean upperUnbounded = max + Long.MIN_VALUE >= columnMax + Long.MIN_VALUE;
if (lowerUnbounded && upperUnbounded) {
MutableRoaringBitmap all = new MutableRoaringBitmap();
all.add(0, _numDocs);
all.add(0L, _numDocs);
return all;
}
RangeBitmap rangeBitmap = mapRangeBitmap();
if (lowerUnbounded) {
return rangeBitmap.lte(max).toMutableRoaringBitmap();
}
if (upperUnbounded) {
return rangeBitmap.gte(min).toMutableRoaringBitmap();
}
return rangeBitmap.between(min, max).toMutableRoaringBitmap();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are non-functional changes and I don't think they are improvements, I suggest removing them to remove noise in the PR (which is an important bug-fix)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will improve the case of full match, where we don't need to read the range bitmap. But agree we can focus on just the bugfix in this PR

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The data structure already detects these cases and skips them, the mapping should take ~10ns and eliminating that should be balanced against needing to read the code

Comment on lines 179 to 192
// Compare unsigned
boolean lowerUnbounded = min + Long.MIN_VALUE <= Long.MIN_VALUE;
boolean upperUnbounded = max + Long.MIN_VALUE >= columnMax + Long.MIN_VALUE;
if (lowerUnbounded && upperUnbounded) {
return _numDocs;
}
RangeBitmap rangeBitmap = mapRangeBitmap();
if (Long.compareUnsigned(max, columnMax) < 0) {
if (Long.compareUnsigned(min, 0) > 0) {
return (int) rangeBitmap.betweenCardinality(min, max);
}
if (lowerUnbounded) {
return (int) rangeBitmap.lteCardinality(max);
} else {
if (Long.compareUnsigned(min, 0) > 0) {
return (int) rangeBitmap.gteCardinality(min);
}
return (int) _numDocs;
}
if (upperUnbounded) {
return (int) rangeBitmap.gteCardinality(min);
}
return (int) rangeBitmap.betweenCardinality(min, max);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are non-functional changes and I don't think they are improvements, I suggest removing them to remove noise in the PR (which is an important bug-fix)

Comment on lines 280 to 283
// TODO: RangeBitmap has a bug in version 0.9.28 which gives wrong result computing between for 2 doubles with
// different sign. The bug is tracked here: https://github.com/RoaringBitmap/RoaringBitmap/issues/586.
// Uncomment this line after the bug is fixed.
// double prev = quantiles[0] - 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A temporary workaround is provided on the issue, I suggest applying the suggestion in this PR and revisit this when the bug is fixed (though that should be soon).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good

Comment on lines +63 to +66
// TODO: Handle this before reading the range index
if (min > max || min > _max || max < _min) {
return 0;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check for min > _max || max < _min is critical and can only be performed here for raw columns. However, ranges with min > max making it down to this level is a problem with the query planner and should be fixed there (even on the broker)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we should also handle min > _max || max < _min using the column metadata during the planning

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, I don’t think the index should need to do these checks at all, but if you need a quick fix maybe this is the quick solution.

Copy link
Member

@richardstartin richardstartin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like the changes I flagged as cosmetic and would prefer you revert them, but since this fixes a bug with invalid ranges I feel it is pragmatic to approve.

@Jackie-Jiang Jackie-Jiang merged commit 96f4d1e into apache:master Sep 23, 2022
@Jackie-Jiang Jackie-Jiang deleted the fix_range_index branch September 23, 2022 16:57
61yao pushed a commit to 61yao/pinot that referenced this pull request Oct 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants