-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support st_contains using H3 index #8498
Conversation
Codecov Report
@@ Coverage Diff @@
## master #8498 +/- ##
============================================
+ Coverage 63.60% 68.79% +5.18%
- Complexity 4322 4326 +4
============================================
Files 1648 1694 +46
Lines 86917 88960 +2043
Branches 13265 13497 +232
============================================
+ Hits 55281 61197 +5916
+ Misses 27595 23431 -4164
- Partials 4041 4332 +291
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/utils/H3Utils.java
Outdated
Show resolved
Hide resolved
...core/src/main/java/org/apache/pinot/core/operator/filter/H3InclusionIndexFilterOperator.java
Outdated
Show resolved
Hide resolved
...core/src/main/java/org/apache/pinot/core/operator/filter/H3InclusionIndexFilterOperator.java
Outdated
Show resolved
Hide resolved
...core/src/main/java/org/apache/pinot/core/operator/filter/H3InclusionIndexFilterOperator.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍🏻
pinot-core/src/main/java/org/apache/pinot/core/plan/FilterPlanNode.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/plan/FilterPlanNode.java
Outdated
Show resolved
Hide resolved
@@ -210,11 +226,47 @@ public void testH3Index() | |||
Assert.assertNotNull(aggregationResult); | |||
Assert.assertEquals((long) aggregationResult.get(0), NUM_RECORDS); | |||
} | |||
|
|||
// Test st contains in polygon | |||
testQueryStContain("SELECT COUNT(*) FROM testTable WHERE ST_Contains(ST_GeomFromText('POLYGON ((\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we add some tests of the corner cases, such as on the line/vertex of the polygon?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ST_Contains seems doesn't include the point or line lying in polygon boundary. https://postgis.net/docs/ST_Contains.html
but let me try create a test with points very close to the boundary.
for (long h3IndexId : _h3Ids) { | ||
partialMatchDocIds[i++] = _h3IndexReader.getDocIds(h3IndexId); | ||
} | ||
MutableRoaringBitmap mutableRoaringBitmap = BufferFastAggregation.or(partialMatchDocIds); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is just approximation, but it doesn't return same result of ST_Contains
, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think
expressionFilterOperator.getNextBlock().getBlockDocIdSet().iterator();
MutableRoaringBitmap result = docIdIterator.applyAnd(partialMatchDocIds);
will do the final check on the potential match set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, it could be further optimized to skip the cells fully contained in the polygon. But we can leave it for futture (add a TODO) if you find it too difficult.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure. will create a TODO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created an issue to track here: #8547
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually I did the optimization. but still will use the issue to track if H3 support different polyfill natively.
High-level comment is that using H3 index can speed up query evaluation, but we need to ensure it does not change query results. H3's Polyfill returns a set of H3 for approximation, but we need one more round of evaluation to check the borderlines. |
2717ee1
to
d8f86c4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for adding the tests and glad to see the improved index handling
// arg0 is the literal | ||
geometry = GeometrySerializer.deserialize(BytesUtils.toBytes(arguments.get(0).getLiteral())); | ||
// must be some h3 index | ||
assert _h3IndexReader != null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
throw some meaningful error message for better debugging?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure. add error message for the assert.
pinot-core/src/main/java/org/apache/pinot/core/plan/FilterPlanNode.java
Outdated
Show resolved
Hide resolved
...core/src/main/java/org/apache/pinot/core/operator/filter/H3InclusionIndexFilterOperator.java
Outdated
Show resolved
Hide resolved
...core/src/main/java/org/apache/pinot/core/operator/filter/H3InclusionIndexFilterOperator.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/plan/FilterPlanNode.java
Outdated
Show resolved
Hide resolved
...core/src/main/java/org/apache/pinot/core/operator/filter/H3InclusionIndexFilterOperator.java
Outdated
Show resolved
Hide resolved
...core/src/main/java/org/apache/pinot/core/operator/filter/H3InclusionIndexFilterOperator.java
Outdated
Show resolved
Hide resolved
dc7b903
to
15e03bb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with minor comments
@@ -56,6 +59,10 @@ | |||
|
|||
|
|||
public class FilterPlanNode implements PlanNode { | |||
|
|||
private static final Set<String> CAN_APPLY_H3_INCLUSION_INDEX_FUNCTION_NAMES = | |||
ImmutableSet.of("st_within", "stwithin", "st_contains", "stcontains"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(minor) No need to check st_within
and st_contains
because the function name is already canonicalized (we keep 2 function names for st_distance
for backward compatibility). We may directly use the value equals check into the canApplyH3IndexForInclusionCheck()
instead of using a set for 2 values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure.
if (arguments.get(0).getType() == ExpressionContext.Type.IDENTIFIER | ||
&& arguments.get(1).getType() == ExpressionContext.Type.LITERAL) { | ||
String columnName = arguments.get(0).getIdentifier(); | ||
return columnName != null && _indexSegment.getDataSource(columnName).getH3Index() != null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(minor) columnName
will never be null
return columnName != null && _indexSegment.getDataSource(columnName).getH3Index() != null; | |
return _indexSegment.getDataSource(columnName).getH3Index() != null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update on line 183 as well.
@Override | ||
protected FilterBlock getNextBlock() { | ||
// get the set of H3 cells at the specified resolution which completely cover the input shape and potential cover. | ||
final Pair<LongSet, LongSet> fullCoverAndPotentialCoverCells = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(minor) We don't usually put final
for local variables
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it.
// return filtered num_docs | ||
ImmutableRoaringBitmap[] potentialMatchDocIds = new ImmutableRoaringBitmap[potentialCoverH3Cells.size()]; | ||
int i = 0; | ||
for (long h3IndexId : potentialCoverH3Cells) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use iterator (LongIterator.nextLong()
) instead to avoid the unnecessary boxing/unboxing, same for other places when iterating over LongSet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure. updated.
LongSet polyfilledSet = new LongOpenHashSet(H3_CORE.polyfill( | ||
Arrays.stream(polygon.getExteriorRing().getCoordinates()) | ||
.map(coordinate -> new GeoCoord(coordinate.y, coordinate.x)) | ||
.collect(Collectors.toList()), ImmutableList.of(), resolution)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(minor)
.collect(Collectors.toList()), ImmutableList.of(), resolution)); | |
.collect(Collectors.toList()), Collections.emptyList(), resolution)); |
f9bba02
to
3dcf59b
Compare
3dcf59b
to
094d2f1
Compare
This is a follow up PR for : #7252 since that one seems doesn't get update for long time.
The idea is to convert input geometry into a list of h3 cells by using polyfill. But h3 polyfill only fills with the hexagons whose centers are contained by the geometry.
so creating a method
coverGeometryInH3
to return the set of H3 cells at the specified resolution which completely cover the input shape.Instructions:
feature
(**) Use
release-notes
label for scenarios like: