-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support creating dictionary at runtime for an existing column #9678
Conversation
0e91579
to
c2ba236
Compare
Codecov Report
@@ Coverage Diff @@
## master #9678 +/- ##
=============================================
- Coverage 68.71% 24.65% -44.07%
+ Complexity 4971 53 -4918
=============================================
Files 1939 1938 -1
Lines 103671 104096 +425
Branches 15727 15778 +51
=============================================
- Hits 71242 25664 -45578
- Misses 27350 75822 +48472
+ Partials 5079 2610 -2469
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
c2ba236
to
980c45d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall lgtm, some minor nits
...java/org/apache/pinot/segment/local/segment/creator/impl/SegmentIndexCreationDriverImpl.java
Outdated
Show resolved
Hide resolved
...l/src/main/java/org/apache/pinot/segment/local/segment/index/loader/ForwardIndexHandler.java
Show resolved
Hide resolved
...l/src/main/java/org/apache/pinot/segment/local/segment/index/loader/ForwardIndexHandler.java
Show resolved
Hide resolved
...l/src/main/java/org/apache/pinot/segment/local/segment/index/loader/ForwardIndexHandler.java
Outdated
Show resolved
Hide resolved
...l/src/main/java/org/apache/pinot/segment/local/segment/index/loader/ForwardIndexHandler.java
Outdated
Show resolved
Hide resolved
...l/src/main/java/org/apache/pinot/segment/local/segment/index/loader/ForwardIndexHandler.java
Outdated
Show resolved
Hide resolved
...l/src/main/java/org/apache/pinot/segment/local/segment/index/loader/ForwardIndexHandler.java
Show resolved
Hide resolved
...l/src/main/java/org/apache/pinot/segment/local/segment/index/loader/ForwardIndexHandler.java
Outdated
Show resolved
Hide resolved
...l/src/main/java/org/apache/pinot/segment/local/segment/index/loader/SegmentPreProcessor.java
Outdated
Show resolved
Hide resolved
...c/test/java/org/apache/pinot/segment/local/segment/index/loader/ForwardIndexHandlerTest.java
Outdated
Show resolved
Hide resolved
bdecd0a
to
73c5e5a
Compare
Let's add e2e query execution tests as well (need not be in integration but one of the subclasses of BaseQueriesTest) |
...c/test/java/org/apache/pinot/segment/local/segment/index/loader/ForwardIndexHandlerTest.java
Show resolved
Hide resolved
...l/src/main/java/org/apache/pinot/segment/local/segment/index/loader/ForwardIndexHandler.java
Show resolved
Hide resolved
...l/src/main/java/org/apache/pinot/segment/local/segment/index/loader/ForwardIndexHandler.java
Outdated
Show resolved
Hide resolved
String fwdIndexFileExtension; | ||
if (isSingleValue) { | ||
fwdIndexFileExtension = | ||
existingColMetadata.isSorted() ? V1Constants.Indexes.SORTED_SV_FORWARD_INDEX_FILE_EXTENSION |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This case will never be hit. If the column is sorted then dictionary should have already existed. Sorted forward index is always dictionary encoded / dictionary based.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to understand, if someone had added a sorted column under the noDict list in the TableConfig what happens? For dict disable case, will this operation be allowed for sorted columns? Should this fail or be a no-op? Perhaps need to check this code path and add relevant Precondition checks / returns as no-op code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was able to write a unit test (please see ForwardIndexHandlerTest) where I created a sorted column without dictionary.
I understand the comment. But I believe it still makes sense to have this support just incase someone decides to change the behavior? We're just covering all cases by having this support. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the behavior for sorted columns on the segment creation code path? How does it interact with the noDict columns list? The behavior should be consistent IMO. So if that code path allows creation of raw forward index for sorted columns then this code path should do. If not, I'd recommend tacking adding support for raw forward index for sorted columns as a separate change and tackle both segment creation and reload code paths. Wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed offline. @vvivekiyer you may want to update this thread
...l/src/main/java/org/apache/pinot/segment/local/segment/index/loader/ForwardIndexHandler.java
Outdated
Show resolved
Hide resolved
...l/src/main/java/org/apache/pinot/segment/local/segment/index/loader/ForwardIndexHandler.java
Show resolved
Hide resolved
...l/src/main/java/org/apache/pinot/segment/local/segment/index/loader/ForwardIndexHandler.java
Show resolved
Hide resolved
...l/src/main/java/org/apache/pinot/segment/local/segment/index/loader/ForwardIndexHandler.java
Show resolved
Hide resolved
...l/src/main/java/org/apache/pinot/segment/local/segment/index/loader/SegmentPreProcessor.java
Show resolved
Hide resolved
...l/src/main/java/org/apache/pinot/segment/local/segment/index/loader/ForwardIndexHandler.java
Outdated
Show resolved
Hide resolved
65cc0c0
to
0e95302
Compare
...l/src/main/java/org/apache/pinot/segment/local/segment/index/loader/ForwardIndexHandler.java
Outdated
Show resolved
Hide resolved
bbe56c6
to
07e5e7a
Compare
@@ -223,6 +223,42 @@ public void testSelectionOverRangeFilter(String query, int min, int max, boolean | |||
} | |||
} | |||
|
|||
@Test(dataProvider = "selectionTestCases") | |||
public void testSelectionOverRangeFilterAfterReload(String query, int min, int max, boolean inclusive) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Query execution tests seem to be focused on range index only. Any reason ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add other tests like
- Run query on a raw column using that in filter
- Enable dict + reload
- Run same query again and it should use dict based predicate evaluator and return same result
Same goes for using the column in SELECT clause. It should correctly return the same result after doing dual lookup in rewriten fwd index and dict as it did before with raw fwd index. Things like that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple of types for both SV and MV should be considered
Decided to merge this in the interest of unblocking @somandal 's reload support work on #6473 Main work is done here. @vvivekiyer is going to follow-up immediately with
|
operationMap = fwdIndexHandler.computeOperation(writer); | ||
assertEquals(operationMap, Collections.EMPTY_MAP); | ||
|
||
// TEST5: Change compression | ||
// TEST8: Add text index and disable forward index. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this comment is misleading. You mean to say "Add text index and enable dictionary"
?
segmentLocalFSDirectory.close(); | ||
|
||
validateIndexMap(column, true); | ||
validateForwardIndex(column, null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also check if ColumnIndexEntry
exists / dictionary index buffer exists by calling hasIndex()
?
validateIndex(ColumnIndexType.FORWARD_INDEX, EXISTING_INT_COL_RAW_MV, 18499, 15, _schema, false, false, false, 0, | ||
false, 13, ChunkCompressionType.LZ4, false, DataType.INT, 106688); | ||
|
||
// Enable dictionary and inverted index. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(nit) should be enable dict, inv index and range index
_indexLoadingConfig.getRangeIndexColumns().add(EXISTING_INT_COL_RAW); | ||
checkForwardIndexCreation(EXISTING_INT_COL_RAW, 42242, 16, _schema, false, true, false, 0, null, true, 0, | ||
DataType.INT, 100000); | ||
validateIndex(ColumnIndexType.RANGE_INDEX, EXISTING_INT_COL_RAW, 42242, 16, _schema, false, true, false, 0, true, 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this actually validate that range index got correctly rewritten to be dict based ? I guess that's where query exec tests will be useful
checkForwardIndexCreation(EXISTING_STRING_COL_DICT, 9, 4, _schema, false, true, false, 26, null, true, 0, | ||
DataType.STRING, 100000); | ||
|
||
// TEST 3: EXISTING_STRING_COL_RAW. Enable dictionary. Also add inverted index and text index. Reload code path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(nit) you can potentially reorg tests to make them more readable
- simple enable dict on SV, no other change
- simple enable dict on MV, no other change
- enable dict + on SV
- enable dict + on MV
@@ -232,4 +268,38 @@ public void testCountOverRangeFilter(String query, int expectedCount) { | |||
assertEquals(aggregationResult.size(), 1); | |||
assertEquals(((Number) aggregationResult.get(0)).intValue(), expectedCount, query); | |||
} | |||
|
|||
@Test(dataProvider = "countTestCases") | |||
public void testCountOverRangeFilterAfterReload(String query, int expectedCount) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Best for us to control the query. Not really sure if the query being sent here by the provider is indeed using the concerned column in parts where we want
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- SELECT FROM foo LIMIT -- this will test out that dict is correct and fwd index got rewritten
- Try the same in GROUP BY as GROUP BY uses dict.
- Try MIN and MAX aggregations -- they can be answered from dict.
- WHERE clause like mentioned in another commend
OSS issue: #9348
Label: Feature
Document
With this PR, we add support to enable dictionary on a raw column for immutable segments during segment Reload. This support is added for both SV and MV columns.
Added Unit Tests. As the PR is being reviewed, I'm planning to add an integration test where we query on a column where dictionary is enabled.