-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support disabling dictionary at runtime for an existing column #9868
Conversation
938c27a
to
59c5d27
Compare
Codecov Report
@@ Coverage Diff @@
## master #9868 +/- ##
============================================
+ Coverage 70.15% 70.34% +0.19%
- Complexity 5410 5520 +110
============================================
Files 1956 1972 +16
Lines 104975 105869 +894
Branches 15892 16020 +128
============================================
+ Hits 73641 74473 +832
+ Misses 26195 26175 -20
- Partials 5139 5221 +82
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
pinot-core/src/test/java/org/apache/pinot/queries/ForwardIndexHandlerReloadQueriesTest.java
Outdated
Show resolved
Hide resolved
List<Object[]> beforeResultRows4 = resultTable.getRows(); | ||
|
||
// TEST5 | ||
query = "SELECT column1, max(column1), sum(column10) from testTable WHERE column6 = 1001 GROUP BY " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is column6 the column where dictionary is getting deleted ?
May be compare results for the same query before with dict and after without dict ? Also might want to double check in debugger once that query plan uses scan based evaluator and inv index is no longer being used (dropped infact)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. We do compare that the query results exactly match before and after reload.
The list of operations is documented in L726
- column1 -> change compression.
- column6 -> disable dictionary
- column9 -> disable dictionary
- column3 -> Enable dictionary.
- column2 -> Enable dictionary. Add inverted index.
- column7 -> Enable dictionary. Add inverted index.
- column10 -> Enable dictionary.
BrokerResponseNative brokerResponseNative = getBrokerResponse(query); | ||
assertTrue(brokerResponseNative.getProcessingExceptions() == null | ||
|| brokerResponseNative.getProcessingExceptions().size() == 0); | ||
ResultTable resultTable1 = brokerResponseNative.getResultTable(); | ||
assertEquals(brokerResponseNative.getNumRowsResultSet(), 1); | ||
assertEquals(brokerResponseNative.getTotalDocs(), 400_000L); | ||
assertEquals(brokerResponseNative.getNumDocsScanned(), 103280L); | ||
assertEquals(brokerResponseNative.getNumDocsScanned(), 40224L); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So rangeIndex will be rewritten to be raw value based right ? I wonder why should numDocsScanned
/ numEntriesScannedInFilter
change ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually changed the query by adding another WHERE clause.
Set<String> noDictionaryColumns = new HashSet<>(_noDictionaryColumns); | ||
indexLoadingConfig.setNoDictionaryColumns(noDictionaryColumns); | ||
indexLoadingConfig.getNoDictionaryColumns().remove("column2"); | ||
indexLoadingConfig.getNoDictionaryColumns().remove("column3"); | ||
indexLoadingConfig.getNoDictionaryColumns().remove("column7"); | ||
indexLoadingConfig.getNoDictionaryColumns().remove("column10"); | ||
indexLoadingConfig.getNoDictionaryColumns().add("column6"); | ||
indexLoadingConfig.getNoDictionaryColumns().add("column9"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we handle basic validation (similar to what we did in noForwardIndex
work) at the table config level ?
For example, if someone wants to disable dictionary on a column, but does not remove the column from the invertedIndexColumn
list in the tableConfig when updating the tableConfig, then we should throw error right there.
Other way would be to detect this invalid combination during reload but I guess that's too late.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. That's already there in TableConfigUtils.
pinot/pinot-segment-local/src/main/java/org/apache/pinot/segment/local/utils/TableConfigUtils.java
Line 699 in a222502
private static void validateIndexingConfig(@Nullable IndexingConfig indexingConfig, @Nullable Schema schema) { |
// dictionary will only be allowed if FST and inverted index are also disabled. | ||
if (_indexLoadingConfig.getInvertedIndexColumns().contains(column) || _indexLoadingConfig.getFSTIndexColumns() | ||
.contains(column)) { | ||
LOGGER.warn("Cannot disabled dictionary as column={} has FST index or inverted index or both.", column); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, ideally we should not have permitted the user updating the tableConfig in the first place since it is an invalid combination. May want to take a look at TableConfigValidator code once
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still need this here if validator can catch it ? Reload is invoked after tableConfig is updated successfully so I don't think this check is needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @vvivekiyer
...l/src/main/java/org/apache/pinot/segment/local/segment/index/loader/ForwardIndexHandler.java
Outdated
Show resolved
Hide resolved
...l/src/main/java/org/apache/pinot/segment/local/segment/index/loader/ForwardIndexHandler.java
Outdated
Show resolved
Hide resolved
...c/test/java/org/apache/pinot/segment/local/segment/index/loader/SegmentPreProcessorTest.java
Show resolved
Hide resolved
checkForwardIndexCreation(COLUMN1_NAME, 51594, 16, _schema, false, false, false, 0, null, true, 0, DataType.INT, | ||
100000); | ||
|
||
// TEST 3: Disable dictionary for a column (Column10) that has range index. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do / can we verify that range index indeed got rebuilt with raw values ? May be compare offset and size of range index buffer from index_map before and after ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added an additional validation based on rangeIndex size from indexMap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Can we can come up with another test query where range predicate is full match and thus results in reading the entire range index ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just want to make sure that there are no index out of bounds / seg fault type situations due to incorrectly building index.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @vvivekiyer
...l/src/main/java/org/apache/pinot/segment/local/segment/index/loader/ForwardIndexHandler.java
Outdated
Show resolved
Hide resolved
private int getMaxRowLengthForMVColumn(String column, ForwardIndexReader reader, Dictionary dictionary) | ||
throws Exception { | ||
ColumnMetadata existingColMetadata = _segmentMetadata.getColumnMetadataFor(column); | ||
AbstractColumnStatisticsCollector statsCollector = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably nothing wrong in theory but I wonder why do we need to go through StatsCollector
framework to gather this info. We can simply read the forward index to collect this right ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, using statsCollector is convenient and it can be the single place that requires change if we alter anything in the future. Without StatsCollector, we might have duplicate the logic for each datatype here that might lead to unnecessarily more code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Fair enough
...l/src/main/java/org/apache/pinot/segment/local/segment/index/loader/ForwardIndexHandler.java
Show resolved
Hide resolved
49ab0d7
to
ab194a3
Compare
cf52f5f
to
a4aee7d
Compare
Overall looks good. @vvivekiyer tagged you on two comment threads that you may want to address in follow-ups |
OSS issue: #9348
Label: Feature
Document
With this PR, we add support to disable dictionary on a dict-based column for immutable segments during segment reload. This support is added for both SV and MV columns.
Added Tests.