-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support segment storage format without forward index #9333
Conversation
Codecov Report
@@ Coverage Diff @@
## master #9333 +/- ##
============================================
- Coverage 69.88% 69.85% -0.04%
- Complexity 4871 5235 +364
============================================
Files 1927 1928 +1
Lines 102694 102842 +148
Branches 15586 15629 +43
============================================
+ Hits 71771 71841 +70
- Misses 25858 25921 +63
- Partials 5065 5080 +15
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
...in/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java
Outdated
Show resolved
Hide resolved
...src/main/java/org/apache/pinot/segment/local/segment/index/datasource/MutableDataSource.java
Outdated
Show resolved
Hide resolved
...n/java/org/apache/pinot/segment/local/segment/index/loader/invertedindex/H3IndexHandler.java
Outdated
Show resolved
Hide resolved
pinot-segment-local/src/test/resources/data/newColumnsSchemaWithForwardIndexDisabled.json
Outdated
Show resolved
Hide resolved
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/creator/IndexCreationContext.java
Outdated
Show resolved
Hide resolved
...in/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java
Outdated
Show resolved
Hide resolved
...g/apache/pinot/segment/local/segment/creator/impl/fwd/MultiValueNoOpForwardIndexCreator.java
Outdated
Show resolved
Hide resolved
.../apache/pinot/segment/local/segment/creator/impl/fwd/SingleValueNoOpForwardIndexCreator.java
Outdated
Show resolved
Hide resolved
...n/java/org/apache/pinot/segment/local/segment/index/column/PhysicalColumnIndexContainer.java
Outdated
Show resolved
Hide resolved
...in/java/org/apache/pinot/segment/local/segment/creator/impl/DefaultIndexCreatorProvider.java
Outdated
Show resolved
Hide resolved
...on-tests/src/test/java/org/apache/pinot/integration/tests/OfflineClusterIntegrationTest.java
Outdated
Show resolved
Hide resolved
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/utils/TableConfigUtils.java
Outdated
Show resolved
Hide resolved
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/utils/TableConfigUtils.java
Outdated
Show resolved
Hide resolved
We should allow disabling forward index for sorted column, which is a no-op because the inverted index for sorted column can be used as forward index. I believe it is already allowed in the implementation because the default column is always sorted. No need to add this in the validation. |
Thanks for the suggestion @Jackie-Jiang. From going through the code it looks like for sorted columns the forward index and inverted index are essentially the same. Code snippet from the
I also see code on the segment creation path where we skip creating the inverted index if the column is sorted even if the inverted index is enabled for the column. So keeping these in mind, we didn't think it made much sense to disable forward index for such columns. Let me know your thoughts based on the above. cc @siddharthteotia |
...in/java/org/apache/pinot/segment/local/segment/creator/impl/DefaultIndexCreatorProvider.java
Outdated
Show resolved
Hide resolved
...in/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java
Outdated
Show resolved
Hide resolved
...in/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java
Outdated
Show resolved
Hide resolved
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/datasource/DataSourceMetadata.java
Outdated
Show resolved
Hide resolved
...a/org/apache/pinot/segment/local/segment/index/readers/forward/NoOpMVForwardIndexReader.java
Outdated
Show resolved
Hide resolved
@somandal We want to disable the forward index to save the storage. For sorted index, we store the index in inverted format (map from dictId to doc range), and also use it as forward index. We should allow disabling the forward index for it in the table config, and simply ignore it when creating the index. |
b7524bf
to
2c0c090
Compare
Thanks for discussing this, done! |
…dexDisabled columns
0a244d4
to
df04878
Compare
Thanks for the contribution @somandal . Let's please also share the doc on follow-up changes like discussed in last meeting with OSS. |
This PR adds an option to disable the forward index for a given column via the FieldConfig properties list. This is a PR to solve issue #6473.
If the forward index is disabled, only a subset of queries can work. Basically queries that don't need to select, transform, group by, order by but do need to use the column in WHERE clauses can be supported.
To disable the forward index, the following mandates have been added at the moment (depending on the usage of this feature we may decide to relax some of them over time):
The above checks have been added in the code in the following places:
Sorted Column Handling:
For sorted columns we decided to allow someone to disable the forward index, but internally it's a no-op operation (we already create only a single index for sorted columns and use it for both forward and inverted index, we'll continue doing the same irrespective of the
forwardIndexDisabled
flag)The ability to disable the forward index is currently implemented for the following cases:
This PR does not add support for disabling the forward index for:
Validations with other indices can be broadly classified into the following paths:
indexRow
function creates all the indices needed at the same time for a given row.No-op creators have been added for the forward index disabled columns. This is useful for:
All code paths that need to read the forward index now have null checks added and appropriate exceptions thrown if the forward index doesn't exist.
cc @siddharthteotia @Jackie-Jiang