-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support rewriting forward index upon changing compression type for existing raw MV column #9510
Conversation
9512ae4
to
b577d85
Compare
Codecov Report
@@ Coverage Diff @@
## master #9510 +/- ##
=============================================
- Coverage 28.10% 15.56% -12.54%
- Complexity 53 174 +121
=============================================
Files 1922 1883 -39
Lines 102937 101236 -1701
Branches 15657 15459 -198
=============================================
- Hits 28926 15755 -13171
- Misses 71178 84294 +13116
+ Partials 2833 1187 -1646
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
@somandal to help review this. |
I’ll review this |
b577d85
to
629dde1
Compare
629dde1
to
b01b7db
Compare
pom.xml
Outdated
<version>${netty.version}</version> | ||
<type>pom</type> | ||
<scope>import</scope> | ||
<artifactId>netty-buffer</artifactId> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any specific need for this pom.xml change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, was a mistake! Removed.
b01b7db
to
fb09698
Compare
...l/src/main/java/org/apache/pinot/segment/local/segment/index/loader/ForwardIndexHandler.java
Outdated
Show resolved
Hide resolved
@vvivekiyer - can you rebase please ? |
fb09698
to
1b374e8
Compare
1b374e8
to
a0e7a34
Compare
...l/src/main/java/org/apache/pinot/segment/local/segment/index/loader/ForwardIndexHandler.java
Outdated
Show resolved
Hide resolved
...l/src/main/java/org/apache/pinot/segment/local/segment/index/loader/ForwardIndexHandler.java
Outdated
Show resolved
Hide resolved
...l/src/main/java/org/apache/pinot/segment/local/segment/index/loader/ForwardIndexHandler.java
Outdated
Show resolved
Hide resolved
...l/src/main/java/org/apache/pinot/segment/local/segment/index/loader/ForwardIndexHandler.java
Outdated
Show resolved
Hide resolved
Preconditions.checkState(lengthOfLongestEntry >= 0, | ||
"lengthOfLongestEntry cannot be negative. segment=" + segmentName + " column={}" + column); | ||
int maxNumberOfMVEntries = existingColMetadata.getMaxNumberOfMultiValues(); | ||
int maxRowLengthInBytes = lengthOfLongestEntry - (Integer.BYTES * maxNumberOfMVEntries) - Integer.BYTES; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is not at all obvious. Can you explain this part here? e.g. why are you subtracting (Integer.BYTES * maxNumberOfMVEntries)
for MultiValueVarByteRawIndexCreator
I see why this will work and that this will actually get passed to the forward index creator and used
int maxLengthPrefixes = Integer.BYTES * maxNumberOfElements;
int totalMaxLength = Integer.BYTES + maxRowLengthInBytes + maxLengthPrefixes;
For MultiValueFixedByteRawIndexCreator
they calculate it as:
int totalMaxLength = Integer.BYTES + (maxNumberOfMultiValueElements * valueType.getStoredType().size());
The maxRowLengthInBytes
isn't actually passed to this forward index creator though and is thus never actually used.
Can you perhaps add a comment about why it's okay to set this up in this way (that it's only used by the variable length types)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel this calculation should not live here because this is dependent on the file format layout / version of the originally created raw forward index.
If the latter changes, this will fail. So I suggest moving this into a static function inside the raw chunk forward index writer that controls the format.
Whoever changes the format should keep the math inside the function backward compatible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I've added comments explaining the logic. I've also moved the math inside static functions in MultiValueVarByteRawIndexCreator
.
IMO, it's still a little confusing because of our existing naming - e.g lengthOfLongestEntry
maxRowLengthInBytes
, etc. Cleaning these names everywhere is a larger undertaking and better left as a future cleanup. Thoughts?
PinotSegmentColumnReader columnReader = new PinotSegmentColumnReader(reader, null, null, maxNumberOfMVEntries); | ||
|
||
for (int i = 0; i < numDocs; i++) { | ||
Object[] values = (Object[]) columnReader.getValue(i); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wondering if it's a better idea to expose the getIntMV, etc functions in the columnReader
or directly use the ForwardIndexReader
instead of doing it this way. In your code, you're doing double copies, first to construct the Object[] values
and then to construct the actual value type array.
The ForwardIndexReader
directly provides APIs to get the array in the desired type:
default int getIntMV(int docId, int[] valueBuffer, T context) {
throw new UnsupportedOperationException();
}
default int[] getIntMV(int docId, T context) {
throw new UnsupportedOperationException();
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Did this and also refactored the code quite a bit.
...l/src/main/java/org/apache/pinot/segment/local/segment/readers/PinotSegmentColumnReader.java
Outdated
Show resolved
Hide resolved
} else { | ||
Preconditions.checkState(_maxNumValuesPerMVEntry >= 0, "maxNumValuesPerMVEntry is negative for an MV column."); | ||
|
||
switch (_forwardIndexReader.getStoredType()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this. I do see value in keeping this code even if you decide to either add more efficient functions to directly fetch the type array or if you decide to directly use the ForwardIndexReader
.
...c/test/java/org/apache/pinot/segment/local/segment/index/loader/ForwardIndexHandlerTest.java
Show resolved
Hide resolved
...l/src/main/java/org/apache/pinot/segment/local/segment/index/loader/ForwardIndexHandler.java
Outdated
Show resolved
Hide resolved
...l/src/main/java/org/apache/pinot/segment/local/segment/index/loader/ForwardIndexHandler.java
Outdated
Show resolved
Hide resolved
...rg/apache/pinot/segment/local/segment/creator/impl/fwd/MultiValueVarByteRawIndexCreator.java
Outdated
Show resolved
Hide resolved
...rg/apache/pinot/segment/local/segment/creator/impl/fwd/MultiValueVarByteRawIndexCreator.java
Outdated
Show resolved
Hide resolved
45a89db
to
91a6dcf
Compare
...l/src/main/java/org/apache/pinot/segment/local/segment/index/loader/ForwardIndexHandler.java
Outdated
Show resolved
Hide resolved
Tagging @Jackie-Jiang in case he wants to take a look. This is a follow-up to #9454 |
I don't have concern merging it given the added tests |
OSS issue: #9348
Label: Feature
In PR #9454, we added support to change Compression Type for raw SV columns during segmentReload. This PR extends the support to raw MV columns.