Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support rewriting forward index upon changing compression type for existing raw MV column #9510

Merged
merged 6 commits into from
Oct 17, 2022

Conversation

vvivekiyer
Copy link
Contributor

OSS issue: #9348
Label: Feature

In PR #9454, we added support to change Compression Type for raw SV columns during segmentReload. This PR extends the support to raw MV columns.

@codecov-commenter
Copy link

codecov-commenter commented Oct 2, 2022

Codecov Report

Merging #9510 (26777bd) into master (c581fd9) will decrease coverage by 12.53%.
The diff coverage is 0.00%.

@@              Coverage Diff              @@
##             master    #9510       +/-   ##
=============================================
- Coverage     28.10%   15.56%   -12.54%     
- Complexity       53      174      +121     
=============================================
  Files          1922     1883       -39     
  Lines        102937   101236     -1701     
  Branches      15657    15459      -198     
=============================================
- Hits          28926    15755    -13171     
- Misses        71178    84294    +13116     
+ Partials       2833     1187     -1646     
Flag Coverage Δ
integration1 ?
integration2 ?
unittests2 15.56% <0.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...tor/impl/fwd/MultiValueVarByteRawIndexCreator.java 0.00% <0.00%> (ø)
...ocal/segment/index/loader/ForwardIndexHandler.java 0.00% <0.00%> (ø)
...ocal/segment/readers/PinotSegmentColumnReader.java 0.00% <0.00%> (ø)
...inot/segment/spi/creator/IndexCreationContext.java 0.00% <0.00%> (ø)
...src/main/java/org/apache/pinot/sql/FilterKind.java 0.00% <0.00%> (-100.00%) ⬇️
...ain/java/org/apache/pinot/core/data/table/Key.java 0.00% <0.00%> (-100.00%) ⬇️
...n/java/org/apache/pinot/core/data/table/Table.java 0.00% <0.00%> (-100.00%) ⬇️
.../java/org/apache/pinot/core/data/table/Record.java 0.00% <0.00%> (-100.00%) ⬇️
.../java/org/apache/pinot/core/util/GroupByUtils.java 0.00% <0.00%> (-100.00%) ⬇️
.../java/org/apache/pinot/common/auth/AuthConfig.java 0.00% <0.00%> (-100.00%) ⬇️
... and 987 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@siddharthteotia
Copy link
Contributor

@somandal to help review this.

@somandal
Copy link
Contributor

somandal commented Oct 4, 2022

I’ll review this

@vvivekiyer vvivekiyer marked this pull request as ready for review October 11, 2022 08:35
pom.xml Outdated
<version>${netty.version}</version>
<type>pom</type>
<scope>import</scope>
<artifactId>netty-buffer</artifactId>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any specific need for this pom.xml change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, was a mistake! Removed.

@siddharthteotia
Copy link
Contributor

@vvivekiyer - can you rebase please ?

Preconditions.checkState(lengthOfLongestEntry >= 0,
"lengthOfLongestEntry cannot be negative. segment=" + segmentName + " column={}" + column);
int maxNumberOfMVEntries = existingColMetadata.getMaxNumberOfMultiValues();
int maxRowLengthInBytes = lengthOfLongestEntry - (Integer.BYTES * maxNumberOfMVEntries) - Integer.BYTES;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is not at all obvious. Can you explain this part here? e.g. why are you subtracting (Integer.BYTES * maxNumberOfMVEntries)

for MultiValueVarByteRawIndexCreator I see why this will work and that this will actually get passed to the forward index creator and used

    int maxLengthPrefixes = Integer.BYTES * maxNumberOfElements;
    int totalMaxLength = Integer.BYTES + maxRowLengthInBytes + maxLengthPrefixes;

For MultiValueFixedByteRawIndexCreator they calculate it as:

    int totalMaxLength = Integer.BYTES + (maxNumberOfMultiValueElements * valueType.getStoredType().size());

The maxRowLengthInBytes isn't actually passed to this forward index creator though and is thus never actually used.

Can you perhaps add a comment about why it's okay to set this up in this way (that it's only used by the variable length types)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel this calculation should not live here because this is dependent on the file format layout / version of the originally created raw forward index.

If the latter changes, this will fail. So I suggest moving this into a static function inside the raw chunk forward index writer that controls the format.

Whoever changes the format should keep the math inside the function backward compatible

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I've added comments explaining the logic. I've also moved the math inside static functions in MultiValueVarByteRawIndexCreator.

IMO, it's still a little confusing because of our existing naming - e.g lengthOfLongestEntry maxRowLengthInBytes, etc. Cleaning these names everywhere is a larger undertaking and better left as a future cleanup. Thoughts?

PinotSegmentColumnReader columnReader = new PinotSegmentColumnReader(reader, null, null, maxNumberOfMVEntries);

for (int i = 0; i < numDocs; i++) {
Object[] values = (Object[]) columnReader.getValue(i);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wondering if it's a better idea to expose the getIntMV, etc functions in the columnReader or directly use the ForwardIndexReader instead of doing it this way. In your code, you're doing double copies, first to construct the Object[] values and then to construct the actual value type array.

The ForwardIndexReader directly provides APIs to get the array in the desired type:

  default int getIntMV(int docId, int[] valueBuffer, T context) {
    throw new UnsupportedOperationException();
  }

  default int[] getIntMV(int docId, T context) {
    throw new UnsupportedOperationException();
  }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Did this and also refactored the code quite a bit.

} else {
Preconditions.checkState(_maxNumValuesPerMVEntry >= 0, "maxNumValuesPerMVEntry is negative for an MV column.");

switch (_forwardIndexReader.getStoredType()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this. I do see value in keeping this code even if you decide to either add more efficient functions to directly fetch the type array or if you decide to directly use the ForwardIndexReader.

@vvivekiyer vvivekiyer requested review from siddharthteotia and somandal and removed request for siddharthteotia October 14, 2022 09:46
@siddharthteotia
Copy link
Contributor

Tagging @Jackie-Jiang in case he wants to take a look. This is a follow-up to #9454

@Jackie-Jiang
Copy link
Contributor

I don't have concern merging it given the added tests

@siddharthteotia siddharthteotia changed the title Allow changing compressionType for raw MV columns during segmentReload Support rewriting forward index upon changing compression type for existing raw MV column Oct 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants