Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix column metadata getMaxValue NPE bug and expose maxNumMultiValues #7918

Merged
merged 1 commit into from
Jan 8, 2022

Conversation

mqliang
Copy link
Contributor

@mqliang mqliang commented Dec 16, 2021

Description

This PR:

When the maxValue/minValue exceed METADATA_PROPERTY_LENGTH_LIMIT, SegmentColumnarIndexCreator will not set minValue/minValue. In this case, `metadata.getMaxValue()' will return NULL.

  • expose maxNumMultiValues information for MV columns
  • add integration test case for MV columns and column name encoding

Upgrade Notes

Does this PR prevent a zero down-time upgrade? (Assume upgrade order: Controller, Broker, Server, Minion)

  • Yes (Please label as backward-incompat, and complete the section below on Release Notes)

Does this PR fix a zero-downtime upgrade introduced earlier?

  • Yes (Please label this as backward-incompat, and complete the section below on Release Notes)

Does this PR otherwise need attention when creating release notes? Things to consider:

  • New configuration options
  • Deprecation of configurations
  • Signature changes to public methods/interfaces
  • New plugins added or old plugins removed
  • Yes (Please label this PR as release-notes and complete the section on Release Notes)

Release Notes

Documentation

@codecov-commenter
Copy link

codecov-commenter commented Dec 16, 2021

Codecov Report

Merging #7918 (6ac29a1) into master (aa2da07) will decrease coverage by 43.61%.
The diff coverage is 11.30%.

❗ Current head 6ac29a1 differs from pull request most recent head daa7e73. Consider uploading reports for the commit daa7e73 to get more accurate results
Impacted file tree graph

@@              Coverage Diff              @@
##             master    #7918       +/-   ##
=============================================
- Coverage     71.32%   27.71%   -43.62%     
=============================================
  Files          1589     1586        -3     
  Lines         82139    82388      +249     
  Branches      12270    12306       +36     
=============================================
- Hits          58589    22833    -35756     
- Misses        19578    57468    +37890     
+ Partials       3972     2087     -1885     
Flag Coverage Δ
integration1 ?
integration2 27.71% <11.30%> (+0.12%) ⬆️
unittests1 ?
unittests2 ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...a/org/apache/pinot/common/metrics/BrokerMeter.java 100.00% <ø> (ø)
...ces/PinotSegmentUploadDownloadRestletResource.java 47.82% <0.00%> (-10.44%) ⬇️
...ler/api/resources/TableConfigsRestletResource.java 0.00% <0.00%> (-73.13%) ⬇️
...troller/helix/core/retention/RetentionManager.java 27.35% <0.00%> (-55.41%) ⬇️
...rg/apache/pinot/core/minion/RawIndexConverter.java 0.00% <0.00%> (-56.61%) ⬇️
...not/core/operator/combine/BaseCombineOperator.java 74.32% <0.00%> (-13.52%) ⬇️
...core/query/executor/ServerQueryExecutorV1Impl.java 80.00% <0.00%> (-3.34%) ⬇️
...ry/optimizer/statement/JsonStatementOptimizer.java 0.00% <ø> (-77.64%) ⬇️
...ore/query/scheduler/fcfs/BoundedFCFSScheduler.java 0.00% <0.00%> (ø)
...nverttorawindex/ConvertToRawIndexTaskExecutor.java 0.00% <0.00%> (-100.00%) ⬇️
... and 1205 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update aa2da07...daa7e73. Read the comment docs.

Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we add some tests to verify the expected behavior and guard it from regression?

Copy link
Contributor

@walterddr walterddr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to @Jackie-Jiang 's comment. also bug1 seems to be an improvement right? improving the inefficiency of encode/decode.

Comment on lines 220 to 223
ByteArray maxValueByteArray =
storedDataType == DataType.BYTES ? ((ByteArray) columnMetadata.getMaxValue())
: BytesUtils.toByteArray((String) columnMetadata.getMaxValue());
columnLength = maxValueByteArray.length();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you add test for this?

Copy link
Contributor

@siddharthteotia siddharthteotia Dec 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mqliang This code change does not look correct to me. AFAIK, we want to fix couple of problems

  • Use of*to allow fetching metadata for all columns instead of individual column. Looks like this has been fixed by encoding * ? Please add tests for the same

  • A weird scenario where the use of particular column name (internal) fails the API call for some reason. That column type is STRING and stored type is also STRING (UTF-8 bytes essentially). I don't think this is related to the handling of BYTES / BYTE_ARRAY. While column of data types / stored types as BYTES and BYTE_ARRAY should also be handled, the problem we wanted to fix does not seem to be related to it or at least there is no evidence from debugging. I suggest debugging this more to understand the root cause and then come back on this part of the change

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline with @mqliang

As suggested in the previous comment, let's not fix the BYTES problem in this PR. We can investigate and fix that separately. The url encoding for (with and without *) and that corner case problem for a particular problem should be fixed (which @mqliang found out is due to min/max value being null and server throwing NPE).
Let's make sure to add tests for with and without encoded query string -- everything after columns= can be encoded. I think in this PR, controller is re-encoding. We should remove that

@mqliang mqliang changed the title Fix query parameter URL encoding bug and ByteArray storetype bug Fix query parameter URL encoding bug and ByteArray datatype bug Dec 19, 2021
@mqliang mqliang force-pushed the url-encode-bug branch 2 times, most recently from 1573c31 to 16f9d50 Compare January 5, 2022 23:43
@mqliang mqliang changed the title Fix query parameter URL encoding bug and ByteArray datatype bug Fix ByteArray datatype column metadata getMaxValue NPE bug and expose maxNumMultiValues Jan 5, 2022
@mqliang mqliang changed the title Fix ByteArray datatype column metadata getMaxValue NPE bug and expose maxNumMultiValues Fix column metadata getMaxValue NPE bug and expose maxNumMultiValues Jan 6, 2022
columnLengthMap.merge(column, (double) columnLength, Double::sum);
columnCardinalityMap.merge(column, (double) columnCardinality, Double::sum);
maxNumMultiValuesMap.merge(column, (double) maxNumMultiValues, Double::sum);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we avoid populating this map for single valued column since maxNumMultiValues is only applicable to MV columns ? The other option is to make sure it is either 0 or -1 for SV columns

@siddharthteotia
Copy link
Contributor

@mqliang can you please update the PR description to correctly reflect the actual changes that we finally agreed to do in this PR ?

Let's remove the stuff related to byte because it turned out that was not the issue and let's describe the reason for the NPE issue we were seeing due to METADATA_PROPERTY_LENGTH_LIMIT (it is described in our internal ticket).

Regarding the addition of maxNumMultiValues, can we ensure in tests that it is non-zero only for columns that are multi-value and 0 or -1 for single value columns as it will never be used for them

@siddharthteotia siddharthteotia merged commit 06888d8 into apache:master Jan 8, 2022
@mqliang mqliang deleted the url-encode-bug branch January 8, 2022 03:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants