-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix column metadata getMaxValue NPE bug and expose maxNumMultiValues #7918
Conversation
Codecov Report
@@ Coverage Diff @@
## master #7918 +/- ##
=============================================
- Coverage 71.32% 27.71% -43.62%
=============================================
Files 1589 1586 -3
Lines 82139 82388 +249
Branches 12270 12306 +36
=============================================
- Hits 58589 22833 -35756
- Misses 19578 57468 +37890
+ Partials 3972 2087 -1885
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we add some tests to verify the expected behavior and guard it from regression?
6f1726b
to
37fcfc5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to @Jackie-Jiang 's comment. also bug1 seems to be an improvement right? improving the inefficiency of encode/decode.
ByteArray maxValueByteArray = | ||
storedDataType == DataType.BYTES ? ((ByteArray) columnMetadata.getMaxValue()) | ||
: BytesUtils.toByteArray((String) columnMetadata.getMaxValue()); | ||
columnLength = maxValueByteArray.length(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you add test for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mqliang This code change does not look correct to me. AFAIK, we want to fix couple of problems
-
Use of
*
to allow fetching metadata for all columns instead of individual column. Looks like this has been fixed by encoding*
? Please add tests for the same -
A weird scenario where the use of particular column name (internal) fails the API call for some reason. That column type is STRING and stored type is also STRING (UTF-8 bytes essentially). I don't think this is related to the handling of BYTES / BYTE_ARRAY. While column of data types / stored types as BYTES and BYTE_ARRAY should also be handled, the problem we wanted to fix does not seem to be related to it or at least there is no evidence from debugging. I suggest debugging this more to understand the root cause and then come back on this part of the change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed offline with @mqliang
As suggested in the previous comment, let's not fix the BYTES problem in this PR. We can investigate and fix that separately. The url encoding for (with and without *) and that corner case problem for a particular problem should be fixed (which @mqliang found out is due to min/max value being null and server throwing NPE).
Let's make sure to add tests for with and without encoded query string -- everything after columns= can be encoded. I think in this PR, controller is re-encoding. We should remove that
dfc8138
to
1697650
Compare
1573c31
to
16f9d50
Compare
16f9d50
to
6f2e6ae
Compare
columnLengthMap.merge(column, (double) columnLength, Double::sum); | ||
columnCardinalityMap.merge(column, (double) columnCardinality, Double::sum); | ||
maxNumMultiValuesMap.merge(column, (double) maxNumMultiValues, Double::sum); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we avoid populating this map for single valued column since maxNumMultiValues is only applicable to MV columns ? The other option is to make sure it is either 0 or -1 for SV columns
@mqliang can you please update the PR description to correctly reflect the actual changes that we finally agreed to do in this PR ? Let's remove the stuff related to byte because it turned out that was not the issue and let's describe the reason for the NPE issue we were seeing due to METADATA_PROPERTY_LENGTH_LIMIT (it is described in our internal ticket). Regarding the addition of maxNumMultiValues, can we ensure in tests that it is non-zero only for columns that are multi-value and 0 or -1 for single value columns as it will never be used for them |
6f2e6ae
to
a0432a0
Compare
… maxNumMultiValues
a0432a0
to
daa7e73
Compare
Description
This PR:
metadata.getMaxValue()
may return a null pointer, see the function here:pinot/pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java
Lines 667 to 675 in 77a7069
When the maxValue/minValue exceed METADATA_PROPERTY_LENGTH_LIMIT, SegmentColumnarIndexCreator will not set minValue/minValue. In this case, `metadata.getMaxValue()' will return NULL.
Upgrade Notes
Does this PR prevent a zero down-time upgrade? (Assume upgrade order: Controller, Broker, Server, Minion)
backward-incompat
, and complete the section below on Release Notes)Does this PR fix a zero-downtime upgrade introduced earlier?
backward-incompat
, and complete the section below on Release Notes)Does this PR otherwise need attention when creating release notes? Things to consider:
release-notes
and complete the section on Release Notes)Release Notes
Documentation