-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update cardinality when converting raw column to dict based #9875
Conversation
Codecov Report
@@ Coverage Diff @@
## master #9875 +/- ##
=============================================
- Coverage 70.45% 25.08% -45.37%
+ Complexity 5545 44 -5501
=============================================
Files 1982 1973 -9
Lines 106558 106385 -173
Branches 16151 16152 +1
=============================================
- Hits 75077 26690 -48387
- Misses 26248 76967 +50719
+ Partials 5233 2728 -2505
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
4c3186f
to
abe0773
Compare
@Jackie-Jiang @siddharthteotia please review. Ideally, would like to add a unit test but it wasn't straightforward. I'm trying to see if I can add some sort of testing in |
@vvivekiyer Adding a test in |
dea500c
to
a41b197
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly good. Thanks for adding (also cleaning up) the test
...ests/src/test/java/org/apache/pinot/integration/tests/LLCRealtimeClusterIntegrationTest.java
Outdated
Show resolved
Hide resolved
...on-tests/src/test/java/org/apache/pinot/integration/tests/BaseClusterIntegrationTestSet.java
Outdated
Show resolved
Hide resolved
...on-tests/src/test/java/org/apache/pinot/integration/tests/BaseClusterIntegrationTestSet.java
Outdated
Show resolved
Hide resolved
...on-tests/src/test/java/org/apache/pinot/integration/tests/BaseClusterIntegrationTestSet.java
Show resolved
Hide resolved
|
||
queryResponse = postQuery(query); | ||
long numTotalDocsAfterReload = queryResponse.get("totalDocs").asLong(); | ||
assertEquals(numTotalDocs, numTotalDocsAfterReload); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want to validate if total doc and query response is always correct during the reload. We can also pick a query that can return different metadata based on whether the column is dictionary encoded to ensure the changing index actually happens. E.g. if we pick a value that doesn't exist, SELECT COUNT(*) FROM myTable WHERE ActualElapsedTime = <val>
will have docs scanned in filter for raw index, but no scan for dictionary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh good point. Added this test.
Just want to mention I enabled both dictionary and inverted index because "docs scanned in filter" is zero only if inverted index is added.
3863d89
to
18426c1
Compare
18426c1
to
336decc
Compare
df0e0bf
to
049a14e
Compare
Fixes the issue #9874
Fixes two issues:
metadata.properties
file was failing because Commons Configuration 1.10 does not support file path containing '%'Also tested manually. Results are below.
Tested manually using breakpoints and REST API calls in
LLCRealtimeClusterIntegrationTest
.Created a realtime segment. The metadata.properties file is as follow for noDictionary column
ActualElapsedTime
SegmentName = mytable__0__0__20221202T0322Z
Now, issued a REST call to enable dictionary for ActualElapsedTime. Issued reloadSegment. Verified that dictionary load also succeeds. The updated metadata is as follows: