-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Default column handling of noForwardIndex and regeneration of forward index on reload path #9810
Default column handling of noForwardIndex and regeneration of forward index on reload path #9810
Conversation
Codecov Report
@@ Coverage Diff @@
## master #9810 +/- ##
=============================================
+ Coverage 15.86% 70.47% +54.60%
- Complexity 176 5074 +4898
=============================================
Files 1931 1987 +56
Lines 104306 107133 +2827
Branches 15901 16272 +371
=============================================
+ Hits 16551 75497 +58946
+ Misses 86531 26366 -60165
- Partials 1224 5270 +4046
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
03d37ce
to
cf4e87d
Compare
...t/segment/local/segment/index/loader/InvertedIndexAndDictionaryBasedForwardIndexCreator.java
Outdated
Show resolved
Hide resolved
...t/segment/local/segment/index/loader/InvertedIndexAndDictionaryBasedForwardIndexCreator.java
Show resolved
Hide resolved
...t/segment/local/segment/index/loader/InvertedIndexAndDictionaryBasedForwardIndexCreator.java
Outdated
Show resolved
Hide resolved
...t/segment/local/segment/index/loader/InvertedIndexAndDictionaryBasedForwardIndexCreator.java
Outdated
Show resolved
Hide resolved
...t/segment/local/segment/index/loader/InvertedIndexAndDictionaryBasedForwardIndexCreator.java
Outdated
Show resolved
Hide resolved
...t/segment/local/segment/index/loader/InvertedIndexAndDictionaryBasedForwardIndexCreator.java
Show resolved
Hide resolved
...t/segment/local/segment/index/loader/InvertedIndexAndDictionaryBasedForwardIndexCreator.java
Show resolved
Hide resolved
...t/segment/local/segment/index/loader/InvertedIndexAndDictionaryBasedForwardIndexCreator.java
Outdated
Show resolved
Hide resolved
...t/segment/local/segment/index/loader/InvertedIndexAndDictionaryBasedForwardIndexCreator.java
Show resolved
Hide resolved
...l/src/main/java/org/apache/pinot/segment/local/segment/index/loader/ForwardIndexHandler.java
Outdated
Show resolved
Hide resolved
...l/src/main/java/org/apache/pinot/segment/local/segment/index/loader/ForwardIndexHandler.java
Outdated
Show resolved
Hide resolved
} | ||
|
||
private void updateMetadataProperties(File indexDir, Map<String, String> metadataProperties) | ||
static void updateMetadataProperties(File indexDir, Map<String, String> metadataProperties) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm wondering if this will actually work in stateless / static fashion. Caller need not do anything after properties.save()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I follow. properties.save()
will save the properties into the metadata file. After this the next step is to cleanup indexes and stuff which is not related to this.
|
||
|
||
/** | ||
* Base class for all of the {@link IndexHandler} classes which need to build their indexes from the forward index. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a bit confused by this.
IIUC, a particular IndexHandler
is supposed to create that index or remove / cleanup / update that index.
We recently added ForwardIndexHandler
as well which can rewrite forward index (for work done in #9348) and also remove forward index for noForwardIndex feature.
Similarly this PR uses ForwardIndexHandler to rebuild forward index using some other structures.
So I am not sure if above statement "all of the IndexHandler classes which need to build their indexes from the forward index. is exactly true or intuitive ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a list of indexes that require the forward index to build their indexes, such as inverted index, JSON index, etc. For most scenarios within the ForwardIndexHandler
we too need the forward index to modify the forward index (except when we enable forward index for columns where it's disabled). Does that clarify your question? Yes each IndexHandler
needs to deal with remove / cleanup / update of its own index, but they often rely on forward index to do so.
I'm trying to abstract away the common logic needed by all such IndexHandlers
as they all may require the need to create a temp forward index and clean it up later on. I can avoid adding this abstract class but then we'll have to duplicate this logic in all such IndexHandlers
. Let me know if this is preferred over what I'm doing now.
...a/org/apache/pinot/segment/local/segment/index/loader/BaseForwardIndexBasedIndexHandler.java
Outdated
Show resolved
Hide resolved
...a/org/apache/pinot/segment/local/segment/index/loader/BaseForwardIndexBasedIndexHandler.java
Outdated
Show resolved
Hide resolved
...a/org/apache/pinot/segment/local/segment/index/loader/BaseForwardIndexBasedIndexHandler.java
Outdated
Show resolved
Hide resolved
...l/src/main/java/org/apache/pinot/segment/local/segment/index/loader/ForwardIndexHandler.java
Outdated
Show resolved
Hide resolved
...l/src/main/java/org/apache/pinot/segment/local/segment/index/loader/ForwardIndexHandler.java
Outdated
Show resolved
Hide resolved
fc074ef
to
30246e3
Compare
3704d19
to
af94ed7
Compare
...rg/apache/pinot/segment/local/segment/index/loader/defaultcolumn/V3DefaultColumnHandler.java
Show resolved
Hide resolved
...rg/apache/pinot/segment/local/segment/index/loader/defaultcolumn/V3DefaultColumnHandler.java
Show resolved
Hide resolved
...a/org/apache/pinot/segment/local/segment/index/loader/BaseForwardIndexBasedIndexHandler.java
Outdated
Show resolved
Hide resolved
...a/org/apache/pinot/segment/local/segment/index/loader/BaseForwardIndexBasedIndexHandler.java
Outdated
Show resolved
Hide resolved
...a/org/apache/pinot/segment/local/segment/index/loader/BaseForwardIndexBasedIndexHandler.java
Outdated
Show resolved
Hide resolved
@@ -845,6 +845,69 @@ public void testSelectQueriesWithReload() | |||
assertEquals(resultRow[0], 240528); | |||
assertEquals(resultRow[1], "gFuH"); | |||
} | |||
|
|||
// Re-enable forward index for column9, column11, and column6 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a test for a simple SELECT *
or SELECT reenabledColumn FROM Foo LIMIT <MAX> ORDER BY <something>
.
Basically no WHERE clause.
This will have to read the entire forward index and can be a good correctness test that entire forward index was generated correctly.
Something like
- Generate proper regular segment that has forward index.
- Gather resultSet1
- Disable forward index on a particular column via the reload path
- Query should fail
- Rebuild forward index via the reload path
- Gather resultSet2
resultSet1 should be same as resultSet2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest doing this for both SV and MV columns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done. one issue here, the MV column test is taking 7-8 minutes to run :( it passes but maybe this is too long. Any suggestions to improve this?
Also couldn't run SELECT *
because some other columns still have forward index disabled in these tests. So aded a test for selecting the specific column instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we expect 7-8 mins ? That's quite long for reload to take on a single segment single column.
I think it will be useful to double check the implementation once and see if there isn't anything obvious sub-optimal stuff going on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reload isn't taking 7-8 minutes. The verification is taking that long. Need to walk through 400K rows, construct the sets for before and after and validate that they match. I did validate that the reload part is quick, so there shouldn't be a major performance issue with the reload part itself.
...t/segment/local/segment/index/loader/InvertedIndexAndDictionaryBasedForwardIndexCreator.java
Outdated
Show resolved
Hide resolved
...t/segment/local/segment/index/loader/InvertedIndexAndDictionaryBasedForwardIndexCreator.java
Outdated
Show resolved
Hide resolved
private void trackLengthOfLongestEntry(Dictionary dictionary, int[] lengthOfLongestEntry, int dictId) { | ||
switch (_columnMetadata.getFieldSpec().getDataType().getStoredType()) { | ||
case STRING: | ||
lengthOfLongestEntry[0] = Math.max(dictionary.getStringValue(dictId).length(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we need to track the length as physical number of bytes (UTF-8 encoded)
as opposed to String.length()
which reports number of characters ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops, good catch! i had a note in my TODO list for this and forgot to update it. I've fixed it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm I am worried how were the tests passing then ? ideally any test that runs a SELECT query and tries to read forward index should have failed because of the incorrect value of metadata ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried printing the value and walking through the debugger too and found that both values come out to be the same in my tests:
Example log that I added:
LOGGER.warn("*********** old len: {}, new len: {}", dictionary.getStringValue(dictId).length(), dictionary.getStringValue(dictId).getBytes(UTF_8).length);
test logs:
19:17:05.932 WARN [InvertedIndexAndDictionaryBasedForwardIndexCreator] [main] *********** old len: 18, new len: 18
19:17:05.939 WARN [InvertedIndexAndDictionaryBasedForwardIndexCreator] [main] *********** old len: 7, new len: 7
19:17:05.939 WARN [InvertedIndexAndDictionaryBasedForwardIndexCreator] [main] *********** old len: 15, new len: 15
19:17:05.940 WARN [InvertedIndexAndDictionaryBasedForwardIndexCreator] [main] *********** old len: 6, new len: 6
19:17:05.940 WARN [InvertedIndexAndDictionaryBasedForwardIndexCreator] [main] *********** old len: 6, new len: 6
19:17:05.940 WARN [InvertedIndexAndDictionaryBasedForwardIndexCreator] [main] *********** old len: 4, new len: 4
19:17:05.941 WARN [InvertedIndexAndDictionaryBasedForwardIndexCreator] [main] *********** old len: 26, new len: 26
19:17:05.941 WARN [InvertedIndexAndDictionaryBasedForwardIndexCreator] [main] *********** old len: 8, new len: 8
19:17:05.941 WARN [InvertedIndexAndDictionaryBasedForwardIndexCreator] [main] *********** old len: 19, new len: 19
So essentially they calculate the same thing, each UTF8 character is stored as a single byte. Does that make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes thanks for confirming and fixing.
...t/segment/local/segment/index/loader/InvertedIndexAndDictionaryBasedForwardIndexCreator.java
Outdated
Show resolved
Hide resolved
…ard index from inverted index and dictionary
2646460
to
d537783
Compare
@somandal - looks good overall. Since this is quite big already and we have iterated upon it few times, I am merging it. I think the runtime of MV column should be looked into once to see there isn't any performance issue. Secondly, it will be good to take another pass at the new file added to see if some methods can be refactored into smaller ones. Thanks for working on this. |
Thanks so much, appreciate your patience with this long PR! I'll look into:
As for this:
The MV test taking 7-8 minutes is spending most of that time on verification and not on reload. So I don't anticipate a major issue in the reload path for performance purposes, but I can definitely look into whether there are any optimization opportunities. |
Related OSS Issue: #6473
Related OSS PRs:
#9333
#9740
Document outlining the breakup of PRs and the approach to solve various sub-problems of this issue: https://docs.google.com/document/d/1MNLLhYCg5e-UFBQ6wTBODd41sDsbjevwRfwoGuNowWw/edit?usp=sharing
This PR adds support for the following operations related to forward index disabled columns:
This PR does not add support for the following (and there will be follow up PRs to handle these):
Tests includes:
cc @siddharthteotia @Jackie-Jiang @vvivekiyer