-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use oldest offset on newly detected partitions #7756
Conversation
Fixed the issue where Pinot was losing data when it detected new stream partitions (depending on table configuration). Manual testing by running LLCRealtimeClusterIntegrationTest via debugger - Change the table config to start from largest offset - Force the test to detect only one partition, notice that roughly half the rows are ingested, and only partiton 0 shows up in idealstate - Run the RealtimeSegmentValidationManager job via swagger to force detection of new partition. - Confirmed that all rows are now present. Issue apache#7741
Codecov Report
@@ Coverage Diff @@
## master #7756 +/- ##
============================================
- Coverage 71.63% 71.55% -0.08%
Complexity 4064 4064
============================================
Files 1577 1577
Lines 80542 80589 +47
Branches 11965 11974 +9
============================================
- Hits 57694 57665 -29
- Misses 18971 19049 +78
+ Partials 3877 3875 -2
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
String startOffset = partitionGroupMetadata.getStartOffset().toString(); | ||
StreamPartitionMsgOffset startOffset; | ||
if (isLiveTable) { | ||
startOffset = getPartitionGroupSmallestOffset(streamConfig, partitionGroupId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method is actually overriding the offset criteria and call getNewPartitionGroupMetadataList()
to get all PartitionGroupMetadata
but only return the offset for one partition. Handling it here is not efficient, and also not avoiding the problem of overriding the stream config. I slightly prefer the fix in #7743
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not in the performance path, and I am not worried about that at all. Also, the override is very localized, so it is evident as you read the method that it is being overridden (by constructing a new object appropriately named)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough. Let's go with this approach and probably restructure the code in the future.
* Use oldest offset on newly detected partitions Fixed the issue where Pinot was losing data when it detected new stream partitions (depending on table configuration). Manual testing by running LLCRealtimeClusterIntegrationTest via debugger - Change the table config to start from largest offset - Force the test to detect only one partition, notice that roughly half the rows are ingested, and only partiton 0 shows up in idealstate - Run the RealtimeSegmentValidationManager job via swagger to force detection of new partition. - Confirmed that all rows are now present. Issue apache#7741 * Fixed linter error
Description
Fixed the issue #7741 where Pinot was losing data when it detected new stream partitions
(depending on table configuration).
Manual testing by running LLCRealtimeClusterIntegrationTest via debugger
half the rows are ingested, and only partiton 0 shows up in idealstate
of new partition.
Upgrade Notes
Does this PR prevent a zero down-time upgrade? (Assume upgrade order: Controller, Broker, Server, Minion)
backward-incompat
, and complete the section below on Release Notes)Does this PR fix a zero-downtime upgrade introduced earlier?
backward-incompat
, and complete the section below on Release Notes)Does this PR otherwise need attention when creating release notes? Things to consider:
release-notes
and complete the section on Release Notes)Release Notes
Documentation