Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow segment upload via Metadata in MergeRollup Minion task #9825
Allow segment upload via Metadata in MergeRollup Minion task #9825
Changes from 5 commits
a0a425a
8944ea4
7280b64
ce6c8fe
0cd020b
2bed0e6
cced16b
c691c0c
b868117
169332f
198c296
4486ae8
1ffb275
72efe9e
8b4dd1a
ee76795
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel that we should check push mode first before moving the segment. Otherwise, this middle step is not needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
As @zhtaoxiang mentioned, we should NOT move the segments to deep storage when we do the segment tar file push. In that case, the controller will move the segment to the deep storage during the upload phase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if the Minion somehow got killed (e.g. restart) after finish the segment move to outputPinotFS but before finishing the metadata push to controller?
I think that this can potentially leave some files in the deep storage without any Pinot side segment metadata pointing those files. This will potentially become the problem unless we have the cleanup mechanism. Do we have any design/solution to address this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One solution can be to always mention a seperate OUTPUT_DIR and always clean that dir in case the task fails. Even if we fail to clean the outputDir it should not be a problem if it is seperate from the deep store.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I think this may not be a problem. Because Minion task should be idempotent, a later retry should generate segments with the same name and the existing ones will be reused or be override.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not assume that the later retry will always come to finish this particular task. Similarly, the controller can be down and fail the metadata update. If the controller recovery happens after the minion's retry attempt fails, I think that this will cause some files to sit at the deep storage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here