-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Validate the numbers of input and output files in HadoopSegmentCreationJob #8098
Conversation
Codecov Report
@@ Coverage Diff @@
## master #8098 +/- ##
============================================
- Coverage 71.20% 64.73% -6.47%
- Complexity 4303 4304 +1
============================================
Files 1617 1572 -45
Lines 83800 81919 -1881
Branches 12517 12313 -204
============================================
- Hits 59669 53030 -6639
- Misses 20079 25125 +5046
+ Partials 4052 3764 -288
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
What if user wants to replace an existing segment with a new generated one? |
82cd206
to
5f1f61a
Compare
If user wants to replace an existing segment, he/she can still use the current logic to do that. I've update the logic of the PR to validate the number of input and output files. Since there is 1:1 mapping between the input and output files, if these two number doesn't match, we should fail the job. Sample log from the mapper:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. The changes look simple enough, but did you happen to test it?
@sajjad-moradi Yes, that's tested in the Hadoop env. |
Description
If the
exclude.sequence.id
property is set totrue
, segment name will not have the sequence id as the suffix. If there are multiple input files for the same day within the batch job, all these new segments will share the same segment name, which leads to the scenario that only 1 pinot segment instead of N stored in the file system.This PR adds the validation between the number of input files and the number of output files, so that once these two number don't match, an exception will be thrown and then fail the job.
Upgrade Notes
Does this PR prevent a zero down-time upgrade? (Assume upgrade order: Controller, Broker, Server, Minion)
backward-incompat
, and complete the section below on Release Notes)Does this PR fix a zero-downtime upgrade introduced earlier?
backward-incompat
, and complete the section below on Release Notes)Does this PR otherwise need attention when creating release notes? Things to consider:
release-notes
and complete the section on Release Notes)Release Notes
Documentation