-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix the replication in segment assignment strategy #9816
Conversation
@mcvsubbu @sajjad-moradi @jugomezv @snleee : please review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@GSharayu Do we have a case where we pick up BalancedNumSegmentAssignment
for realtime table? I thought that we haven't wired the segment assignment strategy for realtime tables yet.
@@ -51,6 +51,10 @@ public void init(HelixManager helixManager, TableConfig tableConfig) { | |||
SegmentsValidationAndRetentionConfig validationAndRetentionConfig = tableConfig.getValidationConfig(); | |||
Preconditions.checkState(validationAndRetentionConfig != null, "Validation Config is null"); | |||
_replication = validationAndRetentionConfig.getReplicationNumber(); | |||
// Number of replicas per partition of low-level consumers check is for the real time tables only | |||
if (validationAndRetentionConfig.getReplicasPerPartition() != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to check if the table is offline or real-time and then decide where to pull the replication data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to wrap all the logic for whether to use replication
or replicasPerPartition
within the TableConfig itself? That way, tableConfig.getReplicationNumber
should just return the correct value depending on whether it is offline or realtime table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we saw real time tables using balanced strategy in Production. @jugomezv can you please confirm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@navina We can wire that in table config but it would also mean cleaning up at all the other places we use this. I would so keep that out of scope for this fix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please tag this PR with the (open) issue #8804.
Since we have a problem in production, we should fix in some way (add TODOs as needed), but fix the overall replication/replicasPerPartition as a part of that issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Codecov Report
@@ Coverage Diff @@
## master #9816 +/- ##
============================================
+ Coverage 60.58% 70.26% +9.68%
+ Complexity 5281 4998 -283
============================================
Files 1949 1964 +15
Lines 104632 105033 +401
Branches 15847 15896 +49
============================================
+ Hits 63389 73803 +10414
+ Misses 36540 26096 -10444
- Partials 4703 5134 +431
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
Can we also add the testing case to cover this? We should add the test where we set |
@@ -51,6 +52,10 @@ public void init(HelixManager helixManager, TableConfig tableConfig) { | |||
SegmentsValidationAndRetentionConfig validationAndRetentionConfig = tableConfig.getValidationConfig(); | |||
Preconditions.checkState(validationAndRetentionConfig != null, "Validation Config is null"); | |||
_replication = validationAndRetentionConfig.getReplicationNumber(); | |||
// Number of replicas per partition of low-level consumers check is for the real time tables only | |||
if (tableConfig.getTableType() == TableType.REALTIME) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add the TODO
comment that we can clean this up once the table config has the new API that picks up the correct replication depending on the table type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added a todo and a unit test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Could you add a unit test?
_replication = validationAndRetentionConfig.getReplicationNumber(); | ||
// Number of replicas per partition of low-level consumers check is for the real time tables only | ||
if (tableConfig.getTableType() == TableType.REALTIME) { | ||
_replication = validationAndRetentionConfig.getReplicasPerPartitionNumber(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
_replication = tableConfig.getTableType() == TableType.REALTIME
? validationAndRetentionConfig.getReplicasPerPartitionNumber()
: validationAndRetentionConfig.getReplicationNumber();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
9edbf3f
to
ee3aab0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
The Balanced and Replica group segment assignment strategy would assume replication to be fetched from only table config as tableConfig.getValidationConfig().getReplicationNumber() irrespective of whether its a realtime table or offline table
But for real time tables, if we have tableConfig.getValidationConfig().getReplicasPerPartitionNumber(), this should be preferred over table replication
No additional changes are required
Reference Issue: #9309
Also, issue #8804