-
Notifications
You must be signed in to change notification settings - Fork 228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for ALL_RESOURCES key to disabled partitions #2848
Add support for ALL_RESOURCES key to disabled partitions #2848
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM, one minor comment. Nice work!
helix-core/src/test/java/org/apache/helix/integration/rebalancer/TestInstanceOperation.java
Show resolved
Hide resolved
...core/src/main/java/org/apache/helix/controller/dataproviders/BaseControllerDataProvider.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
@GrantPSpencer This test failed: Test failed: testDisablePartition(org.apache.helix.integration.TestDisablePartition) Time elapsed: 30.279 s <<< FAILURE! It is related to your change. Make sure this is fixed. |
I was not cleaning up the resources I added solely for the new test method (testDisableAllPartitions) in testDisablePartition class. This caused a downstream issue seen in the failed test. Pushing change now and will trigger a few cI runs on my personal fork to confirm not flaky |
Pull request approved @junkaixue, @zpinto |
Issues
With the recent changes to InstanceOperation, Helix allows users to set their instances to DISABLE where all partitions are gracefully transited to the offline state. Likewise, customers can also disable specific partitions to transit them to offline (CRUSHED) or move them off the node (WAGED). However, because there can only be 1 active InstanceOperation on a node at a time, a user cannot DISABLE an evacuating or SWAP_IN instance without overriding that operation. There are scenarios where a user may want to ensure that the instance being operated on immediately downward state transits its partitions or does not receive any upward state transitions while it is being operated on.
Description
This change adds support for the ALL_RESOURCES key in the disabled partitions map of an instance's config. This change mirrors the current behavior of disabling a specific partition on a node, but extends that to all partitions in the cluster.
The main area this change effects are:
WAGED placement calculation - AssignableNode and ReplicaActivateConstraint
DelayedAutoRebalancer - computeBestPossiblePartitionState (responsible for forcing partitions down to offline)
There will likely need to be discussion had on how to best reduce complexity of adding the "ALL_RESOURCES" key. Future code may need to explicitly check for this key depending on what methods they are leveraging. which will likely not be obvious for developers
Tests
Basic Functionality
testDisableAllPartitions in TestDisablePartitions.java
Test Asserting Behavior Alongside Instance Operation
testEvacuateWithDisabledPartition in TestInstanceOperation.java
Changes that Break Backward Compatibility (Optional)
N/A
Commits
Code Quality
(helix-style-intellij.xml if IntelliJ IDE is used)