Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for ALL_RESOURCES key to disabled partitions #2848

Merged
merged 5 commits into from
Jul 25, 2024

Conversation

GrantPSpencer
Copy link
Contributor

@GrantPSpencer GrantPSpencer commented Jul 24, 2024

Issues

  • My PR addresses the following Helix issues and references them in the PR description:

With the recent changes to InstanceOperation, Helix allows users to set their instances to DISABLE where all partitions are gracefully transited to the offline state. Likewise, customers can also disable specific partitions to transit them to offline (CRUSHED) or move them off the node (WAGED). However, because there can only be 1 active InstanceOperation on a node at a time, a user cannot DISABLE an evacuating or SWAP_IN instance without overriding that operation. There are scenarios where a user may want to ensure that the instance being operated on immediately downward state transits its partitions or does not receive any upward state transitions while it is being operated on.

Description

  • Here are some details about my PR, including screenshots of any UI changes:

This change adds support for the ALL_RESOURCES key in the disabled partitions map of an instance's config. This change mirrors the current behavior of disabling a specific partition on a node, but extends that to all partitions in the cluster.

The main area this change effects are:
WAGED placement calculation - AssignableNode and ReplicaActivateConstraint
DelayedAutoRebalancer - computeBestPossiblePartitionState (responsible for forcing partitions down to offline)

There will likely need to be discussion had on how to best reduce complexity of adding the "ALL_RESOURCES" key. Future code may need to explicitly check for this key depending on what methods they are leveraging. which will likely not be obvious for developers

Tests

  • The following tests are written for this issue:
    Basic Functionality
    testDisableAllPartitions in TestDisablePartitions.java

Test Asserting Behavior Alongside Instance Operation
testEvacuateWithDisabledPartition in TestInstanceOperation.java

Changes that Break Backward Compatibility (Optional)

  • My PR contains changes that break backward compatibility or previous assumptions for certain methods or API. They include:

N/A

Commits

  • My commits all reference appropriate Apache Helix GitHub issues in their subject lines. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Code Quality

  • My diff has been formatted using helix-style.xml
    (helix-style-intellij.xml if IntelliJ IDE is used)

Copy link
Contributor

@zpinto zpinto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM, one minor comment. Nice work!

Copy link
Contributor

@junkaixue junkaixue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@junkaixue
Copy link
Contributor

@GrantPSpencer This test failed: Test failed: testDisablePartition(org.apache.helix.integration.TestDisablePartition) Time elapsed: 30.279 s <<< FAILURE!

It is related to your change. Make sure this is fixed.

@GrantPSpencer
Copy link
Contributor Author

@GrantPSpencer This test failed: Test failed: testDisablePartition(org.apache.helix.integration.TestDisablePartition) Time elapsed: 30.279 s <<< FAILURE!

It is related to your change. Make sure this is fixed.

I was not cleaning up the resources I added solely for the new test method (testDisableAllPartitions) in testDisablePartition class. This caused a downstream issue seen in the failed test. Pushing change now and will trigger a few cI runs on my personal fork to confirm not flaky

@GrantPSpencer
Copy link
Contributor Author

Pull request approved @junkaixue, @zpinto
Commit message: Add support for ALL_RESOURCES key to disabled partitions

@junkaixue junkaixue merged commit eda45fc into apache:master Jul 25, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants