Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test 2886 #19

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from
Draft

Test 2886 #19

wants to merge 3 commits into from

Conversation

htin1
Copy link
Owner

@htin1 htin1 commented Aug 22, 2024

Issues

Description

  • Here are some details about my PR, including screenshots of any UI changes:

(Write a concise description including what, why, how)

Tests

  • The following tests are written for this issue:

(List the names of added unit/integration tests)

  • The following is the result of the "mvn test" command on the appropriate module:

(If CI test fails due to known issue, please specify the issue and test PR locally. Then copy & paste the result of "mvn test" to here.)

Changes that Break Backward Compatibility (Optional)

  • My PR contains changes that break backward compatibility or previous assumptions for certain methods or API. They include:

(Consider including all behavior changes for public methods or API. Also include these changes in merge description so that other developers are aware of these changes. This allows them to make relevant code changes in feature branches accounting for the new method/API behavior.)

Documentation (Optional)

  • In case of new functionality, my PR adds documentation in the following wiki page:

(Link the GitHub wiki you added)

Commits

  • My commits all reference appropriate Apache Helix GitHub issues in their subject lines. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Code Quality

  • My diff has been formatted using helix-style.xml
    (helix-style-intellij.xml if IntelliJ IDE is used)

Copy link

palmier-bot bot commented Aug 22, 2024

Here's the summary of the provided GitHub pull request patch:

Issues

No specific issues were mentioned in the provided patch.

Description

This pull request introduces changes to the Apache Helix codebase, primarily focusing on improving the maintenance management service and the ZkBucketDataAccessor. The main changes include:

  1. Modifying the stoppable check process to perform the minimum active replica check sequentially in maintenance zones.
  2. Refactoring the HealthCheck enum to separate the MIN_ACTIVE_REPLICA_CHECK_FAILED as a static string.
  3. Enhancing the ZkBucketDataAccessor to ensure scheduled garbage collection (GC) tasks are always completed.

These changes aim to improve the reliability and efficiency of instance health checks and data management in Apache Helix.

Test

New tests and modifications to existing tests were introduced:

  1. TestMaintenanceManagementService: Modified to accommodate changes in the MaintenanceManagementService class.
  2. TestInstancesAccessor: Added a new test method testMultipleReplicasInSameMZ() to verify the behavior of stoppable checks with multiple replicas in the same maintenance zone.
  3. TestZkBucketDataAccessor: Added a new test method testGCScheduler() to ensure bucket GC still occurs in high-frequency write scenarios.

Changes that Break Backward Compatibility

N/A

Documentation

N/A

@htin1 htin1 closed this Aug 22, 2024
@htin1 htin1 reopened this Aug 22, 2024
Copy link

palmier-bot bot commented Aug 22, 2024

Here's the summary of the provided GitHub pull request patch:

Issues

No specific issues were mentioned in the provided patch.

Description

This pull request introduces changes to the Apache Helix codebase, primarily focusing on improving the maintenance management service and the ZkBucketDataAccessor. The main changes include:

  1. Modifying the stoppable check process to perform the minimum active replica check sequentially in maintenance zones.
  2. Refactoring the HealthCheck enum to separate the MIN_ACTIVE_REPLICA_CHECK_FAILED as a static string.
  3. Enhancing the ZkBucketDataAccessor to ensure scheduled garbage collection (GC) tasks are always completed.

These changes aim to improve the reliability and efficiency of instance health checks and data management in Apache Helix.

Test

Several tests were modified and added:

  1. TestMaintenanceManagementService: Updated to reflect changes in the MaintenanceManagementService class.
  2. TestInstancesAccessor: Added a new test testMultipleReplicasInSameMZ() to verify the behavior of stoppable checks with multiple replicas in the same maintenance zone.
  3. TestZkBucketDataAccessor: Added a new test testGCScheduler() to ensure bucket GC still occurs in high-frequency write scenarios.

Changes that Break Backward Compatibility

N/A

Documentation

N/A

@htin1 htin1 closed this Aug 22, 2024
@htin1 htin1 reopened this Aug 22, 2024
Copy link

palmier-bot bot commented Aug 22, 2024

Here's the summary of the provided GitHub pull request patch:

Issues

No specific issues were mentioned in the provided patch.

Description

This pull request introduces changes to the Apache Helix codebase, primarily focusing on improving the maintenance management service and the ZkBucketDataAccessor. The main changes include:

  1. Modifying the stoppable check process to perform the minimum active replica check sequentially in maintenance zones.
  2. Refactoring the HealthCheck enum to separate the MIN_ACTIVE_REPLICA_CHECK_FAILED as a static string.
  3. Enhancing the ZkBucketDataAccessor to ensure scheduled garbage collection (GC) tasks are always completed.

These changes aim to improve the reliability and efficiency of instance health checks and data management in Apache Helix.

Test

New tests and modifications to existing tests were introduced:

  1. TestMaintenanceManagementService: Modified to accommodate changes in the MaintenanceManagementService class.
  2. TestInstancesAccessor: Added a new test method testMultipleReplicasInSameMZ() to verify the behavior of stoppable checks with multiple replicas in the same maintenance zone.
  3. TestZkBucketDataAccessor: Added a new test method testGCScheduler() to ensure bucket GC still occurs in high-frequency write scenarios.

Changes that Break Backward Compatibility

N/A

Documentation

N/A

@htin1 htin1 closed this Aug 22, 2024
@htin1 htin1 reopened this Aug 22, 2024
Copy link

palmier-bot bot commented Aug 22, 2024

Here's the summary of the provided GitHub pull request patch:

Issues

No specific issues were mentioned in the provided patch.

Description

This pull request introduces changes to the Apache Helix codebase, primarily focusing on improving the maintenance management service and the ZkBucketDataAccessor. The main changes include:

  1. Modifying the stoppable check process to perform the minimum active replica check sequentially in maintenance zones.
  2. Refactoring the HealthCheck enum to separate the MIN_ACTIVE_REPLICA_CHECK_FAILED as a static string.
  3. Enhancing the ZkBucketDataAccessor to ensure scheduled garbage collection (GC) tasks are always completed.

These changes aim to improve the reliability and efficiency of instance health checks and data management in Apache Helix.

Test

Several tests were modified and added:

  1. TestMaintenanceManagementService: Updated to reflect changes in the MaintenanceManagementService class.
  2. TestInstancesAccessor: Added a new test testMultipleReplicasInSameMZ() to verify the behavior of stoppable checks with multiple replicas in the same maintenance zone.
  3. TestZkBucketDataAccessor: Added a new test testGCScheduler() to ensure bucket GC still occurs in high-frequency write scenarios.

Changes that Break Backward Compatibility

N/A

Documentation

N/A

@htin1 htin1 closed this Aug 22, 2024
@htin1 htin1 reopened this Aug 22, 2024
Copy link

palmier-bot bot commented Aug 22, 2024

Here's the summary of the pull request based on the provided patch:

Issues

[apache#2885] - Controller can violate fault zone placement constraints during intermediate state
[apache#200] - Throw new Exception to avoid ugly NPE

Description

This pull request addresses two main issues:

  1. It changes the stoppable check in MaintenanceManagementService to perform the minimum active replica check sequentially instead of in parallel. This is to account for scenarios where multiple replicas might exist in the same maintenance zone.

  2. It modifies the ZkBucketDataAccessor to ensure that scheduled garbage collection (GC) tasks are always completed, even in high-frequency write scenarios.

The changes include:

  • Modifying the HealthCheck enum to separate the MIN_ACTIVE_REPLICA_CHECK_FAILED check
  • Updating the MaintenanceManagementService to perform the min active replica check sequentially
  • Refactoring the ZkBucketDataAccessor to improve GC scheduling and execution
  • Adding a new test case testGCScheduler to verify GC behavior in high-frequency write scenarios

Test

New test added:

  • testGCScheduler in TestZkBucketDataAccessor to ensure bucket GC still occurs in high-frequency write scenarios

Modified test:

  • TestMaintenanceManagementService updated to reflect changes in stoppable check logic

Changes that Break Backward Compatibility

N/A

Documentation

N/A

@htin1 htin1 closed this Aug 22, 2024
@htin1 htin1 reopened this Aug 22, 2024
Copy link

palmier-bot bot commented Aug 22, 2024

Here's the summary of the pull request based on the provided patch:

Issues

[apache#2885] - Controller can violate fault zone placement constraints during intermediate state
[apache#200] - Throw new Exception to avoid ugly NPE

Description

This pull request addresses two main issues:

  1. It changes the stoppable check in MaintenanceManagementService to perform the minimum active replica check sequentially instead of in parallel. This is to account for scenarios where multiple replicas might exist in the same maintenance zone.

  2. It modifies the ZkBucketDataAccessor to ensure that scheduled garbage collection (GC) tasks are always completed, even in high-frequency write scenarios.

The changes include:

  • Modifying the HealthCheck enum to separate the MIN_ACTIVE_REPLICA_CHECK_FAILED check
  • Updating the MaintenanceManagementService to perform the min active replica check sequentially
  • Refactoring the ZkBucketDataAccessor to improve GC scheduling and execution

Test

New test added:

  • testMultipleReplicasInSameMZ in TestInstancesAccessor to verify stoppable check behavior with multiple replicas in the same maintenance zone
  • testGCScheduler in TestZkBucketDataAccessor to ensure GC occurs in high-frequency write scenarios

Changes that Break Backward Compatibility

N/A

Documentation

N/A

@htin1 htin1 closed this Aug 22, 2024
@htin1 htin1 reopened this Aug 22, 2024
Copy link

palmier-bot bot commented Aug 22, 2024

Here's the summary of the pull request based on the provided patch and related issues:

Issues

[apache#2885] - Controller can violate fault zone placement constraints during intermediate state
[apache#200] - Throw new Exception to avoid ugly NPE

Description

This pull request addresses two main issues:

  1. It changes the stoppable check in MaintenanceManagementService to perform the minimum active replica check sequentially instead of in parallel. This is done to account for scenarios where multiple replicas might exist in the same maintenance zone, which was previously not considered.

  2. It modifies the ZkBucketDataAccessor to ensure that scheduled garbage collection (GC) tasks are always completed, even in high-frequency write scenarios.

The changes include:

  • Modifying the HealthCheck enum to separate the MIN_ACTIVE_REPLICA_CHECK_FAILED check
  • Updating the MaintenanceManagementService to perform the min active replica check sequentially
  • Refactoring the ZkBucketDataAccessor to improve GC scheduling and execution
  • Adding a new test case to verify GC behavior in high-frequency write scenarios

Test

A new test method testGCScheduler() was added to TestZkBucketDataAccessor to ensure bucket GC still occurs in high-frequency write scenarios.

Changes that Break Backward Compatibility

N/A

Documentation

N/A

@htin1 htin1 closed this Aug 23, 2024
@htin1 htin1 reopened this Aug 23, 2024
Copy link

palmier-bot bot commented Aug 23, 2024

Here's the summary of the pull request based on the provided patch:

Issues

[apache#2873] - ZkBucketDataAccessor always complete scheduled GC

Description

This pull request modifies the ZkBucketDataAccessor class to improve the garbage collection (GC) process for stale versions in ZooKeeper. The main changes include:

  1. Refactoring the GC scheduling logic to ensure that GC tasks are always completed, even in high-frequency write scenarios.
  2. Modifying the HealthCheck enum to separate the MIN_ACTIVE_REPLICA_CHECK_FAILED check from the main enum list.
  3. Changing the MaintenanceManagementService class to perform the min active replica check sequentially, improving the accuracy of stoppable instance selection.

These changes aim to enhance the reliability and efficiency of the ZooKeeper bucket data management and maintenance processes in Apache Helix.

Test

New test added:

  • testGCScheduler() in TestZkBucketDataAccessor: Ensures that bucket GC still occurs in high-frequency write scenarios.

Modified test:

  • TestMaintenanceManagementService: Updated to reflect changes in the MaintenanceManagementService class.

Changes that Break Backward Compatibility

The MIN_ACTIVE_REPLICA_CHECK_FAILED check has been removed from the HealthCheck enum and is now a separate static String. This may affect code that directly references this enum value.

Documentation

N/A

@GrantPSpencer
Copy link

Great code. Brilliant code. Some might even say this is the best code ever written

@htin1 htin1 closed this Aug 23, 2024
@htin1 htin1 reopened this Aug 23, 2024
Copy link

palmier-bot bot commented Aug 23, 2024

Description

This pull request introduces changes to improve the garbage collection (GC) mechanism in the ZkBucketDataAccessor class. The main changes include:

  1. Modifying the GC scheduling logic to ensure that GC tasks are always completed, even in high-frequency write scenarios.
  2. Refactoring the deleteStaleVersions method to determine the most recent version at execution time rather than passing it as a parameter.
  3. Changing the MIN_ACTIVE_REPLICA_CHECK_FAILED from an enum value to a static String in the HealthCheck class.
  4. Updating the MaintenanceManagementService class to use the new MIN_ACTIVE_REPLICA_CHECK_FAILED String constant.

These changes aim to address the issue where GC tasks were being constantly rescheduled and cancelled in high-frequency write scenarios, preventing them from executing.

Test

A new test method testGCScheduler has been added to TestZkBucketDataAccessor class. This test ensures that bucket GC still occurs in high-frequency write scenarios by:

  1. Creating a ZkBucketDataAccessor with a short GC TTL (1 second).
  2. Performing multiple writes in quick succession.
  3. Verifying that the number of child nodes is less than the number of writes, indicating that stale versions have been cleaned up.

Changes that Break Backward Compatibility

The MIN_ACTIVE_REPLICA_CHECK_FAILED enum value in the HealthCheck class has been changed to a static String constant. This may break backward compatibility for code that directly references this enum value.

Documentation

N/A

@htin1 htin1 closed this Aug 23, 2024
@htin1 htin1 reopened this Aug 23, 2024
Copy link

palmier-bot bot commented Aug 23, 2024

Description

This pull request introduces changes to improve the garbage collection (GC) mechanism in the ZkBucketDataAccessor class. The main changes include:

  1. Modifying the GC scheduling logic to ensure that GC tasks are always completed, even in high-frequency write scenarios.
  2. Refactoring the deleteStaleVersions method to determine the most recent version at execution time rather than passing it as a parameter.
  3. Changing the MIN_ACTIVE_REPLICA_CHECK_FAILED from an enum value to a static String in the HealthCheck class.
  4. Updating the MaintenanceManagementService class to use the new MIN_ACTIVE_REPLICA_CHECK_FAILED String constant.

These changes aim to address the issue where GC tasks were being constantly rescheduled and cancelled in high-frequency write scenarios, preventing them from executing.

Test

A new test method testGCScheduler has been added to TestZkBucketDataAccessor class. This test ensures that bucket GC still occurs in high-frequency write scenarios by:

  1. Creating a ZkBucketDataAccessor with a short GC TTL (1 second).
  2. Performing multiple writes in quick succession.
  3. Verifying that the number of child nodes is less than the number of writes, indicating that stale versions have been cleaned up.

Changes that Break Backward Compatibility

The MIN_ACTIVE_REPLICA_CHECK_FAILED enum value in the HealthCheck class has been changed to a static String constant. This may break backward compatibility for code that directly references this enum value.

Documentation

N/A

@htin1 htin1 closed this Aug 23, 2024
@htin1 htin1 reopened this Aug 23, 2024
@htin1 htin1 closed this Aug 23, 2024
@htin1 htin1 reopened this Aug 23, 2024
@htin1 htin1 closed this Aug 23, 2024
@htin1 htin1 reopened this Aug 23, 2024
@htin1 htin1 closed this Aug 23, 2024
@htin1 htin1 reopened this Aug 23, 2024
Copy link

palmier-bot bot commented Aug 23, 2024

Here is my summary of the pull request based on the provided patch:

Description

This pull request makes changes to the ZkBucketDataAccessor class to improve the garbage collection (GC) scheduling mechanism. The main changes include:

  1. Modifying the scheduleStaleVersionGC method (formerly updateGCTimer) to only schedule a GC task if one is not already scheduled for the given root path. This ensures that GC tasks are not constantly rescheduled and cancelled, allowing them to complete.

  2. Refactoring the deleteStaleVersions method to determine the most recent version at execution time, rather than passing it as a parameter.

  3. Adding a new getLastSuccessfulWriteVersion helper method to retrieve the latest version from ZooKeeper.

  4. Changing the _gcTaskFutureMap from HashMap to ConcurrentHashMap for thread safety.

  5. Updating the HealthCheck enum to move MIN_ACTIVE_REPLICA_CHECK_FAILED from an enum value to a static String constant.

These changes aim to guarantee that a GC will occur within the TTL delay from any write, addressing the issue where GC tasks were not being executed in high-frequency write scenarios.

Test

A new test method testGCScheduler has been added to TestZkBucketDataAccessor class. This test ensures that bucket GC still occurs in high-frequency write scenarios by:

  1. Creating a ZkBucketDataAccessor with a short GC TTL (1 second).
  2. Performing multiple writes in quick succession.
  3. Verifying that the number of version nodes is less than the number of writes, indicating that stale versions have been cleaned up.

Changes that Break Backward Compatibility

N/A

Documentation

N/A

@htin1 htin1 closed this Aug 23, 2024
@htin1 htin1 reopened this Aug 23, 2024
@@ -800,6 +802,35 @@ protected Map<String, Boolean> getInstanceHealthStatus(String clusterId, String
return healthStatus;
}

// Adds the result of the min_active replica check for each stoppable check passed in futureStoppableCheckByInstance
private void addInstanceMinActiveReplicaCheck(Map<String, Future<StoppableCheck>> futureStoppableCheckByInstance,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issues: Potential race condition in addInstanceMinActiveReplicaCheck method
Rationale: The method uses a shared Set (possibleToStopInstances) which is modified inside a loop. This could lead to race conditions if the method is called concurrently. Consider using a thread-safe collection or synchronizing access to this set.

- Set<String> possibleToStopInstances = new HashSet<>(toBeStoppedInstances);
+ Set<String> possibleToStopInstances = Collections.synchronizedSet(new HashSet<>(toBeStoppedInstances));

@Test(dependsOnMethods = "testLargeWriteAndRead")
public void testGCScheduler() throws IOException, InterruptedException {
long gcTTL = 1000; // GC schedule for 1 second after write
ZkBucketDataAccessor fastGCBucketDataAccessor = new ZkBucketDataAccessor(_zkClient, 50 * 1024, gcTTL);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issues: Potential resource leak in TestZkBucketDataAccessor
Rationale: The ZkBucketDataAccessor instance 'fastGCBucketDataAccessor' is created but never closed. This could lead to resource leaks. Consider using try-with-resources or explicitly closing the accessor after use.

- ZkBucketDataAccessor fastGCBucketDataAccessor = new ZkBucketDataAccessor(_zkClient, 50 * 1024, gcTTL);
+ try (ZkBucketDataAccessor fastGCBucketDataAccessor = new ZkBucketDataAccessor(_zkClient, 50 * 1024, gcTTL)) {

@htin1 htin1 closed this Aug 23, 2024
@htin1 htin1 reopened this Aug 23, 2024
Copy link

palmier-bot bot commented Aug 23, 2024

Description

This pull request makes several changes to improve the behavior of the ZkBucketDataAccessor class and related components:

  1. It modifies the ZkBucketDataAccessor to ensure that garbage collection (GC) tasks are always completed, even in high-frequency write scenarios. The key changes include:

    • Replacing the updateGCTimer method with scheduleStaleVersionGC
    • Using a ConcurrentHashMap for _gcTaskFutureMap instead of a regular HashMap
    • Refactoring the deleteStaleVersions method to determine the current version at runtime
  2. The HealthCheck enum is updated to remove MIN_ACTIVE_REPLICA_CHECK_FAILED as an enum value and instead define it as a static String. This change affects how the minimum active replica check is performed in the MaintenanceManagementService.

  3. The MaintenanceManagementService is modified to perform the minimum active replica check sequentially, separate from other health checks.

These changes address issues related to GC task scheduling in high-frequency write scenarios and improve the handling of minimum active replica checks.

Test

New test added:

  • testGCScheduler in TestZkBucketDataAccessor: This test ensures that bucket GC still occurs in high-frequency write scenarios.

Modified test:

  • testMultipleReplicasInSameMZ in TestInstancesAccessor: Updated to reflect changes in how the MIN_ACTIVE_REPLICA_CHECK_FAILED check is performed.

Changes that Break Backward Compatibility

The removal of MIN_ACTIVE_REPLICA_CHECK_FAILED from the HealthCheck enum and its conversion to a static String may potentially break backward compatibility for code that directly references this enum value.

Documentation

N/A

@htin1
Copy link
Owner Author

htin1 commented Aug 28, 2024

@palmier describe

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants