Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Flaky Test - testGCScheduler in TestZkBucketDataAcessor #3003

Merged
merged 1 commit into from
Feb 18, 2025

Conversation

GrantPSpencer
Copy link
Contributor

@GrantPSpencer GrantPSpencer commented Feb 6, 2025

Issues

Description

  • Here are some details about my PR, including screenshots of any UI changes:

Issue #2937 is likely due to slow ZK performance. The GC occurs on an async thread, so it is possible that GC is occurring but deletion of Znodes has not yet completed so read is served before deletion is written to ZK server.

This PR introduces 3 changes to TestZkBucketDataAcessor.testGCScheduler

  1. Rename testGCScheduler --> testGCCompletesUnderHighFrequency
  2. Assertion logic has changed. No longer has strict check that can be flaky due to slow ZK connection. Now leverages verifier that will continuously write to ZK shorter than GC timeout and will only exist once GC has occurred. This maintains test logic that GC will occur under high frequency writes.
  3. Isolates paths between test methods

Tests

  • The following tests are written for this issue:

testGCCompletesUnderHighFrequency

  • The following is the result of the "mvn test" command on the appropriate module:
[INFO] -------------------------------------------------------
[INFO]  T E S T S
[INFO] -------------------------------------------------------
[INFO] Running org.apache.helix.manager.zk.TestZkBucketDataAccessor
[INFO] Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 6.113 s - in org.apache.helix.manager.zk.TestZkBucketDataAccessor
[INFO] 
[INFO] Results:
[INFO] 
[INFO] Tests run: 5, Failures: 0, Errors: 0, Skipped: 0

Changes that Break Backward Compatibility (Optional)

  • My PR contains changes that break backward compatibility or previous assumptions for certain methods or API. They include:

N/A

Commits

  • My commits all reference appropriate Apache Helix GitHub issues in their subject lines. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Code Quality

  • My diff has been formatted using helix-style.xml
    (helix-style-intellij.xml if IntelliJ IDE is used)

Copy link
Contributor

@xyuanlu xyuanlu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
Thanks for making the fix.

@GrantPSpencer
Copy link
Contributor Author

Pull request approved by: @xyuanlu
Commit message: Fix Flaky Test - testGCScheduler in TestZkBucketDataAcessor by refactoring assertion to use verifier with timeout. The GC happens on an async thread, so slow ZK performance can cause GC to not complete before the read is performed by test assertion.

@xyuanlu xyuanlu merged commit e73f389 into apache:master Feb 18, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Failed CI Test] testGCScheduler(org.apache.helix.manager.zk.TestZkBucketDataAccessor)
2 participants