Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] ResourceAwareTasksTests flaky on Windows. #5063

Closed
mch2 opened this issue Nov 3, 2022 · 2 comments · Fixed by #5077
Closed

[BUG] ResourceAwareTasksTests flaky on Windows. #5063

mch2 opened this issue Nov 3, 2022 · 2 comments · Fixed by #5077
Labels
bug Something isn't working flaky-test Random test failure that succeeds on second run untriaged

Comments

@mch2
Copy link
Member

mch2 commented Nov 3, 2022

Describe the bug
When running gradle check for windows distribution., #4924, The tests in ResourceAwareTasksTests frequently fail on windows. The tests fail because the CPU time at the end of each test is zero, and the test asserts a positive value. threadMXBean.getThreadCpuTime(threadId) here is returning a zero value.

To Reproduce
Steps to reproduce the behavior:

  1. On any windows machine run ResourceAwareTasksTests. At least once per 1-2 runs a failure will occur.

Expected behavior
Test should pass.

@mch2 mch2 added bug Something isn't working untriaged flaky-test Random test failure that succeeds on second run labels Nov 3, 2022
@mch2
Copy link
Member Author

mch2 commented Nov 4, 2022

The issue here is that on windows the clock tick is ~15ms. The actual runnable in the test is completing before that time and so threadMXBean is still set to 0.

We can resolve this by waiting until a full tick is registered inside ResourceAwareNodesAction.doRun

                    if (Constants.WINDOWS) {
                        while (threadMXBean.getThreadCpuTime(Thread.currentThread().getId()) <= 0) {
                        }
                    }

This will also cause the mem consumed to be out of the existing limits set in the test assertions, so we'd need to adjust the buffer applied in assertMemoryUsageWithinLimits. @Bukhtawar @ketanv3, wondering if either of you could shed light on why we need those assertions, would a positive value not suffice for these tests & be less flaky?

or alternatively we could simply update all assertions to assume cpu is >= to 0 rather than >0, or skip the test entirely on windows.

@mch2
Copy link
Member Author

mch2 commented Nov 4, 2022

Have cut a PR here to simply allow zero value on windows. Given we are directly populating the cpu value from ThreadMXBean I think this is safe. I'd rather go this route than introduce additional flakiness into the memory assertion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working flaky-test Random test failure that succeeds on second run untriaged
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant