Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

timer: TestTicker fail #46474

Closed
toralf opened this issue May 31, 2021 · 7 comments
Closed

timer: TestTicker fail #46474

toralf opened this issue May 31, 2021 · 7 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.

Comments

@toralf
Copy link

toralf commented May 31, 2021

I do maintain and run a build bot for Gentoo packages. Inside one of the running images (bubblewrap'ed images) the mentioned test case fails. The approproiate Gentoo dev [2] is arguing that the environment of the tinderbox harms the test case scenario.
The build bot is a recent zen3 with enough RAM. For a good overall throuput however I do chain each image via cgroups to 2 CPUs. Furthermore I do set [3]

EGO_BUILD_FLAGS="-p ${jobs}"
GO19CONCURRENTCOMPILATION=0

with jobs=2 or jobs=3 usually.
Does this by any chance harms the test case ?

[1] https://github.com/toralf/tinderbox
[2] https://bugs.gentoo.org/792963#c7
[3] https://github.com/toralf/tinderbox/blob/master/bin/setup_img.sh#L393

@seankhliao seankhliao changed the title dev-lang/go-1.16.4 FAIL: TestTicker timer: TestTicker fail May 31, 2021
@seankhliao
Copy link
Member

Is this a reproducible failure or a one off? how are you limiting CPU resources? and how loaded is the system (hardware) overall?

@seankhliao seankhliao added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. labels May 31, 2021
@ianlancetaylor
Copy link
Member

From https://bugs.gentoo.org/792963 the beginning of the failure report is

--- FAIL: TestTicker (2.80s)
    tick_test.go:79: saw 5 errors
    tick_test.go:37: 10 20ms ticks took 599.673629ms, expected [240ms,360ms]
    tick_test.go:37: 10 20ms ticks took 500.305963ms, expected [240ms,360ms]
    tick_test.go:37: 10 20ms ticks took 586.319661ms, expected [240ms,360ms]

This certainly suggests that there is something wrong but it's hard to know what. I don't know what tinderbox does. The environment variables you mention should not make any difference to this test.

This test can fail in this way if the machine is very busy, or if it is run in an environment in which the epoll system call does not honor its timeout with a reasonable granularity. Because the test is affected by system busyness, the test is run five times with a delay in between, and only if all five cases fail is a failure reported.

@toralf
Copy link
Author

toralf commented Jun 1, 2021

Each image is given a cgroup v1 quota of usually 2.1 cpu if run with "make -j2" (https://github.com/toralf/tinderbox/blob/master/bin/bwrap.sh#L30) I can see that go spawns always much more threads as if could be used all 32 CPUs. The system is under load, yes - it is a build bot.

I will repeat the tests.

@ianlancetaylor
Copy link
Member

Is there some way that the Go runtime could know that it only has a limited number of CPUs available? Currently on Linux systems the Go runtime determines the number of CPUs available by using the sched_getaffinity system call.

@bcmills
Copy link
Contributor

bcmills commented Jun 1, 2021

This looks like the same failure mode as #35692 and #37332.

On CL 313849, I suggested that this measurement would be better suited to a benchmark rather than a test.

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/339892 mentions this issue: time: fallback to slower TestTicker test after one failure

gopherbot pushed a commit that referenced this issue Oct 6, 2021
TestTicker is sensitive to overloaded or slow systems, where a 20ms
ticker running for 10 ticks has a total run time out of the range
[110ms, 290ms]. To counter this flakiness, it tries five times to
get a successful result. This is insufficient--an overloaded test
machine can introduce more than 100ms of delay across the test.

Reduce the five attempts to two, but use a 1s ticker for 8 ticks
in the second attempt.

Updates #46474.
Updates #35692.

Change-Id: Ibd5187b00ccceeb981b652f2af9a1c3766357b78
Reviewed-on: https://go-review.googlesource.com/c/go/+/339892
Trust: Damien Neil <[email protected]>
Run-TryBot: Damien Neil <[email protected]>
TryBot-Result: Go Bot <[email protected]>
Reviewed-by: Ian Lance Taylor <[email protected]>
@joedian
Copy link

joedian commented Aug 16, 2022

Fixed.

@joedian joedian removed the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Aug 16, 2022
rcrozean pushed a commit to rcrozean/go that referenced this issue Oct 5, 2022
TestTicker is sensitive to overloaded or slow systems, where a 20ms
ticker running for 10 ticks has a total run time out of the range
[110ms, 290ms]. To counter this flakiness, it tries five times to
get a successful result. This is insufficient--an overloaded test
machine can introduce more than 100ms of delay across the test.

Reduce the five attempts to two, but use a 1s ticker for 8 ticks
in the second attempt.

Updates golang#46474.
Updates golang#35692.

Change-Id: Ibd5187b00ccceeb981b652f2af9a1c3766357b78
Reviewed-on: https://go-review.googlesource.com/c/go/+/339892
Trust: Damien Neil <[email protected]>
Run-TryBot: Damien Neil <[email protected]>
TryBot-Result: Go Bot <[email protected]>
Reviewed-by: Ian Lance Taylor <[email protected]>
rcrozean pushed a commit to rcrozean/go that referenced this issue Oct 12, 2022
TestTicker is sensitive to overloaded or slow systems, where a 20ms
ticker running for 10 ticks has a total run time out of the range
[110ms, 290ms]. To counter this flakiness, it tries five times to
get a successful result. This is insufficient--an overloaded test
machine can introduce more than 100ms of delay across the test.

Reduce the five attempts to two, but use a 1s ticker for 8 ticks
in the second attempt.

Updates golang#46474.
Updates golang#35692.

Change-Id: Ibd5187b00ccceeb981b652f2af9a1c3766357b78
Reviewed-on: https://go-review.googlesource.com/c/go/+/339892
Trust: Damien Neil <[email protected]>
Run-TryBot: Damien Neil <[email protected]>
TryBot-Result: Go Bot <[email protected]>
Reviewed-by: Ian Lance Taylor <[email protected]>
rcrozean pushed a commit to rcrozean/go that referenced this issue Oct 12, 2022
TestTicker is sensitive to overloaded or slow systems, where a 20ms
ticker running for 10 ticks has a total run time out of the range
[110ms, 290ms]. To counter this flakiness, it tries five times to
get a successful result. This is insufficient--an overloaded test
machine can introduce more than 100ms of delay across the test.

Reduce the five attempts to two, but use a 1s ticker for 8 ticks
in the second attempt.

Updates golang#46474.
Updates golang#35692.

Change-Id: Ibd5187b00ccceeb981b652f2af9a1c3766357b78
Reviewed-on: https://go-review.googlesource.com/c/go/+/339892
Trust: Damien Neil <[email protected]>
Run-TryBot: Damien Neil <[email protected]>
TryBot-Result: Go Bot <[email protected]>
Reviewed-by: Ian Lance Taylor <[email protected]>
@golang golang locked and limited conversation to collaborators Aug 20, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

6 participants