timer: TestTicker fail #46474

toralf · 2021-05-31T19:31:19Z

I do maintain and run a build bot for Gentoo packages. Inside one of the running images (bubblewrap'ed images) the mentioned test case fails. The approproiate Gentoo dev [2] is arguing that the environment of the tinderbox harms the test case scenario.
The build bot is a recent zen3 with enough RAM. For a good overall throuput however I do chain each image via cgroups to 2 CPUs. Furthermore I do set [3]

EGO_BUILD_FLAGS="-p ${jobs}"
GO19CONCURRENTCOMPILATION=0

with jobs=2 or jobs=3 usually.
Does this by any chance harms the test case ?

[1] https://github.com/toralf/tinderbox
[2] https://bugs.gentoo.org/792963#c7
[3] https://github.com/toralf/tinderbox/blob/master/bin/setup_img.sh#L393

The text was updated successfully, but these errors were encountered:

seankhliao · 2021-05-31T20:22:14Z

Is this a reproducible failure or a one off? how are you limiting CPU resources? and how loaded is the system (hardware) overall?

ianlancetaylor · 2021-05-31T22:41:48Z

From https://bugs.gentoo.org/792963 the beginning of the failure report is

--- FAIL: TestTicker (2.80s)
    tick_test.go:79: saw 5 errors
    tick_test.go:37: 10 20ms ticks took 599.673629ms, expected [240ms,360ms]
    tick_test.go:37: 10 20ms ticks took 500.305963ms, expected [240ms,360ms]
    tick_test.go:37: 10 20ms ticks took 586.319661ms, expected [240ms,360ms]

This certainly suggests that there is something wrong but it's hard to know what. I don't know what tinderbox does. The environment variables you mention should not make any difference to this test.

This test can fail in this way if the machine is very busy, or if it is run in an environment in which the epoll system call does not honor its timeout with a reasonable granularity. Because the test is affected by system busyness, the test is run five times with a delay in between, and only if all five cases fail is a failure reported.

toralf · 2021-06-01T07:20:46Z

Each image is given a cgroup v1 quota of usually 2.1 cpu if run with "make -j2" (https://github.com/toralf/tinderbox/blob/master/bin/bwrap.sh#L30) I can see that go spawns always much more threads as if could be used all 32 CPUs. The system is under load, yes - it is a build bot.

I will repeat the tests.

ianlancetaylor · 2021-06-01T17:02:27Z

Is there some way that the Go runtime could know that it only has a limited number of CPUs available? Currently on Linux systems the Go runtime determines the number of CPUs available by using the sched_getaffinity system call.

bcmills · 2021-06-01T19:13:47Z

This looks like the same failure mode as #35692 and #37332.

On CL 313849, I suggested that this measurement would be better suited to a benchmark rather than a test.

gopherbot · 2021-08-04T23:14:27Z

Change https://golang.org/cl/339892 mentions this issue: time: fallback to slower TestTicker test after one failure

TestTicker is sensitive to overloaded or slow systems, where a 20ms ticker running for 10 ticks has a total run time out of the range [110ms, 290ms]. To counter this flakiness, it tries five times to get a successful result. This is insufficient--an overloaded test machine can introduce more than 100ms of delay across the test. Reduce the five attempts to two, but use a 1s ticker for 8 ticks in the second attempt. Updates #46474. Updates #35692. Change-Id: Ibd5187b00ccceeb981b652f2af9a1c3766357b78 Reviewed-on: https://go-review.googlesource.com/c/go/+/339892 Trust: Damien Neil <[email protected]> Run-TryBot: Damien Neil <[email protected]> TryBot-Result: Go Bot <[email protected]> Reviewed-by: Ian Lance Taylor <[email protected]>

joedian · 2022-08-16T15:18:47Z

Fixed.

TestTicker is sensitive to overloaded or slow systems, where a 20ms ticker running for 10 ticks has a total run time out of the range [110ms, 290ms]. To counter this flakiness, it tries five times to get a successful result. This is insufficient--an overloaded test machine can introduce more than 100ms of delay across the test. Reduce the five attempts to two, but use a 1s ticker for 8 ticks in the second attempt. Updates golang#46474. Updates golang#35692. Change-Id: Ibd5187b00ccceeb981b652f2af9a1c3766357b78 Reviewed-on: https://go-review.googlesource.com/c/go/+/339892 Trust: Damien Neil <[email protected]> Run-TryBot: Damien Neil <[email protected]> TryBot-Result: Go Bot <[email protected]> Reviewed-by: Ian Lance Taylor <[email protected]>

seankhliao changed the title ~~dev-lang/go-1.16.4 FAIL: TestTicker~~ timer: TestTicker fail May 31, 2021

seankhliao added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. labels May 31, 2021

joedian removed the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Aug 16, 2022

seankhliao closed this as completed Aug 20, 2022

golang locked and limited conversation to collaborators Aug 20, 2023

gopherbot added the FrozenDueToAge label Aug 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

timer: TestTicker fail #46474

timer: TestTicker fail #46474

toralf commented May 31, 2021

seankhliao commented May 31, 2021

ianlancetaylor commented May 31, 2021

toralf commented Jun 1, 2021

ianlancetaylor commented Jun 1, 2021

bcmills commented Jun 1, 2021

gopherbot commented Aug 4, 2021

joedian commented Aug 16, 2022

timer: TestTicker fail #46474

timer: TestTicker fail #46474

Comments

toralf commented May 31, 2021

seankhliao commented May 31, 2021

ianlancetaylor commented May 31, 2021

toralf commented Jun 1, 2021

ianlancetaylor commented Jun 1, 2021

bcmills commented Jun 1, 2021

gopherbot commented Aug 4, 2021

joedian commented Aug 16, 2022