-
Notifications
You must be signed in to change notification settings - Fork 623
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flaky scm06 test? (Error (criu/sk-unix.c:1651): unix: Can't bind id 0x9 ino 433398 addr : Address already in use") #2537
Comments
@carnil For Fedora and RHEL we are excluding a couple of tests, but not this one: https://gitlab.com/redhat/centos-stream/rpms/criu/-/blob/c10s/tests/run-zdtm.sh?ref_type=heads That doesn't answer your question, but just as an additional data point. |
@adrianreber thanks, while yes indeed is not the answer of the question it helps as an idea to how to improve the tests runs on our end (i.e. try again if a test run fails to circument and see if it was just flaky). Still I wonder if the failing smc06 test shows a real problem or is really known to be flaky. @adrianreber thanks a lot! |
@carnil I think it is a side effect of unix_gc in the linux kernel. All dumped processes has been destroyed, but some sockets are destroyed asynchronously. |
The kernel releases a test socket asynchronously, so the restore can fail if it is executed before the kernel actually destroys the socket. Fixes checkpoint-restore#2537 Signed-off-by: Andrei Vagin <[email protected]>
The kernel releases a test socket asynchronously, so the restore can fail if it is executed before the kernel actually destroys the socket. Fixes checkpoint-restore#2537 Signed-off-by: Andrei Vagin <[email protected]>
The kernel releases a test socket asynchronously, so the restore can fail if it is executed before the kernel actually destroys the socket. Fixes #2537 Signed-off-by: Andrei Vagin <[email protected]>
Thank you @avagin (and @adrianreber) |
Hi
In meanwhile we run almost all tests for
criu
in Debian, only excludingapparmor_stacking
andfd01
tests.What we see is that occassionally the
scm06
test fails:https://ci.debian.net/data/autopkgtest/testing/amd64/c/criu/55087421/log.gz
Is this by chance known to be flaky and should I better disable the test or is there indication of a real problem we need to address? I guess this indicates a race condition as the address is already in use in this above case.
The text was updated successfully, but these errors were encountered: