Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Checkpoint fails with type NOTIFY errno 0 #1202

Closed
kunalkushwaha opened this issue Nov 28, 2016 · 15 comments
Closed

Checkpoint fails with type NOTIFY errno 0 #1202

kunalkushwaha opened this issue Nov 28, 2016 · 15 comments
Assignees

Comments

@kunalkushwaha
Copy link

System information
$ uname -r
4.4.0-47-generic

$ runc --version
runc version 1.0.0-rc1
commit: 6b8f696614e472bc404d31a327fb308d039d2d7a-dirty
spec: 1.0.0-rc1-dev

$ criu --version
Version: 2.6

Steps to reproduce.

$ sudo runc create alpine
$ sudo runc start alpine
$ sudo runc list                             
ID          PID         STATUS      BUNDLE                                CREATED
alpine      0           running     /home/kunal/work/runc-images/alpine   2016-11-28T02:44:25.680187033Z
$ sudo runc checkpoint alpine
criu failed: type NOTIFY errno 0
log file: /run/runc/alpine/criu.work/dump.log

Log

433 (00.101993) Dumping opened files (pid: 32148)
434 (00.101995) ----------------------------------------
435 (00.102002) Sent msg to daemon 12 0 0
436 pie: 1: __fetched msg: 12 0 0
437 pie: 1: __sent ack msg: 12 12 0
438 pie: 1: Daemon waits for command
439 (00.102020) Wait for ack 12 on daemon socket    
440 (00.102025) Fetched ack: 12 12 0
441 (00.102052) 32148 fdinfo 0: pos:                0 flags:           100002/0
442 (00.102065) tty: Dumping tty 10 with id 0x3
443 (00.102068) Error (criu/files-reg.c:1122): Can't lookup mount=21 for fd=0 path=/dev/pts/2
444 (00.102074) ----------------------------------------
445 (00.102080) Error (criu/cr-dump.c:1313): Dump files (pid: 32148) failed with -1
446 (00.102105) Waiting for 32148 to trap
447 (00.102108) Daemon 32148 exited trapping
448 (00.102112) Sent msg to daemon 5 0 0
449 pie: 1: __fetched msg: 5 0 0
450 pie: 1: 1: new_sp=0x7f3a1a5a7008 ip 0x7f3a1a3781e0
451 (00.102706) 32148 was trapped 
452 (00.102708) `- Expecting exit
453 (00.102717) 32148 was trapped
454 (00.102719) 32148 is going to execute the syscall 15
455 (00.102729) 32148 was stopped
456 (00.102738) 32148 was trapped
457 (00.102741) 32148 is going to execute the syscall 186
458 (00.102747) 32148 was trapped
459 (00.102749) `- Expecting exit
460 (00.102755) 32148 was trapped
461 (00.102757) 32148 is going to execute the syscall 1
462 (00.102764) 32148 was trapped
463 (00.102765) `- Expecting exit
464 (00.102771) 32148 was trapped
465 (00.102773) 32148 is going to execute the syscall 11
466 (00.102789) 32148 was stopped
467 (00.102848) Unlock network
468 (00.102851) Running network-unlock scripts
469 (00.102852)     RPC
470 (00.104537) Unfreezing tasks into 1
471 (00.104542)     Unseizing 32148 into 1
472 (00.104552) Error (criu/cr-dump.c:1628): Dumping FAILED.

I am able to reproduce this with ubuntu and redis container too.

@cyphar
Copy link
Member

cyphar commented Nov 28, 2016

/cc @avagin

@avagin
Copy link
Contributor

avagin commented Nov 28, 2016

It is the known issue. This container uses an "external" tty.
https://criu.org/Inheriting_FDs_on_restore#External_TTYs

@cyphar
Copy link
Member

cyphar commented Nov 29, 2016

@avagin Does #1018 unbreak this?

@kunalkushwaha
Copy link
Author

@avagin @cyphar Does #1018 fix this?

I will try to test the PR and update.

@marcosnils
Copy link
Contributor

@avagin @cyphar Does #1018 fix this?

I will try to test the PR and update.

No, it's a CRIU thing.

@avagin
Copy link
Contributor

avagin commented Dec 1, 2016

The problem is that one end of a pty pair is used externally. You can look how we work with pipes in runc now and do the similar things for a console. It should not be hard. Unfortunately I don't have time for this now.

@kunalkushwaha
Copy link
Author

Thanks, Will look into it and make a PR

@cyphar
Copy link
Member

cyphar commented Dec 2, 2016

@kunalkushwaha I would wait for #1018 to be merged before touching any console code, just because that patch is very large and very annoying to keep up-to-date with master when people touch all of the various parts it touches.

@kunalkushwaha
Copy link
Author

@cyphar I agree. Actually I am studying #1018 patch to understand whole new console fix :)

@cyphar
Copy link
Member

cyphar commented Dec 8, 2016

@keloyang #1018 has been merged now. I've just tested this and it no longer fails on the /dev/console resolution. However, now it fails with a different error in the log.

% recvtty test.sock & # in a separate terminal
% runc create --console-socket test.sock ctr
% runc start ctr
% runc checkpoint ctr
criu failed: type NOTIFY errno 0
log file: /run/runc/ctr/criu.work/dump.log

Here is the relevant tail of the log:

(00.416041) mnt: Dumping mountpoints
(00.416043) mnt:        287: 87:/ @ ./sys/firmware
(00.416050) mnt: Path `/sys/firmware' resolved to `./sys/firmware' mountpoint
(00.443342) mnt:        286: 76:/null @ ./proc/sched_debug
(00.443347) mnt: Something is mounted on top of ./dev
(00.443374) Error (criu/mount.c:1053): mnt: Can't create a temporary directory: Read-only file system
(00.443447) Unlock network
(00.443449) Running network-unlock scripts
(00.443449)     RPC
(00.471881) Unfreezing tasks into 1
(00.471893)     Unseizing 30626 into 1
(00.471908) Error (criu/cr-dump.c:1635): Dumping FAILED.

@kunalkushwaha
Copy link
Author

I am able to reproduce this issue.

Now it seems CRIU bug now. It is not able to mount /dev folder inside the criu namespace

and function get_clean_mnt() is failing.

@avagin avagin self-assigned this Dec 24, 2016
@avagin
Copy link
Contributor

avagin commented Dec 27, 2016

criupatchwork pushed a commit to criupatchwork/criu that referenced this issue Jan 4, 2017
Now Docker creates a pty pair from a container devpts to use is as console.
A slave tty is set as a control tty for the init process and bind-mounted
into /dev/console. The master tty is handled externelly.

Now CRIU can handle external resources, but here we have internal resources
which are used externaly.

opencontainers/runc#1202
Signed-off-by: Andrei Vagin <[email protected]>
criupatchwork pushed a commit to criupatchwork/criu that referenced this issue Jan 4, 2017
If we can't create a temporary directory for a detached mount,
we can clone a whole mount namespace, open a mount and release
the created namespace. The result will be the same.

https://jira.sw.ru/browse/PSBM-57135
opencontainers/runc#1202
Signed-off-by: Andrei Vagin <[email protected]>
@kunalkushwaha
Copy link
Author

kunalkushwaha commented Jan 5, 2017

@avagin I am still able to reproduce this issue.

$ criu --version
Version: 2.9
GitID: v1.2-4041-gdcb5df7

$ sudo ~/go/src/github.com/opencontainers/runc/runc --version        
runc version 1.0.0-rc2
commit: 3e947028ff9a544c5829588ebec878b6699e2aa0-dirty
spec: 1.0.0-rc3

runc from https://github.com/avagin/runc/tree/cr-console build hits cr-console/libcontainer/container_linux.go#L543 . So skipping this condition results in same issue.

You can find config.json file for my container at https://gist.github.com/kunalkushwaha/c57b94660543c5a19d5a9f95b58c9093

Steps I followed are exactly as #1202 (comment)

avagin added a commit to avagin/criu that referenced this issue Jan 10, 2017
Now Docker creates a pty pair from a container devpts to use is as console.
A slave tty is set as a control tty for the init process and bind-mounted
into /dev/console. The master tty is handled externelly.

Now CRIU can handle external resources, but here we have internal resources
which are used externaly.

opencontainers/runc#1202
avagin added a commit to avagin/criu that referenced this issue Jan 10, 2017
If we can't create a temporary directory for a detached mount,
we can clone a whole mount namespace, open a mount and release
the created namespace. The result will be the same.

https://jira.sw.ru/browse/PSBM-57135
opencontainers/runc#1202
avagin added a commit to avagin/criu that referenced this issue Jan 10, 2017
Now Docker creates a pty pair from a container devpts to use is as console.
A slave tty is set as a control tty for the init process and bind-mounted
into /dev/console. The master tty is handled externelly.

Now CRIU can handle external resources, but here we have internal resources
which are used externaly.

opencontainers/runc#1202
avagin added a commit to avagin/criu that referenced this issue Jan 10, 2017
If we can't create a temporary directory for a detached mount,
we can clone a whole mount namespace, open a mount and release
the created namespace. The result will be the same.

https://jira.sw.ru/browse/PSBM-57135
opencontainers/runc#1202
criupatchwork pushed a commit to criupatchwork/criu that referenced this issue Jan 10, 2017
Now Docker creates a pty pair from a container devpts to use is as console.
A slave tty is set as a control tty for the init process and bind-mounted
into /dev/console. The master tty is handled externelly.

Now CRIU can handle external resources, but here we have internal resources
which are used externaly.

opencontainers/runc#1202
Signed-off-by: Andrei Vagin <[email protected]>
criupatchwork pushed a commit to criupatchwork/criu that referenced this issue Jan 10, 2017
If we can't create a temporary directory for a detached mount,
we can clone a whole mount namespace, open a mount and release
the created namespace. The result will be the same.

https://jira.sw.ru/browse/PSBM-57135
opencontainers/runc#1202
Signed-off-by: Andrei Vagin <[email protected]>
@avagin
Copy link
Contributor

avagin commented Jan 10, 2017

@kunalkushwaha what criu do you use? Did you get it from my repo?

avagin added a commit to avagin/criu that referenced this issue Jan 11, 2017
Now Docker creates a pty pair from a container devpts to use is as console.
A slave tty is set as a control tty for the init process and bind-mounted
into /dev/console. The master tty is handled externelly.

Now CRIU can handle external resources, but here we have internal resources
which are used externaly.

opencontainers/runc#1202
avagin added a commit to avagin/criu that referenced this issue Jan 11, 2017
If we can't create a temporary directory for a detached mount,
we can clone a whole mount namespace, open a mount and release
the created namespace. The result will be the same.

https://jira.sw.ru/browse/PSBM-57135
opencontainers/runc#1202
@avagin
Copy link
Contributor

avagin commented Jan 11, 2017

$ criu --version
Version: 2.9
GitID: v1.2-4041-gdcb5df7

I didn't notice this part of your answer. You got the right criu.

runc from https://github.com/avagin/runc/tree/cr-console build hits cr-console/libcontainer/container_linux.go#L543 . So skipping this condition results in same issue.

It's very suspicious. I have never seen this error and I can't reproduce this issue with your config. I think runc can use a wrong criu. Could you update runc from my repo and try to reproduce the issue? Now the error will contain the current and required criu versions.

If you are sure that the right criu is used, I need dump.log, restore.log and runc.log to investigate the issue. Thanks you for the time.

avagin added a commit to avagin/criu that referenced this issue Feb 7, 2017
Now Docker creates a pty pair from a container devpts to use is as console.
A slave tty is set as a control tty for the init process and bind-mounted
into /dev/console. The master tty is handled externelly.

Now CRIU can handle external resources, but here we have internal resources
which are used externaly.

opencontainers/runc#1202
Signed-off-by: Andrei Vagin <[email protected]>
avagin added a commit to avagin/criu that referenced this issue Feb 7, 2017
If we can't create a temporary directory for a detached mount,
we can clone a whole mount namespace, open a mount and release
the created namespace. The result will be the same.

https://jira.sw.ru/browse/PSBM-57135
opencontainers/runc#1202
Signed-off-by: Andrei Vagin <[email protected]>
criupatchwork pushed a commit to criupatchwork/criu that referenced this issue Feb 7, 2017
Now Docker creates a pty pair from a container devpts to use is as console.
A slave tty is set as a control tty for the init process and bind-mounted
into /dev/console. The master tty is handled externelly.

Now CRIU can handle external resources, but here we have internal resources
which are used externaly.

opencontainers/runc#1202
Signed-off-by: Andrei Vagin <[email protected]>
criupatchwork pushed a commit to criupatchwork/criu that referenced this issue Feb 7, 2017
If we can't create a temporary directory for a detached mount,
we can clone a whole mount namespace, open a mount and release
the created namespace. The result will be the same.

https://jira.sw.ru/browse/PSBM-57135
opencontainers/runc#1202
Signed-off-by: Andrei Vagin <[email protected]>
avagin added a commit to avagin/criu that referenced this issue Feb 7, 2017
If we can't create a temporary directory for a detached mount,
we can clone a whole mount namespace, open a mount and release
the created namespace. The result will be the same.

https://jira.sw.ru/browse/PSBM-57135
opencontainers/runc#1202
Signed-off-by: Andrei Vagin <[email protected]>
xemul pushed a commit to checkpoint-restore/criu that referenced this issue Feb 13, 2017
Now Docker creates a pty pair from a container devpts to use is as console.
A slave tty is set as a control tty for the init process and bind-mounted
into /dev/console. The master tty is handled externelly.

Now CRIU can handle external resources, but here we have internal resources
which are used externaly.

opencontainers/runc#1202
travis-ci: success for A few fixes to c/r a docker container with a console (rev3)
Signed-off-by: Andrei Vagin <[email protected]>
Signed-off-by: Pavel Emelyanov <[email protected]>
xemul pushed a commit to checkpoint-restore/criu that referenced this issue Feb 13, 2017
If we can't create a temporary directory for a detached mount,
we can clone a whole mount namespace, open a mount and release
the created namespace. The result will be the same.

https://jira.sw.ru/browse/PSBM-57135
opencontainers/runc#1202
travis-ci: success for A few fixes to c/r a docker container with a console (rev3)
Signed-off-by: Andrei Vagin <[email protected]>
Signed-off-by: Pavel Emelyanov <[email protected]>
xemul pushed a commit to checkpoint-restore/criu that referenced this issue Feb 13, 2017
Now Docker creates a pty pair from a container devpts to use is as console.
A slave tty is set as a control tty for the init process and bind-mounted
into /dev/console. The master tty is handled externelly.

Now CRIU can handle external resources, but here we have internal resources
which are used externaly.

opencontainers/runc#1202
travis-ci: success for A few fixes to c/r a docker container with a console (rev3)
Signed-off-by: Andrei Vagin <[email protected]>
Signed-off-by: Pavel Emelyanov <[email protected]>
xemul pushed a commit to checkpoint-restore/criu that referenced this issue Feb 13, 2017
If we can't create a temporary directory for a detached mount,
we can clone a whole mount namespace, open a mount and release
the created namespace. The result will be the same.

https://jira.sw.ru/browse/PSBM-57135
opencontainers/runc#1202
travis-ci: success for A few fixes to c/r a docker container with a console (rev3)
Signed-off-by: Andrei Vagin <[email protected]>
Signed-off-by: Pavel Emelyanov <[email protected]>
avagin added a commit to avagin/criu that referenced this issue Feb 28, 2017
Now Docker creates a pty pair from a container devpts to use is as console.
A slave tty is set as a control tty for the init process and bind-mounted
into /dev/console. The master tty is handled externelly.

Now CRIU can handle external resources, but here we have internal resources
which are used externaly.

opencontainers/runc#1202
travis-ci: success for A few fixes to c/r a docker container with a console (rev3)
Signed-off-by: Andrei Vagin <[email protected]>
Signed-off-by: Pavel Emelyanov <[email protected]>
avagin added a commit to avagin/criu that referenced this issue Feb 28, 2017
If we can't create a temporary directory for a detached mount,
we can clone a whole mount namespace, open a mount and release
the created namespace. The result will be the same.

https://jira.sw.ru/browse/PSBM-57135
opencontainers/runc#1202
travis-ci: success for A few fixes to c/r a docker container with a console (rev3)
Signed-off-by: Andrei Vagin <[email protected]>
Signed-off-by: Pavel Emelyanov <[email protected]>
avagin added a commit to avagin/criu that referenced this issue Mar 1, 2017
Now Docker creates a pty pair from a container devpts to use is as console.
A slave tty is set as a control tty for the init process and bind-mounted
into /dev/console. The master tty is handled externelly.

Now CRIU can handle external resources, but here we have internal resources
which are used externaly.

opencontainers/runc#1202
travis-ci: success for A few fixes to c/r a docker container with a console (rev3)
Signed-off-by: Andrei Vagin <[email protected]>
Signed-off-by: Pavel Emelyanov <[email protected]>
avagin added a commit to avagin/criu that referenced this issue Mar 1, 2017
If we can't create a temporary directory for a detached mount,
we can clone a whole mount namespace, open a mount and release
the created namespace. The result will be the same.

https://jira.sw.ru/browse/PSBM-57135
opencontainers/runc#1202
travis-ci: success for A few fixes to c/r a docker container with a console (rev3)
Signed-off-by: Andrei Vagin <[email protected]>
Signed-off-by: Pavel Emelyanov <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants