-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
checkpoint: set default work-dir to image-path #3029
Conversation
What prevents you from using Ahh I see -- standalone criu binary works as described above, but runc c/r doesn't. I think we need to modify runc to NOT set @liusdu can you implement the above? |
Hi @kolyshkin, thanks to you reply. if
And if
Yes, It can fix it this way~ |
de742d4
to
0710bbb
Compare
@kolyshkin done, please take a look. |
@liusdu needs a rebase |
@kolyshkin sorry for this, done~ |
libcontainer/container_linux.go
Outdated
@@ -1012,6 +997,19 @@ func (c *linuxContainer) Checkpoint(criuOpts *CriuOpts) error { | |||
LazyPages: proto.Bool(criuOpts.LazyPages), | |||
} | |||
|
|||
// if criuOpts.WorkDirectory is not set. let criu sets it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// if criuOpts.WorkDirectory is not set. let criu sets it. | |
// If criuOpts.WorkDirectory is not set, criu default is used | |
// (which is to reuse ImagesDirectory as WorkDirectory). |
I think you need to amend the commit message to explain how it works (i.e. by not setting the WorkDirectory we let criu use its default, which is to reuse ImagesDirectory, or some such). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few typos in the commit message:
Dump/restrore logs are very useful when process fails.
restore
New runc puts these files in c.root, which will be deleted
I think you mean "Now".
when container exits. So if checkpinting/restoring failed,
checkpointing
Now runc puts dump/restore logs in c.root defaultly, which will be deleted when container exits. So if checkpinting/restoring failed, we can not get these logs and analyze why. This patch lets criu use its default if --work-path is not set: - Use WorkDirectory found in criu's configfile. - Use ImageDirectory. Signed-off-by: Liu Hua <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@adrianreber PTAL
The only thing I am worried about is if Docker or Podman are relying on this somehow. Would be good to verify that neither of them break with this change. |
@adrianreber Podman and Docker/Containerd all set WorkDirectory. So this change does not affect them.
// workPath will be used to store dump.log and stats-dump
workPath := ctr.bundlePath()
logrus.Debugf("Writing checkpoint to %s", imagePath)
logrus.Debugf("Writing checkpoint logs to %s", workPath)
logrus.Debugf("Pre-dump the container %t", options.PreCheckPoint)
args := []string{}
args = append(args, r.runtimeFlags...)
args = append(args, "checkpoint")
args = append(args, "--image-path")
args = append(args, imagePath)
args = append(args, "--work-path")
args = append(args, workPath)
# restore command
podman container restore 06fd1a24a1f0992c9349e7de45796b80e253a10fc1b7adcfc89052cd460a4288
# output
Error: OCI runtime error: criu failed: type NOTIFY errno 0
log file: /var/lib/containers/storage/overlay-containers/06fd1a24a1f0992c9349e7de45796b80e253a10fc1b7adcfc89052cd460a4288/userdata/restore.log
# restore.log is perisist on disk
ls -l /var/lib/containers/storage/overlay-containers/06fd1a24a1f0992c9349e7de45796b80e253a10fc1b7adcfc89052cd460a4288/userdata/restore.log
-rw------- 1 root root 25921 6月 17 11:03 /var/lib/containers/storage/overlay-containers/06fd1a24a1f0992c9349e7de45796b80e253a10fc1b7adcfc89052cd460a4288/userdata/restore.log
if p.CriuWorkPath == "" {
// if criu work path not set, use container WorkDir
p.CriuWorkPath = p.WorkDir
}
# restore
docker start --checkpoint checkpoint1 c1b1a73a91
# restore output
Error response from daemon: failed to retrieve OCI runtime container pid: open /var/run/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/c1b1a73a91b8bbbf61318f7bfa391a8ce6ad42952aba10076372e60e3a34b7ae/init.pid: no such file or directory: unknown
# I can not found restore.log
find /var -name restore.log |
Thanks for looking into it. So Docker and Podman both seem to always set WorkDirectory and ImagesDirectory. This PR should not lead to different behaviour on the container engine level. Patch LGTM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Sump/restrore logs are very useful when process fails.
So put them to image path defaultly,otherwise these files
will be deleted when container exits.
Signed-off-by: Liu Hua [email protected]'
Proposed changelog entry
--work-path
is not specified, it defaults to the value of--image-path
(same ascriu
does).