Storage gets corrupted after podman pull is killed #14003

luluz66 · 2022-04-25T19:03:26Z

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind bug

Description

Podman storage gets corrupted if Podman is killed when a layer is incomplete.

Steps to reproduce the issue:

podman pull gcr.io/tensorflow-testing/nosla-cuda11.2-cudnn8.1-ubuntu18.04-manylinux2010-multipython@sha256:5102e2651975df6c131c4f0cb22454b81d509a7be2a3d98351a876d3f85ef2b8
kill the pull process when one of the layer is incomplete by monitoring /var/lib/containers/storage/overlay-layers/layers.json

watch 'cat var/lib/containers/storage/overlay-layers/layers.json | grep incomplete'

Run the following command in three different terminals.

term1$ podman pull gcr.io/tensorflow-testing/nosla-cuda11.2-cudnn8.1-ubuntu18.04-manylinux2010-multipython@sha256:5102e2651975df6c131c4f0cb22454b81d509a7be2a3d98351a876d3f85ef2b8 
term2$ podman pull gcr.io/tensorflow-testing/nosla-cuda11.2-cudnn8.1-ubuntu18.04-manylinux2010-multipython@sha256:5102e2651975df6c131c4f0cb22454b81d509a7be2a3d98351a876d3f85ef2b8 
term3$ podman pull gcr.io/tensorflow-testing/nosla-cuda11.2-cudnn8.1-ubuntu18.04-manylinux2010-multipython@sha256:5102e2651975df6c131c4f0cb22454b81d509a7be2a3d98351a876d3f85ef2b8

Describe the results you received:
All three instances of podman pull returned the following error:

WARN[0136] Can't read link "/var/lib/containers/storage/overlay/l/V2OP2CCVMKSOHK2XICC546DUCG" because it does not exist. A storage corruption might have occurred, attempting to recreate the missing symlinks. It might be best wipe the storage to avoid further errors due to storage corruption.
WARN[0136] Can't stat lower layer "/var/lib/containers/storage/overlay/l/V2OP2CCVMKSOHK2XICC546DUCG" because it does not exist. Going through storage to recreate the missing symlinks.
ERRO[0136] Unmounting /var/lib/containers/storage/overlay/a1b212349f0f80e2f88cbb35fe0b22792ee40bb7d9662c56b5bcaf3dc0941708/merged: invalid argument
Error: writing blob: adding layer with blob "sha256:eb957d5dbd82b0e05da5c583db9216df577fc3729e2353e9e04cab2963a71942": creating overlay mount to /var/lib/containers/storage/overlay/a1b212349f0f80e2f88cbb35fe0b22792ee40bb7d9662c56b5bcaf3dc0941708/merged, mount_data=",lowerdir=/var/lib/containers/storage/overlay/l/QXP5KZSENBI5UM6D4W26CEZZSC:/var/lib/containers/storage/overlay/l/XIRSG2LVRQH54TRV25LYRLNQSN:/var/lib/containers/storage/overlay/l/YTYLOHM45UDDY4ZEQAEXZMN22J:/var/lib/containers/storage/overlay/l/QB6N3I6PANPH4US42R5OLFPI7L:/var/lib/containers/storage/overlay/l/BMJKOW5HBXNZAWGHJXE2NUAXSN:/var/lib/containers/storage/overlay/l/SF5CO7EYA6M5UZOKUSTTXDNVYF:/var/lib/containers/storage/overlay/l/ZQS6IZ5T2XF2KB7JBDCQEL7EM3:/var/lib/containers/storage/overlay/l/LYOFE2V2SA4CN4NBEIHIIKTVXN:/var/lib/containers/storage/overlay/l/NR6LZN4ZIUGKVPOTOR72KBH54X:/var/lib/containers/storage/overlay/l/V2OP2CCVMKSOHK2XICC546DUCG:/var/lib/containers/storage/overlay/l/7A6QMKITLXCCE5OCRMU6USROT5:/var/lib/containers/storage/overlay/l/W2KWKL5KN57MNKLUZGULX7WJA7:/var/lib/containers/storage/overlay/l/XIUUN62UCNZ5LSOT4BXPVCTSVS:/var/lib/containers/storage/overlay/l/REG7KO3LB5KRFKDO4V7W53PU5S:/var/lib/containers/storage/overlay/l/RLQS2PCJ6IDAK7FUPOMFN7M5DH,upperdir=/var/lib/containers/storage/overlay/a1b212349f0f80e2f88cbb35fe0b22792ee40bb7d9662c56b5bcaf3dc0941708/diff,workdir=/var/lib/containers/storage/overlay/a1b212349f0f80e2f88cbb35fe0b22792ee40bb7d9662c56b5bcaf3dc0941708/work": no such file or directory

Note: since at this point the image is not downloaded yet, running podman rmi will not recover. Only podman system reset helps.

In addition, if after step 2 (killing podman pull when a layer is incomplete), I only run one podman pull. This podman pull will succeed. However podman image inspect will fail for this image with the error Error: layer not known. podman run will also fail with readlink error as described in containers/storage#1136

Describe the results you expected:
I expected the following podman pulls succeed.

Additional information you deem important (e.g. issue happens only occasionally):

Output of podman version:

Client:       Podman Engine
Version:      4.0.3
API Version:  4.0.3
Go Version:   go1.18
Built:        Thu Jan  1 00:00:00 1970
OS/Arch:      linux/amd64

Output of podman info --debug:

host:
  arch: amd64
  buildahVersion: 1.24.3
  cgroupControllers:
  - cpuset
  - cpu
  - cpuacct
  - blkio
  - memory
  - devices
  - freezer
  - net_cls
  - perf_event
  - net_prio
  - hugetlb
  - pids
  - rdma
  cgroupManager: cgroupfs
  cgroupVersion: v1
  conmon:
    package: 'conmon: /usr/bin/conmon'
    path: /usr/bin/conmon
    version: 'conmon version 2.1.0, commit: bdb4f6e56cd193d40b75ffc9725d4b74a18cb33c'
  cpus: 8
  distribution:
    codename: bullseye
    distribution: debian
    version: "11"
  eventLogger: file
  hostname: executor-lulu-test-c56675bf5-5fckt
  idMappings:
    gidmap: null
    uidmap: null
  kernel: 5.4.0-1061-gke
  linkmode: dynamic
  logDriver: k8s-file
  memFree: 12879020032
  memTotal: 33673482240
  networkBackend: cni
  ociRuntime:
    name: runc
    package: 'containerd.io: /usr/bin/runc'
    path: /usr/bin/runc
    version: |-
      runc version 1.0.3
      commit: v1.0.3-0-gf46b6ba
      spec: 1.0.2-dev
      go: go1.17.8
      libseccomp: 2.5.1
  os: linux
  remoteSocket:
    path: /run/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_AUDIT_WRITE,CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_MKNOD,CAP_NET_BIND_SERVICE,CAP_NET_RAW,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: false
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: false
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: 'slirp4netns: /usr/bin/slirp4netns'
    version: |-
      slirp4netns version 1.1.12
      commit: unknown
      libslirp: 4.6.1
      SLIRP_CONFIG_VERSION_MAX: 3
      libseccomp: 2.5.1
  swapFree: 0
  swapTotal: 0
  uptime: 18h 6m 38.07s (Approximately 0.75 days)
plugins:
  log:
  - k8s-file
  - none
  - passthrough
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  docker.io:
    Blocked: false
    Insecure: false
    Location: docker.io
    MirrorByDigestOnly: false
    Mirrors:
    - Insecure: false
      Location: mirror.gcr.io
    Prefix: docker.io
  docker.io/library:
    Blocked: false
    Insecure: false
    Location: quay.io/libpod
    MirrorByDigestOnly: false
    Mirrors: null
    Prefix: docker.io/library
  search:
  - docker.io
  - quay.io
  - registry.fedoraproject.org
store:
  configFile: /usr/share/containers/storage.conf
  containerStore:
    number: 0
    paused: 0
    running: 0
    stopped: 0
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /var/lib/containers/storage
  graphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 1
  runRoot: /run/containers/storage
  volumePath: /var/lib/containers/storage/volumes
version:
  APIVersion: 4.0.3
  Built: 0
  BuiltTime: Thu Jan  1 00:00:00 1970
  GitCommit: ""
  GoVersion: go1.18
  OsArch: linux/amd64
  Version: 4.0.3

Package info (e.g. output of rpm -q podman or apt list podman):

podman/unknown,now 100:4.0.3-1 amd64 [installed

Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide? (https://github.com/containers/podman/blob/main/troubleshooting.md)

Yes

Additional environment details (AWS, VirtualBox, physical, etc.):

The text was updated successfully, but these errors were encountered:

giuseppe · 2022-04-28T11:05:15Z

@mtrmac would your fixes in c/storage also address this issue?

mtrmac · 2022-04-28T13:04:03Z

The issue description links to containers/storage#1136 , and seems consistent with that at a short glance (I didn’t try to reproduce). The two PRs that are waiting for review target inconsistent overlay driver state, and don’t fix this locking issue.

github-actions · 2022-05-29T00:08:00Z

A friendly reminder that this issue had no activity for 30 days.

rhatdan · 2022-05-31T12:22:32Z

Since those two PRs were merged and podman has revendored storage, I am assuming this is fixed. Reopen if I am mistaken.

mtrmac · 2022-05-31T17:58:27Z

The two PRs that are waiting for review target inconsistent overlay driver state, and don’t fix this locking issue.

banool · 2022-06-29T03:32:35Z

I'm hitting this also, rebooting did not fix the issue.

Update: podman system reset fixed it.

elasticdotventures · 2022-07-06T03:14:46Z

podman system reset

⚠️does not warn, WILL DELETE EVERY CONTAINER ON YOUR SYSTEM. use with caution!

github-actions · 2022-08-06T00:07:15Z

A friendly reminder that this issue had no activity for 30 days.

rhatdan · 2022-08-06T12:50:04Z

@mtrmac Should this still be opened?

mtrmac · 2022-08-09T13:43:07Z

AFAIK, containers/storage#1136 is still outstanding (and I’m not working on it).

As for whether Podman needs to track this separately from the c/storage issue, I don’t have a strong opinion.

rhatdan · 2022-08-09T14:31:42Z

Ok I am going to close this issue, and we can follow it in containers/storage.

In order to avoid a podman issue [1] causing a layer corruption when an image pull is killed midway, let's move the image pull outside of the timeout command. The timeout was recently reduced to 20 seconds with [2] making the issue more likely to happen. [1] containers/podman#14003 [2] openshift#3271

Ramblurr · 2022-10-06T09:22:59Z

I am experiencing this bug in centos stream 9. Is there a way to fix my podman host without wiping everything?

openshift-ci bot added the kind/bug Categorizes issue or PR as related to a bug. label Apr 25, 2022

luluz66 changed the title ~~Storage is corrupted after podman pull is killed~~ Storage gets corrupted after podman pull is killed Apr 25, 2022

flouthoc mentioned this issue May 11, 2022

Canceling buildah bud during layer pull causes layer not known with unclear resolution containers/buildah#3979

Closed

github-actions bot added the stale-issue label May 29, 2022

rhatdan closed this as completed May 31, 2022

mtrmac reopened this May 31, 2022

github-actions bot removed the stale-issue label Jun 1, 2022

github-actions bot added the stale-issue label Aug 6, 2022

github-actions bot removed the stale-issue label Aug 7, 2022

rhatdan closed this as completed Aug 9, 2022

mandre mentioned this issue Aug 31, 2022

OCPBUGS-631: Pull container image as a separate step openshift/machine-config-operator#3318

Merged

mandre mentioned this issue Aug 31, 2022

OCPBUGS-737: Pull container image as a separate step openshift/machine-config-operator#3319

Merged

github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 13, 2023

github-actions bot locked as resolved and limited conversation to collaborators Sep 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Storage gets corrupted after podman pull is killed #14003

Storage gets corrupted after podman pull is killed #14003

luluz66 commented Apr 25, 2022

giuseppe commented Apr 28, 2022

mtrmac commented Apr 28, 2022

github-actions bot commented May 29, 2022

rhatdan commented May 31, 2022

mtrmac commented May 31, 2022

banool commented Jun 29, 2022 •

edited

Loading

elasticdotventures commented Jul 6, 2022

github-actions bot commented Aug 6, 2022

rhatdan commented Aug 6, 2022

mtrmac commented Aug 9, 2022

rhatdan commented Aug 9, 2022

Ramblurr commented Oct 6, 2022 •

edited

Loading

Storage gets corrupted after podman pull is killed #14003

Storage gets corrupted after podman pull is killed #14003

Comments

luluz66 commented Apr 25, 2022

giuseppe commented Apr 28, 2022

mtrmac commented Apr 28, 2022

github-actions bot commented May 29, 2022

rhatdan commented May 31, 2022

mtrmac commented May 31, 2022

banool commented Jun 29, 2022 • edited Loading

elasticdotventures commented Jul 6, 2022

github-actions bot commented Aug 6, 2022

rhatdan commented Aug 6, 2022

mtrmac commented Aug 9, 2022

rhatdan commented Aug 9, 2022

Ramblurr commented Oct 6, 2022 • edited Loading

banool commented Jun 29, 2022 •

edited

Loading

Ramblurr commented Oct 6, 2022 •

edited

Loading