-
Notifications
You must be signed in to change notification settings - Fork 251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crash resiliency: deleting incomplete layers doesn’t reliably happen #1136
Comments
@nalind @vrothberg This is a locking design issue I’m not immediately sure how to handle. A cheap kludge would be to have the read-only Maybe have |
It sounds like a new state would help. But I only parsed your summary and did not take a look at code. I'd easily need a full workday to catch up on this issue. |
One concern with this approach is that a single process would still see a layer disappear during its lifetime. Notably a pull would first see a layer present in That’s not, in principle, a new concern — layers can always disappear while the store is unlocked. But it would now be more likely to be user-noticeable. Alternatively, should every process start by taking a read-write lock and cleaning up in-progress layers, even if the operation is purely read-only? That seems vaguely reminiscent of earlier read-write/read-only discussions like #473 (which, to be explicit, I haven’t now re-read). |
... to help diagnosing later possible broken references to this layer; compare containers#1136 . Signed-off-by: Miloslav Trmač <[email protected]>
... to help diagnosing later possible broken references to this layer; compare containers#1136 . Signed-off-by: Miloslav Trmač <[email protected]>
... to help diagnosing later possible broken references to this layer; compare containers#1136 . Signed-off-by: Miloslav Trmač <[email protected]>
... to help diagnosing later possible broken references to this layer; compare containers#1136 . Signed-off-by: Miloslav Trmač <[email protected]>
#1145 doesn’t fix this. |
@nalind what should we do about this? |
1. [1] landed after 4.11.1 and was shipped in 4.11.2, adding a 20s timeout to baremetalRuntimeCfgImage 'podman run ...' calls. 2. The baremetalRuntimeCfgImage stuff gets enabled for a number of infrastructure providers [2]. 3. A Podman bug in TERM handling means that timeout TERMs can result in corrupted storage [3]. 4. That corruption bubbles up with errors like: Can't read link "/var/lib/containers/storage/overlay/..." because it does not exist. A storage corruption might have occurred or maybe: Image ... exists in local storage but may be corrupted (remove the image to resolve the issue): layer not known or maybe both [4]. 5. 4.11.3 and later fix the regression by separating the possibly-slow image pull from the container run [4]. [1]: https://github.com/openshift/machine-config-operator/pull/3287/files#diff-255f8a4599166f31961853ea8626f969ca4231c55aacbc20a5bb3ceb640f911dR48 [2]: https://github.com/openshift/machine-config-operator/blob/d33d8dc3d2cad2247f67dff5989256315000e2d1/pkg/controller/template/render.go#L512 Commit from: $ oc adm release info --commits quay.io/openshift-release-dev/ocp-release:4.11.2-x86_64 | grep machine-config-operator machine-config-operator https://github.com/openshift/machine-config-operator d33d8dc3d2cad2247f67dff5989256315000e2d1 [3]: containers/storage#1136 [4]: https://issues.redhat.com/browse/OCPBUGS-631
Note also #1322 , about the whole concept of read-only locking the store object. |
See #1332 (comment) for a very vague sketch of how this could be fixed. |
(Warning: Untested:) An originally unappreciated component of the failure is that before the
step, the That has recently changed in #1351 : so the simple reproducers above should now, AFAICS, trigger deleting incomplete layers. That needs testing (and if true, we will need a more complex reproducer.) We still need to fix this in case that incomplete layer is created by a concurrent process during a lifetime of our process (e.g. if two processes are concurrently pulling images that share layers, and one crashes). TBD: We need a different solution for incomplete layers in read-only additional stores. Maybe just remove the incomplete layer from in-memory state completely, and pretend it doesn’t exist; that might allow the layer to be pulled in the primary store. At least assuming correct writers to the additional store (which is not guaranteed as of today), there should be no child layers or images referring to those incomplete layers. |
Confirmed.
Then it is observable that the incomplete layer is not deleted before a child is created, and the pulled image is corrupt; already the pull complains:
A subsequent
and that causes the original process to pull the image again. The ultimate failure is the same: child layers’
|
Trying the above reproducer, #1407 indeed seems to fix it: The resumed process, on a first read-only access, notices that the store is corrupt and retries with a write lock, deleting the incomplete layer immediately and creating it from scratch. |
Consider this sequential sequence of events, with the overlay graph driver.
layers.json
withincompleteFlag
.ApplyDiff
, the pull process is forcibly killed (so that it can’t do its own cleanup).layers.json
contain a record of the layer, withincompleteFlag
; the overlay graph driver contains an incomplete/inconsistent layer, but a$parentLayer/link
file and al/$link
symbolic link exist. This is all as expected.Store.Layer(parentLayer)
. This locks thelayerStore
read-only first. Thus, the firstlayerStore.ReloadIfChanged
does trigger alayerStore.Load()
, but that does not clean up incomplete layers. ButlayerStore.lockFile.lw
was updated to match the lock file contents.Store.Layer
reports thatparentLayer
exists.parentLayer
exists, and starts creatingchildLayer
.childLayer
, thelayerStore
is locked read-write, but because nothing has changed on disk andlayerStore.lockFile.lw
matches (within the same process),layerStore.ReloadIfChanged
does nothing, and does not enterlayerStore.Load()
and the “delete incomplete layers” code is not reached. Consequently,parentLayer
continues to exist in incomplete state.childLayer
to succeed.$childLayer/lower
is created, and includes the short link fromparentLayer/link
.layerStore
. That finally triggerslayerStore.Load
to delete incomplete layers — and nowparentLayer
is deleted, resulting in a broken parent link fromchildLayer
toparentLayer
.podman run theSameImage
works for this purpose. That deletes the layer and fails withError: layer not known
(with a currently unclear call stack).podman run theSameImage
causes the missing layer to be noticed, withparentLayer
is missing, and creates it afresh, with a new$parentLayer/link
value.childLayer
is not missing, and the previous one is just reused.$childLayer/lower
continues to contain the old$parentLayer/link
value.childLayer
, this manifests inThe text was updated successfully, but these errors were encountered: