Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid layer content in rootless mode #2381

Open
lbernail opened this issue Sep 28, 2021 · 17 comments
Open

Invalid layer content in rootless mode #2381

lbernail opened this issue Sep 28, 2021 · 17 comments
Labels
area/rootless rootless mode

Comments

@lbernail
Copy link

Hello,

We are starting to run buildkit in rootless mode and in some cases, we seem to be having issues with the snapshotter.
For instance, with this Dockerfile

FROM golang:latest as builder

# https://github.com/kubernetes-sigs/sig-storage-local-static-provisioner/commit/93915e330be4291f209b13dc6df063729bd4d9c4
ENV TAG "93915e330be4291f209b13dc6df063729bd4d9c4"
ENV GOOS "linux"
ENV CGO_ENABLED "0"

WORKDIR src/sigs.k8s.io/sig-storage-local-static-provisioner

RUN git clone https://github.com/kubernetes-sigs/sig-storage-local-static-provisioner.git .  && ls -l vendor/k8s.io/utils/io/ && find . | wc -l
RUN git checkout $TAG && ls -l vendor/k8s.io/utils/io/ && find | wc -l

RUN echo "Checking content" && ls -l vendor/k8s.io/utils/io/ && find | wc -l

We get this with buildkit in rootless mode:

#6 [3/5] RUN git clone https://github.com/kubernetes-sigs/sig-storage-local-static-provisioner.git .  && ls -l vendor/k8s.io/utils/io/ && find . | wc -l
#6 sha256:aff0ee315ab635aa458b8e553f083158f525c06590d39b21212716100211e4fa
#6 0.073 Cloning into '.'...
#6 3.166 total 8
#6 3.166 -rw-r--r-- 1 root root  134 Sep 25 09:58 README.md
#6 3.166 -rw-r--r-- 1 root root 2982 Sep 25 09:58 read.go
#6 3.195 6787
#6 DONE 3.4s

#7 [4/5] RUN git checkout 93915e330be4291f209b13dc6df063729bd4d9c4 && ls -l vendor/k8s.io/utils/io/ && find . | wc -l
#7 sha256:712f0ac1353ca45de25e50eb4e17134c9865712a83f756c893ee1fde0750e0c8
#7 1.068 Note: switching to '93915e330be4291f209b13dc6df063729bd4d9c4'.
#7 1.072 total 4
#7 1.072 -rw-r--r-- 1 root root 1748 Sep 25 09:58 consistentread.go
#7 1.106 5573
#7 DONE 1.3s

#8 [5/5] RUN echo "Checking content" && ls -l vendor/k8s.io/utils/io/ && find . | wc -l
#8 sha256:600ad17eb12cb7bd7a2f3154b8eec4ce13dd60bd40b2cabf4b95320e00fd45df
#8 0.120 Checking content
#8 0.121 total 12
#8 0.121 -rw-r--r-- 1 root root  134 Sep 25 09:58 README.md
#8 0.121 -rw-r--r-- 1 root root 1748 Sep 25 09:58 consistentread.go
#8 0.121 -rw-r--r-- 1 root root 2982 Sep 25 09:58 read.go
#8 0.157 5605
#8 DONE 0.2s

Of course we'd expect content of the layer at step #8 to be the same as the content at the end of step #7

We tested building in non-rootless mode and the problem disappeared.
After looking into past issues, we found #1792 which felt a bit similar, so we tried in rootless mode with --oci-worker-snapshotter=native and the problem also disappeared.

So the problem only happens in rootless mode with overlayfs snapshotter.
Here is the configuration where we see this behavior:

  • Ubuntu 20.04 with kernel 5.8
  • image: buildkit:v0.9.0-rootless
  • parameters: --oci-worker-no-process-sandbox
  • non-privileged with seccomp and apparmor set to unconfined

Is there a way to diagnose what is failing when we use the overlayfs snapshotter?
I could not find the impact of using the native snapshotter instead of the overlayfs one. I assume slower build?

Of course, we are more than happy to perform additional tests to gather more data

@tonistiigi
Copy link
Member

tonistiigi commented Sep 28, 2021

@AkihiroSuda I guess this is the fuse-overlay? Seems it doesn't invalidate the fuse cache or writeback is not implemented correctly.

edit: or is it the opaque handling? I initially looked that files where not there after clone but the issue seems to be that layer removes files and adds some new ones but in the next layer both appear.

@AkihiroSuda AkihiroSuda added the area/rootless rootless mode label Sep 28, 2021
@tonistiigi
Copy link
Member

@giuseppe Is this something known/tracked for fuse-overlayfs?

@giuseppe
Copy link

giuseppe commented Oct 4, 2021

It could be fuse-overlayfs.

What version of fuse-overlayfs is it?

What is the underlying file system? Is it running on top of an another overlay mount?

@tonistiigi
Copy link
Member

What version of fuse-overlayfs is it?

docker run -it --rm --entrypoint apk moby/buildkit:v0.9.0-rootless info -v | grep fuse-overlayfs
fuse-overlayfs-1.6-r0

@giuseppe
Copy link

giuseppe commented Oct 5, 2021

I am not sure I've followed the correct steps, but I've tried a simple docker build --no-cache . as rootless on Ubuntu 20.04.2 with kernel 5.4.0-73-generic with the Dockerfile above, and this is what I get:

...

Step 7/8 : RUN git checkout $TAG && ls -l vendor/k8s.io/utils/io/ && find | wc -l
 ---> Running in d62a84ba8eb9
Note: switching to '93915e330be4291f209b13dc6df063729bd4d9c4'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by switching back to a branch.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -c with the switch command. Example:

  git switch -c <new-branch-name>

Or undo this operation with:

  git switch -

Turn off this advice by setting config variable advice.detachedHead to false

HEAD is now at 93915e33 Merge pull request #135 from DataDog/JulienBalestra/readiness
total 4
-rw-r--r-- 1 root root 1748 Oct  5 08:52 consistentread.go
5573
Removing intermediate container d62a84ba8eb9
 ---> c1831ee2992d
Step 8/8 : RUN echo "Checking content" && ls -l vendor/k8s.io/utils/io/ && find | wc -l
 ---> Running in 713d365767c0
Checking content
total 12
-rw-r--r-- 1 root root  134 Oct  5 08:52 README.md
-rw-r--r-- 1 root root 1748 Oct  5 08:52 consistentread.go
-rw-r--r-- 1 root root 2982 Oct  5 08:52 read.go
5605
Removing intermediate container 713d365767c0
 ---> 175956894c15
Successfully built 175956894c15

fuse-overlays is not even installed on the system.

@lbernail
Copy link
Author

lbernail commented Oct 5, 2021

@giuseppe we run the rootless buildkitd in a kubernetes pod with no specific mounts so the underlying fs for builds is the root fs of the container (overlay)
We will try using an ephemeral volume to see if using ext4 as the underlying fs solves the issue

@lbernail
Copy link
Author

lbernail commented Oct 5, 2021

@AkihiroSuda moving to the native snapshotter solved all our problems with invalid layers but we took a huge hit in terms of performance (io throughput is much higher). I assume that's expected and that it is the only impact we should see from using this snapshotter?

@lbernail
Copy link
Author

lbernail commented Oct 8, 2021

We checked on our side and we are not using fuse-overlay (we don't load the fuse module and we don't run the k8s-hostdev-plugin plugin: #1384)
So it seems the problem is really overlay in rootless mode (which confirms what you saw @giuseppe )
We will try with a more recent kernel next week (so far we have tested on 5.8 and we will test on 5.11) and we will keep you updated

@lbernail
Copy link
Author

lbernail commented Oct 10, 2021

I'm not 100% sure but I think I may have found the (or a) problem.

I simulated what happens directly on a linux host:

mkdir lower merged upper work
git clone https://github.com/kubernetes-sigs/sig-storage-local-static-provisioner.git lower
sudo mount -t overlay overlay -olowerdir=lower,upperdir=upper,workdir=work merged
cd merged/ ; git checkout 93915e330be4291f209b13dc6df063729bd4d9c4 ; cd ..

At this point, lower has:

ls -l lower/vendor/k8s.io/utils/io
total 8
-rw-rw-r-- 1 ddeng ddeng 2982 Oct 10 15:08 read.go
-rw-rw-r-- 1 ddeng ddeng  134 Oct 10 15:08 README.md

And here is what we get in merged:

ls -l merged/vendor/k8s.io/utils/io
total 4
-rw-rw-r-- 1 ddeng ddeng 1748 Oct 10 15:09 consistentread.go

Which is the expected content.
What about upper?

ls -l upper/vendor/k8s.io/utils/io
total 4
-rw-rw-r-- 1 ddeng ddeng 1748 Oct 10 15:09 consistentread.go

No whiteouts, let's look at the directory:

sudo getfattr -R -d -m "" upper/vendor/k8s.io/utils/io
# file: upper/vendor/k8s.io/utils/io
trusted.overlay.opaque="y"

=> Directory is marked as opaque and the lower directory is ignored

Let's do the same in a user namespace, starting from scratch:

unshare -m -p -f -U -r bash
mkdir lower merged upper work
git clone https://github.com/kubernetes-sigs/sig-storage-local-static-provisioner.git lower
mount -t overlay overlay -olowerdir=lower,upperdir=upper,workdir=work merged
cd merged/ ; git checkout 93915e330be4291f209b13dc6df063729bd4d9c4 ; cd ..

At this point everything is the same, but

getfattr -R -d -m "" upper/vendor/k8s.io/utils/io

returns nothing in the userns. The directory has the right xattr in the host user namespace (trusted.overlay.opaque="y"), but we don't have the info in the userns so we can't add it when we snapshot.

So, I think the problem is with the extended attribute: given it's part of the Trusted namespace we can't see it in a user namespace.
I suspect this only impacts opaque directories and we can likely create a much simpler reproducer.

@lbernail
Copy link
Author

lbernail commented Oct 10, 2021

There is a solution in 5.11: torvalds/linux@2d2f2d7

And it looks promising:

unshare -m -p -f -U -r bash
mkdir lower merged upper work
git clone https://github.com/kubernetes-sigs/sig-storage-local-static-provisioner.git lower
mount -t overlay overlay -o userxattr -olowerdir=lower,upperdir=upper,workdir=work merged
cd merged/ ; git checkout 93915e330be4291f209b13dc6df063729bd4d9c4 ; cd ..

Same as before, with the additional mount option from 5.11: -o userxattr

And as a result:

getfattr -R -d -m "" upper/vendor/k8s.io/utils/io
# file: upper/vendor/k8s.io/utils/io
user.overlay.opaque="y"

Now I wonder if this will work with the overlay snapshotter because the new mount option is explicit and only work on 5.11+ kernels. We'll update our buildkit platform to 5.11 and will let you know what we find.

@lbernail
Copy link
Author

I'm not sure what happens between 2 RUN commands in buildkit. Are we archiving the diff into a tarball before processing to next one? (this seems to rely on github.com/docker/docker/pkg/archive )

Slightly related: #2181 computes diff from the overlayfs upper dir (which requires trusted.overlay.opaque or user.overlay.opaque). Before this PR, diff relied on walking differ (which is slow but should always be right I think). Once this PR goes into a release it may break some setups (kernel < 5.11). It does not seem to be our problem here because 0.9.0 still uses the walking differ (hence my first question on archive)

@tonistiigi
Copy link
Member

Are we archiving the diff into a tarball before processing to next one?

No. Just the previous upper becomes an additional lower.

@AkihiroSuda If the issue is that overlay in rootless mode doesn't work correctly for <5.11 kernels (at least for ubuntu) we should try to detect it and run fuse for these as well. cc @ktock for the overlay differ part

@lbernail
Copy link
Author

No. Just the previous upper becomes an additional lower.

This means we do not go through the Differ or Archiver ? Looks like I have another code path to explore. So we are unmounting and remounting using upper as an additional lower directly? (I wonder because I feel this should work, unless we do a copy, which may lead to loss of the xattr). I can easily check this on a 5.8 host

@tonistiigi
Copy link
Member

Differ only runs at the end of the build if an image result was needed.

@lbernail
Copy link
Author

lbernail commented Oct 10, 2021

OK I confirm this is the problem on 5.8:

unshare -m -p -f -U -r bash
mkdir lower merged upper work
git clone https://github.com/kubernetes-sigs/sig-storage-local-static-provisioner.git lower
mount -t overlay overlay -olowerdir=lower,upperdir=upper,workdir=work merged
cd merged/ ; git checkout 93915e330be4291f209b13dc6df063729bd4d9c4 ; cd ..

At this point, merged has the right content and xattr does not show in userns:

ls merged/vendor/k8s.io/utils/io
consistentread.go

getfattr -R -d -m "" upper/vendor/k8s.io/utils/io
=> nothing

If we proceed with a new mount including upper as an additional layer:

umount merged
mkdir upper2
mount -t overlay overlay -olowerdir=upper:lower,upperdir=upper2,workdir=work merged

the content becomes wrong, showing the xattr is ignored for the second mount:

ls merged/vendor/k8s.io/utils/io
consistentread.go  read.go  README.md

On 5.11, this works completely fine (as long as -o userxattr is added to mount options)

@lbernail
Copy link
Author

lbernail commented Oct 11, 2021

I confirm that the problem disappears with Ubuntu 20.04.3 and kernel 5.11 , which is great news 🎉
I agree with @tonistiigi I think we should not use overlayfs in kernel <5.11 even on Ubuntu: it will mostly work but will have weird edge cases as the one we encountered

@lbernail
Copy link
Author

I just noticed we have a lot of

[22995.791577] overlayfs: lowerdir is in-use as upperdir/workdir of another mount, accessing files from both mounts will result in undefined behavior.

in our kernel logs.
I assume we don't unmount overlayfs between the different mount operations used for RUN statements? It's likely just a warning and this does not impact our builds

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/rootless rootless mode
Projects
None yet
Development

No branches or pull requests

4 participants