-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[master] Large performance regression exporting layers with containerd worker #2365
Comments
Just a random guess - could this be caused by a lack of buffering between the gzip compressor and the content store writer? I've run into similar issues with I don't know if this is the issue here, though, and don't want to send you down the wrong path. |
I did notice this comment on
That suggests that maybe I'm on to something... |
@aaronlehmann Thank you for reporting this. I'm working on it.
Could you provide an example build reproduces this issue? |
This regression seems occur in containerd worker. FROM ghcr.io/stargz-containers/ubuntu:20.04-org
RUN apt-get update -y && apt-get install -y gcc In containerd worker mode (w/ containerd v1.5.5 on ubuntu 20.04), "exporting layers" took 5.8s in 9462a23 (commit before the overlayfs differ) but took 18.2s in the master. (No regression occurred in OCI worker) I tried buffered copy (ktock@cc0ffec) for writing to the content store but it didn't solve this issue (still took 17.0s).
Thank you for providing this. Yes, sending diff tarball to containerd seems costly here. For containerd worker mode, I think we should use the diff service of containerd instead of our overlayfs-differ. |
I don't think the comment to "use io.CopyBuffer" was correct. That does not actually buffer writes but just allows to reuse the internal buffer. So |
@tonistiigi Fixed to use I think we should fix containerd worker not to use client-side differ implementation (must be done before v0.10.0, draft PR at #2366) and should port our overlayfs differ to containerd's differ service (can be long-term work). WDYT? |
If it is just 5.6 vs 6.5 then could be just grpc overhead indeed. @aaronlehmann can you confirm that you were using containerd worker as well? |
Yes, I am using the containerd worker. I'm not the one that originally set this up, so I'd be curious about the pros/cons of containerd vs. OCI and whether it would be better for me to switch. |
If you are already using containerd outside buildkit then with containerd worker you can load images into or from containerd imagestore and share the blob storage with other components. Otherwise, there are no benefits and OCI is generally probably bit faster because there is no extra grpc layer. Note for the differ, the grpc overhead shouldn't apply because there is a special diff API in containerd and slow blob creation happens internally in daemon. But with this change we would compute the diff directly in buildkit and then send the blob over grpc to containerd, so now there was much more traffic going to the grpc api. |
@ktock So did I understand correctly that adding bufio, while it didn't make the diff as fast as oci worker or containerd internal differ it still made it much faster. Eg. now it was 10-20% slower while previously it was 3-6 times slower. |
Yes. |
PR #2181 (Compute diff from the upper directory of overlayfs-based snapshotter) appears to cause a serious performance regression when exporting layers for an image push.
Before this merge commit, the "export layers" span of a relatively straightforward build/push took 8 seconds. On the merge commit, it is taking 69 seconds. containerd and buildkitd come close to maxing out CPU during this interval.
We are seeing a routine set of builds get dramatically slower, going from ~20 minutes end-to-end to about 85 minutes.
I can provide more information for debugging on request.
cc @ktock @tonistiigi @sipsma @coryb
The text was updated successfully, but these errors were encountered: