Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

buildkit + gcr.io private repos (credHelpers) do not stack #720

Closed
haizaar opened this issue Nov 12, 2018 · 38 comments · Fixed by #1024
Closed

buildkit + gcr.io private repos (credHelpers) do not stack #720

haizaar opened this issue Nov 12, 2018 · 38 comments · Fixed by #1024
Milestone

Comments

@haizaar
Copy link

haizaar commented Nov 12, 2018

Docker 18.09-ce here.

I have FROM directive in my dockerfile pointing to a private registry:

FROM gcr.io/...

Running DOCKER_BUILDKIT=1 docker build . with this Dockerfile never finishes (after 5 minutes I hit CTRL-C).
Without buildkit it builds fine in seconds.

My ~/.docker/config.json is as follows:

{
  "credHelpers": {
    "us.gcr.io": "gcloud",
    "staging-k8s.gcr.io": "gcloud",
    "asia.gcr.io": "gcloud",
    "gcr.io": "gcloud",
    "marketplace.gcr.io": "gcloud",
    "eu.gcr.io": "gcloud"
  }
}

After waiting long time and pressing CTRL-C, the following error is printed (exact image names scrambled with ...):

------
 > [stage-1 1/4] FROM gcr.io/...:
------
failed to copy: httpReaderSeeker: failed open: unexpected status code https://gcr.io/v2/...: 403 Forbidden

Bug?

@tonistiigi
Copy link
Member

@haizaar Yes, I think this is a bug with credHelpers in cli. Would it be possible for you to confirm that without credhelpers, private pulls from grc work fine?

Will try to fix this for 18.09.1

@haizaar
Copy link
Author

haizaar commented Nov 12, 2018 via email

@tonistiigi
Copy link
Member

@haizaar I didn't mean that. Does it work with buildkit when you do regular docker login without the credential helpers?

@haizaar
Copy link
Author

haizaar commented Nov 13, 2018

@tonistiigi
No, apparently it does not:

$ gcloud auth print-access-token | docker login -u oauth2accesstoken --password-stdin https://gcr.io
WARNING! Your password will be stored unencrypted in /home/haizaar/.docker/config.json.
Configure a credential helper to remove this warning. See
https://docs.docker.com/engine/reference/commandline/login/#credentials-store

Login Succeeded
$ cat Dockerfile
FROM gcr.io/...
$ DOCKER_BUILDKIT=1 docker build .
...

After several minutes I hit a CTRL-C and got the same forbidden error as in my original post.

So, can we conclude that buildkit does not work with private registries at all?

@mstevanic
Copy link

I had the same issue. Adding # syntax=docker/dockerfile:experimental helped in my case.

@haizaar
Copy link
Author

haizaar commented Nov 15, 2018

Adding # syntax=... indeed helps, but during the build I see the following error:

=> ERROR [internal] load metadata for gcr.io/...

The error is for the private repo image I try to start FROM.
However the command exits with success status and the resulting image is built.

@tonistiigi
Copy link
Member

Looked into it and it actually seems to be a gcr bug. I thought this was because of credential helpers but this part seems to work fine.

Containerd and buildkit use a concept of shallow pulls where only parts of an image are pulled. For example, in buildkit this is how the metadata of base image can be used by the builder before the image has actually been pulled (eg. to determine cache matches without pulls).

The way authentication works is that the client expects 401 Unauthorized to be returned from the registry and then authorizes the request based on the response headers. In gcr, from looking at the data, it seems that only the unauthenticated manifest requests return 401 while the unauthenticated blob requests return 403 what is not a response that can be used for asking the credentials.

As the result of that the private pulls from gcr only work on specific conditions where manifest and config are pulled by the same action as the layers. Why it worked with external implementation was actually a side-effect of another bug that failed to pull the config on a proper time and is being fixed in #729 .

@dmcgowan @mattmoor

@haizaar
Copy link
Author

haizaar commented Nov 20, 2018

Thanks for the explanation. So... should we start bugging Google about this?

@mslusarczyk
Copy link

@tonistiigi I see same issue with Nexus (OSS 3.14.0-04) and latest Docker for Mac.

@tonistiigi
Copy link
Member

@mslusarczyk Do you have the same behavior as this issue (hanging on blob download while manifests pulled fine) or #721 (failure when using #syntax with basic auth registry) . Is this with #syntax or not?

@dmcgowan
Copy link
Member

It looks like GCR fixed the manifests endpoint which works with the containerd pull flow since we are ALWAYS resolving the manifest endpoint on pull. We could change this in the future though if we did type introspection after a shallow pull, in which case containerd could break with GCR again.

Based on the specification, the blobs endpoint should be supporting 401 as well https://github.com/opencontainers/distribution-spec/blob/master/spec.md#fetch-blob
@jonjohnsonjr @mattmoor Do you agree this should be changed similar to https://issuetracker.google.com/issues/64463951 ?

@mattmoor
Copy link

I pinged the team that's actively work on the registry about this, but responses may be delayed due to the holiday week.

(for some historical context)
GCR doesn't currently 401 on blob endpoints for two main reasons:

  1. It enables us to more quickly redirect to the backing storage, which is what's really authenticating users anyhow. We could simply check for a Bearer token and reject the request on 401, but...
  2. The --registry-mirror support doesn't pass any auth headers and a 401 pushback would likely break this path (without special casing).

It was nice to support mirroring with literally zero code changes to the serving path. This change would basically necessitate GCR special-casing the mirroring codepath to avoid exactly the check you want added.

If you don't hear from someone by next week, feel free to ping me again.

@tonistiigi
Copy link
Member

Temp. workaround solution for moby moby/moby#38246

@mslusarczyk
Copy link

@tonistiigi Sorry for the delay
Build without # syntax=docker/dockerfile:experimental but with DOCKER_BUILDKIT=1

[+] Building 0.4s (4/5)                                                                                                           
 => [internal] load build definition from Dockerfile                                                                         0.0s
 => => transferring dockerfile: 1.45kB                                                                                       0.0s
 => [internal] load .dockerignore                                                                                            0.0s
 => => transferring context: 2B                                                                                              0.0s
 => ERROR [internal] load metadata for nexus.yyy.com:9084/jdk8-gradle2.14:3.0.2                                       0.2s
 => ERROR [1/2] FROM nexus.yyy.com:9084/jdk8-gradle2.14:3.0.2                                                         0.2s
 => => resolve nexus.yyy.com:9084/jdk8-gradle2.14:3.0.2                                                               0.2s
------
 > [internal] load metadata for nexus.yyy.com:9084/jdk8-gradle2.14:3.0.2:
------
------
 > [1/2] FROM nexus.yyy.com:9084/jdk8-gradle2.14:3.0.2:
------
nexus.yyy.com:9084/jdk8-gradle2.14:3.0.2 not found

Build with # syntax=docker/dockerfile:experimental and DOCKER_BUILDKIT=1


[+] Building 5.5s (6/7)                                                                                                           
 => [internal] load .dockerignore                                                                                            0.0s
 => => transferring context: 2B                                                                                              0.0s
 => [internal] load build definition from Dockerfile                                                                         0.0s
 => => transferring dockerfile: 1.49kB                                                                                       0.0s
 => resolve image config for docker.io/docker/dockerfile:experimental                                                        1.5s
 => docker-image://docker.io/docker/dockerfile:experimental@sha256:d2d402b6fa1dae752f8c688d72066a912d7042cc1727213f7990cdb5  2.0s
 => => resolve docker.io/docker/dockerfile:experimental@sha256:d2d402b6fa1dae752f8c688d72066a912d7042cc1727213f7990cdb57f60  0.0s
 => => sha256:d2d402b6fa1dae752f8c688d72066a912d7042cc1727213f7990cdb57f60df0c 2.03kB / 2.03kB                               0.0s
 => => sha256:da42ab0d8238dd86f7669b0ce43bb0a2d0f8bb55147eadc3073760429b703899 527B / 527B                                   0.0s
 => => sha256:0ebbea2400fa95b4fbbef60efc694402c078d6bcab7ee2b476bb17ac9af3dfaa 637B / 637B                                   0.0s
 => => sha256:d3bb83eb682ffd2e30a5c8a89b86d60fd348e09be8301d8c28cf0b58ebf61412 7.54MB / 7.54MB                               1.6s
 => => extracting sha256:d3bb83eb682ffd2e30a5c8a89b86d60fd348e09be8301d8c28cf0b58ebf61412                                    0.3s
 => ERROR [internal] load metadata for nexus.yyy.com:9084/jdk8-gradle2.14:3.0.2                                       0.2s
 => ERROR [1/2] FROM nexus.yyy.com:9084/jdk8-gradle2.14:3.0.2                                                         0.2s
 => => resolve nexus.yyy.com:9084/jdk8-gradle2.14:3.0.2                                                               0.2s
------
 > [internal] load metadata for nexus.yyy.com:9084/jdk8-gradle2.14:3.0.2:
------
------
 > [1/2] FROM nexus.yyy.com:9084/jdk8-gradle2.14:3.0.2:
------
rpc error: code = Unknown desc = nexus.yyy.com:9084/jdk8-gradle2.14:3.0.2 not found

All works fine without DOCKER_BUILDKIT=1

@haizaar
Copy link
Author

haizaar commented Nov 29, 2018

Ping @mattmoor

Can please someone update about the status of this issue? Is it indeed GCR only and we expect them to fix their service? Is there a temporary workaround that is going to land into 18.09.1 release?

@haizaar
Copy link
Author

haizaar commented Dec 7, 2018

Looks like the work-around is going to be merged into docker-ce 18.09.1: docker-archive/engine#122

@haizaar
Copy link
Author

haizaar commented Jan 11, 2019

My previous conclusion that it works was premature. With BUILDKIT=1 docker 18.09.1 can't pull privates image from gcr.io regardless of # syntax... preamble being present in the docker file.

Seems like it fails to fetch blob:

$ DOCKER_BUILDKIT=1 docker build -t bk .
[+] Building 346.4s (4/4) FINISHED
 => [internal] load .dockerignore                                                                          0.1s
 => => transferring context: 2B                                                                            0.0s
 => [internal] load build definition from Dockerfile                                                       0.1s
 => => transferring dockerfile: 37B                                                                        0.0s
 => [internal] load metadata for gcr.io/...                                                                2.4s
 => ERROR [1/1] FROM gcr.io/...                                                                          343.9s
 => => resolve gcr.io/...                                                                                  0.0s
 => => sha256:<sum1>                                                           4.33kB / 4.33kB             0.0s
 => => sha256:<sum2>                                                           0B / 2.21MB               343.9s
 => => sha256:<sum3>                                                           0B / 6.49MB               343.9s
 => => sha256:<sum4>                                                           1.99kB / 1.99kB             0.0s
 => => sha256:<sum5>                                                           0B / 1.74MB               343.9s
------
 > [1/1] FROM gcr.io/...:
------
failed to copy: httpReaderSeeker: failed open: unexpected status code https://gcr.io/v2/<project>/<image-name>/blobs/sha256:<sum3>: 403 Forbidden

Without BUILDKIT=1 all works fine.

@haizaar
Copy link
Author

haizaar commented Jan 11, 2019

Did another test with private registry on dockerhub - all works great. So this is indeed a gcr.io issue. Will try to persuade Google support.

$ docker system prune -a  # make sure there is no cache
$ cat Dockerfile 
FROM haizaar/private:latest
$ DOCKER_BUILDKIT=1 docker build -t pri .
[+] Building 6.8s (5/5) FINISHED                                                                                                                                                                                                                              
 => [internal] load build definition from Dockerfile                                                                                                                                                                                                     0.1s
 => => transferring dockerfile: 71B                                                                                                                                                                                                                      0.0s
 => [internal] load .dockerignore                                                                                                                                                                                                                        0.1s
 => => transferring context: 2B                                                                                                                                                                                                                          0.0s
 => [internal] load metadata for docker.io/haizaar/private:latest                                                                                                                                                                                        3.5s
 => [1/1] FROM docker.io/haizaar/private:latest@sha256:3d2e482b82608d153a374df3357c0291589a61cc194ec4a9ca2381073a17f58e                                                                                                                                  3.1s
 => => resolve docker.io/haizaar/private:latest@sha256:3d2e482b82608d153a374df3357c0291589a61cc194ec4a9ca2381073a17f58e                                                                                                                                  0.0s
 => => sha256:3d2e482b82608d153a374df3357c0291589a61cc194ec4a9ca2381073a17f58e 528B / 528B                                                                                                                                                               0.0s
 => => sha256:3f53bb00af943dfdf815650be70c0fa7b426e56a66f5e3362b47a129d57d5991 1.51kB / 1.51kB                                                                                                                                                           0.0s
 => => sha256:cd784148e3483c2c86c50a48e535302ab0288bebd587accf40b714fffd0646b3 2.21MB / 2.21MB                                                                                                                                                           2.8s
 => => extracting sha256:cd784148e3483c2c86c50a48e535302ab0288bebd587accf40b714fffd0646b3                                                                                                                                                                0.2s
 => exporting to image                                                                                                                                                                                                                                   0.0s
 => => exporting layers                                                                                                                                                                                                                                  0.0s
 => => writing image sha256:87211fd91f8ca080f3aae114dfe55e857195853b2968fd7a9be23bfa2ed91521                                                                                                                                                             0.0s
 => => naming to docker.io/library/pri        

@haizaar haizaar changed the title buildkit + private repos (credHelpers) do not stack buildkit + gcr.io private repos (credHelpers) do not stack Jan 14, 2019
@carlosgalvezp
Copy link

carlosgalvezp commented Jan 14, 2019

Hi,
Related to this topic, I'm also having issues when pulling from private repos. It fails to download the metadata of the parent image. Here's a failing and working case:

  • DOCKER_BUILDKIT=1, # syntax=docker/dockerfile:experimental -> error: "unexpected status code <image_name>: 403 Forbidden.

  • DOCKER_BUILDKIT=1, without any syntax specification -> works fine, it can download the metadata of the parent image.

Can we use a different # syntax tag that works?

EDIT: probably same issue as #721, but it should already been fixed in 18.09.1, right?

Thanks!

@haizaar
Copy link
Author

haizaar commented Jan 15, 2019

@carlosgalvezp Do you experience with gcr.io or other private registry provider?

@tonistiigi
Copy link
Member

tonistiigi commented Jan 15, 2019

EDIT: probably same issue as #721, but it should already been fixed in 18.09.1, right?

Yes, do you have 18.09.1. gcr issue is different and workaround was not part of 18.09.1

@haizaar
Copy link
Author

haizaar commented Jan 15, 2019

gcr issue is different and workaround was not part of 18.09.1

Do you have an idea what the workaround will be released?

In the meanwhile, Google support say "We will continue to investigate to determine if this is a GCR-level issue." I'll update the thread is there are any news.

@carlosgalvezp
Copy link

@haizaar I'm having the issue with a private docker registry, not gcr.io. Should I move the discussion to #721 even if it was closed?

@tonistiigi Yes I do have 18.09.1 and still I encounter this error.

Thanks!

@carlosgalvezp
Copy link

carlosgalvezp commented Jan 15, 2019

Actually now that I check it seems that I have 18.09.1 in the client but 18.09.0 in the server...

Client:
 Version:           18.09.1
 API version:       1.39
 Go version:        go1.10.6
 Git commit:        4c52b90
 Built:             Wed Jan  9 19:35:31 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          18.09.0
  API version:      1.39 (minimum version 1.12)
  Go version:       go1.10.4
  Git commit:       4d60db4
  Built:            Wed Nov  7 00:16:44 2018
  OS/Arch:          linux/amd64
  Experimental:     false

@haizaar
Copy link
Author

haizaar commented Jan 17, 2019

Got an update from Google that GCR team acknowledged the issue and they are going to address it, but there is no ETA currently.

@haizaar
Copy link
Author

haizaar commented Jan 21, 2019

Here is the ticket in Google issue tracker to watch: https://issuetracker.google.com/issues/123043691
(you can star it to subscribe for notifications)

@glerchundi
Copy link

Facing the same issue while using img, switching to DinD in the meantime.

@AkihiroSuda
Copy link
Member

Workaround PR moby/moby#38246 got merged into DOCKER_BUILDKIT mode (but not into standalone buildkitd yet)

@michael-gillett
Copy link

michael-gillett commented May 1, 2019

We are facing this issue as well. After trying a bunch of different authentication mechanisms, we decided to just docker pull the necessary private gcr.io images before running our docker build with DOCKER_BUILDKIT=1. This works since buildkit can pull the manifests from GCR and is then able to use the cached blobs from the previous pull

@AkihiroSuda
Copy link
Member

@michael-gillett Can you try Docker v19.03 with moby/moby#38246 ?

I'm closing this issue because it seems fixed now, but feel free to ping us if you are still hitting.

I also tested the latest standalone buildkitd as well and it works fine.

@haizaar
Copy link
Author

haizaar commented May 8, 2019

@AkihiroSuda Docker v19.03 is still not released though.

@AkihiroSuda
Copy link
Member

$ docker run -p 127.0.0.1:2375:2375 -d --privileged docker:19.03.0-beta3-dind
$ export DOCKER_HOST=tcp://127.0.0.1:2375

@tonistiigi
Copy link
Member

@AkihiroSuda

I also tested the latest standalone buildkitd as well and it works fine.

So did gcr fix their side or was that a different case?

@AkihiroSuda
Copy link
Member

So did gcr fix their side or was that a different case?

I'm not sure 😅

Somebody else can confirm?

@tonistiigi
Copy link
Member

@AkihiroSuda did not work for me on buildx with docker-container driver. I'm ok with putting the ugly workaround to buildkit.

@tonistiigi tonistiigi reopened this May 12, 2019
@tonistiigi tonistiigi added this to the v0.5.2 milestone May 12, 2019
@haizaar
Copy link
Author

haizaar commented May 13, 2019

I tried dind as @AkihiroSuda suggested and it actually worked for me - it was able to pull an imaged listed in FROM from private GCR repo. All good!

@haizaar
Copy link
Author

haizaar commented May 13, 2019

@tonistiigi

So did gcr fix their side or was that a different case?

The Google issue I listed earlier is still open.

@the21st
Copy link

the21st commented Jan 9, 2020

@haizaar The Google issue you listed earlier seems to be fixed now: https://issuetracker.google.com/issues/123043691#comment6

Does this mean this could be resolved?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.