Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GHA export cache fails with mode=max #2276

Closed
aluzzardi opened this issue Jul 23, 2021 · 17 comments · Fixed by tonistiigi/go-actions-cache#8
Closed

GHA export cache fails with mode=max #2276

aluzzardi opened this issue Jul 23, 2021 · 17 comments · Fixed by tonistiigi/go-actions-cache#8

Comments

@aluzzardi
Copy link
Member

With mode=max, export cache fails due to some blocking on the GitHub side (I assume?).

This is with buildkit 0.9.0

failed to parse error response 429: {"$id":"1","innerException":null,"message":"Request was blocked due to exceeding usage of resource 'Count' in namespace ''.","typeName":"Microsoft.TeamFoundation.Framework.Server.RequestBlockedException, Microsoft.TeamFoundation.Framework.Server","typeKey":"RequestBlockedException","errorCode":0,"eventId":3000}: invalid character 'ï' looking for beginning of value

/cc @crazy-max

@tonistiigi
Copy link
Member

@chrispat Any idea what this means?

@lucacome
Copy link

I'm seeing this as well

#19 ERROR: error writing layer blob: failed to parse error response 429: {"$id":"1","innerException":null,"message":"Request was blocked due to exceeding usage of resource 'Count' in namespace ''.","typeName":"Microsoft.TeamFoundation.Framework.Server.RequestBlockedException, Microsoft.TeamFoundation.Framework.Server","typeKey":"RequestBlockedException","errorCode":0,"eventId":3000}: invalid character 'ï' looking for beginning of value
------
 > exporting cache:
------
error: failed to solve: error writing layer blob: failed to parse error response 429: {"$id":"1","innerException":null,"message":"Request was blocked due to exceeding usage of resource 'Count' in namespace ''.","typeName":"Microsoft.TeamFoundation.Framework.Server.RequestBlockedException, Microsoft.TeamFoundation.Framework.Server","typeKey":"RequestBlockedException","errorCode":0,"eventId":3000}: invalid character 'ï' looking for beginning of value
Error: buildx failed with: error: failed to solve: error writing layer blob: failed to parse error response 429: {"$id":"1","innerException":null,"message":"Request was blocked due to exceeding usage of resource 'Count' in namespace ''.","typeName":"Microsoft.TeamFoundation.Framework.Server.RequestBlockedException, Microsoft.TeamFoundation.Framework.Server","typeKey":"RequestBlockedException","errorCode":0,"eventId":3000}: invalid character 'ï' looking for beginning of value

@chrispat
Copy link

Where are you seeing this issue exactly? Are you using the cache action or the API?

@tonistiigi
Copy link
Member

tonistiigi commented Aug 23, 2021

Via API. The second error is from write/reserve/commit request. Not sure about the first one.

Note that for more precise matching BuildKit stores individual cache blob for each filesystem layer. So more requests are expected than with the cache action.

@chrispat
Copy link

I think what you are hitting is request limits and some missing handling for the 429 error. There was a similar issue here satackey/action-docker-layer-caching#12. I think you would need to do some work to control how many concurrent requests you have going.

@tonistiigi
Copy link
Member

As a first step would be good to know what the limits are. Then we could follow up if they maybe could be increased of we could add concurrency limits or retries from buildkit side. Reliable concurrency limits are tricky though as builds can happen from parallel jobs without seeing each other.

@tonistiigi
Copy link
Member

Looks like there are reports of 429 from regular actions/cache as well. actions/cache#467 The issue is closed but iiuc it was fixed by just ignoring the error(logging it) and leaving the cache in undefined state.

@dhadka
Copy link

dhadka commented Aug 23, 2021

@tonistiigi

As a first step would be good to know what the limits are.

There's a throttling limit of approximately 150 cache creations every 5 minutes. Note though if a bunch of requests are sent within a very short time window, throttling could happen sooner.

In the issue @chrispat linked, satackey/action-docker-layer-caching#12, the action was trying to create caches for 200+ Docker layers within about 0.1 seconds so it was getting throttled. That was due to unbound parallelism, but I'm not seeing that here. It looks like each layer is uploaded sequentially, correct?

So this is perhaps a lot of small(er) caches being created within a short time period? It would be great if anyone could share the org/repo where this problem is seen and I can look at our telemetry to see what's going on and if there's anything we can do on the backend.

@tonistiigi
Copy link
Member

but I'm not seeing that here. It looks like each layer is uploaded sequentially, correct?

Yes, currently the upload is sequential but theoretically multiple builds can run at the same time. Bigger blobs are uploaded in chunks with concurrency 4 https://github.com/tonistiigi/go-actions-cache/blob/master/cache.go#L25 . I guess these requests don't count for this limit? Otherwise we need to stop uploading in chunks to use fewer requests.

It would be great if anyone could share the org/repo where this problem is seen and I can look at our telemetry to see what's going on and if there's anything we can do on the backend.

@aluzzardi @lucacome

@lucacome
Copy link

@dhadka @tonistiigi https://github.com/nginxinc/kubernetes-ingress but we've disabled caching for now since we were having a lot of other issues too like #2325 and docker/buildx#681

@dhadka
Copy link

dhadka commented Aug 23, 2021

@lucacome Thanks! I'll take an in-depth look tomorrow, but it certainly seems like the multiple build jobs targeting different OS'es + each job creating multiple caches is hitting the throttling limits.

@tonistiigi Is there a separate issue here with how the client is handling the response from the cache server? Specifically with the invalid character 'ï' looking for beginning of value message? Should it retry or treat this as a cache-miss?

@chrispat
Copy link

@dhadka given each job has its own token, shouldn't they have independent limits?

@dhadka
Copy link

dhadka commented Aug 23, 2021

@chrispat

given each job has its own token, shouldn't they have independent limits?

@esirko would know the full details about how throttling limits are enforced, but my understanding is it's per repo.

@tonistiigi
Copy link
Member

tonistiigi commented Aug 23, 2021

Is there a separate issue here with how the client is handling the response from the cache server? Specifically with the invalid character 'ï' looking for beginning of value message?

Yes, not sure what though. Looks like error fails to parse as json but it does look like valid json after printing. Wonder if it may be something with encoding, maybe with $id, if the error was not fixed up manually. Comes from https://github.com/tonistiigi/go-actions-cache/blob/master/cache.go#L546-L549

edit: Oh, that's not i but ï 👀 , so it is BOM.

@dhadka
Copy link

dhadka commented Aug 24, 2021

🤦‍♂️ Oof...I'll file an issue with our platform team to take a look at why that BOM is showing up. I think it's because exceptions undergo some additional processing that's converting it to a string with Encoding.UTF8, which is prepending the BOM. Essentially what's described in https://jimmybogard.com/the-curious-case-of-the-json-bom/.

@tonistiigi
Copy link
Member

fixed the bom part in tonistiigi/go-actions-cache#6 in our side

@dhadka
Copy link

dhadka commented Aug 24, 2021

@tonistiigi Thank you! I've also filed an issue on our side to not emit the BOM (since that technically violates the RFC spec for JSON), but would have also recommended handling this on the client side as the turnaround for lower priority issues could take some time.

I've also started the conversation internally to relax the throttling limits.

PiDelport added a commit to ntls-io/nautilus-wallet that referenced this issue Aug 31, 2021
There seems to be an issue currently where concurrent builds cause the
Docker GHA cache exporter to trip over GitHub throttling limits.

Background reading:

- buildx failed with: error: failed to solve: blob not found #422
docker/build-push-action#422

- Copy from previous stage fails #681
docker/buildx#681

- GHA export cache fails with mode=max #2276
moby/buildkit#2276

Try to work around this by limiting the overall concurrency to 1 for
this workflow.
PiDelport added a commit to ntls-io/nautilus-wallet that referenced this issue Sep 1, 2021
There seems to be an issue currently where concurrent builds cause the
Docker GHA cache exporter to trip over GitHub throttling limits.

Background reading:

- buildx failed with: error: failed to solve: blob not found #422
docker/build-push-action#422

- Copy from previous stage fails #681
docker/buildx#681

- GHA export cache fails with mode=max #2276
moby/buildkit#2276

Try to work around this by limiting the overall concurrency to 1 for
this workflow.
PiDelport added a commit to ntls-io/nautilus-wallet that referenced this issue Sep 1, 2021
There seems to be an issue currently where concurrent builds cause the
Docker GHA cache exporter to trip over GitHub throttling limits.

Background reading:

- buildx failed with: error: failed to solve: blob not found #422
docker/build-push-action#422

- Copy from previous stage fails #681
docker/buildx#681

- GHA export cache fails with mode=max #2276
moby/buildkit#2276

Try to work around this by limiting the overall concurrency to 1 for
this workflow.
PiDelport added a commit to ntls-io/nautilus-wallet that referenced this issue Sep 1, 2021
There seems to be an issue currently where concurrent builds cause the
Docker GHA cache exporter to trip over GitHub throttling limits.

Background reading:

- buildx failed with: error: failed to solve: blob not found #422
docker/build-push-action#422

- Copy from previous stage fails #681
docker/buildx#681

- GHA export cache fails with mode=max #2276
moby/buildkit#2276

Try to work around this by limiting the overall concurrency to 1 for
this workflow.
PiDelport added a commit to ntls-io/nautilus-wallet that referenced this issue Sep 2, 2021
There seems to be an issue currently where concurrent builds cause the
Docker GHA cache exporter to trip over GitHub throttling limits.

Background reading:

- buildx failed with: error: failed to solve: blob not found #422
docker/build-push-action#422

- Copy from previous stage fails #681
docker/buildx#681

- GHA export cache fails with mode=max #2276
moby/buildkit#2276

Try to work around this by limiting the overall concurrency to 1 for
this workflow.
jose-fully-ported added a commit to porter-dev/porter-archive that referenced this issue Dec 14, 2023
Buildkit builds expose custom cache exporters which can be used to locally speed up builds on Github Actions. The gha cacher will use the Github Actions cache api instead of the remote registry for caching.

To enable this, set the following two environment variables.

```
DOCKER_BUILDKIT=1
BUILDKIT_CACHE_EXPORTER=gha
```

Note that the cache mode is set to min by default (no value == min) due to a potential timeout issue within the exporter. See moby/buildkit#2276 for details. To switch the cache mode, set the following environment variable:

```
BUILDKIT_CACHE_MODE=max
```
jose-fully-ported added a commit to porter-dev/porter-archive that referenced this issue Dec 14, 2023
Buildkit builds expose custom cache exporters which can be used to locally speed up builds on Github Actions. The gha cacher will use the Github Actions cache api instead of the remote registry for caching.

To enable this, set the following two environment variables.

```
DOCKER_BUILDKIT=1
BUILDKIT_CACHE_EXPORTER=gha
```

Note that the cache mode is set to min by default (no value == min) due to a potential timeout issue within the exporter. See moby/buildkit#2276 for details. To switch the cache mode, set the following environment variable:

```
BUILDKIT_CACHE_MODE=max
```
jose-fully-ported added a commit to porter-dev/porter-archive that referenced this issue Dec 14, 2023
Buildkit builds expose custom cache exporters which can be used to locally speed up builds on Github Actions. The gha cacher will use the Github Actions cache api instead of the remote registry for caching.

To enable this, set the following two environment variables.

```
DOCKER_BUILDKIT=1
BUILDKIT_CACHE_EXPORTER=gha
```

Note that the cache mode is set to min by default (no value == min) due to a potential timeout issue within the exporter. See moby/buildkit#2276 for details. To switch the cache mode, set the following environment variable:

```
BUILDKIT_CACHE_MODE=max
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants