Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option to set custom layer size via environment variables #374

Closed
jay-dee7 opened this issue Jan 19, 2023 · 13 comments · Fixed by #391
Closed

Option to set custom layer size via environment variables #374

jay-dee7 opened this issue Jan 19, 2023 · 13 comments · Fixed by #391

Comments

@jay-dee7
Copy link

A lot of the storage providers put restrictions on the minimum size of a single chunk (or part) that can be uploaded via something like a multipart upload. For example, AWS S3 says for a multipart upload, each part must be at least 5MB except for the last part, which can be any size. This scenario is true for most of the storage providers (for a good reason).

Inside the conformance testing suite, currently, we're using fixed length strings as chunks for the blobs. I was wondering if something like setting the size of the blobs would be possible via environment variables? Here's why:

Now, in our (as in opencontainers/distribution-spec) current implementation of conformance suite, tests usually have blobs (which really are just layers) as small as 23Bytes, and it is then chunked to even smaller sizes to be used in chunked layer uploading. This lead to a certain problem for us since one of our storage providers didn't support a chunk/part of size smaller than 5MB and that caused our Push Conformance Tests to fail, but when I tried uploading the hello-world and busybox images (two of the smallest container images I know of), they both worked just fine and container image push was successful.

So finally I had to go on and take a look at the conformance setup.go file and tweak a bit to set layer/blob size larger than 5MB and distribute them in two chunks:

  • Use math/rand to fill the layer/blob with random data (I hope that's okay)
  • Part/chunk A, at least 5MB
  • Part/chunk B, rest of the data

After this change, the conformance tests started passing. So this lead to me thinking about two scenarios:

  • One, Would it be a wise idea to have the option to configure the size of the layer being used for tests set via some environment variable? If this is the case, I'd love to submit a PR for this and I'll make sure the existing behaviour doesn't break.
  • Two, It breaks the rules for conformance and it's upto the specification implementors to take care nuances like this?

Both of these situations seem right to me. But of course as a registry developer and maintainer, it's easier for me to just set an environment variable and get away with it (I also feel like it should be okay to set this size via environment variable?)

Please let me know what you guys think about this :)

@sudo-bmitch
Copy link
Contributor

This raises two questions for me. How would a client know the minimum size to use per chunk? Does this affect the ability to recover from a partial chunk upload?

@verokarhu
Copy link

Can a registry implementation claim to be conformant if it sets a minimum size on chunks?

@jay-dee7
Copy link
Author

@sudo-bmitch, a client can't really know I guess. It would really have to be part of the protocol for uploading layers/container images for the clients to actually make such assumptions. But regardless, it shouldn't affect the resuming of a failed part upload because AFAIK, when an upload fails, a retry really happens via uploading individual parts and not arbitrary slices. It's more like the way an individual layer is chunked at client-side would be different but not how they retry a failed part.

@jay-dee7
Copy link
Author

jay-dee7 commented Jan 20, 2023

@verokarhu, this is the big question for me. I personally feel like, this breaks the conformance. Defining minimum size on chunks wouldn't actually be all that simple, and it'll definitely require some changes in the spec, since nothing like this (or against it) exists in the spec.

@sudo-bmitch
Copy link
Contributor

@jay-dee7 I'm not opposed to getting a recommended minimum chunk size in the spec since we recently added a recommended maximum manifest size.

I'm leaning towards an inverse of a 413 http status code with a minimum size passed to the client from the server in the error, perhaps in a header. How does S3 implement this error? Perhaps there's some logic that can be copied.

I'm one of the few clients that supports chunked uploads with multiple chunks, so there isn't much to break in this part of the spec. The conformance issues you encountered are because I recently added the tests after finding issues with multiple registries, so these are good things to sort out now.

@jay-dee7
Copy link
Author

jay-dee7 commented Jan 23, 2023

I'm a little glad to hear that.

Looking at the AWS S3 docs (I don't have access to an AWS account to actually test it), the S3 API would return a 400 Bad Request - EntityTooSmall error when any (n-1)th blob/part is smaller than 5MB [Link to S3 Docs] This seems too generic TBH.

If we want to follow a similar path to that of 413, maybe we do something like Expect and 417 Expectation failed? Though I'm not really aware of any headers that indicate a minimum size. Would following be an option:

  1. Server sends something like Distribution-Min-Chunk-Size in response to the POST request for Chunk upload (First request/Start Session for uploading the layer)
  2. Client chunks the layer into sizes multiple of this header value, except the last (nth) part. Last part can be of any size (smaller or larger but usually smaller)
  3. Follows on to upload the layer in chunks via Patch requests.
  4. Finishes as usual with a PUT request at the end with/without the final chunk.

If the layer is smaller than the said header value, client goes on to upload it via Single Post or Post then Put

@jay-dee7
Copy link
Author

@sudo-bmitch do you think the above approach makes sense?

@sudo-bmitch
Copy link
Contributor

This feels like the right path. We've been adding headers for other use cases, and adding it to the POST response gives clients the ability to avoid errors.

@mikebrow
Copy link
Member

mikebrow commented Mar 9, 2023

SGTM

@sajayantony
Copy link
Member

After hearing others describe the scenario on the calls. +1 on the header. Looking forward to seeing a PR for the header.

@sudo-bmitch
Copy link
Contributor

I'll work on a PR for the header and have that ready to discuss on our call next week.

@sudo-bmitch
Copy link
Contributor

@jay-dee7 I've got a draft PR #391 up for review, however I don't have a registry available where it can be tested (I need a test for the test). I'm assuming at least some clients will do a final partial PATCH followed by a zero length PUT. I think working around that on the client side is feasible (requiring the final partial chunk to be a PUT), but figured I'd start with the most flexible option to minimize the changes to the spec.

@jay-dee7
Copy link
Author

@sudo-bmitch this is great news. I can add this in OpenRegistry so that we can test it out. I'm super excited about this 🥳 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants