feat(lib-storage): use PUT for small uploads #2605

Linkgoron · 2021-07-21T08:44:29Z

Use Put for uploads smaller than part size in lib-storage

Issue

When upgrading from s3.upload to lib-storage upload, the performance for small files has degraded and the number of API calls has tripled. This was caused by v2 implementing an optimization for s3.upload that uses PUT for uploads that are one part only (which is one API call), while lib-storage always uses multi-part uploads (which is at least 3 api calls).

fixes #2593

Description

This PR implements PUT for small uploads, instead of using multi-part uploads for smaller files.

Testing

Added multiple tests to Upload.spec.js, that test both "large" multi-part uploads and smaller uploads.

Additional context

The most complex part of this PR (IMO) is delaying the multi-part start command, which needs to happen only after we're certain that we need it. So now it needs to happen in one of the concurrent uploaders, while also needing the other uploaders to wait for it to finish as they need an uploadid to upload their own parts.

~~As an aside, I've seen quite a few buffer-copies that I think can be improved and removed completely. Should I open another PR for that, or incorporate them in this PR?~~

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

codecov-commenter · 2021-07-21T08:56:41Z

Codecov Report

❗ No coverage uploaded for pull request base (main@5789ff4). Click here to learn what that means.
The diff coverage is n/a.

@@           Coverage Diff           @@
##             main    #2605   +/-   ##
=======================================
  Coverage        ?   60.36%           
=======================================
  Files           ?      516           
  Lines           ?    27475           
  Branches        ?     6603           
=======================================
  Hits            ?    16585           
  Misses          ?    10890           
  Partials        ?        0

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5789ff4...e70f7e0. Read the comment docs.

Use Put for uploads smaller than part size in lib-storage

aws-sdk-js-automation · 2021-07-21T09:24:40Z

AWS CodeBuild CI Report

CodeBuild project: sdk-staging-test
Commit ID: e70f7e0
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

AllanZhengYP

Thank you very much for puting this together! I only have a few comments.

lib/lib-storage/src/chunks/getChunkBuffer.ts

lib/lib-storage/src/Upload.ts

AllanZhengYP · 2021-07-21T20:31:46Z

lib/lib-storage/src/Upload.ts

+  async __createMultipartUpload() {
+    if (!this.createMultiPartPromise) {
+      const createCommandParams = { ...this.params, Body: undefined };
+      this.createMultiPartPromise = this.client.send(new CreateMultipartUploadCommand(createCommandParams));


You don't need the variable to handle concurrent createMultipart request if you move the __createMultipartUpload() to the __doMultipartUpload()

Maybe I'm missing something, but why not? Let's say uploader A called CreateMultipartUploadCommand, and before the command returns uploader B gets another chunk. Uploader B needs to wait for Uploader A to get the UploadId, so they need some-kind of shared state.

I'm suggesting moving creating multipart upload API call out of the uploader workflow. As a result, the CreateMultipartUploadCommand is called synchronously before firing concurrent uploaders.

The problem is that we might not actually want to execute the create command, and we only know this after we get the first chunk from the chunker.

We could change the flow so that the first chunk is done separately, and the uploaders are created only after the first chunk was yielded (maybe this is what you're suggesting?). If the first chunk is the last - use PUT, otherwise spin up concurrent uploaders and use multi-part. It would be a much bigger change than what I did, and will probably be way more complex, I think, but it could work.

@Linkgoron I did a little POC and I do find it is a overkill to support this feature. The current solution should work without significant overhead. I will approve your change!

Linkgoron · 2021-08-06T22:53:40Z

lib/lib-storage/src/Upload.ts

@@ -132,9 +183,6 @@ export class Upload extends EventEmitter {
  }

  async __doMultipartUpload(): Promise<ServiceOutputTypes> {
-    const createMultipartUploadResult = await this.client.send(new CreateMultipartUploadCommand(this.params));
-    this.uploadId = createMultipartUploadResult.UploadId;
-
    // Set up data input chunks.
    const dataFeeder = getChunk(this.params.Body, this.partSize);


Yes, it wouldn't be too difficult, as I've said before the main issue is that the code is more complex IMO. Note that once we go down this route we'd also need to manage the lifetime of the iterator, as we'd need to wrap stuff in a try/finally and call dataFeeder.return() to make sure that it isn't missed as otherwise some resources might not get released (which is done by the for await loop "for free"). If it's fine I can change the code. I've already implemented something similar to the above locally after you suggested it previously.

AllanZhengYP · 2021-08-07T00:20:35Z

Thank you a lot for bringing this feature to v3! @Linkgoron

github-actions · 2021-08-22T00:01:44Z

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs and link to relevant comments in this thread.

MoLow approved these changes Jul 21, 2021

View reviewed changes

feat(lib-storage): use PUT from small uploads

e70f7e0

Use Put for uploads smaller than part size in lib-storage

Linkgoron force-pushed the lib-storage-put-small-files branch from feaabe8 to e70f7e0 Compare July 21, 2021 09:12

Linkgoron changed the title ~~feat(lib-storage): use PUT from small uploads~~ feat(lib-storage): use PUT for small uploads Jul 21, 2021

AllanZhengYP self-requested a review July 21, 2021 16:04

AllanZhengYP reviewed Jul 21, 2021

View reviewed changes

AllanZhengYP self-assigned this Jul 26, 2021

Linkgoron commented Aug 6, 2021

View reviewed changes

AllanZhengYP approved these changes Aug 7, 2021

View reviewed changes

AllanZhengYP merged commit 7374720 into aws:main Aug 7, 2021

github-actions bot locked as resolved and limited conversation to collaborators Aug 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(lib-storage): use PUT for small uploads #2605

feat(lib-storage): use PUT for small uploads #2605

Linkgoron commented Jul 21, 2021 •

edited

Loading

codecov-commenter commented Jul 21, 2021 •

edited

Loading

aws-sdk-js-automation commented Jul 21, 2021

AllanZhengYP left a comment

AllanZhengYP Jul 21, 2021

Linkgoron Jul 21, 2021

AllanZhengYP Jul 26, 2021

Linkgoron Jul 27, 2021 •

edited

Loading

AllanZhengYP Aug 7, 2021

Linkgoron Aug 6, 2021 •

edited

Loading

AllanZhengYP commented Aug 7, 2021 •

edited

Loading

github-actions bot commented Aug 22, 2021

feat(lib-storage): use PUT for small uploads #2605

feat(lib-storage): use PUT for small uploads #2605

Conversation

Linkgoron commented Jul 21, 2021 • edited Loading

Issue

Description

Testing

Additional context

codecov-commenter commented Jul 21, 2021 • edited Loading

Codecov Report

aws-sdk-js-automation commented Jul 21, 2021

AWS CodeBuild CI Report

AllanZhengYP left a comment

Choose a reason for hiding this comment

AllanZhengYP Jul 21, 2021

Choose a reason for hiding this comment

Linkgoron Jul 21, 2021

Choose a reason for hiding this comment

AllanZhengYP Jul 26, 2021

Choose a reason for hiding this comment

Linkgoron Jul 27, 2021 • edited Loading

Choose a reason for hiding this comment

AllanZhengYP Aug 7, 2021

Choose a reason for hiding this comment

Linkgoron Aug 6, 2021 • edited Loading

Choose a reason for hiding this comment

AllanZhengYP commented Aug 7, 2021 • edited Loading

github-actions bot commented Aug 22, 2021

Linkgoron commented Jul 21, 2021 •

edited

Loading

codecov-commenter commented Jul 21, 2021 •

edited

Loading

Linkgoron Jul 27, 2021 •

edited

Loading

Linkgoron Aug 6, 2021 •

edited

Loading

AllanZhengYP commented Aug 7, 2021 •

edited

Loading