-
Notifications
You must be signed in to change notification settings - Fork 598
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(lib-storage): use PUT for small uploads #2605
Conversation
Codecov Report
@@ Coverage Diff @@
## main #2605 +/- ##
=======================================
Coverage ? 60.36%
=======================================
Files ? 516
Lines ? 27475
Branches ? 6603
=======================================
Hits ? 16585
Misses ? 10890
Partials ? 0 Continue to review full report at Codecov.
|
Use Put for uploads smaller than part size in lib-storage
feaabe8
to
e70f7e0
Compare
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much for puting this together! I only have a few comments.
async __createMultipartUpload() { | ||
if (!this.createMultiPartPromise) { | ||
const createCommandParams = { ...this.params, Body: undefined }; | ||
this.createMultiPartPromise = this.client.send(new CreateMultipartUploadCommand(createCommandParams)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need the variable to handle concurrent createMultipart request if you move the __createMultipartUpload()
to the __doMultipartUpload()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm missing something, but why not? Let's say uploader A called CreateMultipartUploadCommand
, and before the command returns uploader B gets another chunk. Uploader B needs to wait for Uploader A to get the UploadId
, so they need some-kind of shared state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm suggesting moving creating multipart upload API call out of the uploader workflow. As a result, the CreateMultipartUploadCommand
is called synchronously before firing concurrent uploaders.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that we might not actually want to execute the create command, and we only know this after we get the first chunk from the chunker.
We could change the flow so that the first chunk is done separately, and the uploaders are created only after the first chunk was yielded (maybe this is what you're suggesting?). If the first chunk is the last - use PUT, otherwise spin up concurrent uploaders and use multi-part. It would be a much bigger change than what I did, and will probably be way more complex, I think, but it could work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Linkgoron I did a little POC and I do find it is a overkill to support this feature. The current solution should work without significant overhead. I will approve your change!
@@ -132,9 +183,6 @@ export class Upload extends EventEmitter { | |||
} | |||
|
|||
async __doMultipartUpload(): Promise<ServiceOutputTypes> { | |||
const createMultipartUploadResult = await this.client.send(new CreateMultipartUploadCommand(this.params)); | |||
this.uploadId = createMultipartUploadResult.UploadId; | |||
|
|||
// Set up data input chunks. | |||
const dataFeeder = getChunk(this.params.Body, this.partSize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it wouldn't be too difficult, as I've said before the main issue is that the code is more complex IMO. Note that once we go down this route we'd also need to manage the lifetime of the iterator, as we'd need to wrap stuff in a try/finally and call dataFeeder.return()
to make sure that it isn't missed as otherwise some resources might not get released (which is done by the for await loop "for free"). If it's fine I can change the code. I've already implemented something similar to the above locally after you suggested it previously.
Thank you a lot for bringing this feature to v3! @Linkgoron |
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs and link to relevant comments in this thread. |
Use Put for uploads smaller than part size in lib-storage
Issue
When upgrading from
s3.upload
tolib-storage
upload, the performance for small files has degraded and the number of API calls has tripled. This was caused byv2
implementing an optimization fors3.upload
that uses PUT for uploads that are one part only (which is one API call), whilelib-storage
always uses multi-part uploads (which is at least 3 api calls).fixes #2593
Description
This PR implements PUT for small uploads, instead of using multi-part uploads for smaller files.
Testing
Added multiple tests to Upload.spec.js, that test both "large" multi-part uploads and smaller uploads.
Additional context
The most complex part of this PR (IMO) is delaying the multi-part start command, which needs to happen only after we're certain that we need it. So now it needs to happen in one of the concurrent uploaders, while also needing the other uploaders to wait for it to finish as they need an uploadid to upload their own parts.
As an aside, I've seen quite a few buffer-copies that I think can be improved and removed completely. Should I open another PR for that, or incorporate them in this PR?By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.