-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multipart enhancements for efficiency #114
Comments
A couple of notes on the implementation.
|
Hmm, there may be cases where someone wants to support resending of parts. Perhaps we should introduce a config option that allows the admin to change the behavior of the S3 API. This would be similar to the 4.2 compatibility option. Do you know if the clients expose options for disabling the resending of parts?
If If we don't clean up, then we've effectively lost that memory for the lifetime of the S3 API process. Worst case scenario is the network is unstable and multiple users have to restart transfers which lead to the shared memory being exhausted. I think we need a sweeper task to handle the clean up. Thoughts? |
|
Yeah, I'm betting the clients don't support such a feature. However, if we start rejecting all resends of parts, that means we're choosing performance over more coverage of the protocol. Some users may prefer having resend capability over improved performance. Having an option in the config for controlling rejection of parts would cover both use-cases. |
Design: 1. When a new part upload is initiated, determine if all lower numbered parts have been started. If so, you can determine the part offset via the previous part sizes. In this case stream the part directly to iRODS with a seek to the offset and writes. 2. To accomplish the above, when a new part upload is initiated, save the part size in a map that translates the upload_id's to a list of parts numbers and sizes. The following is an example of this map: { "1234abcd-1234-1234-123456789abc": { 0: 4096000, 1: 4096000, 4: 4096000 }, "01234abc-0123-0123-0123456789ab": { 0: 5192000, 3: 5192000 } } 3. If the part offset is not known, write the bytes to a local part file. When CompleteMultipartUpload is encountered, read all of these local part files and stream these to iRODS. If there is no local part file, that means that that part was streamed directly to iRODS and does not need to be rewritten. 4. The first open to iRODS will remain open until CompleteMultipartUpload is finished. This is done to make sure that the replica_token does not update in the middle of writing parts to iRODS. See the keep_dstream_open_flag flag in putobject.cpp.
Design: 1. When a new part upload is initiated, determine if all lower numbered parts have been started. If so, you can determine the part offset via the previous part sizes. In this case stream the part directly to iRODS with a seek to the offset and writes. 2. To accomplish the above, when a new part upload is initiated, save the part size in a map that translates the upload_id's to a list of parts numbers and sizes. The following is an example of this map: { "1234abcd-1234-1234-123456789abc": { 0: 4096000, 1: 4096000, 4: 4096000 }, "01234abc-0123-0123-0123456789ab": { 0: 5192000, 3: 5192000 } } 3. If the part offset is not known, write the bytes to a local part file. When CompleteMultipartUpload is encountered, read all of these local part files and stream these to iRODS. If there is no local part file, that means that that part was streamed directly to iRODS and does not need to be rewritten. 4. The first open to iRODS will remain open until CompleteMultipartUpload is finished. This is done to make sure that the replica_token does not update in the middle of writing parts to iRODS. See the keep_dstream_open_flag flag in putobject.cpp.
@JustinKyleJames - Please close if complete. Thanks |
This work has been completed. Closing this issue. |
Enhance multipart uploads by doing the following:
putobject.cpp
The outer map has the upload_id as the keys. The inner map has the part number as the key and part size as the value.
completemultipartupload.cpp
Only read the parts that have corresponding part files and write them to iRODS. The other parts should have already been written.
Delete the entry for the upload_id in the shared memory once CompleteMultipart returns. (This should also be done when CancelMultipartUpload is implemented.)
The text was updated successfully, but these errors were encountered: