Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multipart enhancements for efficiency #114

Closed
JustinKyleJames opened this issue Jul 25, 2024 · 6 comments
Closed

Multipart enhancements for efficiency #114

JustinKyleJames opened this issue Jul 25, 2024 · 6 comments
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@JustinKyleJames
Copy link
Contributor

JustinKyleJames commented Jul 25, 2024

Enhance multipart uploads by doing the following:

putobject.cpp

  1. When a part upload request comes in, store the part size in a data structure that looks like the following:
{                                                                                                                                                                                                                                                                                                                    
  "1234abcd-1234-1234-123456789abc":                                                                                                           
    { 0: 4096000,                                       
      1: 4096000,                                       
      4: 4096000 },                                   
  "01234abc-0123-0123-0123456789ab":                                                                    
    { 0: 5192000,                                                
      3: 5192000 }
}

The outer map has the upload_id as the keys. The inner map has the part number as the key and part size as the value.

  1. Prior to streaming the data for a part, determine if we know all previous part sizes. If so, we know the offset for this part and can stream directly to iRODS instead of creating a local part file. If we don't then write the data to a local part file.

completemultipartupload.cpp

Only read the parts that have corresponding part files and write them to iRODS. The other parts should have already been written.

Delete the entry for the upload_id in the shared memory once CompleteMultipart returns. (This should also be done when CancelMultipartUpload is implemented.)

@JustinKyleJames
Copy link
Contributor Author

A couple of notes on the implementation.

  1. If someone resends a part with a different size before CompleteMultipartUpload is called, this could cause issues. I think we just have to say that this will not be supported. I believe I can detect the part size change and reject it.

  2. The entry in shared memory is cleaned up when CompleteMultipartUpload returns. It will also be cleaned up if CancelMultipartUpload is called once that endpoint is implemented. However, if neither is called it won't be cleaned up until the server restarts. A new call to InitiateMultipartUpload would generate a new upload_id so that upload should behave correctly.

@korydraughn
Copy link
Collaborator

  1. If someone resends a part with a different size before CompleteMultipartUpload is called, this could cause issues. I think we just have to say that this will not be supported. I believe I can detect the part size change and reject it.

Hmm, there may be cases where someone wants to support resending of parts. Perhaps we should introduce a config option that allows the admin to change the behavior of the S3 API. This would be similar to the 4.2 compatibility option.

Do you know if the clients expose options for disabling the resending of parts?

  1. The entry in shared memory is cleaned up when CompleteMultipartUpload returns. It will also be cleaned up if CancelMultipartUpload is called once that endpoint is implemented. However, if neither is called it won't be cleaned up until the server restarts. A new call to InitiateMultipartUpload would generate a new upload_id so that upload should behave correctly.

If CompleteMultipartUpload and CancelMultipartUpload are never called, can we determine if the upload id is dead (i.e. not going to be used)? My guess is no. We'd probably have to rely on some amount of inactivity, which means we'd have to track time.

If we don't clean up, then we've effectively lost that memory for the lifetime of the S3 API process.

Worst case scenario is the network is unstable and multiple users have to restart transfers which lead to the shared memory being exhausted.

I think we need a sweeper task to handle the clean up.

Thoughts?

@trel
Copy link
Member

trel commented Aug 22, 2024

  1. Not sure we've ever seen this... it's a client-side choice... and we have little/no experience in the wild, for now. Shipping a 'reject' seems okay at the beginning, along with a big fat log message that explains the situation to an admin, and tells them to complain to us.

  2. If we can't come up with a better metric/indicator... a sweeper with a timing configuration for the admin seems... reasonable as a backstop.

@korydraughn
Copy link
Collaborator

  1. Not sure we've ever seen this... it's a client-side choice... and we have little/no experience in the wild, for now. Shipping a 'reject' seems okay at the beginning, along with a big fat log message that explains the situation to an admin, and tells them to complain to us.

Yeah, I'm betting the clients don't support such a feature. However, if we start rejecting all resends of parts, that means we're choosing performance over more coverage of the protocol. Some users may prefer having resend capability over improved performance.

Having an option in the config for controlling rejection of parts would cover both use-cases.

@trel trel mentioned this issue Sep 6, 2024
JustinKyleJames added a commit to JustinKyleJames/irods_client_s3_cpp that referenced this issue Oct 3, 2024
Design:

1. When a new part upload is initiated, determine if all lower
numbered parts have been started. If so, you can determine the part
offset via the previous part sizes.  In this case stream the part
directly to iRODS with a seek to the offset and writes.

2. To accomplish the above, when a new part upload is initiated, save
the part size in a map that translates the upload_id's to a list of
parts numbers and sizes. The following is an example of this map:

{
  "1234abcd-1234-1234-123456789abc":
    { 0: 4096000,
      1: 4096000,
      4: 4096000 },
  "01234abc-0123-0123-0123456789ab":
    { 0: 5192000,
      3: 5192000 }
}

3. If the part offset is not known, write the bytes to a local part
file. When CompleteMultipartUpload is encountered, read all of these
local part files and stream these to iRODS. If there is no local part
file, that means that that part was streamed directly to iRODS and does
not need to be rewritten.

4. The first open to iRODS will remain open until
CompleteMultipartUpload is finished. This is done to make sure that the
replica_token does not update in the middle of writing parts to iRODS.
See the keep_dstream_open_flag flag in putobject.cpp.
alanking pushed a commit that referenced this issue Oct 3, 2024
Design:

1. When a new part upload is initiated, determine if all lower
numbered parts have been started. If so, you can determine the part
offset via the previous part sizes.  In this case stream the part
directly to iRODS with a seek to the offset and writes.

2. To accomplish the above, when a new part upload is initiated, save
the part size in a map that translates the upload_id's to a list of
parts numbers and sizes. The following is an example of this map:

{
  "1234abcd-1234-1234-123456789abc":
    { 0: 4096000,
      1: 4096000,
      4: 4096000 },
  "01234abc-0123-0123-0123456789ab":
    { 0: 5192000,
      3: 5192000 }
}

3. If the part offset is not known, write the bytes to a local part
file. When CompleteMultipartUpload is encountered, read all of these
local part files and stream these to iRODS. If there is no local part
file, that means that that part was streamed directly to iRODS and does
not need to be rewritten.

4. The first open to iRODS will remain open until
CompleteMultipartUpload is finished. This is done to make sure that the
replica_token does not update in the middle of writing parts to iRODS.
See the keep_dstream_open_flag flag in putobject.cpp.
@alanking
Copy link

alanking commented Oct 3, 2024

@JustinKyleJames - Please close if complete. Thanks

@JustinKyleJames
Copy link
Contributor Author

This work has been completed. Closing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants