Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Multipart support #2

Closed
12 tasks done
epsilon-phase opened this issue Jan 17, 2023 · 10 comments
Closed
12 tasks done

Add Multipart support #2

epsilon-phase opened this issue Jan 17, 2023 · 10 comments
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@epsilon-phase
Copy link
Contributor

epsilon-phase commented Jan 17, 2023

  • Create SQLite plugin
    • Enable ID assignment
    • Threading
    • List parts
    • list uploads
    • Create upload part
    • Abort multipart upload
    • Complete multipart upload
    • store key value
    • get key value
    • wrapper functions
  • Create multipart upload non-plugin code
  • Write route handling for multipart actions
epsilon-phase added a commit to epsilon-phase/irods_client_s3_cpp that referenced this issue Mar 31, 2023
# This is the 1st commit message:

Write the irods-S3-bridge.

This is the initial release, and thus limited in fairly many important ways.
Right now it particularly needs attention to the support of different payload
signature schemes as it does not handle streaming ones at the moment, it also
does not yet with the irods S3 resource as a client.

# This is the commit message irods#2:

Add boost::url
Continue working on getObject

# This is the commit message irods#3:

Initial commit
# This is the commit message irods#4:

initial goals
# This is the commit message irods#5:

Add some debug branches to make sure the request
discrimination logic is correct

# This is the commit message irods#6:

progress

# This is the commit message irods#7:

Add bucket resolver(placeholder for now)

# This is the commit message irods#8:

Various fixes. Getobject works properly

# This is the commit message #9:

Write some more of the S3 authentication stuff.

# This is the commit message irods#10:

write handle_listobjects_v2

# This is the commit message irods#11:

Working on the authentication stuff

# This is the commit message irods#12:

Steps towards authentication

# This is the commit message irods#13:

Big changes
* Move authentication code out of main.cpp
* Write hex_encode function
* Authentication works now
* Use a library that is not openssl for signature verification

# This is the commit message irods#14:

Progress towards growing a plugin interface
@korydraughn korydraughn added this to the 0.1.0 milestone Sep 18, 2023
@trel trel modified the milestones: 0.1.0, 0.2.0 Sep 29, 2023
@korydraughn
Copy link
Collaborator

Noting here the potential need for the following ...

  • Ring buffers to avoid memory exhaustion
  • A persistence layer to help with coordination and recovery

@trel trel changed the title Multipart checklist Add Multipart support Jan 4, 2024
@trel
Copy link
Member

trel commented Jan 4, 2024

initial design... 20240104

sequenceDiagram
participant client as S3 Client
participant s3 as S3 API
participant irods as iRODS

client ->>+ s3: CreateMultipartUpload
activate client
s3 ->>+ irods: initiate_parallel_transfer
irods ->>- s3: replica_token
s3 ->>- client: uploadID
deactivate client

    par Part 1
        client ->>+ s3: UploadPart_1
        activate client
        s3 ->>+ irods: part_1
        irods ->>- s3: response
        s3 ->>- client: response
    and Part 2
        client ->>+ s3: UploadPart_2
        s3 ->>+ irods: part_2
        irods ->>- s3: response
        s3 ->>- client: response
    and Part 3
        client ->>+ s3: UploadPart_3
        s3 ->>+ irods: part_3
        irods ->>- s3: response
        s3 ->>- client: response
        deactivate client
    end

client ->>+ s3: CompleteMultiPart
activate client
s3 ->>+ irods: complete_parallel_transfer
irods ->>- s3: response
s3 ->>- client: response
deactivate client

Loading

@trel
Copy link
Member

trel commented Jan 24, 2024

Yesterday we discussed different options (originally listed on UGM2023 slides)...

a. Multiobject - write all parts individually to iRODS, then complete triggers copy/concatenate/whatever

  • pro - relatively simple
  • con - lots of extra policy, could trigger replication to multiple continents (just a config option)
    • requires API plugin for concatenate()
      • may not require an API plugin, logic to read/write/remove could be in the bridge, possibly in parallel
      • but... client cannot tell server to move some bytes from a to b 'in' the server with an offset
    • temporarily 'pollutes' the namespace
    • double the physical space

b. Store-and-forward - write it all down in the bridge, then send it to iRODS

  • pro - simple, no extra policy
  • con - slow/delayed (client sends all, THEN bridge sends all, THEN return to client)
    • need POTENTIALLY HUGE scratch disk
  • *** BEST DECISION FOR NOW *** ... simple, quick, provides the functionality

c. Efficient store-and-forward - write down / hold non-contiguous parts in bridge - send contiguous parts to iRODS when ready

  • pro - elegant, single write
  • con - more complexity, need biggish disk
    • maybe off the table because a client can re-send the same numbered part and it should overwrite the earlier same part
    • OR... new thread! offset, overwrite, who cares, same size, magic/perfect, just works, don't look at me...

d. Store-and-register - write it all down where iRODS can see it, then just register it in iRODS

  • pro - simple, fastest
  • con - just reg policy?, adds dependency on co-visibility of bridge and iRODS
    • cannot continue on failure (incomplete writes)
    • iRODS doesn't know what happened
    • Client has no way to recover

e. ugm2023 persistence layer - efficient ring buffers in the middle for use of space and restart...

  • still don't know the part lengths, so limited to being a smart (c)

S3 protocol works for Amazon because concatenate() is FREE - AWS just stores its objects as a 'list' of parts, transparently.

  • We can't do that b/c we're storing 'single files' and POSIX...
  • unless we add 'sub-objects' as a thing to iRODS - then this becomes 'free' like at AWS
    • but all operations would have to know about and honor these sub-objects... sounds hard / a lot.

Zoey/Hao approach

  • next generation iRODS multipart from 2018
  • write down data in-place, at the correct offsets, with header/restart/progress information in a 'footer' at the end of the reserved length.
  • when done, just truncate the 'footer', and the data is in place.
  • THIS required all the offsets/lengths known up front - we don't have that with the S3 protocol

trel added a commit to trel/irods_client_s3_cpp that referenced this issue Mar 6, 2024
@JustinKyleJames
Copy link
Contributor

JustinKyleJames commented Mar 7, 2024

I don't think the SQlite part is in scope at the moment. As for the multipart checklists, all have been implemented except ListParts, ListMultipartUploads, and AbortMultipartUpload.

I will bump this to 0.3.0.

@trel
Copy link
Member

trel commented Mar 7, 2024

or we close it for 0.2.0... and make a new one for any remaining work?

@trel
Copy link
Member

trel commented Mar 7, 2024

b) store-and-forward has been implemented for 0.2.0

@trel trel added the enhancement New feature or request label Mar 7, 2024
@korydraughn
Copy link
Collaborator

GitHub gives us a button to export bullet list items as new issues.

@trel
Copy link
Member

trel commented Mar 7, 2024

We've created new issues for remaining multipart-related tasks. Closing this issue.

@JustinKyleJames
Copy link
Contributor

Here is my best attempt for a sequence diagram for the current behavior

sequenceDiagram
participant client as S3 Client
participant s3 as S3 API
participant thread_pool as S3 API Thread Pool
participant irods as iRODS

client ->>+ s3: CreateMultipartUpload
activate client
s3 ->> s3: generate uploadID
s3 ->>- client: uploadID
deactivate client

    par Part 1
        client ->>+ s3: UploadPart_1
        activate client
        s3 ->> s3: write part to disk
        s3 ->>- client: response
    and Part 2
        client ->>+ s3: UploadPart_2
        activate client
        s3 ->> s3: write part to disk
        s3 ->>- client: response
    end 

client ->>+ s3: CompleteMultipartUpload
activate client
s3 ->>+ thread_pool: write part 1 upload task to thread pool
s3 ->>+ thread_pool: write part 2 upload task to thread pool

  par Thread 1
     thread_pool ->>+ irods: stream bytes for part 1
     irods ->>- thread_pool: response
  and Thread 2
     thread_pool ->>+ irods: stream bytes for part 2
     irods ->>- thread_pool: response
  end

thread_pool ->>- s3: complete
s3 ->>- client: response
deactivate client
Loading

@trel
Copy link
Member

trel commented May 17, 2024

The 'current' behavior being 0.2.0 aka "b) store and forward"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants