Skip to content

Unable to copy to amazon while adding data #122

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
sandrinebedard opened this issue Jul 4, 2022 · 22 comments
Closed

Unable to copy to amazon while adding data #122

sandrinebedard opened this issue Jul 4, 2022 · 22 comments
Assignees

Comments

@sandrinebedard
Copy link
Member

Context

To uplaod new data, I ran the command:

git annex copy --to amazon     

I get the following error:

copy labels/sub-oxfordFmrib02/anat/sub-oxfordFmrib02_acq-MTon_MTS_seg-manual.nii.gz (checking amazon...) (to amazon...)
100%  7.14 KiB          5 MiB/s 0s
  S3Error {s3StatusCode = Status {statusCode = 403, statusMessage = "Forbidden"}, s3ErrorCode = "AccessDenied", s3ErrorMessage = "Access Denied", s3ErrorResource = Nothing, s3ErrorHostId = Just "s4EuiEryZUyBQbJbHAyaa5bu+pcRPg8hG/s0813R3L7b/nG6BBGM3X7zhPYyHQ/VrQjhjbBPog0=", s3ErrorAccessKeyId = Nothing, s3ErrorStringToSign = Nothing, s3ErrorBucket = Nothing, s3ErrorEndpointRaw = Nothing, s3ErrorEndpoint = Nothing}

I added my AWS credentials as stated here: https://github.com/spine-generic/spine-generic/wiki/git-annex#images-niigz

@kousu
Copy link
Contributor

kousu commented Jul 5, 2022

I'm looking into it @sandrinebedard, hopefully it's something simple.

@kousu
Copy link
Contributor

kousu commented Jul 5, 2022

Reviewing Policy

Here is the IAM policy @RignonNoel set up yesterday:

READ_WRITE_S3_data-multi-subject---spine-generic---neuropoly
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "s3:PutAnalyticsConfiguration",
                "s3:GetObjectVersionTagging",
                "s3:CreateBucket",
                "s3:ReplicateObject",
                "s3:GetObjectAcl",
                "s3:GetBucketObjectLockConfiguration",
                "s3:DeleteBucketWebsite",
                "s3:GetIntelligentTieringConfiguration",
                "s3:PutLifecycleConfiguration",
                "s3:GetObjectVersionAcl",
                "s3:DeleteObject",
                "s3:GetBucketPolicyStatus",
                "s3:GetObjectRetention",
                "s3:GetBucketWebsite",
                "s3:PutReplicationConfiguration",
                "s3:GetObjectAttributes",
                "s3:PutObjectLegalHold",
                "s3:InitiateReplication",
                "s3:GetObjectLegalHold",
                "s3:GetBucketNotification",
                "s3:PutBucketCORS",
                "s3:GetReplicationConfiguration",
                "s3:ListMultipartUploadParts",
                "s3:PutObject",
                "s3:GetObject",
                "s3:PutBucketNotification",
                "s3:PutBucketLogging",
                "s3:GetAnalyticsConfiguration",
                "s3:PutBucketObjectLockConfiguration",
                "s3:GetObjectVersionForReplication",
                "s3:GetLifecycleConfiguration",
                "s3:GetInventoryConfiguration",
                "s3:GetBucketTagging",
                "s3:PutAccelerateConfiguration",
                "s3:DeleteObjectVersion",
                "s3:GetBucketLogging",
                "s3:ListBucketVersions",
                "s3:RestoreObject",
                "s3:ListBucket",
                "s3:GetAccelerateConfiguration",
                "s3:GetObjectVersionAttributes",
                "s3:GetBucketPolicy",
                "s3:PutEncryptionConfiguration",
                "s3:GetEncryptionConfiguration",
                "s3:GetObjectVersionTorrent",
                "s3:AbortMultipartUpload",
                "s3:GetBucketRequestPayment",
                "s3:GetObjectTagging",
                "s3:GetMetricsConfiguration",
                "s3:GetBucketOwnershipControls",
                "s3:DeleteBucket",
                "s3:PutBucketVersioning",
                "s3:GetBucketPublicAccessBlock",
                "s3:ListBucketMultipartUploads",
                "s3:PutIntelligentTieringConfiguration",
                "s3:PutMetricsConfiguration",
                "s3:PutBucketOwnershipControls",
                "s3:GetBucketVersioning",
                "s3:GetBucketAcl",
                "s3:PutInventoryConfiguration",
                "s3:GetObjectTorrent",
                "s3:PutBucketWebsite",
                "s3:PutBucketRequestPayment",
                "s3:PutObjectRetention",
                "s3:GetBucketCORS",
                "s3:GetBucketLocation",
                "s3:ReplicateDelete",
                "s3:GetObjectVersion"
            ],
            "Resource": [
                "arn:aws:s3:::data-multi-subject---spine-generic---neuropoly",
                "arn:aws:s3:::data-multi-subject---spine-generic---neuropoly/*"
            ]
        },
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": [
                "s3:ListStorageLensConfigurations",
                "s3:ListAccessPointsForObjectLambda",
                "s3:GetAccessPoint",
                "s3:GetAccountPublicAccessBlock",
                "s3:ListAllMyBuckets",
                "s3:ListAccessPoints",
                "s3:ListJobs",
                "s3:PutStorageLensConfiguration",
                "s3:ListMultiRegionAccessPoints",
                "s3:CreateJob"
            ],
            "Resource": "*"
        }
    ]
}

I'm not at all versed in IAM policies, but this looks correct to me. It looks very similar to the one I made for backups. I'll come back to investigating it if I can't find anything else obvious.

Reproducing

I set up a read-only copy of the dataset and made sure it enableremote amazon'd:

p115628@joplin:~/datasets$ git clone [email protected]:spine-generic/data-multi-subject.git
Cloning into 'data-multi-subject'...
remote: Enumerating objects: 31328, done.
remote: Counting objects: 100% (2732/2732), done.
remote: Compressing objects: 100% (1991/1991), done.
remote: Total 31328 (delta 410), reused 2003 (delta 379), pack-reused 28596
Receiving objects: 100% (31328/31328), 3.73 MiB | 4.48 MiB/s, done.
Resolving deltas: 100% (8599/8599), done.
p115628@joplin:~/datasets$ cd data-multi-subject/
p115628@joplin:~/datasets/data-multi-subject$ git annex get sub-mniS02/
(merging origin/git-annex into git-annex...)
(recording state in git...)
(scanning for unlocked files...)
get sub-mniS02/anat/sub-mniS02_T1w.nii.gz (from amazon...) 

(checksum...) ok                     
get sub-mniS02/anat/sub-mniS02_T2star.nii.gz (from amazon...) 

(checksum...) ok                     
get sub-mniS02/anat/sub-mniS02_T2w.nii.gz (from amazon...) 

(checksum...) ok                     
get sub-mniS02/anat/sub-mniS02_acq-MToff_MTS.nii.gz (from amazon...) 

(checksum...) ok                  
get sub-mniS02/anat/sub-mniS02_acq-MTon_MTS.nii.gz (from amazon...) 

(checksum...) ok                   
get sub-mniS02/anat/sub-mniS02_acq-T1w_MTS.nii.gz (from amazon...) 

(checksum...) ok                  
get sub-mniS02/dwi/sub-mniS02_dwi.nii.gz (from amazon...) 

(checksum...) ok                     
(recording state in git...)
p115628@joplin:~/datasets/data-multi-subject$ git checkout -b upload-test

Then I went into AWS and created two new users, kousu2 and sandrine2, and I used the handy "Copy Permissions from Existing User" feature to make them emulate what both of us see, and set them up as Access-Key-Only users.

kousu2
p115628@joplin:~/datasets/data-multi-subject$ export AWS_ACCESS_KEY_ID="..." AWS_SECRET_ACCESS_KEY="..."
p115628@joplin:~/datasets/data-multi-subject$ dd if=/dev/urandom of=what.nii.gz count=20 bs=1M
20+0 records in
20+0 records out
20971520 bytes (21 MB, 20 MiB) copied, 0.153876 s, 136 MB/s
p115628@joplin:~/datasets/data-multi-subject$ git add what.nii.gz 
p115628@joplin:~/datasets/data-multi-subject$ git commit -m "test"
[upload-test-kousu2 5be22215] test
 1 file changed, 1 insertion(+)
 create mode 100644 what.nii.gz
p115628@joplin:~/datasets/data-multi-subject$ git annex whereis what.nii.gz 
whereis what.nii.gz (1 copy) 
  	932066fe-3995-451c-b1a6-fa1a9603a9dc -- [email protected]:~/datasets/data-multi-subject [here]
ok
p115628@joplin:~/datasets/data-multi-subject$ git annex copy --to amazon
copy sub-mniS02/anat/sub-mniS02_T1w.nii.gz (checking amazon...) ok
copy sub-mniS02/anat/sub-mniS02_T2star.nii.gz (checking amazon...) ok
copy sub-mniS02/anat/sub-mniS02_T2w.nii.gz (checking amazon...) ok
copy sub-mniS02/anat/sub-mniS02_acq-MToff_MTS.nii.gz (checking amazon...) ok
copy sub-mniS02/anat/sub-mniS02_acq-MTon_MTS.nii.gz (checking amazon...) ok
copy sub-mniS02/anat/sub-mniS02_acq-T1w_MTS.nii.gz (checking amazon...) ok
copy sub-mniS02/dwi/sub-mniS02_dwi.nii.gz (checking amazon...) ok
copy what.nii.gz (checking amazon...) (to amazon...) 
ok                                
(recording state in git...)
p115628@joplin:~/datasets/data-multi-subject$ git annex whereis what.nii.gz 
whereis what.nii.gz (2 copies) 
  	5a5447a8-a9b8-49bc-8276-01a62632b502 -- [amazon]
   	932066fe-3995-451c-b1a6-fa1a9603a9dc -- [email protected]:~/datasets/data-multi-subject [here]

  amazon: https://data-multi-subject---spine-generic---neuropoly.s3.ca-central-1.amazonaws.com/SHA256E-s20971520--6f0f7f9893cc8c8d01993abf368e59fb4678f2308b756f6f18734862404efb16.nii.gz
ok
sandrine2
p115628@joplin:~/datasets/data-multi-subject$ export AWS_ACCESS_KEY_ID="..." AWS_SECRET_ACCESS_KEY="..."
p115628@joplin:~/datasets/data-multi-subject$ dd if=/dev/urandom of=what-sandrine.nii.gz count=20 bs=1M
20+0 records in
20+0 records out
20971520 bytes (21 MB, 20 MiB) copied, 0.750586 s, 27.9 MB/s
p115628@joplin:~/datasets/data-multi-subject$ git add what-sandrine.nii.gz 
p115628@joplin:~/datasets/data-multi-subject$ git commit -m "test sandrine's credentials"
[upload-test-kousu2 6d2b9b11] test sandrine's credentials
 1 file changed, 1 insertion(+)
 create mode 100644 what-sandrine.nii.gz
p115628@joplin:~/datasets/data-multi-subject$ git annex copy --to amazon
copy sub-mniS02/anat/sub-mniS02_T1w.nii.gz (checking amazon...) ok
copy sub-mniS02/anat/sub-mniS02_T2star.nii.gz (checking amazon...) ok
copy sub-mniS02/anat/sub-mniS02_T2w.nii.gz (checking amazon...) ok
copy sub-mniS02/anat/sub-mniS02_acq-MToff_MTS.nii.gz (checking amazon...) ok
copy sub-mniS02/anat/sub-mniS02_acq-MTon_MTS.nii.gz (checking amazon...) ok
copy sub-mniS02/anat/sub-mniS02_acq-T1w_MTS.nii.gz (checking amazon...) ok
copy sub-mniS02/dwi/sub-mniS02_dwi.nii.gz (checking amazon...) ok
copy what-sandrine.nii.gz (checking amazon...) (to amazon...) 
39%   7.78 MiB          6 MiB/s 2s
  S3Error {s3StatusCode = Status {statusCode = 403, statusMessage = "Forbidden"}, s3ErrorCode = "AccessDenied", s3ErrorMessage = "Access Denied", s3ErrorResource = Nothing, s3ErrorHostId = Just "9zqKhZszy/4pOwKk9KyequYK6IgSN5w3uda5NnsY9Rb3FsPindWUV/xh7jFM3syOVjbDeHpxzgs=", s3ErrorAccessKeyId = Nothing, s3ErrorStringToSign = Nothing, s3ErrorBucket = Nothing, s3ErrorEndpointRaw = Nothing, s3ErrorEndpoint = Nothing}

33%   6.53 MiB          9 MiB/s 1s 
  S3Error {s3StatusCode = Status {statusCode = 403, statusMessage = "Forbidden"}, s3ErrorCode = "AccessDenied", s3ErrorMessage = "Access Denied", s3ErrorResource = Nothing, s3ErrorHostId = Just "2UBoKIKPVnPpVCGr4VWxIphyHJhszlewAzIFLaw6WqMAojGKVwJXnvfQu0QaqbGqdFDsi40uYWI=", s3ErrorAccessKeyId = Nothing, s3ErrorStringToSign = Nothing, s3ErrorBucket = Nothing, s3ErrorEndpointRaw = Nothing, s3ErrorEndpoint = Nothing}

39%   7.75 MiB          5 MiB/s 2s 
  S3Error {s3StatusCode = Status {statusCode = 403, statusMessage = "Forbidden"}, s3ErrorCode = "AccessDenied", s3ErrorMessage = "Access Denied", s3ErrorResource = Nothing, s3ErrorHostId = Just "fTIvEYMx2UEoWY3FCVVlkwd90fx5BBg7hFSDxSqrq0RY4duBtwVsqiZuwBonfe8ScmQTHaUhg3Q=", s3ErrorAccessKeyId = Nothing, s3ErrorStringToSign = Nothing, s3ErrorBucket = Nothing, s3ErrorEndpointRaw = Nothing, s3ErrorEndpoint = Nothing}
failed
copy what.nii.gz (checking amazon...) ok
git-annex: copy: 1 failed

So I can reproduce it, on the first try, which is a good start! Now I'll try to dig in and see if I can figure out what is be

@kousu
Copy link
Contributor

kousu commented Jul 5, 2022

Tracing

I turned up debugging

p115628@joplin:~/datasets/data-multi-subject$ git config annex.debug true

And saw:

p115628@joplin:~/datasets/data-multi-subject$ git annex copy --to amazon
[2022-07-05 11:13:41.782990605] process [656836] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","show-ref","git-annex"]
[2022-07-05 11:13:41.785661304] process [656836] done ExitSuccess
[2022-07-05 11:13:41.786184885] process [656837] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","show-ref","--hash","refs/heads/git-annex"]
[2022-07-05 11:13:41.788449622] process [656837] done ExitSuccess
[2022-07-05 11:13:41.78920187] process [656838] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","log","refs/heads/git-annex..236f50371e4c255ed14d4ebaf2dbeb37cd3b63b5","--pretty=%H","-n1"]
[2022-07-05 11:13:41.791476652] process [656838] done ExitSuccess
[2022-07-05 11:13:41.79292082] process [656839] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","cat-file","--batch"]
[2022-07-05 11:13:41.793426397] process [656840] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","cat-file","--batch-check=%(objectname) %(objecttype) %(objectsize)"]
[2022-07-05 11:13:41.808420732] process [656841] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","symbolic-ref","-q","HEAD"]
[2022-07-05 11:13:41.810580688] process [656841] done ExitSuccess
[2022-07-05 11:13:41.810899415] process [656842] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","show-ref","refs/heads/upload-test-kousu2"]
[2022-07-05 11:13:41.813718254] process [656842] done ExitSuccess
[2022-07-05 11:13:41.814022006] process [656843] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","ls-files","--stage","-z","--"]
[2022-07-05 11:13:41.814464755] process [656844] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","cat-file","--batch-check=%(objectname) %(objecttype) %(objectsize)","--buffer"]
[2022-07-05 11:13:41.814919649] process [656845] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","cat-file","--batch=%(objectname) %(objecttype) %(objectsize)","--buffer"]
[2022-07-05 11:13:41.815458503] process [656846] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","cat-file","--batch=%(objectname) %(objecttype) %(objectsize)","--buffer"]
copy sub-mniS02/anat/sub-mniS02_T1w.nii.gz (checking amazon...) [2022-07-05 11:13:41.977224789] String to sign: "HEAD\n/SHA256E-s18495908--c57686bd0854118db791fd1accecfc601c12a23fd559f4cc0b4e35790cd24639.nii.gz\n\nhost:data-multi-subject---spine-generic---neuropoly.s3.ca-central-1.amazonaws.com\nx-amz-content-sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855\nx-amz-date:20220705T151341Z\n\nhost;x-amz-content-sha256;x-amz-date\ne3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"
[2022-07-05 11:13:41.97735259] Host: "data-multi-subject---spine-generic---neuropoly.s3.ca-central-1.amazonaws.com"
[2022-07-05 11:13:41.977405572] Path: "/SHA256E-s18495908--c57686bd0854118db791fd1accecfc601c12a23fd559f4cc0b4e35790cd24639.nii.gz"
[2022-07-05 11:13:41.977449777] Query string: ""
[2022-07-05 11:13:41.977488699] Header: [("Date","Tue, 05 Jul 2022 15:13:41 GMT"),("Authorization","AWS4-HMAC-SHA256 Credential=[redacted]/20220705/ca-central-1/s3/aws4_request,SignedHeaders=host;x-amz-content-sha256;x-amz-date,Signature=ffea1fdc8082840b91eeaec3bb854208f32fed9bde9a0bbb5648059bf4eb7727"),("X-Amz-Content-Sha256","e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"),("X-Amz-Date","20220705T151341Z")]
[2022-07-05 11:13:42.11665903] Response status: Status {statusCode = 200, statusMessage = "OK"}
[2022-07-05 11:13:42.116783035] Response header 'x-amz-id-2': 'nnwHgEYzyEEqZxjng8esTayJXJSG4BOh8GF6vL1q9zfhowoHY4lF74cv3YAReLsDaJlgHQbMtZk='
[2022-07-05 11:13:42.116861378] Response header 'x-amz-request-id': 'NTN9C7DJMPGJFD62'
[2022-07-05 11:13:42.116922504] Response header 'Date': 'Tue, 05 Jul 2022 15:13:43 GMT'
[2022-07-05 11:13:42.11698014] Response header 'Last-Modified': 'Tue, 04 Aug 2020 20:30:16 GMT'
[2022-07-05 11:13:42.117036966] Response header 'ETag': '"38f77609ecaf89a25267b6da0ed39c21"'
[2022-07-05 11:13:42.117093692] Response header 'x-amz-version-id': 'null'
[2022-07-05 11:13:42.117148548] Response header 'Accept-Ranges': 'bytes'
[2022-07-05 11:13:42.117204641] Response header 'Content-Type': 'application/gzip'
[2022-07-05 11:13:42.117260363] Response header 'Server': 'AmazonS3'
[2022-07-05 11:13:42.117313906] Response header 'Content-Length': '18495908'
[2022-07-05 11:13:42.117486744] Response metadata: S3: request ID=<none>, x-amz-id-2=<none>
[2022-07-05 11:13:42.118396154] process [656848] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","cat-file","--batch"]
[2022-07-05 11:13:42.119159378] process [656849] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","cat-file","--batch-check=%(objectname) %(objecttype) %(objectsize)"]
ok
copy sub-mniS02/anat/sub-mniS02_T2star.nii.gz (checking amazon...) [2022-07-05 11:13:42.122409858] String to sign: "HEAD\n/SHA256E-s2495844--8dd28de851b1b71c57692175ce56223854ecf2426879631d4497f0993e2c1951.nii.gz\n\nhost:data-multi-subject---spine-generic---neuropoly.s3.ca-central-1.amazonaws.com\nx-amz-content-sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855\nx-amz-date:20220705T151342Z\n\nhost;x-amz-content-sha256;x-amz-date\ne3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"
[2022-07-05 11:13:42.122596799] Host: "data-multi-subject---spine-generic---neuropoly.s3.ca-central-1.amazonaws.com"
[2022-07-05 11:13:42.122676131] Path: "/SHA256E-s2495844--8dd28de851b1b71c57692175ce56223854ecf2426879631d4497f0993e2c1951.nii.gz"
[2022-07-05 11:13:42.122746914] Query string: ""
[2022-07-05 11:13:42.122806903] Header: [("Date","Tue, 05 Jul 2022 15:13:42 GMT"),("Authorization","AWS4-HMAC-SHA256 Credential=[redacted]/20220705/ca-central-1/s3/aws4_request,SignedHeaders=host;x-amz-content-sha256;x-amz-date,Signature=b7eff01802820b5cc79be9ff7105e2544539e25c4060fd8b1faf6a4012610eae"),("X-Amz-Content-Sha256","e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"),("X-Amz-Date","20220705T151342Z")]
[2022-07-05 11:13:42.133601369] Response status: Status {statusCode = 200, statusMessage = "OK"}
[2022-07-05 11:13:42.133695182] Response header 'x-amz-id-2': 'AN7yTA6JtTY8T78K0dtghPdGpXzH+LwRzlyUyypKxzIfSSJRc0CSh105AW5ZUVLve5RqaIpUzdw='
[2022-07-05 11:13:42.133756189] Response header 'x-amz-request-id': 'NTN9FR72EDPA96VS'
[2022-07-05 11:13:42.133803279] Response header 'Date': 'Tue, 05 Jul 2022 15:13:43 GMT'
[2022-07-05 11:13:42.133847526] Response header 'Last-Modified': 'Tue, 04 Aug 2020 20:30:25 GMT'
[2022-07-05 11:13:42.133890745] Response header 'ETag': '"974e7164a3926860a4d92509338b4d6e"'
[2022-07-05 11:13:42.133934989] Response header 'x-amz-version-id': 'null'
[2022-07-05 11:13:42.133976844] Response header 'Accept-Ranges': 'bytes'
[2022-07-05 11:13:42.134018616] Response header 'Content-Type': 'application/gzip'
[2022-07-05 11:13:42.134211144] Response header 'Server': 'AmazonS3'
[2022-07-05 11:13:42.134272033] Response header 'Content-Length': '2495844'
[2022-07-05 11:13:42.134440082] Response metadata: S3: request ID=<none>, x-amz-id-2=<none>
ok
copy sub-mniS02/anat/sub-mniS02_T2w.nii.gz (checking amazon...) [2022-07-05 11:13:42.134952894] String to sign: "HEAD\n/SHA256E-s9357242--b69103338da6ba0f4c9a5049ad49ac1c5d8115775eb78f066a2dd6e5d661b335.nii.gz\n\nhost:data-multi-subject---spine-generic---neuropoly.s3.ca-central-1.amazonaws.com\nx-amz-content-sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855\nx-amz-date:20220705T151342Z\n\nhost;x-amz-content-sha256;x-amz-date\ne3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"
[2022-07-05 11:13:42.135090601] Host: "data-multi-subject---spine-generic---neuropoly.s3.ca-central-1.amazonaws.com"
[2022-07-05 11:13:42.135155323] Path: "/SHA256E-s9357242--b69103338da6ba0f4c9a5049ad49ac1c5d8115775eb78f066a2dd6e5d661b335.nii.gz"
[2022-07-05 11:13:42.135214302] Query string: ""
[2022-07-05 11:13:42.135266113] Header: [("Date","Tue, 05 Jul 2022 15:13:42 GMT"),("Authorization","AWS4-HMAC-SHA256 Credential=[redacted]/20220705/ca-central-1/s3/aws4_request,SignedHeaders=host;x-amz-content-sha256;x-amz-date,Signature=01f76d05e75cba0b6dd31b82120c9f487eff698b6b6dc99645dc84267efe4da1"),("X-Amz-Content-Sha256","e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"),("X-Amz-Date","20220705T151342Z")]
[2022-07-05 11:13:42.149193638] Response status: Status {statusCode = 200, statusMessage = "OK"}
[2022-07-05 11:13:42.149307362] Response header 'x-amz-id-2': 'fGXOckVysHLd7Jc41zndDcCTBwUr0wM3FtQrWIz9A+rjTLR6sVg1EZqiGNPrPcvo6GdCL4ytSmE='
[2022-07-05 11:13:42.14937399] Response header 'x-amz-request-id': 'NTNAQ7Y4SP5MCGQH'
[2022-07-05 11:13:42.149428098] Response header 'Date': 'Tue, 05 Jul 2022 15:13:43 GMT'
[2022-07-05 11:13:42.149478006] Response header 'Last-Modified': 'Tue, 04 Aug 2020 20:30:32 GMT'
[2022-07-05 11:13:42.149527549] Response header 'ETag': '"b2bcd0f69932c09654f218692f5fd6b4"'
[2022-07-05 11:13:42.149580799] Response header 'x-amz-version-id': 'null'
[2022-07-05 11:13:42.149735358] Response header 'Accept-Ranges': 'bytes'
[2022-07-05 11:13:42.14980838] Response header 'Content-Type': 'application/gzip'
[2022-07-05 11:13:42.14988747] Response header 'Server': 'AmazonS3'
[2022-07-05 11:13:42.14994207] Response header 'Content-Length': '9357242'
[2022-07-05 11:13:42.150067682] Response metadata: S3: request ID=<none>, x-amz-id-2=<none>
ok
copy sub-mniS02/anat/sub-mniS02_acq-MToff_MTS.nii.gz (checking amazon...) [2022-07-05 11:13:42.150599953] String to sign: "HEAD\n/SHA256E-s1807831--f3aa693bfa5aa1fbdc6a44eb17ad2bdcf51e8aa2eee6edc27734a0edbbb5c3a6.nii.gz\n\nhost:data-multi-subject---spine-generic---neuropoly.s3.ca-central-1.amazonaws.com\nx-amz-content-sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855\nx-amz-date:20220705T151342Z\n\nhost;x-amz-content-sha256;x-amz-date\ne3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"
[2022-07-05 11:13:42.15073794] Host: "data-multi-subject---spine-generic---neuropoly.s3.ca-central-1.amazonaws.com"
[2022-07-05 11:13:42.150802105] Path: "/SHA256E-s1807831--f3aa693bfa5aa1fbdc6a44eb17ad2bdcf51e8aa2eee6edc27734a0edbbb5c3a6.nii.gz"
[2022-07-05 11:13:42.150860905] Query string: ""
[2022-07-05 11:13:42.150911767] Header: [("Date","Tue, 05 Jul 2022 15:13:42 GMT"),("Authorization","AWS4-HMAC-SHA256 Credential=[redacted]/20220705/ca-central-1/s3/aws4_request,SignedHeaders=host;x-amz-content-sha256;x-amz-date,Signature=3f0682a4efb7f83db680322f5112fe8c1149d716c834299efbe7a577ef01e392"),("X-Amz-Content-Sha256","e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"),("X-Amz-Date","20220705T151342Z")]
[2022-07-05 11:13:42.164871309] Response status: Status {statusCode = 200, statusMessage = "OK"}
[2022-07-05 11:13:42.164970374] Response header 'x-amz-id-2': 'V2+gchv7VpLUsR6N4twDwfOGih5hbRb/pfNJxQVqmr2I0dM0JIGia3Lkjf4qQGKWzblUy540H4c='
[2022-07-05 11:13:42.165038893] Response header 'x-amz-request-id': 'NTN5BRED4B21G8GW'
[2022-07-05 11:13:42.165089373] Response header 'Date': 'Tue, 05 Jul 2022 15:13:43 GMT'
[2022-07-05 11:13:42.165137241] Response header 'Last-Modified': 'Tue, 04 Aug 2020 20:30:34 GMT'
[2022-07-05 11:13:42.165183318] Response header 'ETag': '"7ffc12ee0bdfa82ef7a8ce8a74b239d5"'
[2022-07-05 11:13:42.165331464] Response header 'x-amz-version-id': 'null'
[2022-07-05 11:13:42.165419331] Response header 'Accept-Ranges': 'bytes'
[2022-07-05 11:13:42.165475051] Response header 'Content-Type': 'application/gzip'
[2022-07-05 11:13:42.165523787] Response header 'Server': 'AmazonS3'
[2022-07-05 11:13:42.165569337] Response header 'Content-Length': '1807831'
[2022-07-05 11:13:42.165685677] Response metadata: S3: request ID=<none>, x-amz-id-2=<none>
ok
copy sub-mniS02/anat/sub-mniS02_acq-MTon_MTS.nii.gz (checking amazon...) [2022-07-05 11:13:42.166166389] String to sign: "HEAD\n/SHA256E-s1766603--8baac2349786d9e54a444d551c101cdf751fd6b3d801cf2e246096a953d18a25.nii.gz\n\nhost:data-multi-subject---spine-generic---neuropoly.s3.ca-central-1.amazonaws.com\nx-amz-content-sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855\nx-amz-date:20220705T151342Z\n\nhost;x-amz-content-sha256;x-amz-date\ne3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"
[2022-07-05 11:13:42.166317053] Host: "data-multi-subject---spine-generic---neuropoly.s3.ca-central-1.amazonaws.com"
[2022-07-05 11:13:42.166379596] Path: "/SHA256E-s1766603--8baac2349786d9e54a444d551c101cdf751fd6b3d801cf2e246096a953d18a25.nii.gz"
[2022-07-05 11:13:42.166436101] Query string: ""
[2022-07-05 11:13:42.166484439] Header: [("Date","Tue, 05 Jul 2022 15:13:42 GMT"),("Authorization","AWS4-HMAC-SHA256 Credential=[redacted]/20220705/ca-central-1/s3/aws4_request,SignedHeaders=host;x-amz-content-sha256;x-amz-date,Signature=fceac384aa93fff26896c7d2067079d01783e5444ff151ed2ad32bbe21331fd4"),("X-Amz-Content-Sha256","e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"),("X-Amz-Date","20220705T151342Z")]
[2022-07-05 11:13:42.17795452] Response status: Status {statusCode = 200, statusMessage = "OK"}
[2022-07-05 11:13:42.178075298] Response header 'x-amz-id-2': 'VANGgOZcopvjSc9oLXX3fS5qppJzN0OenA5iUPCf55sqEWs6n+jyKeJumFL1tEdcsfBday5MfNk='
[2022-07-05 11:13:42.17815254] Response header 'x-amz-request-id': 'NTNF4R3XW070SR11'
[2022-07-05 11:13:42.178224162] Response header 'Date': 'Tue, 05 Jul 2022 15:13:43 GMT'
[2022-07-05 11:13:42.178274266] Response header 'Last-Modified': 'Tue, 04 Aug 2020 20:30:35 GMT'
[2022-07-05 11:13:42.178428617] Response header 'ETag': '"fc9946f1bed5613ecf96b4fbeb3bf295"'
[2022-07-05 11:13:42.178523921] Response header 'x-amz-version-id': 'null'
[2022-07-05 11:13:42.178579429] Response header 'Accept-Ranges': 'bytes'
[2022-07-05 11:13:42.178625949] Response header 'Content-Type': 'application/gzip'
[2022-07-05 11:13:42.178686117] Response header 'Server': 'AmazonS3'
[2022-07-05 11:13:42.178736588] Response header 'Content-Length': '1766603'
[2022-07-05 11:13:42.178868035] Response metadata: S3: request ID=<none>, x-amz-id-2=<none>
ok
copy sub-mniS02/anat/sub-mniS02_acq-T1w_MTS.nii.gz (checking amazon...) [2022-07-05 11:13:42.179407528] String to sign: "HEAD\n/SHA256E-s1802259--47b94ad357745fe870007c6df2e40b36a1eb12f091c1ee971d6767ae8dcb418a.nii.gz\n\nhost:data-multi-subject---spine-generic---neuropoly.s3.ca-central-1.amazonaws.com\nx-amz-content-sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855\nx-amz-date:20220705T151342Z\n\nhost;x-amz-content-sha256;x-amz-date\ne3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"
[2022-07-05 11:13:42.179553895] Host: "data-multi-subject---spine-generic---neuropoly.s3.ca-central-1.amazonaws.com"
[2022-07-05 11:13:42.179622669] Path: "/SHA256E-s1802259--47b94ad357745fe870007c6df2e40b36a1eb12f091c1ee971d6767ae8dcb418a.nii.gz"
[2022-07-05 11:13:42.179686374] Query string: ""
[2022-07-05 11:13:42.179741233] Header: [("Date","Tue, 05 Jul 2022 15:13:42 GMT"),("Authorization","AWS4-HMAC-SHA256 Credential=[redacted]/20220705/ca-central-1/s3/aws4_request,SignedHeaders=host;x-amz-content-sha256;x-amz-date,Signature=93749cf50a110d147ac2a942b96709cfe3856cae64f4e83bcbe307451f3cad6b"),("X-Amz-Content-Sha256","e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"),("X-Amz-Date","20220705T151342Z")]
[2022-07-05 11:13:42.193131313] Response status: Status {statusCode = 200, statusMessage = "OK"}
[2022-07-05 11:13:42.193260002] Response header 'x-amz-id-2': '31h1P6vTRDxBJeeufbU//upu0Djy/AVigvSL0D8X3oSG82L4PwCAn+1LWC4L2LEvsckVHtuCOXk='
[2022-07-05 11:13:42.193328018] Response header 'x-amz-request-id': 'NTN1XJ0HVXQGY1WF'
[2022-07-05 11:13:42.193381766] Response header 'Date': 'Tue, 05 Jul 2022 15:13:43 GMT'
[2022-07-05 11:13:42.193541004] Response header 'Last-Modified': 'Tue, 04 Aug 2020 20:30:37 GMT'
[2022-07-05 11:13:42.193626726] Response header 'ETag': '"d89e5c8be7b1988711839f9da5006659"'
[2022-07-05 11:13:42.193684868] Response header 'x-amz-version-id': 'null'
[2022-07-05 11:13:42.193734142] Response header 'Accept-Ranges': 'bytes'
[2022-07-05 11:13:42.193782567] Response header 'Content-Type': 'application/gzip'
[2022-07-05 11:13:42.193831289] Response header 'Server': 'AmazonS3'
[2022-07-05 11:13:42.19387878] Response header 'Content-Length': '1802259'
[2022-07-05 11:13:42.194002799] Response metadata: S3: request ID=<none>, x-amz-id-2=<none>
ok
copy sub-mniS02/dwi/sub-mniS02_dwi.nii.gz (checking amazon...) [2022-07-05 11:13:42.194530664] String to sign: "HEAD\n/SHA256E-s2721702--d7d52f197f0430a870eb772f6b2d6494c5b1e6dd4cd9ceb955ed8c6b5fd86d56.nii.gz\n\nhost:data-multi-subject---spine-generic---neuropoly.s3.ca-central-1.amazonaws.com\nx-amz-content-sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855\nx-amz-date:20220705T151342Z\n\nhost;x-amz-content-sha256;x-amz-date\ne3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"
[2022-07-05 11:13:42.194669574] Host: "data-multi-subject---spine-generic---neuropoly.s3.ca-central-1.amazonaws.com"
[2022-07-05 11:13:42.194734908] Path: "/SHA256E-s2721702--d7d52f197f0430a870eb772f6b2d6494c5b1e6dd4cd9ceb955ed8c6b5fd86d56.nii.gz"
[2022-07-05 11:13:42.194794864] Query string: ""
[2022-07-05 11:13:42.194846295] Header: [("Date","Tue, 05 Jul 2022 15:13:42 GMT"),("Authorization","AWS4-HMAC-SHA256 Credential=[redacted]/20220705/ca-central-1/s3/aws4_request,SignedHeaders=host;x-amz-content-sha256;x-amz-date,Signature=a37d8bf7e9ec29573eb9ce45686fdb3974cc2334c356dd4b5cb663b53ad2934d"),("X-Amz-Content-Sha256","e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"),("X-Amz-Date","20220705T151342Z")]
[2022-07-05 11:13:42.207204523] Response status: Status {statusCode = 200, statusMessage = "OK"}
[2022-07-05 11:13:42.207298782] Response header 'x-amz-id-2': 'oHtBSPdDH6yihZHerqxkUAcs6xIMwfMI1biSrPHG+bAf88f7VlskB9IMe45vAdmDIHDyfi4PxdI='
[2022-07-05 11:13:42.207365451] Response header 'x-amz-request-id': 'NTN94BD8WNEXCV2C'
[2022-07-05 11:13:42.207515699] Response header 'Date': 'Tue, 05 Jul 2022 15:13:43 GMT'
[2022-07-05 11:13:42.207601588] Response header 'Last-Modified': 'Tue, 04 Aug 2020 20:30:39 GMT'
[2022-07-05 11:13:42.207659815] Response header 'ETag': '"9f025d88575d0f01fa062ca595cea97f"'
[2022-07-05 11:13:42.20771026] Response header 'x-amz-version-id': 'null'
[2022-07-05 11:13:42.207759274] Response header 'Accept-Ranges': 'bytes'
[2022-07-05 11:13:42.207806926] Response header 'Content-Type': 'application/gzip'
[2022-07-05 11:13:42.20785485] Response header 'Server': 'AmazonS3'
[2022-07-05 11:13:42.207902285] Response header 'Content-Length': '2721702'
[2022-07-05 11:13:42.208021223] Response metadata: S3: request ID=<none>, x-amz-id-2=<none>
ok
copy what-sandrine.nii.gz (checking amazon...) [2022-07-05 11:13:42.208472901] String to sign: "HEAD\n/SHA256E-s20971520--f5355e4e10bfcabb5cce9080ea4f4dca56cbc4a2a5a682051770d223d8a1c245.nii.gz\n\nhost:data-multi-subject---spine-generic---neuropoly.s3.ca-central-1.amazonaws.com\nx-amz-content-sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855\nx-amz-date:20220705T151342Z\n\nhost;x-amz-content-sha256;x-amz-date\ne3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"
[2022-07-05 11:13:42.208596852] Host: "data-multi-subject---spine-generic---neuropoly.s3.ca-central-1.amazonaws.com"
[2022-07-05 11:13:42.208656619] Path: "/SHA256E-s20971520--f5355e4e10bfcabb5cce9080ea4f4dca56cbc4a2a5a682051770d223d8a1c245.nii.gz"
[2022-07-05 11:13:42.208710698] Query string: ""
[2022-07-05 11:13:42.208758845] Header: [("Date","Tue, 05 Jul 2022 15:13:42 GMT"),("Authorization","AWS4-HMAC-SHA256 Credential=[redacted]/20220705/ca-central-1/s3/aws4_request,SignedHeaders=host;x-amz-content-sha256;x-amz-date,Signature=a6d3e3c80c688f311a2c06346cb7124794a09c8bd55b08ad438c15c4da3de9fe"),("X-Amz-Content-Sha256","e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"),("X-Amz-Date","20220705T151342Z")]
[2022-07-05 11:13:42.220332299] Response status: Status {statusCode = 404, statusMessage = "Not Found"}
[2022-07-05 11:13:42.220454124] Response header 'x-amz-request-id': 'NTN5MMGTWD37WDAZ'
[2022-07-05 11:13:42.220646122] Response header 'x-amz-id-2': '1K7YSZ09GTy8zVNwW0bV7pfRa9eK2K4z/qj4Y5OoIfErzsyFuIcdvu/GbJui51033fWa9cNQ2JA='
[2022-07-05 11:13:42.220751171] Response header 'Content-Type': 'application/xml'
[2022-07-05 11:13:42.220820017] Response header 'Date': 'Tue, 05 Jul 2022 15:13:41 GMT'
[2022-07-05 11:13:42.220879263] Response header 'Server': 'AmazonS3'
[2022-07-05 11:13:42.220936929] Response metadata: S3: request ID=<none>, x-amz-id-2=<none>
(to amazon...) 
[2022-07-05 11:13:42.229962779] String to sign: "PUT\n/SHA256E-s20971520--f5355e4e10bfcabb5cce9080ea4f4dca56cbc4a2a5a682051770d223d8a1c245.nii.gz\n\ncontent-type:application/octet-stream\nhost:data-multi-subject---spine-generic---neuropoly.s3.ca-central-1.amazonaws.com\nx-amz-acl:public-read\nx-amz-content-sha256:UNSIGNED-PAYLOAD\nx-amz-date:20220705T151342Z\nx-amz-storage-class:STANDARD\n\ncontent-type;host;x-amz-acl;x-amz-content-sha256;x-amz-date;x-amz-storage-class\nUNSIGNED-PAYLOAD"
[2022-07-05 11:13:42.230144737] Host: "data-multi-subject---spine-generic---neuropoly.s3.ca-central-1.amazonaws.com"
[2022-07-05 11:13:42.230464257] Path: "/SHA256E-s20971520--f5355e4e10bfcabb5cce9080ea4f4dca56cbc4a2a5a682051770d223d8a1c245.nii.gz"
[2022-07-05 11:13:42.230572642] Query string: ""
[2022-07-05 11:13:42.230647645] Header: [("Date","Tue, 05 Jul 2022 15:13:42 GMT"),("Content-Type","application/octet-stream"),("Authorization","AWS4-HMAC-SHA256 Credential=[redacted]/20220705/ca-central-1/s3/aws4_request,SignedHeaders=content-type;host;x-amz-acl;x-amz-content-sha256;x-amz-date;x-amz-storage-class,Signature=c661c4bec358b8a516525e01fd7f7e8f97bb19174f83d33be8a454214f3c86fe"),("x-amz-acl","public-read"),("X-Amz-Content-Sha256","UNSIGNED-PAYLOAD"),("X-Amz-Date","20220705T151342Z"),("x-amz-storage-class","STANDARD")]
33%   6.62 MiB         17 MiB/s 0s[2022-07-05 11:13:42.717429111] Response status: Status {statusCode = 403, statusMessage = "Forbidden"}
[2022-07-05 11:13:42.717490433] Response header 'x-amz-request-id': 'NTNEGTAAFPRFQ025'
[2022-07-05 11:13:42.717525757] Response header 'x-amz-id-2': 'wNeEGNkrTHzZqzNNIYReePu2W2ZIeKUF47ut6nvSNv9Bd3ZSQx2qHSy1bb0DbTZKi1V9Nm8Qoio='
[2022-07-05 11:13:42.717558003] Response header 'Content-Type': 'application/xml'
[2022-07-05 11:13:42.71758656] Response header 'Transfer-Encoding': 'chunked'
[2022-07-05 11:13:42.717614501] Response header 'Date': 'Tue, 05 Jul 2022 15:13:41 GMT'
[2022-07-05 11:13:42.71764194] Response header 'Server': 'AmazonS3'
[2022-07-05 11:13:42.717669546] Response header 'Connection': 'close'
[2022-07-05 11:13:42.717916772] Response metadata: S3: request ID=NTNEGTAAFPRFQ025, x-amz-id-2=wNeEGNkrTHzZqzNNIYReePu2W2ZIeKUF47ut6nvSNv9Bd3ZSQx2qHSy1bb0DbTZKi1V9Nm8Qoio=

  S3Error {s3StatusCode = Status {statusCode = 403, statusMessage = "Forbidden"}, s3ErrorCode = "AccessDenied", s3ErrorMessage = "Access Denied", s3ErrorResource = Nothing, s3ErrorHostId = Just "wNeEGNkrTHzZqzNNIYReePu2W2ZIeKUF47ut6nvSNv9Bd3ZSQx2qHSy1bb0DbTZKi1V9Nm8Qoio=", s3ErrorAccessKeyId = Nothing, s3ErrorStringToSign = Nothing, s3ErrorBucket = Nothing, s3ErrorEndpointRaw = Nothing, s3ErrorEndpoint = Nothing}
[2022-07-05 11:13:42.718941556] process [656848] done ExitSuccess
[2022-07-05 11:13:42.719599157] process [656849] done ExitSuccess

[2022-07-05 11:13:42.722289259] String to sign: "PUT\n/SHA256E-s20971520--f5355e4e10bfcabb5cce9080ea4f4dca56cbc4a2a5a682051770d223d8a1c245.nii.gz\n\ncontent-type:application/octet-stream\nhost:data-multi-subject---spine-generic---neuropoly.s3.ca-central-1.amazonaws.com\nx-amz-acl:public-read\nx-amz-content-sha256:UNSIGNED-PAYLOAD\nx-amz-date:20220705T151342Z\nx-amz-storage-class:STANDARD\n\ncontent-type;host;x-amz-acl;x-amz-content-sha256;x-amz-date;x-amz-storage-class\nUNSIGNED-PAYLOAD"
[2022-07-05 11:13:42.722482957] Host: "data-multi-subject---spine-generic---neuropoly.s3.ca-central-1.amazonaws.com"
[2022-07-05 11:13:42.722549082] Path: "/SHA256E-s20971520--f5355e4e10bfcabb5cce9080ea4f4dca56cbc4a2a5a682051770d223d8a1c245.nii.gz"
[2022-07-05 11:13:42.722588527] Query string: ""
[2022-07-05 11:13:42.722623643] Header: [("Date","Tue, 05 Jul 2022 15:13:42 GMT"),("Content-Type","application/octet-stream"),("Authorization","AWS4-HMAC-SHA256 Credential=[redacted]/20220705/ca-central-1/s3/aws4_request,SignedHeaders=content-type;host;x-amz-acl;x-amz-content-sha256;x-amz-date;x-amz-storage-class,Signature=c661c4bec358b8a516525e01fd7f7e8f97bb19174f83d33be8a454214f3c86fe"),("x-amz-acl","public-read"),("X-Amz-Content-Sha256","UNSIGNED-PAYLOAD"),("X-Amz-Date","20220705T151342Z"),("x-amz-storage-class","STANDARD")]
38%   7.53 MiB         18 MiB/s 0s[2022-07-05 11:13:43.233120929] Response status: Status {statusCode = 403, statusMessage = "Forbidden"}
[2022-07-05 11:13:43.233181684] Response header 'x-amz-request-id': 'RY8Y3AXWXDGZQBWC'
[2022-07-05 11:13:43.233216502] Response header 'x-amz-id-2': 'OExrHc36M7pnlrIeg0jZPMRQQRmHkoahG5soyENxDMLEjpsxsfeBpYIP+L4qRJfro3SPiu8oiy0='
[2022-07-05 11:13:43.233248713] Response header 'Content-Type': 'application/xml'
[2022-07-05 11:13:43.233277218] Response header 'Transfer-Encoding': 'chunked'
[2022-07-05 11:13:43.23330473] Response header 'Date': 'Tue, 05 Jul 2022 15:13:42 GMT'
[2022-07-05 11:13:43.233332345] Response header 'Server': 'AmazonS3'
[2022-07-05 11:13:43.233358926] Response header 'Connection': 'close'
[2022-07-05 11:13:43.233532047] Response metadata: S3: request ID=RY8Y3AXWXDGZQBWC, x-amz-id-2=OExrHc36M7pnlrIeg0jZPMRQQRmHkoahG5soyENxDMLEjpsxsfeBpYIP+L4qRJfro3SPiu8oiy0=

  S3Error {s3StatusCode = Status {statusCode = 403, statusMessage = "Forbidden"}, s3ErrorCode = "AccessDenied", s3ErrorMessage = "Access Denied", s3ErrorResource = Nothing, s3ErrorHostId = Just "OExrHc36M7pnlrIeg0jZPMRQQRmHkoahG5soyENxDMLEjpsxsfeBpYIP+L4qRJfro3SPiu8oiy0=", s3ErrorAccessKeyId = Nothing, s3ErrorStringToSign = Nothing, s3ErrorBucket = Nothing, s3ErrorEndpointRaw = Nothing, s3ErrorEndpoint = Nothing}

[2022-07-05 11:13:43.236748221] String to sign: "PUT\n/SHA256E-s20971520--f5355e4e10bfcabb5cce9080ea4f4dca56cbc4a2a5a682051770d223d8a1c245.nii.gz\n\ncontent-type:application/octet-stream\nhost:data-multi-subject---spine-generic---neuropoly.s3.ca-central-1.amazonaws.com\nx-amz-acl:public-read\nx-amz-content-sha256:UNSIGNED-PAYLOAD\nx-amz-date:20220705T151343Z\nx-amz-storage-class:STANDARD\n\ncontent-type;host;x-amz-acl;x-amz-content-sha256;x-amz-date;x-amz-storage-class\nUNSIGNED-PAYLOAD"
[2022-07-05 11:13:43.236994218] Host: "data-multi-subject---spine-generic---neuropoly.s3.ca-central-1.amazonaws.com"
[2022-07-05 11:13:43.237080739] Path: "/SHA256E-s20971520--f5355e4e10bfcabb5cce9080ea4f4dca56cbc4a2a5a682051770d223d8a1c245.nii.gz"
[2022-07-05 11:13:43.237149355] Query string: ""
[2022-07-05 11:13:43.237206116] Header: [("Date","Tue, 05 Jul 2022 15:13:43 GMT"),("Content-Type","application/octet-stream"),("Authorization","AWS4-HMAC-SHA256 Credential=[redacted]/20220705/ca-central-1/s3/aws4_request,SignedHeaders=content-type;host;x-amz-acl;x-amz-content-sha256;x-amz-date;x-amz-storage-class,Signature=485b905a3f42a4c4e00312c0eba8d4cc61439afa1084869699a7bcc88471dd9e"),("x-amz-acl","public-read"),("X-Amz-Content-Sha256","UNSIGNED-PAYLOAD"),("X-Amz-Date","20220705T151343Z"),("x-amz-storage-class","STANDARD")]
39%   7.71 MiB         20 MiB/s 0s[2022-07-05 11:13:43.731710076] Response status: Status {statusCode = 403, statusMessage = "Forbidden"}
[2022-07-05 11:13:43.731794905] Response header 'x-amz-request-id': 'RY8TX7Z5W295404W'
[2022-07-05 11:13:43.731845569] Response header 'x-amz-id-2': '7ONyyzYaQPYK/jcoilDA/bR9xXbDIUTIx8rgA8DXG0zMerLmfjqK+jJGngAYsw5EEm3W1PcZvJA='
[2022-07-05 11:13:43.731891864] Response header 'Content-Type': 'application/xml'
[2022-07-05 11:13:43.731933404] Response header 'Transfer-Encoding': 'chunked'
[2022-07-05 11:13:43.731972602] Response header 'Date': 'Tue, 05 Jul 2022 15:13:43 GMT'
[2022-07-05 11:13:43.732013472] Response header 'Server': 'AmazonS3'
[2022-07-05 11:13:43.732052264] Response header 'Connection': 'close'
[2022-07-05 11:13:43.732287949] Response metadata: S3: request ID=RY8TX7Z5W295404W, x-amz-id-2=7ONyyzYaQPYK/jcoilDA/bR9xXbDIUTIx8rgA8DXG0zMerLmfjqK+jJGngAYsw5EEm3W1PcZvJA=

  S3Error {s3StatusCode = Status {statusCode = 403, statusMessage = "Forbidden"}, s3ErrorCode = "AccessDenied", s3ErrorMessage = "Access Denied", s3ErrorResource = Nothing, s3ErrorHostId = Just "7ONyyzYaQPYK/jcoilDA/bR9xXbDIUTIx8rgA8DXG0zMerLmfjqK+jJGngAYsw5EEm3W1PcZvJA=", s3ErrorAccessKeyId = Nothing, s3ErrorStringToSign = Nothing, s3ErrorBucket = Nothing, s3ErrorEndpointRaw = Nothing, s3ErrorEndpoint = Nothing}
failed
copy what.nii.gz (checking amazon...) [2022-07-05 11:13:43.733662094] String to sign: "HEAD\n/SHA256E-s20971520--6f0f7f9893cc8c8d01993abf368e59fb4678f2308b756f6f18734862404efb16.nii.gz\n\nhost:data-multi-subject---spine-generic---neuropoly.s3.ca-central-1.amazonaws.com\nx-amz-content-sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855\nx-amz-date:20220705T151343Z\n\nhost;x-amz-content-sha256;x-amz-date\ne3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"
[2022-07-05 11:13:43.733813456] Host: "data-multi-subject---spine-generic---neuropoly.s3.ca-central-1.amazonaws.com"
[2022-07-05 11:13:43.733871987] Path: "/SHA256E-s20971520--6f0f7f9893cc8c8d01993abf368e59fb4678f2308b756f6f18734862404efb16.nii.gz"
[2022-07-05 11:13:43.733921881] Query string: ""
[2022-07-05 11:13:43.73396447] Header: [("Date","Tue, 05 Jul 2022 15:13:43 GMT"),("Authorization","AWS4-HMAC-SHA256 Credential=[redacted]/20220705/ca-central-1/s3/aws4_request,SignedHeaders=host;x-amz-content-sha256;x-amz-date,Signature=c52770d6894d91e5845462411c9463fb4fd00a92b96418e5354318aaf4897386"),("X-Amz-Content-Sha256","e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"),("X-Amz-Date","20220705T151343Z")]
[2022-07-05 11:13:43.773743356] Response status: Status {statusCode = 200, statusMessage = "OK"}
[2022-07-05 11:13:43.773866198] Response header 'x-amz-id-2': 'r9ao/8WgfQ+3NiZeWnA5gjWvFnT5pkL0BMwpfxESUnYiSAFWCjQ6OotrnBY6LOqhpzIK/7OP+QA='
[2022-07-05 11:13:43.773942561] Response header 'x-amz-request-id': 'RY8QM0N1WGRVMBT6'
[2022-07-05 11:13:43.774005463] Response header 'Date': 'Tue, 05 Jul 2022 15:13:44 GMT'
[2022-07-05 11:13:43.774064156] Response header 'Last-Modified': 'Tue, 05 Jul 2022 14:59:04 GMT'
[2022-07-05 11:13:43.774121828] Response header 'ETag': '"e32ad309a93cf895004ff012ab4f79dc"'
[2022-07-05 11:13:43.774180241] Response header 'x-amz-version-id': 'yYHon4B_J9_YdscnxjO.MBEnDZ8Jnm9l'
[2022-07-05 11:13:43.77428057] Response header 'Accept-Ranges': 'bytes'
[2022-07-05 11:13:43.774339206] Response header 'Content-Type': 'application/octet-stream'
[2022-07-05 11:13:43.774525961] Response header 'Server': 'AmazonS3'
[2022-07-05 11:13:43.774622839] Response header 'Content-Length': '20971520'
[2022-07-05 11:13:43.774789357] Response metadata: S3: request ID=<none>, x-amz-id-2=<none>
ok
[2022-07-05 11:13:43.775311482] process [656846] done ExitSuccess
[2022-07-05 11:13:43.775447067] process [656845] done ExitSuccess
[2022-07-05 11:13:43.775547275] process [656844] done ExitSuccess
[2022-07-05 11:13:43.775630298] process [656843] done ExitSuccess
git-annex: copy: 1 failed

which tells me that the failure is in the

PUT https://data-multi-subject---spine-generic---neuropoly.s3.ca-central-1.amazonaws.com/SHA256E-s20971520--f5355e4e10bfcabb5cce9080ea4f4dca56cbc4a2a5a682051770d223d8a1c245.nii.gz

request. Which doesn't really tell me anything new, except that it rules out git-annex doing some weird extra step it needs extra permissions for in the background.

@kousu
Copy link
Contributor

kousu commented Jul 5, 2022

awscli

Our internal servers have awscli installed. I read aws s3 help to remind me how to use it and, with the sandrine2 credentials loaded, tested deleting the first test file uploaded using my kousu2 credentials: it worked:

p115628@joplin:~/datasets/data-multi-subject$ aws s3 ls s3://data-multi-subject---spine-generic---neuropoly/SHA256E-s20971520--6f0f7f9893cc8c8d01993abf368e59fb4678f2308b756f6f18734862404efb16.nii.gz
2022-07-05 10:59:04   20971520 SHA256E-s20971520--6f0f7f9893cc8c8d01993abf368e59fb4678f2308b756f6f18734862404efb16.nii.gz
p115628@joplin:~/datasets/data-multi-subject$ aws s3 rm s3://data-multi-subject---spine-generic---neuropoly/SHA256E-s20971520--6f0f7f9893cc8c8d01993abf368e59fb4678f2308b756f6f18734862404efb16.nii.gz
delete: s3://data-multi-subject---spine-generic---neuropoly/SHA256E-s20971520--6f0f7f9893cc8c8d01993abf368e59fb4678f2308b756f6f18734862404efb16.nii.gz
p115628@joplin:~/datasets/data-multi-subject$ aws s3 ls s3://data-multi-subject---spine-generic---neuropoly/SHA256E-s20971520--6f0f7f9893cc8c8d01993abf368e59fb4678f2308b756f6f18734862404efb16.nii.gz
p115628@joplin:~/datasets/data-multi-subject$ # notice how there was no output this time

(and also this link is now broken: https://data-multi-subject---spine-generic---neuropoly.s3.ca-central-1.amazonaws.com/SHA256E-s20971520--6f0f7f9893cc8c8d01993abf368e59fb4678f2308b756f6f18734862404efb16.nii.gz)

That file really is the file that was originally uploaded using the kousu2 credentials:

p115628@joplin:~/datasets/data-multi-subject$ git show HEAD:what.nii.gz
/annex/objects/SHA256E-s20971520--6f0f7f9893cc8c8d01993abf368e59fb4678f2308b756f6f18734862404efb16.nii.gz

And I'm not allowed to upload it again:

p115628@joplin:~/datasets/data-multi-subject$ git annex copy --to amazon what.nii.gz 
copy what.nii.gz (checking amazon...) (to amazon...) 
38%   7.59 MiB         21 MiB/s 0s
  S3Error {s3StatusCode = Status {statusCode = 403, statusMessage = "Forbidden"}, s3ErrorCode = "AccessDenied", s3ErrorMessage = "Access Denied", s3ErrorResource = Nothing, s3ErrorHostId = Just "DybLLAJtTV3vN9NpMyoL0wl+notAPQgiG7g4sQOH0K4eagaooC9gTUSe02t3ChN8DZTG9hcbOuY=", s3ErrorAccessKeyId = Nothing, s3ErrorStringToSign = Nothing, s3ErrorBucket = Nothing, s3ErrorEndpointRaw = Nothing, s3ErrorEndpoint = Nothing}

37%   7.43 MiB         17 MiB/s 0s 
  S3Error {s3StatusCode = Status {statusCode = 403, statusMessage = "Forbidden"}, s3ErrorCode = "AccessDenied", s3ErrorMessage = "Access Denied", s3ErrorResource = Nothing, s3ErrorHostId = Just "9k+Ztf9KR6z9pzSFZ8fECaertPwV+hfj539e8eqKwCLfCpfKm5Pf9G0C3e1XuoeJRNi/ueIyLYA=", s3ErrorAccessKeyId = Nothing, s3ErrorStringToSign = Nothing, s3ErrorBucket = Nothing, s3ErrorEndpointRaw = Nothing, s3ErrorEndpoint = Nothing}

39%   7.87 MiB         21 MiB/s 0s 
  S3Error {s3StatusCode = Status {statusCode = 403, statusMessage = "Forbidden"}, s3ErrorCode = "AccessDenied", s3ErrorMessage = "Access Denied", s3ErrorResource = Nothing, s3ErrorHostId = Just "a0TmDgiEMYvtFDzi4LpzJSRzS5ZeYAiaPmowcfFc+L5cYhqa3Utu4RZ7OV+nEhfR8Q9Bj9qUeN8=", s3ErrorAccessKeyId = Nothing, s3ErrorStringToSign = Nothing, s3ErrorBucket = Nothing, s3ErrorEndpointRaw = Nothing, s3ErrorEndpoint = Nothing}
failed
git-annex: copy: 1 failed

so that's weird. It seems Sandrine's permissions can delete but not create files. That makes me suspect something is wrong with the policy afterall. I'll keep digging.

If I switch over to using the kousu2 credentials, then I can upload the file again

p115628@joplin:~/datasets/data-multi-subject$ export AWS_ACCESS_KEY_ID="[redacted-kousu2-id]" AWS_SECRET_ACCESS_KEY="..."
p115628@joplin:~/datasets/data-multi-subject$ git annex copy --to amazon what.nii.gz 
copy what.nii.gz (checking amazon...) (to amazon...) 
ok                                

and https://data-multi-subject---spine-generic---neuropoly.s3.ca-central-1.amazonaws.com/SHA256E-s20971520--6f0f7f9893cc8c8d01993abf368e59fb4678f2308b756f6f18734862404efb16.nii.gz works again.

Here, I can do it again, as kousu2, I can delete and reupload it multiple times:

p115628@joplin:~/datasets/data-multi-subject$ export AWS_ACCESS_KEY_ID="[redacted-kousu2-id]" AWS_SECRET_ACCESS_KEY="..."
p115628@joplin:~/datasets/data-multi-subject$ aws s3 rm s3://data-multi-subject---spine-generic---neuropoly/SHA256E-s20971520--6f0f7f9893cc8c8d01993abf368e59fb4678f2308b756f6f18734862404efb16.nii.gz
delete: s3://data-multi-subject---spine-generic---neuropoly/SHA256E-s20971520--6f0f7f9893cc8c8d01993abf368e59fb4678f2308b756f6f18734862404efb16.nii.gz
p115628@joplin:~/datasets/data-multi-subject$ git annex copy --to amazon what.nii.gz 
copy what.nii.gz (checking amazon...) (to amazon...) 
ok                                
p115628@joplin:~/datasets/data-multi-subject$ aws s3 rm s3://data-multi-subject---spine-generic---neuropoly/SHA256E-s20971520--6f0f7f9893cc8c8d01993abf368e59fb4678f2308b756f6f18734862404efb16.nii.gz
delete: s3://data-multi-subject---spine-generic---neuropoly/SHA256E-s20971520--6f0f7f9893cc8c8d01993abf368e59fb4678f2308b756f6f18734862404efb16.nii.gz
p115628@joplin:~/datasets/data-multi-subject$ git annex copy --to amazon what.nii.gz 
copy what.nii.gz (checking amazon...) (to amazon...) 
ok                                
p115628@joplin:~/datasets/data-multi-subject$ git annex copy --to amazon what.nii.gz 
copy what.nii.gz (checking amazon...) ok

but as sandrine2 I can delete but not reupload:

p115628@joplin:~/datasets/data-multi-subject$ export AWS_ACCESS_KEY_ID="[redacted-sandrine2-id]" AWS_SECRET_ACCESS_KEY="..."
p115628@joplin:~/datasets/data-multi-subject$ aws s3 rm s3://data-multi-subject---spine-generic---neuropoly/SHA256E-s20971520--6f0f7f9893cc8c8d01993abf368e59fb4678f2308b756f6f18734862404efb16.nii.gz
delete: s3://data-multi-subject---spine-generic---neuropoly/SHA256E-s20971520--6f0f7f9893cc8c8d01993abf368e59fb4678f2308b756f6f18734862404efb16.nii.gz
p115628@joplin:~/datasets/data-multi-subject$ git annex copy --to amazon what.nii.gz 
copy what.nii.gz (checking amazon...) (to amazon...) 
37%   7.31 MiB         19 MiB/s 0s
  S3Error {s3StatusCode = Status {statusCode = 403, statusMessage = "Forbidden"}, s3ErrorCode = "AccessDenied", s3ErrorMessage = "Access Denied", s3ErrorResource = Nothing, s3ErrorHostId = Just "/lumtwDweE87UQ9jAzto/Fy/O1JPgLUFd5Nq43elSr/Uc0GHzBv4MMLay9n7Nj/M3jDTTMi7tQE=", s3ErrorAccessKeyId = Nothing, s3ErrorStringToSign = Nothing, s3ErrorBucket = Nothing, s3ErrorEndpointRaw = Nothing, s3ErrorEndpoint = Nothing}

38%   7.53 MiB         21 MiB/s 0s 
  S3Error {s3StatusCode = Status {statusCode = 403, statusMessage = "Forbidden"}, s3ErrorCode = "AccessDenied", s3ErrorMessage = "Access Denied", s3ErrorResource = Nothing, s3ErrorHostId = Just "bZK5JpkBQ8LTw4msj0uBILaPdXGFyETakYl5RwlXU2hGXlEuUfcYhbMZdZOThFypSzznPc82G3A=", s3ErrorAccessKeyId = Nothing, s3ErrorStringToSign = Nothing, s3ErrorBucket = Nothing, s3ErrorEndpointRaw = Nothing, s3ErrorEndpoint = Nothing}

41%   8.15 MiB         21 MiB/s 0s 
  S3Error {s3StatusCode = Status {statusCode = 403, statusMessage = "Forbidden"}, s3ErrorCode = "AccessDenied", s3ErrorMessage = "Access Denied", s3ErrorResource = Nothing, s3ErrorHostId = Just "oKNfJJRgazd7+P7u23FG/KKMrLFR2EgK37VrR/EEXmeQFKvOnPflZ7+CHRvJuff23+qAVcdB4oU=", s3ErrorAccessKeyId = Nothing, s3ErrorStringToSign = Nothing, s3ErrorBucket = Nothing, s3ErrorEndpointRaw = Nothing, s3ErrorEndpoint = Nothing}

40%   7.96 MiB         20 MiB/s 0s 
  S3Error {s3StatusCode = Status {statusCode = 403, statusMessage = "Forbidden"}, s3ErrorCode = "AccessDenied", s3ErrorMessage = "Access Denied", s3ErrorResource = Nothing, s3ErrorHostId = Just "IB4nvjm3k+ewa6xh89F/QhZ2dbmotDUHLnOgi/wjGS4MPdmzNgdXWtGE1mje15azDJKx3vkUWY4=", s3ErrorAccessKeyId = Nothing, s3ErrorStringToSign = Nothing, s3ErrorBucket = Nothing, s3ErrorEndpointRaw = Nothing, s3ErrorEndpoint = Nothing}
failed
git-annex: copy: 1 failed

@kousu
Copy link
Contributor

kousu commented Jul 5, 2022

Can I just upload anything, without git-annex?

p115628@joplin:~/datasets/data-multi-subject$ export AWS_ACCESS_KEY_ID="[redacted-sandrine2-id]" AWS_SECRET_ACCESS_KEY="[redacted]"
p115628@joplin:~/datasets/data-multi-subject$ dd if=/dev/urandom of=something.nii.gz count=20 bs=1M
20+0 records in
20+0 records out
20971520 bytes (21 MB, 20 MiB) copied, 0.145976 s, 144 MB/s
p115628@joplin:~/datasets/data-multi-subject$ aws s3 cp something.nii.gz s3://data-multi-subject---spine-generic---neuropoly/something-sandrine2.nii.gz
upload: ./something.nii.gz to s3://data-multi-subject---spine-generic---neuropoly/something-sandrine2.nii.gz

Ah! Yes I can. Sort of. The link doesn't work, it gives 403 (not 404, 403!, so the data is there it's just locked):

$ curl -v https://data-multi-subject---spine-generic---neuropoly.s3.ca-central-1.amazonaws.com/something-sandrine2.nii.gz
*   Trying 52.95.146.32...
* TCP_NODELAY set
* Connected to data-multi-subject---spine-generic---neuropoly.s3.ca-central-1.amazonaws.com (52.95.146.32) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/ssl/certs/ca-certificates.crt
  CApath: /etc/ssl/certs
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.2 (IN), TLS handshake, Certificate (11):
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
* TLSv1.2 (IN), TLS handshake, Server finished (14):
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
* TLSv1.2 (OUT), TLS change cipher, Client hello (1):
* TLSv1.2 (OUT), TLS handshake, Finished (20):
* TLSv1.2 (IN), TLS handshake, Finished (20):
* SSL connection using TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256
* ALPN, server did not agree to a protocol
* Server certificate:
*  subject: CN=*.s3.ca-central-1.amazonaws.com
*  start date: Dec 17 00:00:00 2021 GMT
*  expire date: Nov 24 23:59:59 2022 GMT
*  subjectAltName: host "data-multi-subject---spine-generic---neuropoly.s3.ca-central-1.amazonaws.com" matched cert's "*.s3.ca-central-1.amazonaws.com"
*  issuer: C=US; O=Amazon; OU=Server CA 1B; CN=Amazon
*  SSL certificate verify ok.
> GET /something-sandrine2.nii.gz HTTP/1.1
> Host: data-multi-subject---spine-generic---neuropoly.s3.ca-central-1.amazonaws.com
> User-Agent: curl/7.58.0
> Accept: */*
> 
< HTTP/1.1 403 Forbidden
< x-amz-request-id: 28D0P7QM9HF5HXTW
< x-amz-id-2: YBYVU++EYYmk9Te3YKz01LplxsH+PfWhDhiVEcYS9oEnoiKMDtScyllIXjNx/B1g6w9edBok5v0=
< Content-Type: application/xml
< Transfer-Encoding: chunked
< Date: Tue, 05 Jul 2022 15:47:05 GMT
< Server: AmazonS3
< 
<?xml version="1.0" encoding="UTF-8"?>
* Connection #0 to host data-multi-subject---spine-generic---neuropoly.s3.ca-central-1.amazonaws.com left intact
<Error><Code>AccessDenied</Code><Message>Access Denied</Message><RequestId>28D0P7QM9HF5HXTW</RequestId><HostId>YBYVU++EYYmk9Te3YKz01LplxsH+PfWhDhiVEcYS9oEnoiKMDtScyllIXjNx/B1g6w9edBok5v0=</HostId></Error>

This reminds me of a problem long ago. Because we're using the ca-central-1 datacenter some assumptions that git-annex makes by default are no bueno. That case was solved by adding the config option signature=v4 to the remote. But this isn't that problem because

  1. signature=v4 is still set:

    p115628@joplin:~/datasets/data-multi-subject$ git show git-annex:remote.log
    5a5447a8-a9b8-49bc-8276-01a62632b502 autoenable=true bucket=data-multi-subject---spine-generic---neuropoly datacenter=ca-central-1 encryption=none host=s3.ca-central-1.amazonaws.com name=amazon port=443 public=yes publicurl=https://data-multi-subject---spine-generic---neuropoly.s3.ca-central-1.amazonaws.com signature=v4 storageclass=STANDARD type=S3 timestamp=1646250479.065173s
    
  2. It still works when I use kousu2.

so maybe that's a red-herring.

I wonder if the problem is that Sandrine lacks permissions to mark data as public; this wouldn't show up with the other dataset backups because those aren't trying to make public data, I only ever access them using Access Keys set up with permissions specifically for the backups.

Indeed, in the trace above you can set git-annex trying to set

x-amz-acl:public-read

which corresponds to public=yes in the annex-remote-s3 settings:

p115628@joplin:~/datasets/data-multi-subject$ git show git-annex:remote.log
5a5447a8-a9b8-49bc-8276-01a62632b502 autoenable=true bucket=data-multi-subject---spine-generic---neuropoly datacenter=ca-central-1 encryption=none host=s3.ca-central-1.amazonaws.com name=amazon port=443 public=yes publicurl=https://data-multi-subject---spine-generic---neuropoly.s3.ca-central-1.amazonaws.com signature=v4 storageclass=STANDARD type=S3 timestamp=1646250479.065173s

Unfortunately it's not obvious what permissions we're lacking, then, because the 'admin' policy is simply:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "*",
            "Resource": "*"
        }
    ]
}

But there we go, I can reproduce it with awscli by adding --acl public-read:

p115628@joplin:~/datasets/data-multi-subject$ aws s3 rm s3://data-multi-subject---spine-generic---neuropoly/something-sandrine2.nii.gz
delete: s3://data-multi-subject---spine-generic---neuropoly/something-sandrine2.nii.gz
p115628@joplin:~/datasets/data-multi-subject$ aws s3 cp something.nii.gz s3://data-multi-subject---spine-generic---neuropoly/something-sandrine2.nii.gz --acl public-read
upload failed: ./something.nii.gz to s3://data-multi-subject---spine-generic---neuropoly/something-sandrine2.nii.gz An error occurred (AccessDenied) when calling the CreateMultipartUpload operation: Access Denied

Okay so now I should be able to google it, and after about 10 minutes of search I found this SO post which seems to be pretty close, though not an exact match for this problem. Their problem was they had Block Public Access enabled, which we don't.

We currently have

Screenshot 2022-07-05 at 12-18-49 data-multi-subject---spine-generic---neuropoly - S3 bucket

and

Screenshot 2022-07-05 at 12-17-32 data-multi-subject---spine-generic---neuropoly - S3 bucket

but I wonder if we should set

Screenshot 2022-07-05 at 12-17-41 data-multi-subject---spine-generic---neuropoly - S3 bucket

instead; I think probably then we need to write a Policy attached to the Bucket instead of to the Users, though, which I would need to google around to learn how to do.

@jcohenadad
Copy link
Member

@kousu i'm sure you've looked into this and sorry if this is a trivial comment, but could it be related to a git-annex version issue?

@kousu
Copy link
Contributor

kousu commented Jul 5, 2022

I've isolated it to an S3 policy issue.

@kousu
Copy link
Contributor

kousu commented Jul 5, 2022

If I try to turn off ACLs it tells me I need to delete them first:

Screenshot 2022-07-05 at 12-22-44 data-multi-subject---spine-generic---neuropoly - S3 bucket

But that seems way better overall. We shouldn't have "sandrine uploaded this file" and "alex uploaded that file" lurking in the background. So I'm going to figure out how to do that, and maybe that will just clear up the original problem in the process. If not, it'll make it simpler to solve.

@kousu
Copy link
Contributor

kousu commented Jul 5, 2022

Going to try to follow https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-ownership-migrating-acls-prerequisites.html

Those docs are terrible, but I fumbled by way through creating this bucket policy:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": "*",
            "Action": "s3:GetObject",
            "Resource": [
                "arn:aws:s3:::data-multi-subject---spine-generic---neuropoly",
                "arn:aws:s3:::data-multi-subject---spine-generic---neuropoly/*"
            ]
        }
    ]
}

and then it turned out I just had to uncheck the "Everyone" ACL:

Screenshot 2022-07-05 at 13-55-22 data-multi-subject---spine-generic---neuropoly - S3 bucket

Then I was able to turn on "BucketOwnerEnforced" which disables ACLs:

Screenshot 2022-07-05 at 13-56-23 data-multi-subject---spine-generic---neuropoly - S3 bucket

Now let's see if that made things better for Sandrine:

p115628@joplin:~/datasets/data-multi-subject$ aws s3 cp something.nii.gz s3://data-multi-subject---spine-generic---neuropoly/something-sandrine2.nii.gz --acl public-read
upload failed: ./something.nii.gz to s3://data-multi-subject---spine-generic---neuropoly/something-sandrine2.nii.gz An error occurred (AccessControlListNotSupported) when calling the CreateMultipartUpload operation: The bucket does not allow ACLs

Okay so no more ACLs, as expected.

p115628@joplin:~/datasets/data-multi-subject$ aws s3 cp something.nii.gz s3://data-multi-subject---spine-generic---neuropoly/something-sandrine2.nii.gz
upload: ./something.nii.gz to s3://data-multi-subject---spine-generic---neuropoly/something-sandrine2.nii.gz

The upload worked:

p115628@joplin:~/datasets/data-multi-subject$ curl -I https://data-multi-subject---spine-generic---neuropoly.s3.ca-central-1.amazonaws.com/something-sandrine2.nii.gz
HTTP/1.1 200 OK
x-amz-id-2: JQKwoTl1/e6bl6b9xigg3K8EmyjZoP41H0109J8uo6el5pjkQk6QqMIz30laJpycw7ui5+O2ZGA=
x-amz-request-id: P34SYB5KXX3X68FK
Date: Tue, 05 Jul 2022 17:58:50 GMT
Last-Modified: Tue, 05 Jul 2022 17:57:19 GMT
ETag: "7c773e6269f9634287312812acc4e127-3"
x-amz-version-id: CAh5hh4dNhlaLJ2T9Lt_AyLBYn1hZsmw
Accept-Ranges: bytes
Content-Type: binary/octet-stream
Server: AmazonS3
Content-Length: 20971520

This link also works for me in my browser.

Does it work with git-annex though?

No, not yet:

p115628@joplin:~/datasets/data-multi-subject$ git annex copy --to amazon what.nii.gz 
copy what.nii.gz (checking amazon...) (to amazon...) 
41%   8.15 MiB         20 MiB/s 0s
  S3Error {s3StatusCode = Status {statusCode = 400, statusMessage = "Bad Request"}, s3ErrorCode = "AccessControlListNotSupported", s3ErrorMessage = "The bucket does not allow ACLs", s3ErrorResource = Nothing, s3ErrorHostId = Just "a6+ieujj4z3Z4P8ooA306DdbGAoxWDiXd6O2ZwjdfapGnuOGPyL5/WQ4UBEytR80FG+5b6xdlsM=", s3ErrorAccessKeyId = Nothing, s3ErrorStringToSign = Nothing, s3ErrorBucket = Nothing, s3ErrorEndpointRaw = Nothing, s3ErrorEndpoint = Nothing}

32%   6.43 MiB         16 MiB/s 0s 
  S3Error {s3StatusCode = Status {statusCode = 400, statusMessage = "Bad Request"}, s3ErrorCode = "AccessControlListNotSupported", s3ErrorMessage = "The bucket does not allow ACLs", s3ErrorResource = Nothing, s3ErrorHostId = Just "bFOgMomROCOes9yI6HZHysQGoZaTbsPI5b7rHjcTI0wA8Yx5Dm1JOky9BvXvpcXxzY1kVt48FRQ=", s3ErrorAccessKeyId = Nothing, s3ErrorStringToSign = Nothing, s3ErrorBucket = Nothing, s3ErrorEndpointRaw = Nothing, s3ErrorEndpoint = Nothing}

37%   7.37 MiB         21 MiB/s 0s 
  S3Error {s3StatusCode = Status {statusCode = 400, statusMessage = "Bad Request"}, s3ErrorCode = "AccessControlListNotSupported", s3ErrorMessage = "The bucket does not allow ACLs", s3ErrorResource = Nothing, s3ErrorHostId = Just "hqd4HRNk5yp3tKJ6yMhcECEpCjBw8qB6oTpKF3PaOsYFeVG0C+dGI06xq3zgmvnPoFUttI040sY=", s3ErrorAccessKeyId = Nothing, s3ErrorStringToSign = Nothing, s3ErrorBucket = Nothing, s3ErrorEndpointRaw = Nothing, s3ErrorEndpoint = Nothing}

39%   7.81 MiB         21 MiB/s 0s 
  S3Error {s3StatusCode = Status {statusCode = 400, statusMessage = "Bad Request"}, s3ErrorCode = "AccessControlListNotSupported", s3ErrorMessage = "The bucket does not allow ACLs", s3ErrorResource = Nothing, s3ErrorHostId = Just "7m7wwG5woSPmICIuXr9QnBOEjUikuyzHSebMLuaNyZMc2Ki2vaqKpU9U+GOTYmR/NzFjOeyxngk=", s3ErrorAccessKeyId = Nothing, s3ErrorStringToSign = Nothing, s3ErrorBucket = Nothing, s3ErrorEndpointRaw = Nothing, s3ErrorEndpoint = Nothing}
failed
git-annex: copy: 1 failed

So, a new error. public=yes is forcing git-annex to use ACLs, even if they're not enabled on the bucket. But I think I can get around this:

p115628@joplin:~/datasets/data-multi-subject$ git annex enableremote amazon public=no^C
p115628@joplin:~/datasets/data-multi-subject$ git log -p git-annex
commit bcc2b437a99ec4a3daa00eef59e98a7af02077b5 (git-annex)
Author: Nick Guenther <[email protected]>
Date:   Tue Jul 5 14:01:15 2022 -0400

    update

diff --git a/remote.log b/remote.log
index 5ab1049d..a7231416 100644
--- a/remote.log
+++ b/remote.log
@@ -1 +1 @@
-5a5447a8-a9b8-49bc-8276-01a62632b502 autoenable=true bucket=data-multi-subject---spine-generic---neuropoly datacenter=ca-central-1 encryption=none host=s3.ca-central-1.amazonaws.com name=amazon port=443 public=yes publicurl=https://data-multi-subject---spine-generic---neuropoly.s3.ca-central-1.amazonaws.com signature=v4 storageclass=STANDARD type=S3 timestamp=1646250479.065173s
+5a5447a8-a9b8-49bc-8276-01a62632b502 autoenable=true bucket=data-multi-subject---spine-generic---neuropoly datacenter=ca-central-1 encryption=none host=s3.ca-central-1.amazonaws.com name=amazon port=443 public=no publicurl=https://data-multi-subject---spine-generic---neuropoly.s3.ca-central-1.amazonaws.com signature=v4 storageclass=STANDARD type=S3 timestamp=1657044075.314645917s

p115628@joplin:~/datasets/data-multi-subject$ git annex copy --to amazon what.nii.gz 
copy what.nii.gz (checking amazon...) (to amazon...) 
ok                                
p115628@joplin:~/datasets/data-multi-subject$ 

And the link

works

p115628@joplin:~/datasets/data-multi-subject$ git show HEAD:what.nii.gz  # figure out the filename it would have been uploaded to the bucket as
/annex/objects/SHA256E-s20971520--6f0f7f9893cc8c8d01993abf368e59fb4678f2308b756f6f18734862404efb16.nii.gz
p115628@joplin:~/datasets/data-multi-subject$ curl -I https://data-multi-subject---spine-generic---neuropoly.s3.ca-central-1.amazonaws.com/SHA256E-s20971520--6f0f7f9893cc8c8d01993abf368e59fb4678f2308b756f6f18734862404efb16.nii.gz
HTTP/1.1 200 OK
x-amz-id-2: xmkYCrnRhNfma2mGKN2CO04C9oZXayAJbzCASXFXGoeZhXVSiA4uwj53lZFQ8xUtswZz4Ph9ICM=
x-amz-request-id: WPX16371RNVJVBZP
Date: Tue, 05 Jul 2022 18:04:43 GMT
Last-Modified: Tue, 05 Jul 2022 18:01:30 GMT
ETag: "e32ad309a93cf895004ff012ab4f79dc"
x-amz-version-id: KJdZ9UHsLQk4.IzsWzlCnzBoKBHqiE_f
Accept-Ranges: bytes
Content-Type: application/octet-stream
Server: AmazonS3
Content-Length: 20971520

I've reported this upstream, since Amazon's deprecation of ACLs means git-annex's docs are out of date, or at least aging. Hopefully JoeyH has some time to deal with this without feeling pressured to take shortcuts...

I'll just upload the fix on our end, which again was git annex enableremote public=no:

p115628@joplin:~/datasets/data-multi-subject$ git checkout git-annex
Switched to branch 'git-annex'
p115628@joplin:~/datasets/data-multi-subject$ git rebase -i origin/git-annex  # drop the test commits made by `git add` etc above; only keep the latest commit
Successfully rebased and updated refs/heads/git-annex.
p115628@joplin:~/datasets/data-multi-subject$ git push
Enumerating objects: 5, done.
Counting objects: 100% (5/5), done.
Delta compression using up to 128 threads
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 322 bytes | 322.00 KiB/s, done.
Total 3 (delta 2), reused 0 (delta 0), pack-reused 0
remote: Resolving deltas: 100% (2/2), completed with 2 local objects.
To github.com:spine-generic/data-multi-subject.git
   d57e2618..d10114a0  git-annex -> git-annex

@sandrinebedard this should now be working for you. Can you please try again?

@kousu
Copy link
Contributor

kousu commented Jul 5, 2022

Er but you better git annex sync -a --no-content first.

@sandrinebedard
Copy link
Member Author

@kousu I did git annex sync -a --no-content

When running

git annex copy --to amazon 

I get:

copy derivatives/labels/sub-mniS04/anat/sub-mniS04_acq-T1w_MTS_seg-manual.nii.gz (checking amazon...) (to amazon...)
100%  7.1 KiB           5 MiB/s 0s
  S3Error {s3StatusCode = Status {statusCode = 400, statusMessage = "Bad Request"}, s3ErrorCode = "AccessControlListNotSupported", s3ErrorMessage = "The bucket does not allow ACLs", s3ErrorResource = Nothing, s3ErrorHostId = Just "FbFM1/3kL3rgAOVsO8Jgt39Cgf5TXX6O4YryZEt9meEv04fU9CAnJ7lxI7VN3mE+wNX92N3oTss=", s3ErrorAccessKeyId = Nothing, s3ErrorStringToSign = Nothing, s3ErrorBucket = Nothing, s3ErrorEndpointRaw = Nothing, s3ErrorEndpoint = Nothing}

@kousu
Copy link
Contributor

kousu commented Jul 5, 2022

@sandrinebedard, hm you might need to git annex enableremote amazon to get the new setting to kick in?

If that doesn't work git annex enableremote amazon public=no should fix it.

@kousu
Copy link
Contributor

kousu commented Jul 5, 2022

I've given data-single-subject the same treatment. To test if it's behaving I checked out a copy and did

p115628@joplin:~/datasets/data-single-subject$ git annex get sub-glen/
(merging origin/git-annex into git-annex...)
(recording state in git...)
(scanning for unlocked files...)
get sub-glen/anat/sub-glen_T1w.nii.gz (from amazon...) 
(checksum...) ok                      
get sub-glen/anat/sub-glen_T2star.nii.gz (from amazon...) 
(checksum...) ok                  
get sub-glen/anat/sub-glen_T2w.nii.gz (from amazon...) 
(checksum...) ok                   
get sub-glen/anat/sub-glen_acq-MToff_MTS.nii.gz (from amazon...) 
(checksum...) ok                   
get sub-glen/anat/sub-glen_acq-MTon_MTS.nii.gz (from amazon...) 
(checksum...) ok                   
get sub-glen/anat/sub-glen_acq-T1w_MTS.nii.gz (from amazon...) 
(checksum...) ok                   
get sub-glen/dwi/sub-glen_dwi.nii.gz (from amazon...) 
(checksum...) ok                  
(recording state in git...)
p115628@joplin:~/datasets/data-single-subject$ dd if=/dev/urandom of=something.nii.gz count=20 bs=1M
20+0 records in
20+0 records out
20971520 bytes (21 MB, 20 MiB) copied, 0.148093 s, 142 MB/s
p115628@joplin:~/datasets/data-single-subject$ git annex add something.nii.gz 
add something.nii.gz 
ok                                
(recording state in git...)

Then tried to upload:

p115628@joplin:~/datasets/data-single-subject$ git annex copy --to amazon
copy something.nii.gz (checking amazon...) (to amazon...) 
35%   6.97 MiB         20 MiB/s 0s
  S3Error {s3StatusCode = Status {statusCode = 403, statusMessage = "Forbidden"}, s3ErrorCode = "AccessDenied", s3ErrorMessage = "Access Denied", s3ErrorResource = Nothing, s3ErrorHostId = Just "a9pJblAyVDHgfZm0xl6iRWoiQ08nXY6JUHXLDdhRvo7FJ9ZTuoIij4ICsGiFv48Te1efpAwfUxQ=", s3ErrorAccessKeyId = Nothing, s3ErrorStringToSign = Nothing, s3ErrorBucket = Nothing, s3ErrorEndpointRaw = Nothing, s3ErrorEndpoint = Nothing}

44%   8.78 MiB         14 MiB/s 0s 
  S3Error {s3StatusCode = Status {statusCode = 403, statusMessage = "Forbidden"}, s3ErrorCode = "AccessDenied", s3ErrorMessage = "Access Denied", s3ErrorResource = Nothing, s3ErrorHostId = Just "hrFm+hgp0zNUi1PAvW/FGojt72wkbE4td7BUhHlzVrwY4xx3U2gcr056hzxO7XXJnR7NT4c6zJo=", s3ErrorAccessKeyId = Nothing, s3ErrorStringToSign = Nothing, s3ErrorBucket = Nothing, s3ErrorEndpointRaw = Nothing, s3ErrorEndpoint = Nothing}

37%   7.43 MiB         19 MiB/s 0s 
  S3Error {s3StatusCode = Status {statusCode = 403, statusMessage = "Forbidden"}, s3ErrorCode = "AccessDenied", s3ErrorMessage = "Access Denied", s3ErrorResource = Nothing, s3ErrorHostId = Just "qa2GSAN3M1CL6fpsBTPE8eOPexB6C8l10HPwZz/yKNxfgArk5o66SLl7sIbt+vZGPpw8O7jFm50=", s3ErrorAccessKeyId = Nothing, s3ErrorStringToSign = Nothing, s3ErrorBucket = Nothing, s3ErrorEndpointRaw = Nothing, s3ErrorEndpoint = Nothing}

37%   7.31 MiB         19 MiB/s 0s 
  S3Error {s3StatusCode = Status {statusCode = 403, statusMessage = "Forbidden"}, s3ErrorCode = "AccessDenied", s3ErrorMessage = "Access Denied", s3ErrorResource = Nothing, s3ErrorHostId = Just "KlZAbMO1cxQa781w0tznl5Mfs/Nm/V/BvK11uS9xfRa7MD/AqIJYUJhIyQ/8mEuv07maO/c6IB8=", s3ErrorAccessKeyId = Nothing, s3ErrorStringToSign = Nothing, s3ErrorBucket = Nothing, s3ErrorEndpointRaw = Nothing, s3ErrorEndpoint = Nothing}
failed
copy sub-glen/anat/sub-glen_T1w.nii.gz (checking amazon...) ok
copy sub-glen/anat/sub-glen_T2star.nii.gz (checking amazon...) ok
copy sub-glen/anat/sub-glen_T2w.nii.gz (checking amazon...) ok
copy sub-glen/anat/sub-glen_acq-MToff_MTS.nii.gz (checking amazon...) ok
copy sub-glen/anat/sub-glen_acq-MTon_MTS.nii.gz (checking amazon...) ok
copy sub-glen/anat/sub-glen_acq-T1w_MTS.nii.gz (checking amazon...) ok
copy sub-glen/dwi/sub-glen_dwi.nii.gz (checking amazon...) ok
git-annex: copy: 1 failed

which looks good: sandrine2 should not have access to write to data-single-subject.

Then I went into IAM and added the READ_WRITE_S3_data-single-subject---spine-generic---neuropoly policy to the account and tried again:

p115628@joplin:~/datasets/data-single-subject$ 
p115628@joplin:~/datasets/data-single-subject$ git annex copy --to amazon
copy something.nii.gz (checking amazon...) (to amazon...) 
ok                                
copy sub-glen/anat/sub-glen_T1w.nii.gz (checking amazon...) ok
copy sub-glen/anat/sub-glen_T2star.nii.gz (checking amazon...) ok
copy sub-glen/anat/sub-glen_T2w.nii.gz (checking amazon...) ok
copy sub-glen/anat/sub-glen_acq-MToff_MTS.nii.gz (checking amazon...) ok
copy sub-glen/anat/sub-glen_acq-MTon_MTS.nii.gz (checking amazon...) ok
copy sub-glen/anat/sub-glen_acq-T1w_MTS.nii.gz (checking amazon...) ok
copy sub-glen/dwi/sub-glen_dwi.nii.gz (checking amazon...) ok
(recording state in git...)

@kousu
Copy link
Contributor

kousu commented Jul 5, 2022

I've also deleted the sandrine2 and kousu2 accounts, since I'm done testing with them.

@sandrinebedard
Copy link
Member Author

This did not work

git annex enableremote amazon
copy derivatives/labels/sub-balgrist03/anat/sub-balgrist03_acq-MTon_MTS_seg-manual.nii.gz (checking amazon...) (to amazon...)
100%  3.27 KiB          3 MiB/s 0s
  S3Error {s3StatusCode = Status {statusCode = 400, statusMessage = "Bad Request"}, s3ErrorCode = "AccessControlListNotSupported", s3ErrorMessage = "The bucket does not allow ACLs", s3ErrorResource = Nothing, s3ErrorHostId = Just "tP83hoh2Tk3mXsk/X+Lt7CUwRE82XrTSAWZAWMVNGcSKeycRGqUCQp261GWcaQ4D/RCUvglZleU=", s3ErrorAccessKeyId = Nothing, s3ErrorStringToSign = Nothing, s3ErrorBucket = Nothing, s3ErrorEndpointRaw = Nothing, s3ErrorEndpoint = Nothing}

100%  3.27 KiB        204 KiB/s 0s
  S3Error {s3StatusCode = Status {statusCode = 400, statusMessage = "Bad Request"}, s3ErrorCode = "AccessControlListNotSupported", s3ErrorMessage = "The bucket does not allow ACLs", s3ErrorResource = Nothing, s3ErrorHostId = Just "D+4sHrpqmliyjikTc6TnXmny/RkhJMyCYG562FGXfnWO3SAtdetFFfyNaVnyL4jJf9I7ZCjZiJI=", s3ErrorAccessKeyId = Nothing, s3ErrorStringToSign = Nothing, s3ErrorBucket = Nothing, s3ErrorEndpointRaw = Nothing, s3ErrorEndpoint = Nothing}
failed

However,

git annex enableremote amazon public=no

It worked!!! (still on going since there are a lot of images, but no errors!)

@kousu
Copy link
Contributor

kousu commented Jul 5, 2022

Great! I'm pretty confident this is done then.

tl;dr: I'm pretty sure the problem was the policy created yesterday was missing the PutBucketAcl permission, which git annex initrmote s3 public=yes ... needs. However, Amazon has a better solution now -- possibly created since the last time we uploaded files to this dataset? -- which is to not use ACLs at all, and I migrated us to that on both our public datasets.

The reason we never ran into this before is because @sandrinebedard is the first non-admin data curator we've onboarded. Everyone always had full permissions before.

@kousu kousu closed this as completed Jul 5, 2022
@kousu kousu reopened this Jul 15, 2022
@kousu
Copy link
Contributor

kousu commented Jul 15, 2022

Setting public=no has caused

For example:

p115628@joplin:~/datasets$ git clone [email protected]:spine-generic/data-multi-subject.git
Cloning into 'data-multi-subject'...
remote: Enumerating objects: 112666, done.
remote: Counting objects: 100% (16109/16109), done.
remote: Compressing objects: 100% (12231/12231), done.
remote: Total 112666 (delta 3074), reused 15133 (delta 3040), pack-reused 96557
Receiving objects: 100% (112666/112666), 13.89 MiB | 13.13 MiB/s, done.
Resolving deltas: 100% (43287/43287), done.
p115628@joplin:~/datasets$ cd data-multi-subject/
p115628@joplin:~/datasets/data-multi-subject$ git annex init
init  (merging origin/git-annex into git-annex...)
(recording state in git...)
(scanning for unlocked files...)
git config 
  Unable to parse git config from origin

  Remote origin does not have git-annex installed; setting annex-ignore

  This could be a problem with the git-annex installation on the remote. Please make sure that git-annex-shell is available in PATH when you ssh into the remote. Once you have fixed the git-annex installation, run: git annex enableremote origin
(Auto enabling special remote amazon...)
ok
(recording state in git...)
p115628@joplin:~/datasets/data-multi-subject$ git annex get sub-mgh01/dwi/sub-mgh01_dwi.nii.gz 
get sub-mgh01/dwi/sub-mgh01_dwi.nii.gz (from amazon...) 

  S3 bucket does not allow public access; Set both AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY to use S3

  cannot download content

  Unable to access these remotes: amazon

  Maybe add some of these git remotes (git remote add ...):
        5cdba4fc-8d50-4e89-bb0c-a3a4f9449666 -- [email protected]:~/code/spine-generic/data-multi-subject
        9e4d13f3-30e1-4a29-8b86-670879928606 -- [email protected]:~/data/data-multi-subject
        e405e14e-33b2-4a35-b7a7-3eeec054f0d4 -- [email protected]:/mnt/nvme/sebeda/data-multi-subject

  (Note that these git remotes have annex-ignore set: origin)
failed
git-annex: get: 1 failed

However, the actual file does allow public access:

p115628@joplin:~/datasets/data-multi-subject$ git show HEAD:sub-mgh01/dwi/sub-mgh01_dwi.nii.gz 
/annex/objects/SHA256E-s5014657--40fb32f763950b360f64856bea9ea0ee8774f03cd07eab4231522eebaf706c32.nii.gz
p115628@joplin:~/datasets/data-multi-subject$ curl -I https://data-multi-subject---spine-generic---neuropoly.s3.ca-central-1.amazonaws.com/SHA256E-s5014657--40fb32f763950b360f64856bea9ea0ee8774f03cd07eab4231522eebaf706c32.nii.gz
HTTP/1.1 200 OK
x-amz-id-2: kdN9WzlDz9ohW0MbUpcNLdYzlyzxuM50/NCEtn5GY34EJ57SScHM0jlT1LNUwLaskCmIszo4usbeBdBCmwTQRw==
x-amz-request-id: YCARKJAF675B7QZE
Date: Fri, 15 Jul 2022 13:35:12 GMT
Last-Modified: Tue, 04 Aug 2020 20:23:27 GMT
ETag: "545c0154520e66d82f937048a76b939d"
x-amz-version-id: null
Accept-Ranges: bytes
Content-Type: application/gzip
Server: AmazonS3
Content-Length: 5014657

So the problem is on git-annex's side. It's because public=no is set and it doesn't have credentials loaded, so it's not even trying to download the file. To prove it, see that just resetting public=yes locally fixes it:

p115628@joplin:~/datasets/data-multi-subject$ git annex enableremote amazon public=yes
enableremote amazon ok
(recording state in git...)
p115628@joplin:~/datasets/data-multi-subject$ git annex get sub-mgh01/dwi/sub-mgh01_dwi.nii.gz get sub-mgh01/dwi/sub-mgh01_dwi.nii.gz (from amazon...) 

(checksum...) ok                      
(recording state in git...)

So we have a catch 22: we can't disable ACLs with and have contributors with public=yes (because public=yes implies git-annex attempting to use ACLs), but we can't have our users download the datasets without it (because public=no assumes AWS_ACCESS_KEY_ID needs to be set as well).

@kousu
Copy link
Contributor

kousu commented Jul 15, 2022

Another problem: somehow we ended up with two copies of the amazon remote:

p115628@joplin:~/datasets/data-multi-subject$ git show origin/git-annex:remote.log
5a5447a8-a9b8-49bc-8276-01a62632b502 autoenable=true bucket=data-multi-subject---spine-generic---neuropoly datacenter=ca-central-1 encryption=none host=s3.ca-central-1.amazonaws.com name=amazon port=443 public=no publicurl=https://data-multi-subject---spine-generic---neuropoly.s3.ca-central-1.amazonaws.com signature=v4 storageclass=STANDARD type=S3 timestamp=1657044075.314645917s
5a5447a8-a9b8-49bc-8276-01a62632b502 autoenable=true bucket=data-multi-subject---spine-generic---neuropoly datacenter=ca-central-1 encryption=none host=s3.ca-central-1.amazonaws.com name=amazon port=443 public=no publicurl=https://data-multi-subject---spine-generic---neuropoly.s3.ca-central-1.amazonaws.com signature=v4 storageclass=STANDARD type=S3 timestamp=1657048664.179485s

It looks like maybe it was ...a..bad merge?

p115628@joplin:~/datasets/data-multi-subject$ git log -p origin/git-annex -- remote.log
commit 6099ff118ae9380f5e3e64cef960864a21b65e08
Merge: 864c44a58 d10114a0c
Author: Sandrine Bedard <[email protected]>
Date:   Wed Jul 6 14:03:00 2022 -0400

    merging origin/git-annex into git-annex

but running enableremote amazon public=yes above has quietly edited the git-annex branch to remove the conflict:

p115628@joplin:~/datasets/data-multi-subject$ git show git-annex:remote.log
5a5447a8-a9b8-49bc-8276-01a62632b502 autoenable=true bucket=data-multi-subject---spine-generic---neuropoly datacenter=ca-central-1 encryption=none host=s3.ca-central-1.amazonaws.com name=amazon port=443 public=yes publicurl=https://data-multi-subject---spine-generic---neuropoly.s3.ca-central-1.amazonaws.com signature=v4 storageclass=STANDARD type=S3 timestamp=1657892261.433224054s

@kousu
Copy link
Contributor

kousu commented Jul 15, 2022

Reported upstream again at https://git-annex.branchable.com/bugs/S3_ACL_deprecation/, including the note that public=no isn't a full solution.

@mguaypaq
Copy link
Member

Our current understanding:

  • It's important that everyone's git-annex has the amazon remote set to public=yes, so that everyone can download the data without credentials.
  • But it's important to set public=no while uploading new data, so that git-annex uses the credentials to get write access to our S3 bucket.
  • When running git annex sync and/or git push origin git-annex:git-annex, the current public=yes/no setting propagates to everyone else.

So, as long as public=yes is set again before running git annex sync and/or git push origin git-annex:git-annex, the public=no shouldn't "infect" anyone else. I've updated the instructions on the wiki (link) to reflect this.

@kousu
Copy link
Contributor

kousu commented Jul 20, 2022

I think the docs are a good enough workaround for now. Thanks a lot for taking the time to write them up! I like all the helpful comments on each line.

@kousu kousu closed this as completed Jul 20, 2022
@mguaypaq
Copy link
Member

A note for the future: according to a git-annex tip, it should be possible to configure two separate remotes for our Amazon S3 bucket, one with public=yes for everyone to read, and one with public=no for only the people with credentials and write access, configured so that git-annex knows that uploading to the writable remote makes a file available also on the world-readable remote. (@kousu, another use-case mentioned is accessing a repository through both https and ssh, I think you've been looking for that?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants