Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UploadInput.ContentMD5 silently ignored for large files #2500

Closed
eriksw opened this issue Mar 10, 2019 · 5 comments
Closed

UploadInput.ContentMD5 silently ignored for large files #2500

eriksw opened this issue Mar 10, 2019 · 5 comments
Labels
closed-for-staleness documentation This is a problem with documentation. guidance Question that needs advice or information.

Comments

@eriksw
Copy link

eriksw commented Mar 10, 2019

Please fill out the sections below to help us address your issue.

Version of AWS SDK for Go?

v1.17.14

Version of Go (go version)?

go version go1.11 darwin/amd64

What issue did you see?

When using s3manager.Uploader to upload files, when an incorrect ContentMD5 is set on the UploadInput, the behavior depends on the size of the file being uploaded:

  • Uploads of small files (single-part) correctly fail.
  • Uploads of larger files (multi-part) silently and incorrectly succeed.

The behavior of uploading using s3manager.Uploader when UploadInput.ContentMD5 does not match the actual Body being uploaded should be consistent regardless of the Body size.

Silently failing to effect my intention as an user (that the object will only appear in the bucket if its overall md5 matches what I set in ContentMD5) is completely unacceptable.

If there's no way for Uploader to ensure that a provided ContentMD5 actually matches what winds up in the bucket, that option needs to be removed, or an error should be thrown to bring the issue to the api user's attention.

Steps to reproduce

package main

import (
    "crypto/md5"
    "encoding/base64"
    "flag"
    "github.com/aws/aws-sdk-go/aws"
    "github.com/aws/aws-sdk-go/aws/session"
    "github.com/aws/aws-sdk-go/service/s3/s3manager"
    "github.com/aws/aws-sdk-go/service/s3/s3manager/s3manageriface"
    "io"
    "io/ioutil"
    "log"
    "math/rand"
    "os"
)

var uploader s3manageriface.UploaderAPI
var bucket string

func main() {
    flag.StringVar(&bucket, "bucket", "", "s3 bucket to upload to")
    flag.Parse()
    awsConfig := aws.NewConfig()
    awsSession := session.Must(session.NewSession(awsConfig))
    uploader = s3manager.NewUploader(awsSession)

    log.Printf("bucket=%q", bucket)

    log.Println("testing with small upload:")
    withTestFile(s3manager.DefaultUploadPartSize/2, testUpload)
    log.Println("testing with large upload:")
    withTestFile(s3manager.DefaultUploadPartSize*4, testUpload)
}

func testUpload(file *os.File, goodDigest string) {
    if _, err := file.Seek(0, io.SeekStart); err != nil {
        panic(err)
    }
    goodInput := &s3manager.UploadInput{
        Bucket:               &bucket,
        Key:                  aws.String("test-file"),
        ContentMD5:           &goodDigest,
        ServerSideEncryption: aws.String("AES256"),
        Body:                 file,
    }
    if _, err := uploader.Upload(goodInput); err != nil {
        log.Printf("Upload with correct ContentMD5 failed (should not happen): %+v", err)
        return
    }

    badDigest := base64.StdEncoding.EncodeToString(make([]byte, 16))

    if _, err := file.Seek(0, io.SeekStart); err != nil {
        panic(err)
    }
    badInput := &s3manager.UploadInput{
        Bucket:               &bucket,
        Key:                  aws.String("test-file"),
        ContentMD5:           &badDigest,
        ServerSideEncryption: aws.String("AES256"),
        Body:                 file,
    }
    if _, err := uploader.Upload(badInput); err != nil {
        log.Printf("Upload with incorrect ContentMD5 failed (this should happen): %+v", err)
    } else {
        log.Println("Upload with incorrect ContentMD5 succeeded (THIS SHOULD NOT HAPPEN!)")
    }
}

func withTestFile(desiredSize int64, fun func(file *os.File, md5Digest string)) {
    file, err := ioutil.TempFile("", "")
    if err != nil {
        panic(err)
    }
    defer func() {
        name := file.Name()
        if err := file.Close(); err != nil {
            log.Printf("failed to close %q: %s", name, err)
        }
        if err := os.Remove(name); err != nil {
            if !os.IsNotExist(err) {
                log.Printf("failed to clean up %q", name)
            }
        }
    }()

    h := md5.New()
    var written int64
    const bs = 1024
    buf := make([]byte, bs)
    for written < desiredSize {
        _, _ = rand.Read(buf)
        if _, err := h.Write(buf); err != nil {
            panic(err)
        }
        if _, err := file.Write(buf); err != nil {
            panic(err)
        }
        written += bs
    }

    encoded := base64.StdEncoding.EncodeToString(h.Sum(nil))
    fun(file, encoded)
}
@diehlaws diehlaws self-assigned this Jun 11, 2019
@diehlaws diehlaws added guidance Question that needs advice or information. service-api This issue is due to a problem in a service API, not the SDK implementation. labels Jun 11, 2019
@diehlaws
Copy link
Contributor

Hi @eriksw, thanks for reaching out to us and I do apologize for the long delay in response from our end. The behavior you're seeing for Multipart uploads is expected - the ContentMD5 parameter in s3manager's UploadInput is only required for multipart uploads if object lock parameters are specified as mentioned in this struct's documentation. If no object lock parameters are specified and the upload is a multipart upload rather than a single PUT operation, The ContentMD5 parameter in the input is ignored since, as mentioned in S3's Multipart Upload Overview documentation page, the ETag for an object resultant from a successful multipart upload is not necessarily equal to the MD5 hash of the object data. You can see that this parameter is dropped from multipart uploads initiated via s3manager by enabling debug logging in your session config and looking at the post-sign request headers for the CreateMultipartUpload operation.

As a result, you are right in that the uploader cannot ensure file consistency in a multipart upload by way of comparing the ContentMD5 value provided in the UploadInput with the ETag of the resultant object, however this parameter is still used when object lock parameters are involved so it cannot be removed from the UploadInput struct altogether. The topic of multipart upload ETags not matching the MD5 hash of the object has been brought up multiple times in the AWS Forums as well as Premium Support cases, however S3's stance on this remains that it is intentional behavior and there are no plans to change it. One popular workaround for this (at least for files smaller than 5GB) is to copy the object to itself once the multipart upload completes successfully, which results in the ETag value matching the MD5 hash of the object. Other workarounds for different use cases include downloading the resultant object to a different file and comparing the MD5 hash of the newly downloaded file to the MD5 hash of the file that was originally uploaded (not a good solution when bandwidth is a concern) or storing the MD5 hash as metadata for the object (not viable when you want to ensure the contents of the S3 object match the contents of the origin file exactly).

@diehlaws diehlaws added the closing-soon This issue will automatically close in 4 days unless further comments are made. label Jun 11, 2019
@eriksw
Copy link
Author

eriksw commented Jun 12, 2019

Although it's very much required on the server side when Object Lock is in play, I'm pretty sure it's in effect never required when using s3manager.Uploader, because Uploader always promotes the input Body (via uploader.nextReader) such that the underlying S3 client's helper computeBodyHashes populates it if it's missing for a single-part upload. (Just as it does for the parts of multipart uploads.)

Because there's no way for UploadInput.ContentMD5 to have any impact at all on a multipart upload, and it's not required for single-part uploads (via Uploader at least), I'd suggest that the field be explicitly deprecated.

The struct field documentation as it currently exists:

  • States that it's required when it's not.
  • Fails to warn that the field is silently ignored for the common Uploader use case (multipart).
  • Fails to warn that the field is silently ignored even for a subset of the cases where it is "required" (Object Lock + multipart).

@diehlaws diehlaws removed the closing-soon This issue will automatically close in 4 days unless further comments are made. label Jun 13, 2019
@github-actions
Copy link

We have noticed this issue has not received attention in 1 year. We will close this issue for now. If you think this is in error, please feel free to comment and reopen the issue.

@github-actions github-actions bot added the closing-soon This issue will automatically close in 4 days unless further comments are made. label Aug 25, 2020
@eriksw
Copy link
Author

eriksw commented Aug 25, 2020

This issue should not be closed. It is still present in v1.34.10.

➜  s3-chunk-bug AWS_REGION=us-east-1 go run main.go -bucket [REDACTED]
2020/08/24 17:48:33 bucket="[REDACTED]"
2020/08/24 17:48:33 testing with small upload:
2020/08/24 17:48:34 Upload with incorrect ContentMD5 failed (this should happen): BadDigest: The Content-MD5 you specified did not match what we received.
	status code: 400, request id: 169324C027C5AB85, host id: 3M/bz+qLlKtol+DPU2LU6YtcozTB+/wCeflIF703IV0uwmg4Qa0i/oXF+JfR2HePEH8EmtFGeys=
2020/08/24 17:48:34 testing with large upload:
2020/08/24 17:48:56 Upload with incorrect ContentMD5 succeeded (THIS SHOULD NOT HAPPEN!)
➜  s3-chunk-bug go version
go version go1.14.7 darwin/amd64
➜  s3-chunk-bug ls
go.mod		go.sum		main.go
➜  s3-chunk-bug cat go.mod
module s3-chunk-bug

go 1.14

require github.com/aws/aws-sdk-go v1.34.10 // indirect

@diehlaws diehlaws added no-autoclose This issue should not be auto-closed by stale-issue-cleanup action. and removed closing-soon This issue will automatically close in 4 days unless further comments are made. labels Aug 25, 2020
@diehlaws diehlaws removed their assignment Aug 26, 2020
@jasdel
Copy link
Contributor

jasdel commented Feb 25, 2022

Thanks for reaching out with this issue @eriksw. In the SDK's latest release v1.43.6, we've added documentation to the Uploader and its UploadInput.ContentMD5 parameter to clarify the behavior of the SDK, and its handling of ContentMD5 for multi-part uploads with the Uploader. The ContentMD5 parameter is ignored for files that are larger than the Uploader's ChunkSize. It is ignored because for multi-part uploads the ContentMD5 value of the full object will not be the ContentMd5 of individual part, nor the expected checksum of checksums ContentMD5 for a multi part upload.

In addition, the AWS SDK for Go v2's usage of ContentMD5 has improved and more robust with the latest release, feature/s3/[email protected] supporting checking object integrity via the new ChecksumAlgorithm parameter for auto computing the checksum of the object being uploaded to the bucket.

@jasdel jasdel added closing-soon This issue will automatically close in 4 days unless further comments are made. documentation This is a problem with documentation. and removed service-api This issue is due to a problem in a service API, not the SDK implementation. no-autoclose This issue should not be auto-closed by stale-issue-cleanup action. labels Feb 25, 2022
@github-actions github-actions bot added closed-for-staleness and removed closing-soon This issue will automatically close in 4 days unless further comments are made. labels Mar 3, 2022
@github-actions github-actions bot closed this as completed Mar 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
closed-for-staleness documentation This is a problem with documentation. guidance Question that needs advice or information.
Projects
None yet
Development

No branches or pull requests

3 participants