Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[lib-storage] Upload fails if application provides checksum for >5 MB file #6742

Open
3 of 4 tasks
trivikr opened this issue Dec 17, 2024 · 5 comments
Open
3 of 4 tasks
Labels
bug This issue is a bug. p3 This is a minor priority issue queued This issues is on the AWS team's backlog

Comments

@trivikr
Copy link
Member

trivikr commented Dec 17, 2024

Checkboxes for prior research

Describe the bug

Upload fails if application provides checksum for >5 MB file

Regression Issue

  • Select this option if this issue appears to be a regression.

SDK version number

@aws-sdk/[email protected], @aws-sdk/[email protected]

Which JavaScript Runtime is this issue in?

Node.js

Details of the browser/Node.js/ReactNative version

All, verified in v22.11.0

Reproduction Steps

import { createReadStream, createWriteStream } from "fs";
import { createHash } from "crypto";
import { S3 } from "@aws-sdk/client-s3";
import { Upload } from "@aws-sdk/lib-storage";

const SIZE_IN_MB = 6;
const content = "helloworld";
const Key = `${content}_${SIZE_IN_MB}MB.txt`;

const SIZE_IN_BYTES = SIZE_IN_MB * 1024 * 1024;
const repetitions = Math.floor(SIZE_IN_BYTES / content.length);

const hash = createHash("sha256");
const writeStream = createWriteStream(Key);
for (let i = 0; i < repetitions; i++) {
  writeStream.write(content);
  hash.update(content);
}
writeStream.end();
await new Promise((resolve) => writeStream.on("close", resolve));

const client = new S3();
const Bucket = "test-flexible-checksums"; // Replace with your test bucket name.
const Body = createReadStream(Key);
const ChecksumSHA256 = hash.digest("base64");

const upload = new Upload({
  client,
  params: { Bucket, Key, Body, ChecksumSHA256 },
});
await upload.done();

Observed Behavior

When SIZE_IN_MB is greater than 5, the following error is thrown

/local/home/trivikr/workspace/test/node_modules/@smithy/smithy-client/dist-cjs/index.js:835
  const response = new exceptionCtor({
                   ^

BadDigest: The SHA256 you specified did not match the calculated checksum.
    at throwDefaultError (/local/home/trivikr/workspace/test/node_modules/@smithy/smithy-client/dist-cjs/index.js:835:20)
    at /local/home/trivikr/workspace/test/node_modules/@smithy/smithy-client/dist-cjs/index.js:844:5
    at de_CommandError (/local/home/trivikr/workspace/test/node_modules/@aws-sdk/client-s3/dist-cjs/index.js:4919:14)
    at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
    at async /local/home/trivikr/workspace/test/node_modules/@smithy/middleware-serde/dist-cjs/index.js:35:20
    at async /local/home/trivikr/workspace/test/node_modules/@aws-sdk/middleware-sdk-s3/dist-cjs/index.js:485:18
    at async /local/home/trivikr/workspace/test/node_modules/@smithy/middleware-retry/dist-cjs/index.js:320:38
    at async /local/home/trivikr/workspace/test/node_modules/@aws-sdk/middleware-flexible-checksums/dist-cjs/index.js:263:18
    at async /local/home/trivikr/workspace/test/node_modules/@aws-sdk/middleware-sdk-s3/dist-cjs/index.js:110:22
    at async /local/home/trivikr/workspace/test/node_modules/@aws-sdk/middleware-sdk-s3/dist-cjs/index.js:138:14 {
  '$fault': 'client',
  '$metadata': {
    httpStatusCode: 400,
    requestId: '85PD79WA0QPKN7EW',
    extendedRequestId: '1qhp9msss1ay+fS0glD9F/68M9FOXmzOsoejN/DRw/xZ/ViZs9tu1gqpg2XdglxZgQKp2Rm6uJY=',
    cfId: undefined,
    attempts: 1,
    totalRetryDelay: 0
  },
  Code: 'BadDigest',
  RequestId: '85PD79WA0QPKN7EW',
  HostId: '1qhp9msss1ay+fS0glD9F/68M9FOXmzOsoejN/DRw/xZ/ViZs9tu1gqpg2XdglxZgQKp2Rm6uJY='
}

When SIZE_IN_MB is less than or equal to 5, then no error is thrown.

Expected Behavior

No error thrown when application provides Checksum for >5 MB file.

Possible Solution

No response

Additional Information/Context

No response

@trivikr trivikr added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. queued This issues is on the AWS team's backlog p3 This is a minor priority issue and removed needs-triage This issue or PR still needs to be triaged. labels Dec 17, 2024
@trivikr
Copy link
Member Author

trivikr commented Dec 17, 2024

This bug can be fixed when S3 allows passing x-amz-checksum-type as per blog post

The CreateMultiPartUpload API introduces a new HTTP header, x-amz-checksum-type, which lets you specify the type of checksum to use. You can choose either a full object checksum (calculated by combining the checksums of all individual parts) or a composite checksum.

@trivikr
Copy link
Member Author

trivikr commented Dec 17, 2024

While we wait for model to updated to allow passing x-amz-checksum-type, there are two workarounds.

Workaround 1: Let SDK compute the checksum

Pass ChecksumAlgorithm="Sha256" instead of passing the checksum value in ChecksumSHA256 for >5 MB files.
The SDK will compute the checksum for each part.

Test code

class CustomHandler extends NodeHttpHandler {
  constructor() {
    super();
  }

  printChecksumHeaders(prefix, headers) {
    for (const [header, value] of Object.entries(headers)) {
      if (
        header.startsWith("x-amz-checksum-") ||
        header.startsWith("x-amz-sdk-checksum-")
      ) {
        console.log(`${prefix}['${header}']: '${value}'`);
      }
    }
  }

  async handle(request, options) {
    const response = await super.handle(request, options);
    console.log();
    console.log("------------------");
    this.printChecksumHeaders("request", request.headers);
    this.printChecksumHeaders("response", response.response.headers);
    console.log("------------------");
    console.log();
    return response;
  }
}

const client = new S3({ requestHandler: new CustomHandler() });
const Bucket = "test-flexible-checksums"; // Replace with your test bucket name.
const Body = createReadStream(Key);
const ChecksumAlgorithm = "SHA256";

const upload = new Upload({
  client,
  params: { Bucket, Key, Body, ChecksumAlgorithm },
});
await upload.done();

Note that SDK sends the 6 MB file in two parts, and computes checksums for each part

------------------
request['x-amz-checksum-algorithm']: 'SHA256'
response['x-amz-checksum-algorithm']: 'SHA256'
response['x-amz-checksum-type']: 'COMPOSITE'
------------------


------------------
request['x-amz-sdk-checksum-algorithm']: 'SHA256'
request['x-amz-checksum-sha256']: 'jl28wIE8B/50cPRez5JaVTYNlvVk39FWKapxP8KEu88='
response['x-amz-checksum-sha256']: 'jl28wIE8B/50cPRez5JaVTYNlvVk39FWKapxP8KEu88='
------------------


------------------
request['x-amz-sdk-checksum-algorithm']: 'SHA256'
request['x-amz-checksum-sha256']: 'JU0DrgdnLiihtY/7GHhqvAmJv50Va9RhaNLVwUUu9NU='
response['x-amz-checksum-sha256']: 'JU0DrgdnLiihtY/7GHhqvAmJv50Va9RhaNLVwUUu9NU='
------------------


------------------
------------------

Workaround 2: Use PutObject from client-s3 instead of Upload from lib-storage

class CustomHandler extends NodeHttpHandler {
  constructor() {
    super();
  }

  printChecksumHeaders(prefix, headers) {
    for (const [header, value] of Object.entries(headers)) {
      if (
        header.startsWith("x-amz-checksum-") ||
        header.startsWith("x-amz-sdk-checksum-")
      ) {
        console.log(`${prefix}['${header}']: '${value}'`);
      }
    }
  }

  async handle(request, options) {
    const response = await super.handle(request, options);
    console.log();
    console.log("------------------");
    this.printChecksumHeaders("request", request.headers);
    this.printChecksumHeaders("response", response.response.headers);
    console.log("------------------");
    console.log();
    return response;
  }
}

const client = new S3({ requestHandler: new CustomHandler() });
const Bucket = "test-flexible-checksums"; // Replace with your test bucket name.
const Body = createReadStream(Key);

await client.putObject({ Bucket, Key, Body, ChecksumSHA256 });

The PutObject call will send the provided checksum when making the call

------------------
request['x-amz-checksum-sha256']: 'ZRRKEcEAxGazUzgqh+rSEecXfI27XNZQ8Uv7aMOX64s='
response['x-amz-checksum-sha256']: 'ZRRKEcEAxGazUzgqh+rSEecXfI27XNZQ8Uv7aMOX64s='
response['x-amz-checksum-type']: 'FULL_OBJECT'
------------------

Between the two workarounds, we recommend using Upload without checksum, as Upload from @aws-sdk/lib-storage is recommended for large files.

@kasir-barati
Copy link

kasir-barati commented Feb 15, 2025

@trivikr I think I've also seen this behavior. I am using CreateMultipartUploadCommand to upload a large file. I specified FULL_OBJECT as my preferred checksum type but when I changed it to COMPOSITE AWS S3 was returning the same checksum. And now I know that in fact AWS S3's API is the source of this weird situation.

So how I can do it if I have to do it? Like when the source (e.g. an IoT device) is sending the data to a backend app and from there we need to upload it to AWS S3. And this file is HUGE. So I need to do this and cannot let the SDK to do it for me.

BTW here is a very simplified version of what I am trying to do. I really appreciate it if you could take look and tell me if there is a solution for me.

And JFYI I have racked my brain to get this to work, actually you can see my footsteps almost everywhere 😂.

@kasir-barati
Copy link

And on another note I've read in the docs that AWS S3 due to technical limitation only support CRC-based algorithms for FULL_OBJECT:

For full object checksums, you can use CRC-64/NVME (CRC64NVME), CRC-32 (CRC32), or CRC-32C (CRC32C) checksum algorithms in S3. Full object checksums in multipart uploads are only available for CRC-based checksums because they can linearize into a full object checksum. This linearization allows Amazon S3 to parallelize your requests for improved performance. In particular, S3 can compute the checksum of the whole object from the part-level checksums. This type of validation isn’t available for other algorithms, such as SHA and MD5. Because S3 has default integrity protections, if objects are uploaded without a checksum, S3 automatically attaches the recommended full object CRC-64/NVME (CRC64NVME) checksum algorithm to the object.

Ref

But here you've use SHA256, so does that mean that the doc is outdated @trivikr?

@kasir-barati
Copy link

Another dirty hack would be: https://github.com/kasir-barati/bugs/blob/1149c73557b939a911ec4ec999c8430aa99124f0/upload1.ts#L36-L56

@trivikr how we can know when should we expect this bug to be fixed in the AWS SDK?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a bug. p3 This is a minor priority issue queued This issues is on the AWS team's backlog
Projects
None yet
Development

No branches or pull requests

2 participants