Error on uploading more than 1.5Gb file "The specified upload does not exist. The upload ID may be invalid, or the upload may have been aborted or completed." #3497

das-sukriti · 2020-10-14T15:07:21Z

Confirm by changing [ ] to [x] below to ensure that it's a bug:

I've gone through Developer Guide and API reference
I've checked AWS Forums and StackOverflow for answers
I've searched for previous similar issues and didn't find any solution

Describe the bug
Hi,
Specs I am using:
node: v10.22.0
aws-sdk: 2.718.0
archiver: 3.0.0
json2csv: 4.5.1

I am getting "The specified upload does not exist. The upload ID may be invalid, or the upload may have been aborted or completed." error while uploading bulk data using stream in s3.upload. This does not come for data below 1.5Gb size.
I am trying to upload a zip file to s3 endpoint using the SDK for Javascript. Aim is to zip multiple csv files containing data fetched from DB.
To do that I am following the below steps. Is there something wrong with the way I am using the combination of s3.upload and archiver?

fetch Data from DB which gives me an Object Stream
I use

	objectStream.pipe(stringify)
		.on("finish", () => {
			console.log(); // All the read tables finishes successfully
                }
		.on("error", () => {
                        console.log(); //This does not catch any error
		}
		
	where stringify is:	
	const stringify: Transform = new Transform({ objectMode: true });
        stringify._transform = (chunk, _, callback) => {
          const s = JSON.stringify(chunk);
          const buffer = Buffer.from(s);
          stringify.push(buffer);
          callback();
        };
		
	And I am using json2csv converter to convert it into csv:
	const converter = new AsyncParser();
	converter.fromInput(stringify).toOutput(archivePassThrough);
		
	Also, using Archiver I am appending archivePassThrough it to archive:	
	const archive: Archiver = create("zip");
	const archivePassThrough: PassThrough = new PassThrough();
    archive.append(archivePassThrough, { name: fName });

For s3 upload, I am using:

      const s3: AWS.S3 = new AWS.S3({	
      accessKeyId: accessKey,
      secretAccessKey: secretKey,
      endpoint,
      httpOptions: {
          timeout: 0, // Request was timing out initially
       },
       // region: "",   // Tried but still error came
      // s3ForcePathStyle: true, // Tried but still error came
      // signatureVersion: "v4", // Tried but still error came     
    });
	
	const exportPassThrough: PassThrough = new PassThrough();
    archive.pipe(exportPassThrough);
	
	const id: string = uuid.v4();
	const uploadParams = { Bucket: bucketName, Key: id + ".zip", Body: exportPassThrough };
	
	s3.upload(uploadParams).
      on("httpUploadProgress", (evt: any) => {
        log();
      })
      .promise()
      .then((data: any) => {
        return data.Location;
      })
      .catch((err: any) => {
        return err.message; //This is where the error is getting caught
      });

Is the issue in the browser/Node.js?
Node.js

If on Node.js, are you running this on AWS Lambda?
No

Details of the browser/Node.js version
v10.22.0

SDK version number
v2.718.0

Expected behavior
Expected data.Location to have returned with a value. Value comes only for data within 1.5Gb.

The text was updated successfully, but these errors were encountered:

ajredniwja · 2020-10-15T01:50:11Z

Hey @das-sukriti, thank-you for opening this issue, I was not able to reproduce this, I used 5GB file, I believe whats happening is one of the parts is not getting uploaded which gives out the error. This might because of network issues, in the low level API a multipart upload is initiated for larger file when S3.upload() is used.

The multiPart errors are usually handled automatically but for the situation:

One option is to use the low level api and retry a part when it is not uploaded properly manually.

You can find the information about the APIs here.

Try increasing the maxRetries param, https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3.html#upload-property

You can also try to run this a little code snippet which can give you more visibility.


const AWS = require('aws-sdk');
const fs = require('fs');

const s3 = new AWS.S3({
    httpOptions: {
        connectTimeout: 5000,
        timeout: 120000
    },
    maxRetries: 10,
    region: 'us-west-2',
    logger: console,
    retryDelayOptions: {
        customBackoff: (retryCount, err) => {
            console.log(`retry: ${retryCount} :: ${err}`);
        }
    }
});

// var options = {partSize: 10 * 1024 * 1024, queueSize: 1};
s3.upload({
    Bucket: 'bucket',
    Key: 'key ',
    Body: fs.createReadStream('xyz'),
}, (err, data) => {
    if (err) console.log(err, err.stack);
}).on('httpUploadProgress', function (progress) {
    console.log(progress);
});

das-sukriti · 2020-10-15T08:56:22Z

@ajredniwja Hi, Thank you for the quick response.

I thought S3.upload() was supposed to take care of the multipart retries automatically. For my case, I am streaming the data from a source(which is decided at runtime).. I am zipping multiple files to a single zip file and writing to s3 location. The data size of the zip contents may vary from few Kbs to 8Gb. So, as per my understanding, to opt for the low level multipart api, I would need to know the data size in order to decide the part size and part number and accordingly send the multipart requests. I am refraining from loading the full data into memory to calculate the data size..

I am new to aws-sdk for node.. So.. I am not sure if there are any tweaks or libraries I could use to perform multipart without knowing the data size.. Any example of a possible approach will be very helpful.

ajredniwja · 2020-10-15T21:42:22Z

The low level API is used when the size of the data is unknown as mentioned in the documentation above.

For your queries for part size and number of parts you might want to do something like:

const partSize = 1024 * 1024 * 5;  //each part of 5mb, except the last

const totalParts = Math.ceil(buffer.length / partSize);

das-sukriti · 2020-10-16T07:49:49Z

@ajredniwja Question is: How to read the file asynchronously and upload it in parallel in multipart?

How will i get buffer.length if my files are still being read? and how will I define the number of parts if I do not have the entire data read before calling multipart? I do not want to read the full data before starting multipart. it will use a lot of memory. As far as i understand we need totalParts to call UploadPart in a loop. Can you please provide an example where you can use multipart asynchronously with the file reading in progress?

ajredniwja · 2020-10-20T16:05:43Z

You dont't actually need to know the size of the object in multipart upload, you can see this implementation in V3 of the SDK: aws/aws-sdk-js-v3#1547

And examples in other languages: https://docs.aws.amazon.com/AmazonS3/latest/dev/uploadobjusingmpu.html

If you are not able to come up with a solution I can co write the code.

das-sukriti · 2020-10-27T09:56:37Z

Hi, I will not be able to share the codebase with you, since this is a part of a product belonging to my company. As you have mentioned before that you have tried with 5GB data and it worked for you, can you please share the code that you have worked on. Maybe we can proceed with that.. I need to check if I am doing anything different from you.

Also, here you have mentioned that we can try low level API. Are you saying that if we use low level APIs like CreateMultipartUpload , we do not need to know the size of the object? Do you have a real time code example in JavaScript that I can follow (not documentation)?

Sorry, forgot to mention. I have tried the code snippet you provided. Same error came.

Hey @das-sukriti, thank-you for opening this issue, I was not able to reproduce this, I used 5GB file, I believe whats happening is one of the parts is not getting uploaded which gives out the error. This might because of network issues, in the low level API a multipart upload is initiated for larger file when S3.upload() is used.

The multiPart errors are usually handled automatically but for the situation:

One option is to use the low level api and retry a part when it is not uploaded properly manually.

You can find the information about the APIs here.

Try increasing the maxRetries param, https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3.html#upload-property

You can also try to run this a little code snippet which can give you more visibility.
const AWS = require('aws-sdk');
const fs = require('fs');

const s3 = new AWS.S3({
    httpOptions: {
        connectTimeout: 5000,
        timeout: 120000
    },
    maxRetries: 10,
    region: 'us-west-2',
    logger: console,
    retryDelayOptions: {
        customBackoff: (retryCount, err) => {
            console.log(`retry: ${retryCount} :: ${err}`);
        }
    }
});

// var options = {partSize: 10 * 1024 * 1024, queueSize: 1};
s3.upload({
    Bucket: 'bucket',
    Key: 'key ',
    Body: fs.createReadStream('xyz'),
}, (err, data) => {
    if (err) console.log(err, err.stack);
}).on('httpUploadProgress', function (progress) {
    console.log(progress);
});

ajredniwja · 2020-11-09T23:40:30Z

You can also try to run this a little code snippet which can give you more visibility.


const AWS = require('aws-sdk');
const fs = require('fs');

const s3 = new AWS.S3({
    httpOptions: {
        connectTimeout: 5000,
        timeout: 120000
    },
    maxRetries: 10,
    region: 'us-west-2',
    logger: console,
    retryDelayOptions: {
        customBackoff: (retryCount, err) => {
            console.log(`retry: ${retryCount} :: ${err}`);
        }
    }
});

// var options = {partSize: 10 * 1024 * 1024, queueSize: 1};
s3.upload({
    Bucket: 'bucket',
    Key: 'key ',
    Body: fs.createReadStream('xyz'),
}, (err, data) => {
    if (err) console.log(err, err.stack);
}).on('httpUploadProgress', function (progress) {
    console.log(progress);
});

Apologies for the late reply, I used the same code snippet, can you share what printed on the console from the code snippet?
The problem might be just the part not being ready to upload, when using the lower level API you just need to wait for your chunk size to be loaded and upload that. I can work on the example but it will be pretty similar from what I shared how it is done in V3

github-actions · 2020-11-17T00:22:52Z

This issue has not received a response in 1 week. If you still think there is a problem, please leave a comment to avoid the issue from automatically closing.

das-sukriti added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Oct 14, 2020

ajredniwja added guidance Question that needs advice or information. and removed bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Oct 15, 2020

ajredniwja added the response-requested Waiting on additional info and feedback. Will move to \"closing-soon\" in 7 days. label Oct 15, 2020

github-actions bot removed the response-requested Waiting on additional info and feedback. Will move to \"closing-soon\" in 7 days. label Oct 17, 2020

ajredniwja added the response-requested Waiting on additional info and feedback. Will move to \"closing-soon\" in 7 days. label Oct 20, 2020

github-actions bot removed the response-requested Waiting on additional info and feedback. Will move to \"closing-soon\" in 7 days. label Oct 28, 2020

ajredniwja added the response-requested Waiting on additional info and feedback. Will move to \"closing-soon\" in 7 days. label Nov 9, 2020

github-actions bot added closing-soon This issue will automatically close in 4 days unless further comments are made. closed-for-staleness and removed closing-soon This issue will automatically close in 4 days unless further comments are made. labels Nov 17, 2020

github-actions bot closed this as completed Nov 21, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error on uploading more than 1.5Gb file "The specified upload does not exist. The upload ID may be invalid, or the upload may have been aborted or completed." #3497

Error on uploading more than 1.5Gb file "The specified upload does not exist. The upload ID may be invalid, or the upload may have been aborted or completed." #3497

das-sukriti commented Oct 14, 2020 •

edited by ajredniwja

Loading

ajredniwja commented Oct 15, 2020

das-sukriti commented Oct 15, 2020

ajredniwja commented Oct 15, 2020

das-sukriti commented Oct 16, 2020

ajredniwja commented Oct 20, 2020

das-sukriti commented Oct 27, 2020 •

edited

Loading

ajredniwja commented Nov 9, 2020

github-actions bot commented Nov 17, 2020

Error on uploading more than 1.5Gb file "The specified upload does not exist. The upload ID may be invalid, or the upload may have been aborted or completed." #3497

Error on uploading more than 1.5Gb file "The specified upload does not exist. The upload ID may be invalid, or the upload may have been aborted or completed." #3497

Comments

das-sukriti commented Oct 14, 2020 • edited by ajredniwja Loading

ajredniwja commented Oct 15, 2020

das-sukriti commented Oct 15, 2020

ajredniwja commented Oct 15, 2020

das-sukriti commented Oct 16, 2020

ajredniwja commented Oct 20, 2020

das-sukriti commented Oct 27, 2020 • edited Loading

ajredniwja commented Nov 9, 2020

github-actions bot commented Nov 17, 2020

das-sukriti commented Oct 14, 2020 •

edited by ajredniwja

Loading

das-sukriti commented Oct 27, 2020 •

edited

Loading