Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error on uploading more than 1.5Gb file "The specified upload does not exist. The upload ID may be invalid, or the upload may have been aborted or completed." #3497

Closed
3 tasks done
das-sukriti opened this issue Oct 14, 2020 · 8 comments
Labels
closed-for-staleness guidance Question that needs advice or information. response-requested Waiting on additional info and feedback. Will move to \"closing-soon\" in 7 days.

Comments

@das-sukriti
Copy link

das-sukriti commented Oct 14, 2020

Confirm by changing [ ] to [x] below to ensure that it's a bug:

Describe the bug
Hi,
Specs I am using:
node: v10.22.0
aws-sdk: 2.718.0
archiver: 3.0.0
json2csv: 4.5.1

I am getting "The specified upload does not exist. The upload ID may be invalid, or the upload may have been aborted or completed." error while uploading bulk data using stream in s3.upload. This does not come for data below 1.5Gb size.
I am trying to upload a zip file to s3 endpoint using the SDK for Javascript. Aim is to zip multiple csv files containing data fetched from DB.
To do that I am following the below steps. Is there something wrong with the way I am using the combination of s3.upload and archiver?

  1. fetch Data from DB which gives me an Object Stream
  2. I use
	objectStream.pipe(stringify)
		.on("finish", () => {
			console.log(); // All the read tables finishes successfully
                }
		.on("error", () => {
                        console.log(); //This does not catch any error
		}
		
	where stringify is:	
	const stringify: Transform = new Transform({ objectMode: true });
        stringify._transform = (chunk, _, callback) => {
          const s = JSON.stringify(chunk);
          const buffer = Buffer.from(s);
          stringify.push(buffer);
          callback();
        };
		
	And I am using json2csv converter to convert it into csv:
	const converter = new AsyncParser();
	converter.fromInput(stringify).toOutput(archivePassThrough);
		
	Also, using Archiver I am appending archivePassThrough it to archive:	
	const archive: Archiver = create("zip");
	const archivePassThrough: PassThrough = new PassThrough();
    archive.append(archivePassThrough, { name: fName });
  1. For s3 upload, I am using:
      const s3: AWS.S3 = new AWS.S3({	
      accessKeyId: accessKey,
      secretAccessKey: secretKey,
      endpoint,
      httpOptions: {
          timeout: 0, // Request was timing out initially
       },
       // region: "",   // Tried but still error came
      // s3ForcePathStyle: true, // Tried but still error came
      // signatureVersion: "v4", // Tried but still error came     
    });
	
	const exportPassThrough: PassThrough = new PassThrough();
    archive.pipe(exportPassThrough);
	
	const id: string = uuid.v4();
	const uploadParams = { Bucket: bucketName, Key: id + ".zip", Body: exportPassThrough };
	
	s3.upload(uploadParams).
      on("httpUploadProgress", (evt: any) => {
        log();
      })
      .promise()
      .then((data: any) => {
        return data.Location;
      })
      .catch((err: any) => {
        return err.message; //This is where the error is getting caught
      });
	

Is the issue in the browser/Node.js?
Node.js

If on Node.js, are you running this on AWS Lambda?
No

Details of the browser/Node.js version
v10.22.0

SDK version number
v2.718.0

Expected behavior
Expected data.Location to have returned with a value. Value comes only for data within 1.5Gb.

@das-sukriti das-sukriti added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Oct 14, 2020
@das-sukriti das-sukriti changed the title Error on loading more than 1.5Gb file "The specified upload does not exist. The upload ID may be invalid, or the upload may have been aborted or completed." Error on uploading more than 1.5Gb file "The specified upload does not exist. The upload ID may be invalid, or the upload may have been aborted or completed." Oct 14, 2020
@ajredniwja
Copy link
Contributor

Hey @das-sukriti, thank-you for opening this issue, I was not able to reproduce this, I used 5GB file, I believe whats happening is one of the parts is not getting uploaded which gives out the error. This might because of network issues, in the low level API a multipart upload is initiated for larger file when S3.upload() is used.

The multiPart errors are usually handled automatically but for the situation:

One option is to use the low level api and retry a part when it is not uploaded properly manually.

You can find the information about the APIs here.

Try increasing the maxRetries param, https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3.html#upload-property

You can also try to run this a little code snippet which can give you more visibility.


const AWS = require('aws-sdk');
const fs = require('fs');

const s3 = new AWS.S3({
    httpOptions: {
        connectTimeout: 5000,
        timeout: 120000
    },
    maxRetries: 10,
    region: 'us-west-2',
    logger: console,
    retryDelayOptions: {
        customBackoff: (retryCount, err) => {
            console.log(`retry: ${retryCount} :: ${err}`);
        }
    }
});

// var options = {partSize: 10 * 1024 * 1024, queueSize: 1};
s3.upload({
    Bucket: 'bucket',
    Key: 'key ',
    Body: fs.createReadStream('xyz'),
}, (err, data) => {
    if (err) console.log(err, err.stack);
}).on('httpUploadProgress', function (progress) {
    console.log(progress);
});

@ajredniwja ajredniwja added guidance Question that needs advice or information. and removed bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Oct 15, 2020
@das-sukriti
Copy link
Author

@ajredniwja Hi, Thank you for the quick response.

I thought S3.upload() was supposed to take care of the multipart retries automatically. For my case, I am streaming the data from a source(which is decided at runtime).. I am zipping multiple files to a single zip file and writing to s3 location. The data size of the zip contents may vary from few Kbs to 8Gb. So, as per my understanding, to opt for the low level multipart api, I would need to know the data size in order to decide the part size and part number and accordingly send the multipart requests. I am refraining from loading the full data into memory to calculate the data size..

I am new to aws-sdk for node.. So.. I am not sure if there are any tweaks or libraries I could use to perform multipart without knowing the data size.. Any example of a possible approach will be very helpful.

@ajredniwja
Copy link
Contributor

The low level API is used when the size of the data is unknown as mentioned in the documentation above.

For your queries for part size and number of parts you might want to do something like:

const partSize = 1024 * 1024 * 5;  //each part of 5mb, except the last

const totalParts = Math.ceil(buffer.length / partSize);

@ajredniwja ajredniwja added the response-requested Waiting on additional info and feedback. Will move to \"closing-soon\" in 7 days. label Oct 15, 2020
@das-sukriti
Copy link
Author

@ajredniwja Question is: How to read the file asynchronously and upload it in parallel in multipart?

How will i get buffer.length if my files are still being read? and how will I define the number of parts if I do not have the entire data read before calling multipart? I do not want to read the full data before starting multipart. it will use a lot of memory. As far as i understand we need totalParts to call UploadPart in a loop. Can you please provide an example where you can use multipart asynchronously with the file reading in progress?

@github-actions github-actions bot removed the response-requested Waiting on additional info and feedback. Will move to \"closing-soon\" in 7 days. label Oct 17, 2020
@ajredniwja
Copy link
Contributor

You dont't actually need to know the size of the object in multipart upload, you can see this implementation in V3 of the SDK: aws/aws-sdk-js-v3#1547

And examples in other languages: https://docs.aws.amazon.com/AmazonS3/latest/dev/uploadobjusingmpu.html

If you are not able to come up with a solution I can co write the code.

@ajredniwja ajredniwja added the response-requested Waiting on additional info and feedback. Will move to \"closing-soon\" in 7 days. label Oct 20, 2020
@das-sukriti
Copy link
Author

das-sukriti commented Oct 27, 2020

Hi, I will not be able to share the codebase with you, since this is a part of a product belonging to my company. As you have mentioned before that you have tried with 5GB data and it worked for you, can you please share the code that you have worked on. Maybe we can proceed with that.. I need to check if I am doing anything different from you.

Also, here you have mentioned that we can try low level API. Are you saying that if we use low level APIs like CreateMultipartUpload , we do not need to know the size of the object? Do you have a real time code example in JavaScript that I can follow (not documentation)?

Sorry, forgot to mention. I have tried the code snippet you provided. Same error came.

Hey @das-sukriti, thank-you for opening this issue, I was not able to reproduce this, I used 5GB file, I believe whats happening is one of the parts is not getting uploaded which gives out the error. This might because of network issues, in the low level API a multipart upload is initiated for larger file when S3.upload() is used.

The multiPart errors are usually handled automatically but for the situation:

One option is to use the low level api and retry a part when it is not uploaded properly manually.

You can find the information about the APIs here.

Try increasing the maxRetries param, https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3.html#upload-property

You can also try to run this a little code snippet which can give you more visibility.


const AWS = require('aws-sdk');
const fs = require('fs');

const s3 = new AWS.S3({
    httpOptions: {
        connectTimeout: 5000,
        timeout: 120000
    },
    maxRetries: 10,
    region: 'us-west-2',
    logger: console,
    retryDelayOptions: {
        customBackoff: (retryCount, err) => {
            console.log(`retry: ${retryCount} :: ${err}`);
        }
    }
});

// var options = {partSize: 10 * 1024 * 1024, queueSize: 1};
s3.upload({
    Bucket: 'bucket',
    Key: 'key ',
    Body: fs.createReadStream('xyz'),
}, (err, data) => {
    if (err) console.log(err, err.stack);
}).on('httpUploadProgress', function (progress) {
    console.log(progress);
});

@github-actions github-actions bot removed the response-requested Waiting on additional info and feedback. Will move to \"closing-soon\" in 7 days. label Oct 28, 2020
@ajredniwja
Copy link
Contributor

You can also try to run this a little code snippet which can give you more visibility.


const AWS = require('aws-sdk');
const fs = require('fs');

const s3 = new AWS.S3({
    httpOptions: {
        connectTimeout: 5000,
        timeout: 120000
    },
    maxRetries: 10,
    region: 'us-west-2',
    logger: console,
    retryDelayOptions: {
        customBackoff: (retryCount, err) => {
            console.log(`retry: ${retryCount} :: ${err}`);
        }
    }
});

// var options = {partSize: 10 * 1024 * 1024, queueSize: 1};
s3.upload({
    Bucket: 'bucket',
    Key: 'key ',
    Body: fs.createReadStream('xyz'),
}, (err, data) => {
    if (err) console.log(err, err.stack);
}).on('httpUploadProgress', function (progress) {
    console.log(progress);
});

Apologies for the late reply, I used the same code snippet, can you share what printed on the console from the code snippet?
The problem might be just the part not being ready to upload, when using the lower level API you just need to wait for your chunk size to be loaded and upload that. I can work on the example but it will be pretty similar from what I shared how it is done in V3

@ajredniwja ajredniwja added the response-requested Waiting on additional info and feedback. Will move to \"closing-soon\" in 7 days. label Nov 9, 2020
@github-actions
Copy link

This issue has not received a response in 1 week. If you still think there is a problem, please leave a comment to avoid the issue from automatically closing.

@github-actions github-actions bot added closing-soon This issue will automatically close in 4 days unless further comments are made. closed-for-staleness and removed closing-soon This issue will automatically close in 4 days unless further comments are made. labels Nov 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
closed-for-staleness guidance Question that needs advice or information. response-requested Waiting on additional info and feedback. Will move to \"closing-soon\" in 7 days.
Projects
None yet
Development

No branches or pull requests

2 participants