"Large" file uploads results in: MongoServerError: Transaction with { txnNumber: 31 } has been aborted. #5148

zazz-ops · 2024-02-22T06:36:23Z

Link to reproduction

No response

Describe the Bug

I've successfully created, configured, and tested a media collection that has no problem uploading images, resizing them, and forwarding them to my S3 bucket using the @payload/plugin-cloud-storage package. There's no issue with how the collection functions, unless I try to upload a video with a filesize of ~180MB.

The nova function is just a thin sugar-wrapper on my fetch request.

The request hangs for about 3 minutes and then I see the following in the Payload console logs:

And the very helpful error message in the response from the request:

This is obviously some sort of MongoDB timeout, but I'm confused as to why a MongoDB timeout would have any impact on a file upload at all. Is there a MongoDB transaction that starts at the onset of the file upload, and then it doesn't finish until the file upload is done?

Nonetheless, I tried to increase the upload.limits.filesize value in my payload.config.ts file, but the issue persists.

I'm looking to use Payload to handle uploads of up to 5GB. Is this possible with Payload?

To Reproduce

Create a media collection and attempt to upload a large file via the REST API interface.

Payload Version

2.8.1

Adapters and Plugins

No response

The text was updated successfully, but these errors were encountered:

BrianJM · 2024-02-22T20:18:36Z

This appears related to #4350

BrianJM · 2024-02-22T20:21:16Z

@zazz-ops Can you try disabling transactions in mongodb?

transactionOptions: false

https://payloadcms.com/docs/database/mongodb#options

You are connected to a replica set, correct?

zazz-ops · 2024-02-23T03:14:37Z

@BrianJM Disabled transactions and still getting the same error:

Interestingly enough, the file actually does upload successfully to my S3 bucket:

However, I'm still getting the mongo error and there is no doc for the uploaded file in the DB.

zazz-ops · 2024-02-23T03:20:32Z

@BrianJM And yes, connecting to a replica set. I'm just using the free-tier MongoDB Atlas M0 deployment.

BrianJM · 2024-02-23T03:58:34Z

@zazz-ops I started getting transaction errors after moving from a local instance to an Atlas (M0) replica set as well.

It's interesting that disabling transactions still produces a transaction error; in my case, it does resolve the issue but I'm not doing large uploads.

~~Do you see a pattern with the timing of transaction errors? I realize this may depend on network conditions, but does it occur after 60 seconds (as an example) from the start of the upload?~~ I see you noted approximately 3 minutes.

zazz-ops · 2024-02-23T05:27:02Z

@BrianJM Yes, the error typically fires after ~3 minutes. But the interesting thing is that it always fires (immediately?) following the completion of the file upload. The video file I'm using for testing (~180Mb) takes ~3 minutes to upload on average.

So the timing of the error seems to be directly related to the duration of the upload.

zazz-ops · 2024-02-23T05:37:25Z

@BrianJM I just did some more testing with a 21Mb file and an 80Mb file.

The 21Mb file uploaded fine and produced no errors. Payload performed as expected.

The 80Mb file produced the same errors as the original 180Mb file. The file itself landed fine in my S3 Bucket, but the MongoDB error was thrown, and there's no correlating doc in my media collection.

zazz-ops · 2024-02-23T05:40:54Z

How do we figure out what the partSize is that Payload is using for its S3 uploads? Seems to me that it might be something like 50Mb, and so the 21Mb file is fine, but the 80Mb file is not because it needs to move on to the next part of the upload, but that (for some unknown reason) ends up throwing this MongoDB error.

BrianJM · 2024-02-23T12:01:58Z

How do we figure out what the partSize is that Payload is using for its S3 uploads? Seems to me that it might be something like 50Mb, and so the 21Mb file is fine, but the 80Mb file is not because it needs to move on to the next part of the upload, but that (for some unknown reason) ends up throwing this MongoDB error.

@zazz-ops You beat me to it. That's part of the problem, I think. The partSize is 50 MB and the queueSize is 4. Each part must finish within 2 minutes, or the upload will timeout.

What's your upload speed? Are you hosting local or in a data center? You will need at least 15Mbps upload to finish a 200MB upload in 2 minutes.

The partSize definitely needs to be reduced with a queueSize of 4.

https://github.com/payloadcms/payload/blob/main/packages%2Fplugin-cloud-storage%2Fsrc%2Fadapters%2Fs3%2FhandleUpload.ts

DanRibbens · 2024-02-23T17:02:04Z

Could it be the transactionLifetimeLimitSeconds?

The transaction will initialize at the start of the request, but it won't commit until after the file completes. I think this is the underlying issue and something we need to consider.

The upload code is written this way, because if you were trying to upload a file and it fails, we wouldn't want it to have a collection document. It sounds like we need to rethink this approach and introduce an intermediate db commit with a delete cleanup for failing uploads.

BrianJM · 2024-02-24T02:26:25Z

I thought it may be related to this, depending on the upload speed.

I am testing large uploads with the recent fixes on main, and trying to replicate in 2.11.1. I think you're right about transactionLifetimeLimitSeconds - I'll test that as well.

BrianJM · 2024-02-24T05:49:59Z

@DanRibbens I can reproduce an error with large uploads, with and without transactions.

I think you're right about re-thinking the approach regarding transactions - that will also be necessary - but there is a deeper issue here with S3 timeouts.

Testing

With transactionOptions: false, the following errors occur (one per upload).

Error: You are not allowed to perform this action.
or
[22:28:05] ERROR (payload): Error: No files were uploaded.

I'm not sure why zazz-ops received a different result, but transaction errors should not occur if disabled.
With transactions enabled, the error occurs as reported by @zazz-ops.

Reproduction

Here is how to reproduce the issue, with or without transactions:

In the Network tab of dev tools, create a 5 Mbps profile.
Upload a file > 200 MB.
Wait 5 minutes and review the request in the Network tab.

Note: 6 Mbps and 200 MB will not cause an error. The upload will succeed as shown below.

Resolution?

I believe resolution is to allow the S3 timeout to be configured.

Allowing the multipartUpload/partSize and queueSize (file) may also be useful. The maximum memory consumption is queueSize * partSize, so it may be better to have smaller parts spread across more queue slots in some environments (e.g., 4 * 50 MB = 200 MB vs 10 * 5 MB = 50 MB)

Changing the multipartUpload to 5 MB (from 50 MB) does not resolve the issue; I still receive an error ([00:37:40] ERROR (payload): Error: No files were uploaded.).

I did not get as far as testing extending or removing the timeout.

Update: fetch appears to timeout after 5 minutes (waiting for response headers). This explains longer timeouts without transactions.

zazz-ops · 2024-02-24T07:12:08Z

@BrianJM My upload speed is slow, typically it's about 5Mb. So that's likely contributing to the failure of the upload, but it shouldn't matter. The whole point of multipart uploading is resiliency at any upload speed. So Payload absolutely needs to tackle this. It's not optional for a CMS — even one that's so much more than just a CMS.

@DanRibbens I suspected that Payload was starting a DB op upon the init of the upload and then waiting for the upload to complete. And it seems you've already figured out that this is not a tenable strategy for an uploader. A 5Gb file could take hours (or even days) to upload depending on the user's upload speed, so keeping a DB op running for that long is ... you already know.

So ... fun fact: I've built the uploader I need 5 times already for previous projects. So I'll likely reach for the latest bespoke version I wrote and see if I can make Payload work with it by using a media collection that does not have uploads enabled.

Here's a quick sketch of how I've implemented a highly-resilient "large" file uploader before:

The user performs some action to initiate an upload
Before the upload actually begins the client (UI, CLI, node.js, etc...) sends a CreateMedia request to the API with the file details, but not the actual file data.
The API endpoint creates a new media document in the db with a status: 'uploading' property. And this is also the step where the API may use AWS STS (or equivalent) to generate temporary creds and pass them back to the client as an "upload token", thus scoping the user's write privs to that bucket down to only the file they're uploading.
Now the file upload begins, but there's no hop to the server running Payload, it's direct to S3 using Upload from the @aws-sdk/lib-storage package.
Upload uploads the file to S3 and provides progress events. Progress percentage is reported to the user.
When Upload is finished uploading the file, or if it fails to upload the file, it's the client that receives those event notifications and then fires against the API to "finalize" the upload by updating the doc in the media collection with a status: 'complete' or status: 'failed' update.
A CRON runs periodically to clean up any media docs that have a status: 'failed' property.

This is obviously a high-level sketch, but it's the gist of what I've implemented in the past for the bespoke CMS or DAM systems I've built.

Payload handles CRUD, ACL, Hooks, Globals, and Plugins better than what I've built, so for this new project I'm working on I'm currently evaluating whether I should modify the codebase I've already written to suit the project, or if I should reach for Payload and leverage all of its amazing capabilities.

I was hoping Payload's DAM functionality would Just Work™ as I need it to ... not the case it seems.

BrianJM · 2024-02-24T07:34:12Z

@zazz-ops I don't think this can be resolved without using WebSockets.

Network request timeouts (waiting for server response headers) are defined by the browser. Chrome is 300 seconds. Firefox is 90 seconds.

With the current implementation, the browser imposes an upper limit.

zazz-ops · 2024-02-24T08:14:33Z

@BrianJM I'm not entirely sure what you mean by "this can be resolved", but the uploader I described in my previous comment very much works. It's in production on 5 systems I've built and has been for years. And it handles "large" file uploads (typically videos) every day.

You're mentioning WebSockets, so I'm assuming that you mean a socket connection between the UI and the Payload API. But that's not how the uploader I've described works. It bypasses Payload entirely and uploads directly to S3 from the browser, or CLI, or any runtime.

This requires that Payload create a client that leverages the @aws-sdk/lib-storage Upload library. And a client would need to be created for many contexts: browser (JS), CLI (JS, TS, GO, PYTHON, RUST, etc...), runtime (JS, TS, GO, PYTHON, RUST, etc...).

In this case Payload API gives up all functionality/responsibility for uploads. It's the client that co-ordinates the steps. My Payload media collection is just any old data collection. All upload and resizing or transcoding/transformatting functionality is lost and alternatives (like AWS Elemental Media Convert and Blitline) need to be leveraged.

As I mentioned in my last comment, the uploader I sketched is very much a sketch. When you factor in digital asset access/security, transcoding/transformatting, CDN cacheing/security, licence-based-rights access, and more ... it gets pretty hairy pretty fast.

At this point, from Payload's perspective, it's very safe for you to assume you're nowhere near where you need to be with uploads and DAM. You actually have a long way to go to catch up to Wordpress.

I'm going to work on implementing my uploader as a separate entity in my system that simply fires against the Payload API and keep evaluating Payload only because the core functionality is so elegant.

But I can tell you with certainty right now, the current Upload and DAM functionality of Payload is essentially useless for anything other than a brochure site. Maybe even not that.

BrianJM · 2024-02-24T13:54:59Z

@BrianJM I'm not entirely sure what you mean by "this can be resolved", but the uploader I described in my previous comment very much works. It's in production on 5 systems I've built and has been for years. And it handles "large" file uploads (typically videos) every day.

@zazz-ops Which uploader is this and what type of connection is established?

You're mentioning WebSockets, so I'm assuming that you mean a socket connection between the UI and the Payload API.

That's one way, but I assumed direct uploads from the browser to S3 uses a socket connection as well. I believe this is generally how large uploads work.

But that's not how the uploader I've described works. It bypasses Payload entirely and uploads directly to S3 from the browser, or CLI, or any runtime.

This requires that Payload create a client that leverages the @aws-sdk/lib-storage Upload library. And a client would need to be created for many contexts: browser (JS), CLI (JS, TS, GO, PYTHON, RUST, etc...), runtime (JS, TS, GO, PYTHON, RUST, etc...).

I understand the concepts.

What I do not know is the method the browser uses to upload (with lib-storage). Do you? Is it a socket connection?

Can you can share a repo that implements direct uploads from the browser using the library?

At this point, from Payload's perspective, it's very safe for you to assume you're nowhere near where you need to be with uploads and DAM. You actually have a long way to go to catch up to Wordpress.

WordPress media uploads are also bound to server timeouts. Have you been using the WordPress media library to upload and host 5 GB files?

But I can tell you with certainty right now, the current Upload and DAM functionality of Payload is essentially useless for anything other than a brochure site. Maybe even not that.

That's an interesting opinion. You are welcome contribute.

zazz-ops · 2024-02-25T03:31:58Z

@BrianJM

@zazz-ops Which uploader is this and what type of connection is established?
https://github.com/aws/aws-sdk-js-v3/tree/main/lib/lib-storage

That's one way, but I assumed direct uploads from the browser to S3 uses a socket connection as well. I believe this is generally how large uploads work.

I'm not really aware of any uploaders that do so via a socket connection. The AWS lib-storage Upload class linked above uses an XHR PUT request so far as I can tell:

What I do not know is the method the browser uses to upload (with lib-storage). Do you? Is it a socket connection?

See above.

Can you can share a repo that implements direct uploads from the browser using the library?

I can't share any of the ones I've built as they are proprietary, but the Upload class of the lib-storage package is an example itself. The uploaders I built are all have lib-storage as a hard dependency. It (obviously) handles the actual transfer of bytes to S3, and then everything else I wrote is all the "meta logic" that goes with an upload like filename cleansing, status tracking, transcoding/transformatting, etc.

WordPress media uploads are also bound to server timeouts. Have you been using the WordPress media library to upload and host 5 GB files?

No, the largest files I've had to push up to Wordpress capped-out at about 500Mb, but it's not outside reason to think that WP could handle 5Gb with some tweaking of the PHP and Nginx or Apache config. But generally WP offers DAM functionality that is far superior to where Payload is at right now. The point being: even WP is doing a better job at this than Payload, perhaps it should be an area of special focus to get this sorted out given that DAM is a critical part of a CMS.

BrianJM · 2024-02-25T05:39:41Z

@zazz-ops I believe the AWS SDK uses a socket connection for multipart uploads.

I agree that it is logical for S3 uploads to bypass Payload / Node. I planned to develop a component or plugin to do this (to save egress fees), but this limitation may result in core logic changes (so I won't need to do that).

The WP Media library is subject to the same limitation you're seeing today in Payload. The maximum upload is bound by the sever configurations and the browser timeout.

Do you have open discussions or issues regarding the areas you feel are lacking in DAM?

denolfe · 2024-04-02T20:24:46Z

Hey @zazz-ops , it sounds like you have a pretty good idea of the problem you're facing. We'd be open to a PR that would handle these scenarios if you feel up to it. Let me know.

The alternative is to create a custom adapter that has your desired functionality and pass that in. Depending on your needs, this might be a viable option.

janus-reith · 2024-04-03T22:22:37Z

I second the suggestion of uploading directly to S3 and have built file uploads in a similar way before aswell.

When initiating a file upload, the API would call S3 to create a presigned url for the upload and return that to the client who would then do a mutipart request directly to that. Neither a request to payload nor a DB should need to stay open during that time and rather as @zazz-ops pointed out, a separate call in the end should mark the upload as completed.

Besides handling of large file sizes and avoiding a roundtrip to the API, another main factor would be to keep the Payload part of this stateless for serverless or otherwise distributed environments.

I think this pattern of requesting a file upload and returning an url to upload to should be the default implementation for file uploads and not specific for S3, even for basic storage options where the same host that receives the request for uploads also handles the actual upload, so we have said flexibility baked into the Admin UI and API.

janus-reith · 2024-04-03T22:40:06Z

If I’m not missing something, just to make sure we‘re on the same page, the client/browser facing part shouldn’t have any requirement for an AWS specific or otherwise lib to handle the upload and usually would not need to know about the implementation details of the storage provider, although potentially there might be other storage backends where that could matter.

DanRibbens · 2024-04-04T00:07:50Z

Creating a pre signed upload URL could be built on top of our current uploads implementation under a new feature flag or perhaps as an alternate adapter in the plugin. I see this as a valid request. It would be of great help if this was built by the community.

If the upload request could be redirected from payload to the pre signed address then a feature flag wouldn't be necessary, it would just happen automatically from the browser to the s3 URL. I would have to do more digging on this pattern to know the details on what's possible here.

janus-reith · 2024-04-04T07:24:21Z

It would be of great help if this was built by the community.

Happy to help with this, just need to find the time to get more familiar with the Payload plugin structure but that's on my ToDo anyways :)

If the upload request could be redirected from payload to the pre signed address then a feature flag wouldn't be necessary, it would just happen automatically from the browser to the s3 URL.

Could you elaborate further on that? I'm not sure yet I got under which circumstances a feature flag would be required and when not. But yes, the implementation I'd intend to build would let the API call S3 to create a presigned url and return that to the browser, which would upload the files directly to that. Depending on wether the existing behabviour should also be kept intact, a feature flag might still make sense. However, I guess that will all be more clear once we have a PR ready and a decision could be made at that point.

58bits · 2024-06-23T04:53:46Z

A big +1 for the ability to use AWS S3 pre-signed upload URLs - bypassing the API and server for the actual upload. We have a fast connection to our server, and a reverse proxy behind Nginx. Large uploads are working for us at the moment (with a serverless Atlas MongoDB). But... if we wanted to deploy to Vercel we would hit their hard upload size limit of about 4MB (IIRC). In either case, a direct client upload via a signed URL would be more efficient (with API/DB calls only responsible for update the record/document).

We'd like to deploy to Vercel, but keep our CDN on AWS - and so AWS S3 pre-signed URLs are pretty high on our list of 'would be VERY nice to have'.

thijssmudde · 2024-07-03T08:09:32Z

I get a similar issue when uploading videos of around 70 mb in the video collection using the cloudStorage icw Azure Blob Storage. Videos less than 15 mb are working fine.

[09:39:21] ERROR (payload): There was an error while uploading files corresponding to the collection videos with filename Roma_2024-(1080p).mp4:
[09:39:21] ERROR (payload): Cannot read properties of undefined (reading 'timeout')
    err: {
      "type": "TypeError",
      "message": "Cannot read properties of undefined (reading 'timeout')",
      "stack":
          TypeError: Cannot read properties of undefined (reading 'timeout')
              at Object.handleUpload (/node_modules/@payloadcms/plugin-cloud-storage/src/adapters/azure/handleUpload.ts:38:36)
              at map (/node_modules/@payloadcms/plugin-cloud-storage/src/hooks/beforeChange.ts:48:25)
              at Array.map (<anonymous>)
              at hook (/node_modules/@payloadcms/plugin-cloud-storage/src/hooks/beforeChange.ts:47:32)
              at /node_modules/payload/src/collections/operations/create.ts:185:16
              at create (/node_modules/payload/src/collections/operations/create.ts:181:5)
              at createHandler (/node_modules/payload/src/collections/requestHandlers/create.ts:26:17)
    }

roxxel · 2024-07-29T12:59:53Z

A big +1 for the ability to use AWS S3 pre-signed upload URLs - bypassing the API and server for the actual upload. We have a fast connection to our server, and a reverse proxy behind Nginx. Large uploads are working for us at the moment (with a serverless Atlas MongoDB). But... if we wanted to deploy to Vercel we would hit their hard upload size limit of about 4MB (IIRC). In either case, a direct client upload via a signed URL would be more efficient (with API/DB calls only responsible for update the record/document).

We'd like to deploy to Vercel, but keep our CDN on AWS - and so AWS S3 pre-signed URLs are pretty high on our list of 'would be VERY nice to have'.

+1, I had to make ugly monstrosity with custom component just to upload large files using multipart upload with presigned url to S3
Also it'd be great to define custom upload logic using client-side hook.

github-actions · 2024-12-06T15:20:16Z

This issue was automatically closed due to lack of activity.
If this issue is still relevant against the latest codebase, please create a new issue.

zazz-ops added the status: needs-triage Possible bug which hasn't been reproduced yet label Feb 22, 2024

denolfe self-assigned this Apr 2, 2024

denolfe added the status: awaiting-reply label May 29, 2024

github-actions bot removed the status: needs-triage Possible bug which hasn't been reproduced yet label May 29, 2024

denolfe added v2 stale labels Dec 3, 2024

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Dec 6, 2024

denolfe removed the stale label Dec 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"Large" file uploads results in: MongoServerError: Transaction with { txnNumber: 31 } has been aborted. #5148

"Large" file uploads results in: MongoServerError: Transaction with { txnNumber: 31 } has been aborted. #5148

zazz-ops commented Feb 22, 2024

BrianJM commented Feb 22, 2024

BrianJM commented Feb 22, 2024 •

edited

Loading

zazz-ops commented Feb 23, 2024

zazz-ops commented Feb 23, 2024

BrianJM commented Feb 23, 2024 •

edited

Loading

zazz-ops commented Feb 23, 2024

zazz-ops commented Feb 23, 2024

zazz-ops commented Feb 23, 2024 •

edited

Loading

BrianJM commented Feb 23, 2024 •

edited

Loading

DanRibbens commented Feb 23, 2024

BrianJM commented Feb 24, 2024

BrianJM commented Feb 24, 2024 •

edited

Loading

zazz-ops commented Feb 24, 2024

BrianJM commented Feb 24, 2024

zazz-ops commented Feb 24, 2024

BrianJM commented Feb 24, 2024 •

edited

Loading

zazz-ops commented Feb 25, 2024 •

edited

Loading

BrianJM commented Feb 25, 2024 •

edited

Loading

denolfe commented Apr 2, 2024

janus-reith commented Apr 3, 2024

janus-reith commented Apr 3, 2024

DanRibbens commented Apr 4, 2024

janus-reith commented Apr 4, 2024

58bits commented Jun 23, 2024

thijssmudde commented Jul 3, 2024 •

edited

Loading

roxxel commented Jul 29, 2024

github-actions bot commented Dec 6, 2024

"Large" file uploads results in: MongoServerError: Transaction with { txnNumber: 31 } has been aborted. #5148

"Large" file uploads results in: MongoServerError: Transaction with { txnNumber: 31 } has been aborted. #5148

Comments

zazz-ops commented Feb 22, 2024

Link to reproduction

Describe the Bug

To Reproduce

Payload Version

Adapters and Plugins

BrianJM commented Feb 22, 2024

BrianJM commented Feb 22, 2024 • edited Loading

zazz-ops commented Feb 23, 2024

zazz-ops commented Feb 23, 2024

BrianJM commented Feb 23, 2024 • edited Loading

zazz-ops commented Feb 23, 2024

zazz-ops commented Feb 23, 2024

zazz-ops commented Feb 23, 2024 • edited Loading

BrianJM commented Feb 23, 2024 • edited Loading

DanRibbens commented Feb 23, 2024

BrianJM commented Feb 24, 2024

BrianJM commented Feb 24, 2024 • edited Loading

Testing

Reproduction

Resolution?

zazz-ops commented Feb 24, 2024

BrianJM commented Feb 24, 2024

zazz-ops commented Feb 24, 2024

BrianJM commented Feb 24, 2024 • edited Loading

zazz-ops commented Feb 25, 2024 • edited Loading

BrianJM commented Feb 25, 2024 • edited Loading

denolfe commented Apr 2, 2024

janus-reith commented Apr 3, 2024

janus-reith commented Apr 3, 2024

DanRibbens commented Apr 4, 2024

janus-reith commented Apr 4, 2024

58bits commented Jun 23, 2024

thijssmudde commented Jul 3, 2024 • edited Loading

roxxel commented Jul 29, 2024

github-actions bot commented Dec 6, 2024

BrianJM commented Feb 22, 2024 •

edited

Loading

BrianJM commented Feb 23, 2024 •

edited

Loading

zazz-ops commented Feb 23, 2024 •

edited

Loading

BrianJM commented Feb 23, 2024 •

edited

Loading

BrianJM commented Feb 24, 2024 •

edited

Loading

BrianJM commented Feb 24, 2024 •

edited

Loading

zazz-ops commented Feb 25, 2024 •

edited

Loading

BrianJM commented Feb 25, 2024 •

edited

Loading

thijssmudde commented Jul 3, 2024 •

edited

Loading