Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mongoose is throwing Unhandled Quiesce mode error that the Mongo driver doesn't throw #14834

Closed
2 tasks done
navkard opened this issue Aug 27, 2024 · 10 comments
Closed
2 tasks done
Labels
help This issue can likely be resolved in GitHub issues. No bug fixes, features, or docs necessary Stale

Comments

@navkard
Copy link

navkard commented Aug 27, 2024

Prerequisites

  • I have written a descriptive issue title
  • I have searched existing issues to ensure the bug has not already been reported

Mongoose version

8.3.2

Node.js version

20.x

MongoDB server version

7.x (Atlas)

Typescript version (if applicable)

5.3.3

Description

We are getting errors whenever our mongo server changes primary server during upgrades.

{
  "errorType":"Runtime.UnhandledPromiseRejection",
  "errorMessage":"MongoServerError: The server is in quiesce mode and will shut down"
}

We’ve tried running the same code under load testing in a staging environment with mongoose and with just the base mongo driver. With just the base mongo driver we see this error one in a million requests. When we use mongoose nearly 10% of those million requests are failing with this error. It looks like this has [been fixed before](#11661), but there is possible a regression issue?

Using Mongo v7 on Atlas

[Node] mongoose version: 8.3.2

[Node] mongo driver version (for testing): 6.6.2

Our system is using Node.js AWS lambdas (without callbacks; we use async/await) for our APIs. We are using all of the default settings except for { autoIndex: false, bufferCommands: false} . We believe after investigating that mongoose is somehow throwing this error outside of areas that we can catch. We wrapped all of the mongoose code inside of a try catch statement and it is still throwing this error as an unhandled exception.

We have an extra layer in our lambdas that creates an express app so we can take advantage of a tool called Fern that does api request and response shape validation. The layer creates an express server and wraps it in a function that should still use the async/await pattern using the serverless-http package. We thought that might be the issue so we added an express middleware to catch errors, but didn’t catch them there either. And, we are seeing these errors on APIs that don’t have that express wrapper.

We were only able to replicate the issue when we started scaling our Atlas mongo server up to M60/M80 and hitting it with large load. Not sure why it needed large load to replicate exactly, but perhaps it is because when there is larger load on the database the quiesce mode lasts longer trending up towards it’s maximum of 15 seconds.

We also ran tests where serverSelectionTimeoutMS was longer and we completely removed the timeout on our lambdas, so they would have extra time to catch up in case it was just really long timing or something and we needed to change some sort of query performance, but no matter how much time we gave it in testing (even long beyond a reasonable time for APIs to be running) it didn’t solve the problem.

We didn’t get this issue when we hit the few apis that we have on AWS containers. Only in Lambdas.

Steps to Reproduce

While load testing an endpoint run a resilience test in MongoDB Atlas. More details of our system above.

Expected Behavior

Mongoose and the mongo driver together should handle Quiesce mode without errors.

@vkarpov15
Copy link
Collaborator

Do you happen to use change streams?

@navkard
Copy link
Author

navkard commented Aug 28, 2024

Thank you for your quick response. Yes we do use change streams.

@navkard
Copy link
Author

navkard commented Aug 30, 2024

@vkarpov15 Just checking in—do you have any updates on what we can do to resolve this issue?

@vkarpov15
Copy link
Collaborator

Another potential option would be to disable the autoCreate option. autoCreate is enabled by default, which means Mongoose will send a createCollection() in the background when you create a new model. Replace { autoIndex: false, bufferCommands: false} with { autoCreate: false, autoIndex: false, bufferCommands: false} to disable autoCreate and see if that helps.

@vkarpov15 vkarpov15 added the help This issue can likely be resolved in GitHub issues. No bug fixes, features, or docs necessary label Sep 10, 2024
@navkard
Copy link
Author

navkard commented Sep 11, 2024

We tried with { autoCreate: false, autoIndex: false, bufferCommands: false} but still having same issue

@vkarpov15
Copy link
Collaborator

Do you have any ideas as to which operation may be causing this? Do you have a stack trace you can share for this error?

@navkard
Copy link
Author

navkard commented Sep 16, 2024

Here is the stack trace
{ "errorType": "Runtime.UnhandledPromiseRejection", "errorMessage": "MongoServerError: The server is in quiesce mode and will shut down", "reason": { "errorType": "MongoServerError", "errorMessage": "The server is in quiesce mode and will shut down", "code": 91, "errorResponse": { "topologyVersion": { "processId": "66e84f96d850bbe7d9c9ed4a", "counter": 10 }, "ok": 0, "errmsg": "The server is in quiesce mode and will shut down", "code": 91, "codeName": "ShutdownInProgress", "remainingQuiesceTimeMillis": 0, "$clusterTime": { "clusterTime": { "$timestamp": "7415273955805626369" }, "signature": { "hash": { "type": "Buffer", "data": [ 78, 126, 111, 204, 93, 201, 64, 183, 21, 78, 13, 109, 164, 142, 186, 44, 146, 177, 32, 172 ] }, "keyId": { "low": 6, "high": 1718312451, "unsigned": false } } }, "operationTime": { "$timestamp": "7415273955805626369" } }, "topologyVersion": { "processId": "66e84f96d850bbe7d9c9ed4a", "counter": 10 }, "ok": 0, "codeName": "ShutdownInProgress", "remainingQuiesceTimeMillis": 0, "$clusterTime": { "clusterTime": { "$timestamp": "7415273955805626369" }, "signature": { "hash": { "type": "Buffer", "data": [ 78, 126, 111, 204, 93, 201, 64, 183, 21, 78, 13, 109, 164, 142, 186, 44, 146, 177, 32, 172 ] }, "keyId": { "low": 6, "high": 1718312451, "unsigned": false } } }, "operationTime": { "$timestamp": "7415273955805626369" }, "stack": [ "MongoServerError: The server is in quiesce mode and will shut down", " at Connection.sendCommand (/var/task/node_modules/mongodb/lib/cmap/connection.js:297:27)", " at process.processTicksAndRejections (node:internal/process/task_queues:95:5)", " at async Connection.command (/var/task/node_modules/mongodb/lib/cmap/connection.js:325:26)" ] }, "promise": {}, "stack": [ "Runtime.UnhandledPromiseRejection: MongoServerError: The server is in quiesce mode and will shut down", " at process. (file:///var/runtime/index.mjs:1276:17)", " at process.emit (node:events:519:28)", " at emitUnhandledRejection (node:internal/process/promises:250:13)", " at throwUnhandledRejectionsMode (node:internal/process/promises:385:19)", " at processPromiseRejections (node:internal/process/promises:470:17)", " at process.processTicksAndRejections (node:internal/process/task_queues:96:32)" ] }

@vkarpov15 vkarpov15 modified the milestones: 8.6.4, 8.6.5 Sep 19, 2024
@vkarpov15 vkarpov15 modified the milestones: 8.7.1, 8.7.2 Oct 8, 2024
@vkarpov15 vkarpov15 modified the milestones: 8.7.2, 8.7.3 Oct 17, 2024
@vkarpov15
Copy link
Collaborator

What is file:///var/runtime/index.mjs:1276:17 in your code? That's the only suspicious thing I see in the stack trace.

You may be able to get more information using the monitorCommands option, which is one way the MongoDB Node driver allows you to handle errors. You'll need to upgrade to Mongoose 8.5 for that though, that feature was added in #14681

await mongoose.connect(uri, { monitorCommands: true });

mongoose.connection.on('commandFailed', data => {
  console.log('Command failed', data);
});

@vkarpov15 vkarpov15 removed this from the 8.7.3 milestone Oct 18, 2024
Copy link

github-actions bot commented Nov 2, 2024

This issue is stale because it has been open 14 days with no activity. Remove stale label or comment or this will be closed in 5 days

@github-actions github-actions bot added the Stale label Nov 2, 2024
Copy link

github-actions bot commented Nov 8, 2024

This issue was closed because it has been inactive for 19 days and has been marked as stale.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Nov 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help This issue can likely be resolved in GitHub issues. No bug fixes, features, or docs necessary Stale
Projects
None yet
Development

No branches or pull requests

2 participants