-
-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mongoose.syncIndexes() creates indexes even if they exist #12250
Comments
… exist on mongodb server re #12250
refactor(model): allow optionally passing indexes to createIndexes and cleanIndexes
fix(model): prevent index creation on syncIndexes if not necessary
EDIT: the issue was resolved and the comment is irrelevant. Although syncIndexes() triggered a catastrophic issue, the database was already not in the correct and expected state by the time it happened. This bug should be considered critical as we can use our example to confirm it can cause a downtime and a lot of headache. We have an When Meanwhile all queries to the affected data got downgraded to The temporary fix to bring the server back online was to create a non-unique index. And the full solution was to then do a manual collection migration with a custom deduplication script. Mongoose version in that case was |
@Industrialice I believe this is already fixed on 6.8.1 by #12785. Can you come up with a repro script? Did you also try using the latest version (7.4.3)?
Do you think that's because of that bug, or due to a fundamental issue with the design? |
@AbdelrahmanHafez I could try doing more testing on the issue if required, although reproducing the exact catastrophic issue can be problematic as the issue occurred on the live server specifically due to its high load (which was required for the duplicate document to get inserted in the brief time while the unique index was unavailable). But it'd be easy to check for other signs of an undex getting recreated (such as MongoDB index usage stats getting reset). I haven't tried running versions past The design question depends on what kind of goal the method pursues. From our understanding the goal of the method is to drop unused indexes and create indexes defined by schemas only if they don't already exist. Which sounds very safe and thus we just added running syncIndexes() on all collections as part of our maintenance. If it were working as described we wouldn't have ended up with a database entering an invalid state. In that case it's just an implementation issue. Design-wise the "team work" with MongoDB just made the issue so much worse - mongoose recreating the unique index led to a duplicate value getting inserted, and MongoDB refusing to handle such cases by itself (the only native tool Addition: quickly testing it on a local environment with the same database setup and the same |
Okay, I recreated the full picture now of everything that happened and syncIndexes's itself got incorrectly blamed, even if it did trigger the issue. The schema with an index on the field in question already existed, but it was not marked Due to how mongoose works, it saw that the index on the field already exists and just ignored the new That is until syncIndexes() was run. It dropped the existing non-unique index and tried to create a unique index, but by that time there were already a few duplicate values, which prevented that from happening. The only possible improvement I see is mongoose sending a warning that index changes were ignored. syncIndexes() did what it was supposed to do, even if it ended up converting a minor issue into catastrophic. I'd note that if index management was done manually, the first step would have been to create a unique index and drop the non-unique one only after seeing that the unique one was created successfully. Unsure if handling an edge case like this should be expected from a library function though. So sorry for being misleading in the previous messages. The full circumstances ended up being tricky to trace back. I updated my initial comment to state that it is no longer relevant due to containing a wrong assumption (the wrong assumption that syncIndexes() dropped a unique index). |
It sounds like the problem is that 1) you deployed indexes at the same time as your application, and 2) you made a backwards-incompatible index change. We also run Mongoose in production and this is a bit of a sharp edge because it's so easy to define your index along with the rest of your schema, even though index changes really need to be carefully reviewed and deployed separately. |
Not reviewing the index change was definitely a mistake on our end. Unfortunate that instead of hitting any safe nets that mistake ended up spiraling following the Murphy's Law pattern. |
A couple of comments:
|
I've done some testing, it seems like the only way to have no index-downtime is a little bit complicated, let's say you have a schema with a You start with the following initial state const userSchema = new Schema({
customId: String
});
userSchema.index({ customId: 1 });
const User = mongoose.model('User', userSchema);
await User.syncIndexes(); Then if you just add So what you would need to do is add another index with a different name, and wait until it's actually created. userSchema.index({ customId: 1 }); // keep this as is for now
// add the following index, has to have a different name to avoid mongo throwing an error
userSchema.index({ customId: 1 }, { unique: true, name: 'customId_unique_1' });
// later
await User.syncIndexes(); Then you can delete the initial index, and have only the unique index. userSchema.index({ customId: 1 }, { unique: true, name: 'customId_unique_1' });
// later
await User.syncIndexes(); You could leave it at that if you don't mind the custom name for the index. If you want to get rid of the custom name you will need two extra steps: userSchema.index({ customId: 1 }, { unique: true }); // will be built with the default name
userSchema.index({ customId: 1 }, { unique: true, name: 'customId_unique_1' }); // existing in db
// later
await User.syncIndexes(); 2- Get rid of the index with a custom name. userSchema.index({ customId: 1 }, { unique: true });
// later
await User.syncIndexes(); I agree that this is a lengthy flow for applying indexes, but I don't think there's a way around it, because you can not have multiple indexes with the same name in MongoDB, and you can not "update" an index AFAIK. You have to drop and re-create with the new options. So to guarantee you always have some index, you need to apply these steps carefully. In my experience, this is not a problem as long as your collection has <100k documents, once your collection grows to 100k+~millions, index builds take a considerable amount of time. @vkarpov15 Can you think of any way to improve this flow for changing an index from non-unique to unique while maintaining some kind of index at all times and avoid COLLSCAN? |
I'd clarify that the issues have been resolved and there's nothing else on our side that needs to be done. Your method with creating a unique index on top of a non-unique wouldn't have worked in our case as duplicate entries would have appeared while the unique index is being created (due to it being created in the background), meaning it would have always failed. In our case the issue was resolved by creating a temporary collection with a unique index that was used to store new entries, deduplicating the old collection and creating a unique index for it, and then merging the collections together. In our case this method worked without any client-visible downtime, albeit there was some (rather irrelevant in this particular case) data unavailability while the collections were being migrated. We haven't yet switched past And my thoughts on avoiding the The described approach would have completely prevented the issue we experienced and it doesn't seem to have any obvious downsides compared to your current implementation. |
Please re-read my comment again, I explained why we can not do that. a TLDR version is because of index-name conflicts. You can not have two indexes with the same auto-generated name in MongoDB. Although an argument could be made that we can try to create-then-drop and have a special case for indexes with conflicting names where we drop-then-create. We could even have an option @vkarpov15 I'll leave this thread open for the nice discussion, this is a scenario that I suspect might happen with more people, but the OP issue is fixed already. Feel free to close this issue if you feel like there is no action needed regarding the discussion. |
I reread your comment as the first time I didn't realize you mean it as the implication of why you aren't doing it this way inside syncIndexes(). For the names I guess you are trying to avoid violating the default index name expectation? Also explicit names would need special handling. Having to leave the terminal with the command running for hours doesn't sound optimal, that is true. The only operation that seems to be dangerous is when there's both a drop and create index affecting the same field. So the high-level question here is whether it may make sense to handle this one explicitly, or would you consider it too much of an edge case? Other than adding or removing |
Adding Here's a few thoughts on how this issue can be avoided in the future:
'use strict';
const mongoose = require('mongoose');
const { Schema } = mongoose;
const assert = require('node:assert');
run().catch(console.error);
async function run() {
// Arrange
await mongoose.connect('mongodb://localhost:27017/test');
await mongoose.connection.dropDatabase();
const userSchema = new Schema({
name: { type: String }
}, { autoIndex: false });
userSchema.index({ name: 1 }, { unique: true });
const User = mongoose.model('User', userSchema);
await User.create({ name: 'test' });
await User.create({ name: 'test' });
try {
await User.syncIndexes();
} catch (err) {
const keyValue = err.keyValue;
// Clean up duplicates
const conflicting = await User.find(keyValue);
for (const user of conflicting) {
user.name = user.name + '::' + (new mongoose.Types.ObjectId());
await user.save();
}
}
await User.syncIndexes();
console.log('Done');
} |
Your suggested solution to fix the database doesn't work on a live busy database, as new duplicated entries will get created after you searched for conflicts and before you call syncIndexes(). I mentioned that it was the issue in our case, which is why we couldn't have done that. Other suggestions on how to use API to avoid issues sound reasonable, as long as it's reflected by the documentation. |
That's fair, if new duplicated entries are being inserted and you aren't able to patch your code to either disable the code that causes dupes or use |
Prerequisites
Mongoose version
6.5.2
Node.js version
16.x
MongoDB server version
4.x
Description
mongoose.syncIndexes(...) currently sends a
createIndex
command to the database even if an index already exists.It should only create indexes for indexes that do not exist on the MongoDB server.
Steps to Reproduce
12250.js
Output
Expected Behavior
mongoose should only create indexes for indexes that do not exist on the MongoDB server.
The text was updated successfully, but these errors were encountered: