Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve support for partially denormalized subdocs, including populating denormalized subdocs #10856

Closed
adamreisnz opened this issue Oct 8, 2021 · 11 comments
Labels
priority Automatically set for Mongoose Pro subscribers
Milestone

Comments

@adamreisnz
Copy link
Contributor

adamreisnz commented Oct 8, 2021

Do you want to request a feature or report a bug?
Request a feature

What is the current behavior?
When a schema is set as strict, it is not possible to append different fields to the model OR save those fields in the database.
When a schema is set as not strict, it is possible to append different fields AND save those fields in the database

What is the proposed behavior?
It would be nice if these two features could be split into two distinct features/flags. I have a use case where I would like to append/set additional properties to a model, and have this returned to the client when using toJSON, but I don't want these fields to be saved to the database.

Currently, this doesn't seem possible unless I disable strict for the schema altogether, and create a manual pre-save hook to clean up data that I don't want when saving. Needless to say this adds overhead and is prone to bugs if new fields are added.

Use case
The main use case is being able to populate denormalised data easily, which currently is not possible if using strict schemas.

For example:

//Bar model, has a name property and some other properties that we are interested in
const BarSchema = new Schema({
  name: String,
  more: String,
  another: Number,
})
mongoose.model('Bar', BarSchema)

//Denormalised Bar schema with just the name, for use on the Foo model
const BarNameSchema = new Schema({
  _id: {
    type: Schema.Types.ObjectId,
    ref: 'Bar',
  },
  name: String,
})

//Foo model, which contains denormalized bar data (just the name
const FooSchema = new Schema({
  something: String,
  other: Number,
  bar: BarNameSchema,
})
mongoose.model('Foo', FooSchema)

//Now for some routes I want to be able to load the full `Bar` model, and set it on the `bar` property of a loaded `foo` document
const _id = ...
const foo = mongoose.model('Foo').findOne({_id})
const bar = mongoose.model('Bar').findOne({foo.bar._id})

//This won't work, mongoose will drop the additional properties because the schema is set to `strict`
foo.bar = bar 

//I could make the schema not-strict, but then saving foo will save redundant properties in the database
await foo.save() //don't want the full bar data to be saved on the model

So ideally, these two behaviours of strict could be separated somehow into a separate flag/setting. However, I am also open to other suggestions that would solve our problem.

To be honest this has been a pet peeve of mine for a few years of working with Mongoose now. It's so easy to populate data, and it's so easy to make and manage denormalized data in your database, but it seems the combination of the two still causes unnecessary friction.

Maybe there's a use case for defining a new schema type, which is a "Denormalized" type, which can refer to another model and which will store a select number of fields alongside the parent document, but which can also easily be populated with additional fields for output as JSON/object, while not saving those fields in the database. 🤔

@adamreisnz
Copy link
Contributor Author

adamreisnz commented Oct 8, 2021

Current workaround I've found:

FooSchema
  .virtual('$bar', {
    ref: 'Bar',
    localField: 'bar._id',
    foreignField: '_id',
    justOne: true,
  })

FooSchema.options.toJSON = {
  transform(doc, ret) {
    if (ret.$bar) {
      ret.bar = ret.$bar
      delete ret.$bar
    }
  },
}

But this feels kind of hacky

@vkarpov15 vkarpov15 added this to the 6.0.11 milestone Oct 11, 2021
@IslandRhythms IslandRhythms added the new feature This change adds new functionality, like a new method or class label Oct 11, 2021
@vkarpov15
Copy link
Collaborator

I think this comes down to two separate issues:

  1. How to add fields that end up in toJSON() output, but not in the database. That's exactly what virtuals are for. You just need to make sure virtuals end up in your toJSON output by adding toJSON: { virtuals: true } to your output. Below is an example:
const mongoose = require('mongoose')
const { Schema } = mongoose

//Bar model, has a name property and some other properties that we are interested in
const BarSchema = new Schema({
  name: String,
  more: String,
  another: Number,
})
mongoose.model('Bar', BarSchema)

//Denormalised Bar schema with just the name, for use on the Foo model
const BarNameSchema = new Schema({
  _id: {
    type: Schema.Types.ObjectId,
    ref: 'Bar',
  },
  name: String,
})

BarNameSchema.virtual('more').get(function() { return this.$locals.more }).set(function(v) { this.$locals.more = v })
BarNameSchema.virtual('another').get(function() { return this.$locals.another }).set(function(v) { this.$locals.another = v })

//Foo model, which contains denormalized bar data (just the name
const FooSchema = new Schema({
  something: String,
  bar: BarNameSchema,
})
mongoose.model('Foo', FooSchema)

;(async function() {
  await mongoose.connect('mongodb://localhost:27017/test')
  const barId = new mongoose.Types.ObjectId()
  await mongoose.model('Bar').create({ _id: barId, name: 'test', more: 'foo', another: 42 })
  const { _id } = await mongoose.model('Foo').create({ something: 'test', bar: { _id: barId, name: 'test' } })

  const foo = await mongoose.model('Foo').findOne({_id})
  const bar = await mongoose.model('Bar').findOne({_id: foo.bar._id})

  foo.bar = bar
  // includes `more` and `another`
  console.log(foo.bar.toObject({ virtuals: true }))
})()
  1. How to populate a partially denormalized doc. This is a more complex problem, above is a workaround, but it could admittedly be much easier. Doing something like Temporary (virtual) properties on model instance #2642 (see Temporary (virtual) properties on model instance #2642 (comment)) would make it much cleaner, but I think better support for denormalized subdocs is a bigger feature.

What do you think @adamreisnz ?

@vkarpov15 vkarpov15 changed the title Separate "strict" behaviour for setting properties not defined on the schema vs saving those properties in database Improve support for partially denormalized subdocs, including populating denormalized subdocs Oct 14, 2021
@vkarpov15 vkarpov15 modified the milestones: 6.0.11, 6.2.0 Oct 14, 2021
@adamreisnz
Copy link
Contributor Author

adamreisnz commented Oct 19, 2021

Hi @vkarpov15 thanks for looking into this.

  1. Yes, I am aware of virtual of course and we use them extensively. These fields don't show up in the database and that is great. However, I am not sure if it is a suitable solution for our use case, as our original model schema is quite elaborate with many fields, and the denormalized schema is just a small subset of that. To have to define all the fields from the base model on the denormalized schema as virtuals just to tackle this problem seems like too much effort and hard to maintain.

  2. Yes, agreed. The workaround I've found is the best I can come up with for now, but regrettably it means a new virtual property has to be defined, which we don't really want to see/use, and therefore have to clean up in the toJSON/toObject functions. Another problem of using this virtual is that we now have two properties which kind of point to the same object, bar and $bar, and for programmers it will be not as clear to figure out which one to use. But it's necessary, because one will contain the full populated object ($bar) and the other is merely a sub schema of some denormalised paths.

Ideally, it'd be great if we can work towards better support for denormalised subdocs.

The way I would imagine it working is that you could simply populate/inflate the denormalized sub document with additional properties from the original doc, without having to define virtuals or use differently named helper properties.

For example, if we have a simple document like this, which has some denormalized data:

const doc = {
  _id: 1,
  name: 'Test',
  denormalized: {
    _id: 2,
    foo: 'Foo'
  }
}

It'd be great if we can utilise population to "fill this up" like so:

await mongoose.model('Something').populate(doc, {
  path: 'denormalized',
  select: 'foo bar baz',
})

Which would result in:

const doc = {
  _id: 1,
  name: 'Test',
  denormalized: {
    _id: 2,
    foo: 'Foo',
    bar: 'Bar',
    baz: 'Baz,
  }
}

Even though bar and baz are not defined on the denormalized schema, they are not dropped, because Mongoose would recognise these as valid properties of the parent model.

So the additional populated properties are appended, and basically the denormalized sub doc schema is replaced with a copy of the full document. I imagine this might be the simplest implementation, rather than trying to append the additional properties onto the schema in real time.

Looking at it another way, essentially I am asking for a way to populate something which is already an object (of denormalized data) rather than just an ObjectId.

Does that make sense?

@vkarpov15
Copy link
Collaborator

This does make sense. I'm thinking the syntax will look like this:

const FooSchema = new Schema({
  something: String,
  other: Number,
  bar: {
    type: BarNameSchema,
    ref: 'Bar'
  }
})

So if you put a ref on a single nested subdoc, Mongoose will be smart enough to replace bar with a populated doc if you call populate('bar').

The tricky part is what happens if you overwrite certain properties of bar? For example, if bar is populated, what should doc.bar.name = 'new name'; await doc.save(); do? Should it just set doc.bar.name, should it update both doc.bar.name and the populated doc's name?

@adamreisnz
Copy link
Contributor Author

I like that syntax. Yes I figured there'd be some edge cases to figure out.

I am inclined to say that an update should not update the original model, only the denormalized schema (and only the properties that are on that schema). After all, you're calling .save() on the doc with the denormalized schema, not on the actual parent model.

But happy to get more feedback on that.

@adamreisnz
Copy link
Contributor Author

@vkarpov15 I'm running into the following error while trying out this new feature. Is that something on our end I'm doing wrong?

TypeError: value.validate is not a function
    at node_modules/mongoose/lib/schema/SubdocumentPath.js:257:11
    at SubdocumentPath.SchemaType.doValidate (node_modules/mongoose/lib/schematype.js:1206:12)
    at SubdocumentPath.doValidate (node_modules/mongoose/lib/schema/SubdocumentPath.js:249:35)
    at node_modules/mongoose/lib/document.js:2704:18
    at processTicksAndRejections (node:internal/process/task_queues:78:11)

Unfortunately this is all I get out of the stack trace, so not 100% sure whether this is triggered when saving a new document but I'll try to dig deeper.

@adamreisnz
Copy link
Contributor Author

Does this also work when we're creating a new document as such:

const FooSchema = new Schema({
  something: String,
  other: Number,
  bar: {
    type: BarNameSchema,
    ref: 'Bar'
  }
})

const data = {
  something: 'test',
  other: 2,
  bar: {
    //fully populated bar item here
  }
}

const foo = new Foo(data)

It appears any data not present in the BarNameSchema is stripped when we create a new foo object like this.

Also setting it manually after document creation appears to have no effect, e.g.:

foo.bar = bar //fully populated doc
console.log(foo.bar) //only the data in the BarNameSchema appears

Is this expected?

@adamreisnz
Copy link
Contributor Author

Here is a full test script to see the above 2 cases in action (nothing in here is throwing an error though, so I'm still digging into that):

const mongoose = require('mongoose')
const {Schema} = mongoose
const uri = `mongodb://localhost:27017/test`
console.log(`Mongoose version: ${mongoose.version}`)

//Bar model, has a name property and some other properties that we are interested in
const BarSchema = new Schema({
  name: String,
  more: String,
  another: Number,
})
const Bar = mongoose.model('Bar', BarSchema)

//Denormalised Bar schema with just the name, for use on the Foo model
const BarNameSchema = new Schema({
  _id: {
    type: Schema.Types.ObjectId,
    ref: 'Bar',
  },
  name: String,
})

//Foo model, which contains denormalized bar data (just the name)
const FooSchema = new Schema({
  something: String,
  other: Number,
  bar: {
    type: BarNameSchema,
    ref: 'Bar',
  },
})
const Foo = mongoose.model('Foo', FooSchema)

//Now for some routes I want to be able to load the full `Bar` model, and set it on the `bar` property of a loaded `foo` document
async function test() {

  //Connect to database
  await mongoose.connect(uri)

  const bar = await Bar.create({
    name: 'I am Bar',
    more: 'With more data',
    another: 2,
  })
  const foo = await Foo.create({
    something: 'I am Foo',
    other: 1,
    bar,
  })

  console.log(foo)

  const fooLookup = await Foo
    .findOne({_id: foo._id})
    .populate({
      path: 'bar',
      select: 'name more another',
    })

  console.log(fooLookup)

  const bar2 = await Bar.create({
    name: 'I am another Bar',
    more: 'With even more data',
    another: 3,
  })
  const foo2 = await Foo.create({
    something: 'I am another Foo',
    other: 4,
  })

  foo2.bar = bar2
  console.log(foo2)
}

test()

Output is:

{
  something: 'I am Foo',
  other: 1,
  bar: { _id: new ObjectId("61b8ee8665e7a15864d0d4fa"), name: 'I am Bar' },
  _id: new ObjectId("61b8ee8665e7a15864d0d4fc"),
  __v: 0
}
{
  _id: new ObjectId("61b8ee8665e7a15864d0d4fc"),
  something: 'I am Foo',
  other: 1,
  bar: {
    _id: new ObjectId("61b8ee8665e7a15864d0d4fa"),
    name: 'I am Bar',
    more: 'With more data',
    another: 2
  },
  __v: 0
}
{
  something: 'I am another Foo',
  other: 4,
  _id: new ObjectId("61b8ee8665e7a15864d0d502"),
  __v: 0,
  bar: {
    _id: new ObjectId("61b8ee8665e7a15864d0d500"),
    name: 'I am another Bar'
  }
}

So neither setting via foo.bar = bar or by creating with a fully populated object yields the expected result that the populated object will be present on foo. This means this feature currently only works if populating and reading out of the database. However, it'd be great if we can also at least set a pre-populated item onto a doc and for mongoose to recognise that it should retain the full properties, not just those present on the schema.

Thoughts?

@vkarpov15 vkarpov15 reopened this Dec 20, 2021
@vkarpov15 vkarpov15 modified the milestones: 6.1.0, 6.1.4 Dec 20, 2021
@vkarpov15 vkarpov15 added priority Automatically set for Mongoose Pro subscribers and removed new feature This change adds new functionality, like a new method or class labels Dec 20, 2021
@vkarpov15
Copy link
Collaborator

@adamreisnz I got the foo2.bar = bar2 case working, working on getting the first case const foo = await Foo.create({ bar }) right now.

Re: the value.validate is not a function stack trace, I'll need more info to debug that. All that stack trace tells me is that somehow Mongoose didn't cast a POJO to a subdocument.

vkarpov15 added a commit that referenced this issue Apr 16, 2022
@adamreisnz
Copy link
Contributor Author

@vkarpov15 I think I can confirm that all cases are working now, thanks! 👍🏼

@vkarpov15
Copy link
Collaborator

Thanks @adamreisnz , let me know if you run into any related issues 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority Automatically set for Mongoose Pro subscribers
Projects
None yet
Development

No branches or pull requests

3 participants