Skip to content
This repository has been archived by the owner on Mar 1, 2022. It is now read-only.

Mongo JSON utility #2

Closed
4 tasks done
kirrg001 opened this issue Nov 7, 2018 · 36 comments
Closed
4 tasks done

Mongo JSON utility #2

kirrg001 opened this issue Nov 7, 2018 · 36 comments
Assignees
Labels

Comments

@kirrg001
Copy link
Contributor

kirrg001 commented Nov 7, 2018

Requirement

NQL should offer a new interface utility to handle multiple filters.

Currently, you can only use NQL like this:

nql('filter=a+b'). querySQL() (see)
You can pass a filter and the NQL interface will parse this filter into mongo JSON and then passes mongo JSON into mongo-knex.

But what if you want to combine multiple filters? It would be great if NQL could offer a utility set.

e.g. i have two filters

filter=a:b (custom)
filter=a:c (enforced)

IMO the challenge here is to combine mongo JSON.

Tasks

  • collect use cases in Ghost:
    1. enforcedFilters:
      • post model: options.context.public ? 'status:published' : null;
      • user model: options.context.public ? 'status:[inactive, locked]' : null
    2. defaultFilters:
      • post model: options.context.public ? 'page:false' : 'page:false+status:published'
      • user model: options.context.public ? null : 'status:[inactive, locked]'
    3. ideally we need to make this suite compatible with Mongo filters
  • come up with a plan how the utility should look like
  • implement (implementation based on mongo json object manipulation)
  • move implementation from Ghost into nql
@kirrg001 kirrg001 changed the title Merge statements/filters Merge statements/filters - utility Nov 7, 2018
@kirrg001
Copy link
Contributor Author

kirrg001 commented Nov 7, 2018

Inspirtation can be found in GQL: https://github.com/TryGhost/GQL/blob/master/lib/lodash-stmt.js

@naz naz self-assigned this Nov 9, 2018
@naz
Copy link
Contributor

naz commented Nov 9, 2018

While doing research for these filters came out to a limitation of NQL api. After default/custom fields were applied to a filter object self._filters, there's no way to actually use these filter when extending knex'es QueryBuilder object here. When we substitue GQL with NQL this code:

                this.query(function (qb) {
                    gql.knexify(qb, self._filters);
                });

would become:

                this.query(function (qb) {
                    nql.querySQL(qb, self._filters);
                });

but at the moment NQL's querySQL only accepts single parameter and uses the filter string that it was initialized with.

@kirrg001
Copy link
Contributor Author

kirrg001 commented Nov 9, 2018

@gargol Yeah. We need to design this 👍I am wondering why Ghost combines (mongo) JSON and not filters as strings 🤔

nql('filter=a+b'). querySQL() (see)
You can pass a filter and the NQL interface will parse this filter into mongo JSON and then passes > mongo JSON into mongo-knex.

@naz
Copy link
Contributor

naz commented Nov 9, 2018

It would be hard to achieve appropriate merging of default/enforced/custom filters on a string level hence the need to manipulate already parsed objects

@kirrg001
Copy link
Contributor Author

kirrg001 commented Nov 9, 2018

hence the need to manipulate already parsed objects

What do you mean by that?

@kirrg001
Copy link
Contributor Author

kirrg001 commented Nov 9, 2018

We have three use cases in Ghost:

  • pass custom filter string
  • model defines default/enforced filter string
  • model has a deprecated logic using custom statements (objects), which IMO should be removed and replaced with filter strings

I am already getting a mongo JSON when i parse a filter. That's why i am wondering if it's easier to combine filter strings.


BTW: Did you create an issue for researching existing mongo json utility (merge/reject mongo json objects)?

@naz
Copy link
Contributor

naz commented Nov 9, 2018

Haven't created a specific issue as there's no real need so far. Regular merging with _.merge(obj1, obj2) on the mongo doc is enough (when those are flat structures). I'm going through test cases one by one and looking if the outcomes are good. For example, have these simple ones passing already:

it('should merge enforced and default filters if both are provided', function () {
    // combineFilters('status:published', 'page:false').should.eql({
    //     statements: [
    //         {prop: 'status', op: '=', value: 'published'},
    //         {prop: 'page', op: '=', value: false, func: 'and'}
    //     ]
    // });
    combineFilters('status:published', 'page:false').should.eql({
        status: 'published',
        page: false
    });
    // parseSpy.calledTwice.should.be.true();
    // mergeSpy.calledTwice.should.be.true();
    // findSpy.called.should.be.false();
    // rejectSpy.called.should.be.false();
});

it.only('should merge custom filters if more than one is provided', function () {
    // combineFilters(null, null, 'tag:photo', 'featured:true').should.eql({
    //     statements: [
    //         {prop: 'tag', op: '=', value: 'photo'},
    //         {prop: 'featured', op: '=', value: true, func: 'and'}
    //     ]
    // });
    combineFilters(null, null, 'tag:photo', 'featured:true').should.eql({
        tag: 'photo',
        featured: true
    });
    // parseSpy.calledTwice.should.be.true();
    // mergeSpy.calledOnce.should.be.true();
    // findSpy.called.should.be.false();
    // rejectSpy.called.should.be.false();
});

@naz
Copy link
Contributor

naz commented Nov 9, 2018

What do you mean by that?

It's impossible to detect which filters overlap with enforced ones for example this would be rather hard to do with strings only - https://github.com/TryGhost/Ghost/blob/master/core/test/unit/models/plugins/filter_spec.js#L415

@kirrg001
Copy link
Contributor Author

kirrg001 commented Nov 9, 2018

Haven't created a specific issue as there's no real need so far. Regular merging with _.merge(obj1, obj2) on the mongo doc is enough (when those are flat structures). I'm going through test cases one by one and looking if the outcomes are good. For example, have these simple ones passing already:

Using lodash does not work. It only works if you want to merge JSON objects.

As soon as i pass a custom filter featured:true+status:draft, mongo json results in

{$and: [{featured: false}, {status: 'status'}]}

This is a default use case.

I don't say we have to parse strings 🙊, but i'd suggest to collect in an issue:

  1. Which use cases do we have in Ghost?
  2. Which mongo json utils are available? e.g. look at mongoose or similar
  3. Come to a conclusion: short term -> Our first goal is to use NQL in our model layer with 1-2 use cases. What do we have to do to achieve this?
  4. Come to a conclusion: long term -> What to do? Use existing tool? Write our own utility? Parse filter strings?

@naz
Copy link
Contributor

naz commented Nov 14, 2018

  1. At the moment we have to support enforced, default, custom, and extra filters. They are all being bassed as query strings in Ghost (being migrated from statements in extra filter case).
  2. Out of checked moongose, camo, mingo libraries and general searches on github/npm nothing that suites our specific case was found.
  3. Short term we agreed to ty straight string parsing as our base cases seem rather simple to cover.
  4. Long term we would probably need to plan out a utility that would do merging logic based on mongo-json statements rather than raw strings. But that depends on our future use cases and if current ones are achievable with simple string parsing.

@kirrg001 kirrg001 changed the title Merge statements/filters - utility Mongo JSON utility Nov 14, 2018
@ErisDS
Copy link
Member

ErisDS commented Nov 26, 2018

Sorry to be the bearer of bad news. As I mentioned might be the case, we definitely cannot go live with a regex parser in place of this utility.

At a fundamental level, the concept of using regex to parse a language we already have a proper lexer & parser for is pretty darned dirty. At a conceptual level, any manipulation of NQL should really be happening at the JSON level, or as part of the parser itself.

NQL-Lang is a language for humans, Mongo JSON is a language for machines. We parse NQL-Lang into Mongo JSON so that it's easier to work with programatically. We parse, rather than using regex, because NQL-Lang is optimised for humans, flexible, and is a complete language on its own. Regex cannot do a good job of parsing it.

Concating strings was definitely a way to get us a first-pass implementation to focus on other aspects, but it has severe limitations:

  1. it is possible to pass invalid NQL strings as a filter param and have them return results
    E.g. statusstatus::[draft,published] - should return a validation error, but returns results
  2. it is possible to have valid NQL strings be processed and result in invalid NQL
    E.g. status:[draft,published] - incorrectly results in an error
  3. it is possible to bypass enforced filters by using weird patterns in the NQL string
    E.g. page:true,statusstatus::5Bdraft%2Cpublished%5D - incorrectly returns drafts

To be clear, I had concerns about the approach, but it's actually far easier to exploit this than I even imagined.


I don't like coming at you with nothing but problems, but I wanted to get this written down as I'm already a little delayed getting around to testing. I will try a couple of implementation ideas I have now and update this issue ASAP.

@naz
Copy link
Contributor

naz commented Nov 26, 2018

@ErisDS would passing the query string through NQL parser as a first stage validation and improving current set of regexes be enough of the improvement for now? Just thinking out loud about the solutions that are less work intensive to be able to meet our target goals 🤔 I'll get to a more in-depth analysis of this issue, first thing tomorrow.

@ErisDS
Copy link
Member

ErisDS commented Nov 27, 2018

As far as I am aware, it is categorically not possible to solve this problem safely with Regex.

It seems like one of two fundamental pieces of information are missing here: first that NQL supports arbitrary nesting, and second that arbitrary nesting is usually accepted as the cut off point for regex.

https://stackoverflow.com/questions/11905506/regular-expression-vs-string-parsing

@naz
Copy link
Contributor

naz commented Nov 27, 2018

Was trying to find a good argument//examine te problem by coveing the holes listed above. Was able to easily cover up the first case by running the filter through NQL lexer (didn't catch any errors for the second case). But the last point (as it is a vliad NQL) made it rather clear that not properly parsing and matching on the keys themselves is just too dirty... I can make up a regex that will match similar cases but then why do all the work making up unreadable regexp code 🤷‍♂️

Current solution seemed good enough for the limited usecases we have, but going forward it's definitelly better to invest time in proper Mongo JSON util.

@ErisDS did you have time to:

I will try a couple of implementation ideas I have now

@ErisDS
Copy link
Member

ErisDS commented Nov 27, 2018

So far, I've only had chance to re-familiarise myself with the existing code and use cases, so that I can understand what problems we really need to solve, and whether any of the approaches are viable.

Conceptual approaches:

  • Iterator - Build an iterator for mongo statements, so that we know how to apply iterative map/reduce/merge type functions.
  • Parser - Build logic into the parser, or build a new parser
  • Naive logic - Naive merging of Mongo JSON (e.g. logically enforced should always work with outer nested AND groups)

Use cases:

There are 3 types of filter, in terms of priority / levels:

  1. enforced - MUST be applied
  2. anything else (custom and extra) - SHOULD be applied if it doesn't conflict with level 1
  3. default SHOULD be applied if it doesn't conflict with level 1 & 2

The rules here are a kind of cascade.

The cascade is how it works now:

  • First, all the different level 2 custom/extra filters are merged so that we have a single level 2
  • Second, level 2 is reduced to remove any statements that match level 1
  • Third, level 3 is reduced to remove any statements that match the pre-prepared level 2
  • Finally, the remaining level 2 and 3 are combined with outer nested AND groups.

Therefore, the existing approach involves both an iterator and naive logic.

I was wondering if logically speaking, defaults should work with outer nested OR groups 🤔* as a way to reduce the workload*. However, yu have to iterate & reduce the level 2 statements first anyway for enforced logic to work.

To go through this in a quick natural example:

  • enforced: status:published
  • custom: status:draft+page:false
  • default: page:true

The result must be status:published+page:false

Potential quick solution without iteration and reducing:

If you use outer nested logical groups without reducing, you end up with:

(status:published)+(status:draft+page:false)+page:true

This isn't insecure - but it does result in no results. Maybe that should be expected behaviour for a "bad" filter like this, however it would be a potential breaking change to suddenly stop ignoring bad filters.

A variation on this theme might be to validate and reject filters that conflict with enforced filters, but again, we fall into making breaking changes.

Having looked quickly at the parser code, I don't see an obvious quick path to a solution there.

This leaves us with needing to try to implement an iterator, I think.


I've written this up fairly quickly. Let me know if there's anything here that doesn't make sense.


🤔* This is unquestionably wrong LOL

@naz
Copy link
Contributor

naz commented Nov 27, 2018

As discussed, we agreed on going for Naive merging of Mongo JSON. The current filter merging test suite should be expanded with cases from GQL util

@naz
Copy link
Contributor

naz commented Nov 27, 2018

Poked around with util implementation based on mongo json, and some initial work can be found here - kirrg001/Ghost@replace-gql...gargol:mongo-json-utils. Currently took a very naive approach for filter reduction just to see how things work together. One of the changes that need to be done in NQL though is accepting full mongo json object instead of nql string.

Can see already that there will be small challenges reducing filters that are nested under multiple groups like $or:[]. $and:[], but that just needs a bit more thinking through.

Also the processFilters method that replaces aliased properties like 'primary_tag' will need to be rewritten into mong json, but that doesn't seem to be too involved :)

@ErisDS
Copy link
Member

ErisDS commented Nov 28, 2018

One of the changes that need to be done in NQL though is accepting full mongo json object instead of nql string.

Can you walk me through this assertion please?

@naz
Copy link
Contributor

naz commented Nov 28, 2018

At the moment NQL only accepts a query string and options object, which we used this way on the branch where we are substituting GQL. Now that we decided to merge filters as Mongo JSON objects, we will no longer have a string available ( processedFilter variable on the example) but a Mongo JSON object.

@ErisDS
Copy link
Member

ErisDS commented Nov 28, 2018

We control NQL entirely, but I feel like there are a bunch of assumptions about the API for NQL here 😬

I think you're suggesting the API be something like

z = nql.util.merge(x, y)
nql(z, options).querySQL(qb);

But that's by far and away not the only option!

Also note that the const RELATIONS there is actually, options with a relations key: {relations: []}

What's to prevent NQL from understanding the concept of "default" and "override" much the same as nconf does? 🤔

@naz
Copy link
Contributor

naz commented Nov 28, 2018

Yes, the example above is exactly what I've meant. In our case, a fuller picture is something like this:

// step 1
filter = nql.util.merge({enforced, custom, extra, default}) // at this point filter becomes an Mongo JSON object

// step 2
filter = nql.util.replace(filter, aliases) // the name 'aliases' might be a bit confusing, maybe 'expansions' would explain better what it actually does, but this has to operate on Mongo JSON object as well and 'expand' keys defined in the config into statements

// tada
nql(filter, options).querySQL(qb); 

Ideally the logic from step 1 should be moved into the nql package itself. We will still have to pass query strings as an object so nql knows how to 'prioritize' filters for merging, e.g.:

// this can be same concatination that is done on replace-gql branch 
// https://github.com/TryGhost/Ghost/pull/10159/files#diff-f359249f88688ade047760ef9bca4f6bR110
filter = `${custom}+${extra}`

options =  Object.assign(options, {filters: {enforced, default})

nql(filter,  options)...

Similar should happen with logic from step 2 and aliases could be a part of the current options object (maybe naming it as expansions would be better). E.g.:

options =  Object.assign(options, {expansions: ALIASES}) // following current vairable naming

With this approach the interface doesn't change at all. But currently I'm just looking for a way to tell nql "take this pre-processed mongo json object and build up sql upon that". This is needed to be able to test current utilities work with all the test suites in Ghost and be able to change things in case they don't without going back and forth doing updates in nql package.


Rewrote this message about 1000 times, hopefully it's clear 😃

@ErisDS
Copy link
Member

ErisDS commented Nov 29, 2018

This is needed to be able to test current utilities work with all the test suites in Ghost and be able to change things in case they don't without going back and forth doing updates in nql package.

Ah, this makes more sense!

@ErisDS
Copy link
Member

ErisDS commented Nov 29, 2018

aliases could be a part of the current options object (maybe naming it as expansions would be better)

Aliases to me are simple table name aliases, completely agree with renaming this concept to expansions 👍

@naz
Copy link
Contributor

naz commented Nov 29, 2018

To better track the progress of utilities replacement created a PR to @kirrg001's branch with replace-gql code - kirrg001/Ghost#1

naz added a commit to naz/Ghost that referenced this issue Nov 29, 2018
@naz
Copy link
Contributor

naz commented Dec 4, 2018

Initial go for mongo json utils merged into running replace-gql branch TryGhost/Ghost#10159

@naz
Copy link
Contributor

naz commented Dec 5, 2018

As current implementation of utilities has passed through initial testing and 100% works with functional test coverage in Ghost the next step is moving it to nql.

As discussed above, there could be multiple approaches to pass the query information to nql. The one that looks optimal to me would stay with current interface and expand what's being possible to pass in through options object. In Ghost the code would looke like:

const RELATIONS = const RELATIONS = {
    relations: {
        tags: {
            tableName: 'tags',
            type: 'manyToMany',
            join_table: 'posts_tags',
            join_from: 'post_id',
            join_to: 'tag_id'
        },
        ....
};
// the notation should probably change to the one used in RELATIONS
// so we have a map instead of an array, because we can't expand the same key 
// multiple times anyway
const EXPANSIONS = [{
    key: 'primary_tag',
    replacement: 'tags.slug',
    expansion: 'posts_tags.sort_order:0+tags.visibility:public'
},
...
];

nql(options.filter, {
    relations: RELATIONS,
    expansions: EXPANSIONS,
    filters: {
        enforced: options.enforced,
        extra: options.extra,
        default: options.default
    }
});

This approach leaves Ghost with only knowledge of nql queries (no tiyng up with with mongo-knex) and the interface of nql only changes in the options object.

@kirrg001 @ErisDS early feedback appreciated 👍

@kirrg001
Copy link
Contributor Author

kirrg001 commented Dec 5, 2018

Looks good in general 👍


Two things i have spotted

const EXPANSIONS = [{
    key: 'primary_tag',
    replacement: 'tags.slug',
    expansion: 'posts_tags.sort_order:0+tags.visibility:public'
},

Naming suggestion: The expansion key could be named filter? So we have EXPANSIONS and each expansion can contain an optional filter.


nql(options.filter, {
    relations: RELATIONS,
    expansions: EXPANSIONS,
    filters: {
        enforced: options.enforced,
        extra: options.extra,
        default: options.default
    }
});

Isomehow find it weird that we pass a single filter as first argument nql(filter...) and then some more filters as options.

Futhermore, it's sad that we still need to support this "extra" filter, which is a result of some older API query options (e.g. ?status=).

An alternative approach could be to allow either passing a single filter as string or an array of filter strings. The order in the array defines the priority. By that we don't need to tell NQL at all which filters we have.

nql([enforced, custom, extra, default], {
    relations: RELATIONS,
    expansions: EXPANSIONS
});
nql(String|Array, Object)

That should not break the API and is IMO more flexible? 🤔

@naz
Copy link
Contributor

naz commented Dec 5, 2018

@ErisDS, believe we'd also need a new repo in NexesJS for mongo-knex-utils? We haven't discussed the name yet, just throwing in what seems logical :)

@naz
Copy link
Contributor

naz commented Dec 5, 2018

@kirrg001 regarding 1st point, is this the proposed form? :

const EXPANSIONS = [{
    key: 'primary_tag',
    replacement: 'tags.slug',
    filter: 'posts_tags.sort_order:0+tags.visibility:public'
},

To me the name of the key isn't 100% suitable tbh, because it's not just a filter but something that the original key has to be glued together with. On the other hand, when naming it as suggest, it lets us know that it's an 'nql string' that is being passed and not a vague 'expansion' thing 🤔 ... maybe expansionFilter? 🤷‍♂️

I'd go one step further and apply the comment I left in the proposal:

// the notation should probably change to the one used in RELATIONS
// so we have a map instead of an array, because we can't expand the same key
// multiple times anyway

so it would become:

const EXPANSIONS = {
    primary_tag: {
        replacement: 'tags.slug',
        expansionFilter: 'posts_tags.sort_order:0+tags.visibility:public'
    },
    {...},

Regarding 2nd point. Don't think passing an array would be possible, because then we loose the priority of filters. For example we have combine -> custom+extra but reject defaults filter strings. But! :) we can pass in an object like so:

nql({enforced, custom, extra, default}, {
    relations: RELATIONS,
    expansions: EXPANSIONS
});

but that's just moving stuff from options to main parameter which I think is fine.

@ErisDS
Copy link
Member

ErisDS commented Dec 6, 2018

const EXPANSIONS = [{
    key: 'primary_tag',
    replacement: 'tags.slug',
    expansion: 'posts_tags.sort_order:0+tags.visibility:public'
},

Simpler is better, how about from and to instead?

we'd also need a new repo in NexesJS for mongo-knex-utils? We haven't discussed the name yet, just throwing in what seems logical :)

Maybe just mongo-utils? New projects can be started with slimer, you have to pass the --org flag slimer new mongo-utils --org=nexes

@naz
Copy link
Contributor

naz commented Dec 6, 2018

@ErisDS mongo-utils sounds great, the shorter the better 👍

@naz
Copy link
Contributor

naz commented Dec 6, 2018

@ErisDS do you have any specific preference around nql interface as discussed above:

nql({enforced, custom, extra, default}, {
    relations: RELATIONS,
    expansions: EXPANSIONS
});

vs

nql([enforced, custom, extra, default], {
    relations: RELATIONS,
    expansions: EXPANSIONS
});

vs

nql(options.filter, {
    relations: RELATIONS,
    expansions: EXPANSIONS,
    filters: {
        enforced: options.enforced,
        extra: options.extra,
        default: options.default
    }
});

@ErisDS
Copy link
Member

ErisDS commented Dec 6, 2018

With expansions defined as something separate, what's the use case for extra?

With simplicity and ref material in mind, I'd use overrides and defaults (same as nconf!) and keep them flat.

nql(options.filter, {
    relations: RELATIONS,
    expansions: EXPANSIONS,
    overrides: options.enforced,
    defaults: options.defaults
});

Assuming lots of queries in a similar env...

nql = new NQL({relations, expansions, overrides, defaults});
nql(filter).toQuery();
nql(differentFilter).toQuery();

Or

nql = new NQL();

nql.defaults(stuff);
nql.overrides(stuff);

nql(filter).toQuery();
nql(differentFilter).toQuery();

Built whatever feels comfortable with how we're using it in Ghost for now, but I can imagine all of these being useful.

@naz
Copy link
Contributor

naz commented Dec 6, 2018

With expansions defined as something separate, what's the use case for extra?

The extra parameter is a leftover we, unfortunately, have to keep due to support of cases like this (e.g.: filter=status:all).

Expansions don't help in this situation as there's quite a lot of logic involved. What we could do to simplify things, is combining custom and extra filters before passing to nql like ${options.filter}+${this.extraFilters(options)} for a case when extraFilter is present. We can do this because it ends up being combined into $and group this following this, so doing simple + concatination is equivalent.

@naz
Copy link
Contributor

naz commented Dec 6, 2018

We don't do multiple queries on the same nql instance in Ghost atm. Don't think we'd benefit right away from having a separate contructor. So, this approach should be sufficient for current use cases:

nql(options.filter, {
    relations: RELATIONS,
    expansions: EXPANSIONS,
    overrides: options.enforced,
    defaults: options.defaults
});

But still need to figure out where the extra goes 🤔

@kirrg001 what are your throughs?

@kirrg001
Copy link
Contributor Author

kirrg001 commented Dec 6, 2018

@gargol Just go with any suggested solution, which works for now 👍

kirrg001 referenced this issue in TryGhost/mongo-utils Dec 6, 2018
closes #1

- combineFilters
- findStatement
- rejectStatements
- expandFilters
kirrg001 pushed a commit to kirrg001/Ghost that referenced this issue Dec 9, 2018
kirrg001 pushed a commit to kirrg001/Ghost that referenced this issue Dec 11, 2018
daniellockyer pushed a commit to TryGhost/NQL that referenced this issue Mar 1, 2022
closes TryGhost/NQL-old#2, refs TryGhost/mongo-utils#4

- Extended `nql.parse` with new overrides & defaults option parameters
- Forwarded merge action to mongo-utils
- Added test coverage
- Improved readability
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

3 participants