Duplicate objects in results #5278

vpetrovykh · 2017-12-09T00:59:24Z

vpetrovykh
Dec 9, 2017
Maintainer

One of the recent updates makes it so that UNION allows duplicate objects. We also use UNION as the implicit operator that joins the results of element-wise functions.

Consider the following setup:

WITH MODULE test
INSERT User {
    name := 'Elvis'
};

WITH MODULE test
INSERT Issue {
    owner := (SELECT User FILTER User.name = 'Elvis'),
    title := 'Issue1',
};

WITH MODULE test
INSERT Issue {
    owner := (SELECT User FILTER User.name = 'Elvis'),
    title := 'Issue2',
};

WITH MODULE test
INSERT Issue {
    owner := (SELECT User FILTER User.name = 'Elvis'),
    title := 'Issue3',
};

Now consider the following 2 queries:

WITH MODULE test
SELECT Issue.owner;
# -> {Elvis}, the graph produces only 1 node, that is representing Elvis

WITH MODULE test
SELECT Issue.owner
FILTER Issue.title != "";  # the filter happens to be easily satisfied
# -> {Elvis, Elvis, Elvis}

The second of the queries should produce a multi-set of 3 identical User objects (all of them being Elvis). The reason is that the query itself is a function that is element-wise w.r.t. SELECT clause and SET OF w.r.t. FILTER clause. So since the clauses both refer to related sets (have symbolically same prefixes) we need to apply our rules for refactoring the longest common prefix. It happens to be Issue. So in the end we end up evaluating SELECT clause for each Issue, getting the same User object, and then including it in the result (using an implicit UNION) since the FILTER evaluates to true for each Issue as well.

One of the odd features of this is that counter-intuitively we increased the result set cardinality by applying a FILTER vs. not having any FILTER at all.

vpetrovykh · 2017-12-09T01:18:20Z

vpetrovykh
Dec 9, 2017
Maintainer Author

To clarify, the current implementation will return a single User object for both of the above queries. The question is about the correct understanding of the updated model that we're supposed to be implementing.

0 replies

vpetrovykh · 2017-12-12T02:21:00Z

vpetrovykh
Dec 12, 2017
Maintainer Author

After extended discussion the solution to the problem is to detect ambiguous usage of the same symbolic name. According to our design, we always want it to be true that the same symbolic name means the same thing everywhere in the expression. The problematic examples above, in fact, break that rule.

A symbolic name is unambiguous if:

this symbolic name has not been previously explicitly used in outer scopes
the symbolic name is used only as a SET OF argument in all visible scopes
this symbolic name has been used as a whole path and at least once as strictly element-wise (not OPTIONAL or SET OF) argument

These rules guarantee that the elements of the set denoted by the symbol are well-defined and can be iterated over in the context of the expression. The original problems arose from the attempt to resolve these kind of ambiguous definitions in an element-wise context (FILTER should be meaningful expression for every element of the set produced in the SELECT clause).

The two kinds of examples that illustrate the problem involve FILTER or shapes. Both rely on defining some expression for every element of some base set without modifying the base set. Consider:

SELECT Issue.owner {
    foo := Issue.id
};

We cannot unambiguosly determine whether this means "produce duplicate User results with a different single Issue.id as a computable foo" (element-wise on Issue) or "produce User results with multiple Issue.ids as a computable foo" (element-wise on Issue.owner). So we don't have a unique interpretation of an element-wise function application.

0 replies

vpetrovykh · 2017-12-12T02:26:28Z

vpetrovykh
Dec 12, 2017
Maintainer Author

It occurs to me that we're on the right track here, but we may need to more closely look at the element-wise functions. Specifically whether or not their arguments are defined in the same scope level or nested scopes. At the same scope level (e.g. tuples) we are happy enough to produce cross-products and let one element-wise argument affect the interpretation of the other arguments. When the arguments are defined in different scope levels (shapes, FILTER), then the refactoring and cross-product rules come into conflict with the requirement that the set in the outer scope cannot possibly be affected or modified by any inner scope.

We have a different expectation for what these two element-wise functions should do:

SELECT (Issue.owner, Issue.id);  # Issue is in the same scope level
# vs
SELECT Issue.owner{ foo := Issue.id };   # Issue is in different scope levels
# vs
SELECT Issue.owner FILTER Issue.id > 0;  # Issue is in different scope levels

0 replies

elprans · 2017-12-12T13:01:22Z

elprans
Dec 12, 2017
Maintainer

I think the new rule is a sufficient guarantee that a nested scope does not affect the interpretation of the outer scope. In the last example, only the first expression would compile.

0 replies

vpetrovykh · 2017-12-12T19:13:30Z

vpetrovykh
Dec 12, 2017
Maintainer Author

OK, so the general rule that we have is that the interpretation of a symbol in a scope cannot be altered by any nested scope. If a use of a symbol in a nested scope is not compatible with the use in the current scope that is an error.

How do we detect that a symbol is compatible:

this symbolic name has not been previously explicitly used in outer scopes
the symbolic name is used only as a SET OF argument in all visible scopes
this symbolic name or its prefix has been used as a whole path and at least once as strictly element-wise (not OPTIONAL or SET OF) argument

In the future we may relax these requirements slightly by noting that required 1-1 links generate a new symbol that is equivalent to its prefix as far as element-wise usage is concerned. This would allow something like this:

SELECT Issue.id FILTER Issue.desc != "";
# currently the valid form that satisfies the third bullet point is as follows
SELECT (SELECT Issue FILTER Issue.desc != "").id;

It is important to note that := no longer generate a new scope in this interpretation. They only define a new symbol taking the current scope into account.

Another note is that the link operator is special because it's the only operator that takes both the graph and its contents into account. It can behave as either an element-wise (for "11" and "1*") or a SET OF operator.

0 replies

vpetrovykh · 2018-01-24T18:45:20Z

vpetrovykh
Jan 24, 2018
Maintainer Author

There's an update to when the above rule is used: it's only necessary for clauses. Mathematically we're perfectly happy to refactor any arbitrary longest common prefix from any arbitrary combination of functions. However, clauses are meant to behave like data pipelines, where the preceding data is not altered by what is written in the next clause (so a FILTER should not alter the interpretation of the SELECT). So this is a clause-specific minimal requirement similar to additional requirements like ORDER BY needing the second argument to be a singleton, etc.

0 replies

vpetrovykh · 2018-01-24T18:51:37Z

vpetrovykh
Jan 24, 2018
Maintainer Author

Shapes also have this type of behavior (the innards of the shape should not affect the shape's root interpretation), so the rule applies to them as well.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Duplicate objects in results #5278

{{title}}

Replies: 7 comments

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Duplicate objects in results #5278

vpetrovykh Dec 9, 2017 Maintainer

Replies: 7 comments

vpetrovykh Dec 9, 2017 Maintainer Author

vpetrovykh Dec 12, 2017 Maintainer Author

vpetrovykh Dec 12, 2017 Maintainer Author

elprans Dec 12, 2017 Maintainer

vpetrovykh Dec 12, 2017 Maintainer Author

vpetrovykh Jan 24, 2018 Maintainer Author

vpetrovykh Jan 24, 2018 Maintainer Author

vpetrovykh
Dec 9, 2017
Maintainer

vpetrovykh
Dec 9, 2017
Maintainer Author

vpetrovykh
Dec 12, 2017
Maintainer Author

vpetrovykh
Dec 12, 2017
Maintainer Author

elprans
Dec 12, 2017
Maintainer

vpetrovykh
Dec 12, 2017
Maintainer Author

vpetrovykh
Jan 24, 2018
Maintainer Author

vpetrovykh
Jan 24, 2018
Maintainer Author