- Short name: lateral
- SEP Number: SEP-0006
- Author: Andy Seaborne
Discussion: SPARQL-dev Issue #100 (and elsewhere)
Based on previous work : Oxigraph: oxigraph/issues/267
See also apache/jena#1615
A LATERAL
join is like a for-each loop, looping on the results from the left-hand side (LHS),
the pattern before the LATERAL
keyword, and executing the right-hand side (RHS) query pattern
once for each row, with the variables from the RHS in-scope during each RHS evaluation.
A regular join only executes the RHS once, and the variables from the LHS are only used for the join condition after evaluation of the left and right sub-patterns.
Another way to think of a lateral join is as a flatmap
.
Examples:
## Get exactly one label for each subject in a row.
SELECT * {
?s ?p ?o
LATERAL {
SELECT * { ?s rdfs:label ?label } LIMIT 1
}
}
## Get zero or one labels for each subject.
SELECT * {
?s ?p ?o
LATERAL { OPTIONAL { SELECT * { ?s rdfs:label ?label } LIMIT 1} }
}
{ OPTIONAL ...
is the same as writing { {} OPTIONAL ...
.
{ }
evaluates to the join identity, a table of one row of zero columns.
The LATERAL
keyword which has the graph pattern so far (from the {
starting the current
block) and a { }
block afterwards.
A sub-select may have variables of the same name that are not lateral-joined to a variable of that name from the LHS.
SELECT * {
?s ?p ?o
LATERAL {
SELECT ?label { ?s rdfs:label ?label } LIMIT 1
}
}
The inner ?s
in the SELECT ?label
is not the outer ?s
, because the SELECT ?label
does not pass out ?s
. As a sub-query, the ?s
could be any name except ?label
for
the same results.
This is the same situation as a sub-query in other situations.
There needs to be a new syntax restriction: there can be no variable introduced by AS
(BIND
, or sub-query) or VALUES
in-scope at the top level
scope of the LATERAL
RHS, that has the same name as any
in-scope variable from the LHS.
## ** Illegal **
SELECT * {
?s ?p ?o
LATERAL { BIND( 123 AS ?o) }
}
In Apache Jena ARQ, LET
would work.
LET
for a variable that is bound acts like a filter
.
Substituting variables from the LHS into the RHS (with the same
restrictions), then executing the pattern, gives the evaluation of LATERAL
.
There is a similarity to filter
NOT EXISTS
/EXISTS
expressed as the not-legal
FILTER ( ASK { pattern } )
, where the variables of the row being filtered are available to
"pattern". This is similar to SQL
correlated subquery.
- #100
- Apache Jena's SERVICE loop:
- Oxigraph: oxigraph/issues/267, oxigraph/pull/274
Work on lateral joins and correlated subqueries:
- https://docs.stardog.com/query-stardog/stored-query-service#correlated-subqueries
- https://www.postgresql.org/docs/current/queries-table-expressions.html#QUERIES-LATERAL
- https://dev.mysql.com/doc/refman/8.0/en/lateral-derived-tables.html
- https://en.wikipedia.org/wiki/Correlated_subquery
LATERAL
is added to the SPARQL grammar at rule
[[56] GraphPatternNotTriples](https://www.w3.org/TR/sparql11-query/#rGraphPatternNotTriples)
.
As a syntax form, it is similar to OPTIONAL
.
[56] GraphPatternNotTriples ::= GroupOrUnionGraphPattern | OptionalGraphPattern | LateralGraphPattern | ...
[57] OptionalGraphPattern ::= 'OPTIONAL' GroupGraphPattern
[ ] LateralGraphPattern ::= 'LATERAL' GroupGraphPattern
The new algebra operator is lateral
, which takes two expressions:
SELECT * {
?s ?p ?o
LATERAL
{ ?a ?b ?c }
}
is translated to:
(lateral
(bgp (triple ?s ?p ?o))
(bgp (triple ?a ?b ?c)))
To evaluate lateral
:
- Evaluate the first argument (left-hand side from syntax) to get a multiset of solution mappings.
- For each solution mapping ("row"), a. Inject variable bindings into the second argument b. Evaluate this pattern c. Add to results
Outline:
Definition: Lateral
Let Ω be a multiset of solution mappings. We define:
Lateral(Ω, P) = { μ | union of Ω1 where
foreach μ1 in Ω:
pattern2 = inject(pattern, μ1)
Ω1 = eval(D(G), pattern2)
result Ω1
}
where inject
is the corrected substitute
operation.
An alternative style is to define Lateral
more like evaluate P such that μ is in-scope
in some
way, rather than rely on inject
which is a mechanism.
Definition: Evaluation of Lateral
eval(D(G), Lateral(P1, P2) = Lateral(eval(D(G), P1), P2)
This is a new feature with a new syntax.
Queries not using the keyword are unaffected.
Apache Jena 4.7.0: https://jena.apache.org/documentation/query/lateral-join.html
Tests: