Skip to content

Latest commit

 

History

History
191 lines (141 loc) · 5.66 KB

sep-0006.md

File metadata and controls

191 lines (141 loc) · 5.66 KB

LATERAL Join

  • Short name: lateral
  • SEP Number: SEP-0006
  • Author: Andy Seaborne

Discussion: SPARQL-dev Issue #100 (and elsewhere)

Based on previous work : Oxigraph: oxigraph/issues/267

See also apache/jena#1615

Motivation

A LATERAL join is like a for-each loop, looping on the results from the left-hand side (LHS), the pattern before the LATERAL keyword, and executing the right-hand side (RHS) query pattern once for each row, with the variables from the RHS in-scope during each RHS evaluation.

A regular join only executes the RHS once, and the variables from the LHS are only used for the join condition after evaluation of the left and right sub-patterns.

Another way to think of a lateral join is as a flatmap.

Examples:

## Get exactly one label for each subject in a row.
SELECT * {
   ?s ?p ?o
   LATERAL {
     SELECT * { ?s rdfs:label ?label } LIMIT 1
   }
}
## Get zero or one labels for each subject.
SELECT * {
   ?s ?p ?o
   LATERAL { OPTIONAL { SELECT * { ?s rdfs:label ?label } LIMIT 1} }
}

{ OPTIONAL ... is the same as writing { {} OPTIONAL .... { } evaluates to the join identity, a table of one row of zero columns.

Syntax

The LATERAL keyword which has the graph pattern so far (from the { starting the current block) and a { } block afterwards.

Scope

A sub-select may have variables of the same name that are not lateral-joined to a variable of that name from the LHS.

SELECT * {
   ?s ?p ?o
   LATERAL {
     SELECT ?label { ?s rdfs:label ?label } LIMIT 1
   }
}

The inner ?s in the SELECT ?label is not the outer ?s, because the SELECT ?label does not pass out ?s. As a sub-query, the ?s could be any name except ?label for the same results.

This is the same situation as a sub-query in other situations.

There needs to be a new syntax restriction: there can be no variable introduced by AS (BIND, or sub-query) or VALUES in-scope at the top level scope of the LATERAL RHS, that has the same name as any in-scope variable from the LHS.

## ** Illegal **
SELECT * {
   ?s ?p ?o
   LATERAL { BIND( 123 AS ?o) }
}

See SPARQL Grammar note 12.

In Apache Jena ARQ, LET would work. LET for a variable that is bound acts like a filter.

Evaluation

Substituting variables from the LHS into the RHS (with the same restrictions), then executing the pattern, gives the evaluation of LATERAL.

Notes

There is a similarity to filter NOT EXISTS/EXISTS expressed as the not-legal FILTER ( ASK { pattern } ), where the variables of the row being filtered are available to "pattern". This is similar to SQL correlated subquery.

Elsewhere

Work on lateral joins and correlated subqueries:

Spec updates

Syntax

LATERAL is added to the SPARQL grammar at rule [[56] GraphPatternNotTriples](https://www.w3.org/TR/sparql11-query/#rGraphPatternNotTriples). As a syntax form, it is similar to OPTIONAL.

[56]  	GraphPatternNotTriples	  ::=  	GroupOrUnionGraphPattern | OptionalGraphPattern | LateralGraphPattern | ...
[57]  	OptionalGraphPattern	  ::=  	'OPTIONAL' GroupGraphPattern
[  ]  	LateralGraphPattern	  ::=  	'LATERAL' GroupGraphPattern

Algebra

The new algebra operator is lateral, which takes two expressions:

  SELECT * {
    ?s  ?p  ?o
    LATERAL
      { ?a  ?b  ?c }
}

is translated to:

  (lateral
    (bgp (triple ?s ?p ?o))
    (bgp (triple ?a ?b ?c)))

Evaluation

To evaluate lateral:

  1. Evaluate the first argument (left-hand side from syntax) to get a multiset of solution mappings.
  2. For each solution mapping ("row"), a. Inject variable bindings into the second argument b. Evaluate this pattern c. Add to results

Outline:

Definition: Lateral

Let Ω be a multiset of solution mappings. We define:

Lateral(Ω, P) = { μ | union of Ω1 where 
           foreach μ1 in Ω:
               pattern2 = inject(pattern, μ1)
               Ω1 = eval(D(G), pattern2)
	       result Ω1
	   }

where inject is the corrected substitute operation.

An alternative style is to define Lateral more like evaluate P such that μ is in-scope in some way, rather than rely on inject which is a mechanism.

Definition: Evaluation of Lateral

eval(D(G), Lateral(P1, P2) = Lateral(eval(D(G), P1), P2)

Backwards Compatibility

This is a new feature with a new syntax.

Queries not using the keyword are unaffected.

Tests and implementations

Apache Jena 4.7.0: https://jena.apache.org/documentation/query/lateral-join.html

Tests: