Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds custom SPARQL operator OX_LATERAL #274

Closed
wants to merge 4 commits into from
Closed

Adds custom SPARQL operator OX_LATERAL #274

wants to merge 4 commits into from

Conversation

Tpt
Copy link
Collaborator

@Tpt Tpt commented Oct 31, 2022

"FOO OX_LATERAL(?v1 ?vn) SUBSELECT" means that for all bindings emitted from FOO SUBSELECT is going to be called with ?v1 ?vn already set from the binding from FOO

For example:

SELECT ?s ?o WHERE {
    ?s a ex:T .
    OX_LATERAL(?s) {SELECT ?o WHERE { ?s ex:p ?o } ORDER BY ?o LIMIT 2}
}

Closes #267

"FOO OX_LATERAL(?v1 ?vn) SUBSELECT" means that for all bindings emitted from FOO SUBSELECT is going to be called with ?v1 ?vn already set from the binding from FOO
@hobofan
Copy link
Contributor

hobofan commented Oct 31, 2022

I'm not sure what the currently assumed state of stability of a query structure is when serializing and deserializing it.

Until now we've constructed the query as a spargebra Query during our higher-level query planning layer, serialized it as a String in the end, and then parsed the query again when passing it into Store::query. So essentially a roundtrip of query->string->query.

When testing OX_LATERAL with a query where I constructed a GraphPattern::Sequence, that is gone and replaced by a GraphPattern::Join after the roundtrip from what I can tell, resulting in a different query result. If you want I can try to produce a reduced test for that.

As a "workaround"(/ more desirable solution 😅 ) I've eliminated the roundtrip (#276).


With that workaround in place, I was able to integrate the feature in the first places, and it's looking good so far! :)

@Tpt
Copy link
Collaborator Author

Tpt commented Oct 31, 2022

I'm not sure what the currently assumed state of stability of a query structure is when serializing and deserializing it.

It's bad. I have just merged #270 that makes sure the serialization output can be parsed. But in the future we should definitely ensure that serialization -> deserialization ends up with the exact same algebra, at least in common cases.

As a "workaround"(/ more desirable solution 😅 ) I've eliminated the roundtrip (#276).

Thank you!

With that workaround in place, I was able to integrate the feature in the first places, and it's looking good so far! :)

Amazing!

@hobofan
Copy link
Contributor

hobofan commented Nov 7, 2022

As promised, further feedback:

We've now integrated it into all places we wanted for now in our codebase, and seems to be working without issues! :)

Generally, we feel like composing lateral queries via spargebra puts quite a bit more stress on the tools available for composing queries, and required us to write a few utilities for that (e.g. for correctly creating an intermediate GraphPattern::Projection for any GraphPattern, so that we can convert that into a lateral join). I'll try to see that I can generalize some of those into generic utilities for query building to upstream them (see #268).

@Tpt
Copy link
Collaborator Author

Tpt commented Nov 7, 2022

Hi! Thank you very much for the feedback. It's great!

About the creation of intermediate GraphPattern::Projection, if you know that the composition is safe (i.e., no possible variable conflict), you might just use GraphPattern::Sequence without a nested Projection. It will avoid you the hassle of generating the projection. But beware, it is likely that the serialization back to SPARQL of such queries does not work properly.

@hobofan
Copy link
Contributor

hobofan commented Nov 7, 2022

One thing that I've been stubbing my toe on now is the join behaviour of GraphPattern::Sequence. As far as I can tell it will internally generate a ForLoopJoin, which joins the resulting records via a normal join, with no option of turning it into a left join. Is that correct?

In most situations there was a way to work around that (or the alternative way of constructing the query was more optimized anyway), but I seem to be encountering situation where a LeftSequence would be the only way to get the logic to work correctly AFAIK.


A workaround seems to be:

let graph_pattern = GraphPattern::Sequence {
    left: Box::new(left),
    right: Box::new(GraphPattern::LeftJoin {
        left: Box::new(GraphPattern::Extend {
            inner: Box::new(GraphPattern::Bgp { patterns: vec![] }),
            variable: variable_on_left.clone(),
            expression: Expression::Variable(variable_on_left.clone()),
        }),
        right: Box::new(right),
        expression: None,
    }),
};

@Tpt
Copy link
Collaborator Author

Tpt commented Nov 8, 2022

One thing that I've been stubbing my toe on now is the join behaviour of GraphPattern::Sequence. As far as I can tell it will internally generate a ForLoopJoin, which joins the resulting records via a normal join, with no option of turning it into a left join. Is that correct?

Yes.

In most situations there was a way to work around that (or the alternative way of constructing the query was more optimized anyway), but I seem to be encountering situation where a LeftSequence would be the only way to get the logic to work correctly AFAIK.

That's a great point. This make me realise I have made a huge mistake in the current implementation of OX_LATERAL with optionals: if we write

SELECT * WHERE {
  VALUES ?s { ex:s }
  OPTIONAL {
    FILTER(!BOUND(?s))
    VALUES ?o { ex:o }
  }
}

it returns properly (?s = ex:s, ?o = ex:o) because in the left joins the two sides should be evaluated independently.
But in

SELECT * WHERE {
  VALUES ?s { ex:s }
  OPTIONAL {
    OX_LATERAL(?s) {SELECT * WHERE {
    FILTER(!BOUND(?s))
    VALUES ?o { ex:o }
  }}
}

it returns (?s = ex:s, ?o = UND) because ?s is wrongly captured inside the optional, breaking the independence.

To fix that properly I believe we should introduce as you said a LeftSequence that would properly support the semantic of running the optional for each tuple outside of it and surface it into the SPARQL syntax with something like OX_LATERAL_OPTIONAL { O } or OX_LATERAL OPTIONAL { O }.

This way:

SELECT * WHERE {
  VALUES ?s { ex:s }
  OX_LATERAL OPTIONAL {
    FILTER(!BOUND(?s))
    VALUES ?o { ex:o }
  }
}

would return (?s = ex:s, ?o = UND) as expected and would allow the usage of LATERAL subqueries inside it.

Side note: it is likely we would also need "LATERAL" support for the "GRAPH" and "SERVICE" keywords.

What do you think about it?

@Tpt
Copy link
Collaborator Author

Tpt commented Nov 10, 2022

@hobofan I have updated this pull request. The OX_LATERAL subqueries inside OPTIONAL are now properly independant from the outside code. I have introduced OX_LATERAL OPTIONAL and a LeftSequence operator that should provide the behavior you are looking for.

Side note: The python failure in the CI is independant of this PR.

@hobofan
Copy link
Contributor

hobofan commented Nov 16, 2022

Thanks! I've updated our usage to those new changes, and they work great! With this we can now fully express the lateral join intentions in serialized SPARQL queries (which previously only were possible via in-memory construction of GraphPattern) and I was able to remove the workaround I outlined above! :)

The only remaining thing that feels a bit weird is having the repetition in constructs like these (which will/would feel less verbose with a stabilized LATERAL keyword), but all things considered that's not a big deal.

            OX_LATERAL OPTIONAL {
                OX_LATERAL ( ?AgileTeam ?FromDate ?ToDate )
                {
                    SELECT ?AgileTeam (count(?deployment) AS ?countDeploymentsOnDate)

Sorry for not being more responsive when it comes to designing the syntax, but I feel like you (and the other people commenting in related issues) have a very good grasp on it, and I don't think I have much to add on that front. What I can do is giving feedback from the "consumer" perspective, and from that point of view, the feature feels great! :)

@Tpt
Copy link
Collaborator Author

Tpt commented Nov 16, 2022

@hobofan Thank you! Great to hear it works for you and you avoided the algebra hacks. Indeed having two nested "LATERAL" is quite verbose but not having them would mean that everything inside "LATERAL" is going to be "LATERAL" so we would have to build a way to "escape" LATERAL to express some queries...

What I can do is giving feedback from the "consumer" perspective, and from that point of view, the feature feels great! :)

Thank you! "Customer" feedbacks are amazing to check that the features are actually usable and understandable.

@Tpt
Copy link
Collaborator Author

Tpt commented Dec 28, 2022

I have opened #336 with an implementation following SPARQL 1.2 SEP-0006 and compatible with Apache Jena.

@Tpt
Copy link
Collaborator Author

Tpt commented Feb 6, 2023

The other implementation is now merged and released

@Tpt Tpt closed this Feb 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants