[Query Planning Refactor]: SimpleExecute #1008

chewselene · 2021-04-16T17:07:34Z

This is part of a larger refactor to streamline the way we generate query plans. This is the simplest form of query plan where the given query is executed against the database.

Future work includes adding paginated queries and cross database queries.

There are some outstanding TODOs mainly around how to best utilize provider_id to look up both the corresponding schema and corresponding database query execution function.

obi1kenobi

This looks quite good, thank you for putting it together!

Since this is still a draft and we'll want a couple more iterations to finalize everything, it might be good to try splitting it up into a few parts. For example, the following parts have straightforward mutual dependencies and could in principle merge separately, shrinking the code size and making iteration and reviews easier:

the namedtuple -> dataclass part
the "move existing planning functionality to a new module" part
the new typedefs
the code that uses the new typedefs to do query planning

graphql_compiler/query_planning_and_execution/typedefs.py

setup.py

obi1kenobi · 2021-04-20T15:54:56Z

graphql_compiler/query_planning_and_execution/make_query_plan.py

+    query_and_parameters: QueryStringWithParameters,
+    schema_info: Union[CommonSchemaInfo, SQLAlchemySchemaInfo],
+    provider_id: str,
+    backend_type: Optional[str],


I'm not in love with the fact that this is Optional, but I don't have a concrete suggestion in mind. Maybe we can work together to figure something out?

I switched this to use the BackendType enum so I think it makes sense for it to not be optional. It is still a bit weird because I currently check if schema_info is of type SQLAlchemySchemaInfo that backend_type is either MSSQL or PostgreSQL, but I don't check if it matches the dialect in the schema info. If I add the check, we could run into the problem that you mentioned

SQLAlchemy version adds a new dialect for MSSQL that doesn't inherit from MSDialect — our code immediately becomes wrong in yet another place

so I'm not sure what the best solution is...

obi1kenobi · 2021-04-20T15:58:33Z

graphql_compiler/query_planning_and_execution/make_query_plan.py

+
+        provider_metadata = ProviderMetadata(
+            backend_type=schema_info.dialect.name,
+            requires_fold_postprocessing=isinstance(schema_info.dialect, MSDialect),


One of the reasons why I'd love backend_type to be non-optional and an enum is so that we can make this check stricter. Here's what I'm worried about: imagine a future SQLAlchemy version adds a new dialect for MSSQL that doesn't inherit from MSDialect — our code immediately becomes wrong in yet another place.

Another (and perhaps even better) alternative would be to add requires_fold_postprocessing or something like it to the SQLAlchemySchemaInfo. After all, that object knows the dialect (and therefore the emitted SQL) better than anything else, so realistically it should be the one specifying whether post-processing is required or not. From that perspective, ProviderMetadata would be viewed as simply representing the parts of SQLAlchemySchemaInfo that the client may need to know as part of executing the query, which is appealing.

I like this idea! I will work on a separate PR to incorporate.

obi1kenobi · 2021-04-20T16:06:50Z

graphql_compiler/query_planning_and_execution/make_query_plan.py

+    query_and_parameters: QueryStringWithParameters,
+    provider_id: str,
+    *,
+    desired_page_size: Optional[int] = 5000,


I'm kind of on the fence about setting a default here. The right number will depend on each user's statistics collection configuration — aside from some special cases, we can't generate pages that are more fine-grained than the statistics backing them. Within Kensho, we always know and control what that configuration might be. But in general, we might not.

On the flip side, not setting a number would make pagination disabled by default, and that's also a default value — and not a good one. Even if the statistics aren't fine-grained enough, the compiler will emit advisories when paginating which should be an obvious and "soft" way to communicate the issue. And 5000 is a pretty decent compromise between large page sizes (for high throughput) and not blowing up the server or client.

What do you think?

I guess the alternative is that we make this a required param? Do you think it would be better to force the user to make a conscious decision here since it does depend on user stats?

graphql_compiler/query_planning_and_execution/__init__.py

chewselene · 2021-04-26T22:32:00Z

graphql_compiler/query_planning/make_query_plan.py

+            compilation_result = compile_graphql_to_match(
+                schema_info, query_and_parameters.query_string
+            )
+        # TODO(selene): add InterpreterAdapter based backends


I'm unsure of how to deal with InterpreterAdapters. The public methods are interpret_ir and interpret_query. SimpleExecute is set up to contain a query string, which is the input of interpret_query, but that function also requires the schema to be passed. It seems like to want to use the schema to compile the query in this function, but then our output (the equivalent of the query in SimpleExecute) would be IrAndMetadata rather than a query string. Making the type of query be Union[str, IrAndMetadata] doesn't really seem to make sense, but maybe it would be more tolerable if we renamed query to backend_input or something similar? Thoughts?

Selene Chew added 4 commits April 7, 2021 11:52

WIP SimpleExecute

21ff9de

generate_query_plan implementation

0b04429

add execute_query_plan

c31d8f1

merge main

68b8ebe

chewselene requested review from bojanserafimov, obi1kenobi and branen April 16, 2021 17:08

obi1kenobi reviewed Apr 20, 2021

View reviewed changes

This was referenced Apr 20, 2021

Convert SQLAlchemySchemaInfo to a dataclass #1010

Merged

reorganize query planning and execution #1011

Merged

Selene Chew added 2 commits April 22, 2021 11:32

Merge branch 'main' into sc_query_execution

029be77

merge main

41f7bb5

chewselene mentioned this pull request Apr 22, 2021

Add query planning typedefs #1013

Merged

Selene Chew added 2 commits April 26, 2021 13:33

merge main

8fd07cc

fix spacing

ba8c663

chewselene mentioned this pull request Apr 26, 2021

Fix spelling errors #1014

Merged

Selene Chew added 2 commits April 26, 2021 14:43

Merge branch 'main' into sc_query_execution

3fa161f

use enums for backend type

646db34

chewselene commented Apr 26, 2021

View reviewed changes

chewselene mentioned this pull request Apr 27, 2021

Add fold post-processing bool to SQLAlchemySchemaInfo #1015

Open

bojanserafimov removed their request for review April 12, 2022 18:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Query Planning Refactor]: SimpleExecute #1008

[Query Planning Refactor]: SimpleExecute #1008

chewselene commented Apr 16, 2021

obi1kenobi left a comment

obi1kenobi Apr 20, 2021

chewselene Apr 26, 2021

obi1kenobi Apr 20, 2021

chewselene Apr 26, 2021

obi1kenobi Apr 20, 2021

chewselene Apr 26, 2021

chewselene Apr 26, 2021

[Query Planning Refactor]: SimpleExecute #1008

Are you sure you want to change the base?

[Query Planning Refactor]: SimpleExecute #1008

Conversation

chewselene commented Apr 16, 2021

obi1kenobi left a comment

Choose a reason for hiding this comment

obi1kenobi Apr 20, 2021

Choose a reason for hiding this comment

chewselene Apr 26, 2021

Choose a reason for hiding this comment

obi1kenobi Apr 20, 2021

Choose a reason for hiding this comment

chewselene Apr 26, 2021

Choose a reason for hiding this comment

obi1kenobi Apr 20, 2021

Choose a reason for hiding this comment

chewselene Apr 26, 2021

Choose a reason for hiding this comment

chewselene Apr 26, 2021

Choose a reason for hiding this comment