Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support PREPARE statements without explicit parameters #4683

Closed
alamb opened this issue Dec 20, 2022 · 2 comments · Fixed by #4701
Closed

Support PREPARE statements without explicit parameters #4683

alamb opened this issue Dec 20, 2022 · 2 comments · Fixed by #4701

Comments

@alamb
Copy link
Contributor

alamb commented Dec 20, 2022

Currently, DataFusion has support for Prepared statements (thanks @NGA-TRAN !) https://github.com/apache/arrow-datafusion/blob/fddb3d3651041f41d66a801f10e27387e84374f7/datafusion/expr/src/logical_plan/plan.rs#L119-L121

However, currently the types of the parameters must be supplied

@avantgardnerio noted that the types of the parameters in the postgres PREPARE statement https://www.postgresql.org/docs/current/sql-prepare.html#:~:text=A%20prepared%20statement%20is%20a,statement%20is%20planned%20and%20executed. is optional

A corresponding list of parameter data types can optionally be specified. When a parameter's data type is not specified or is declared as unknown, the type is inferred from the context in which the parameter is first referenced (if possible).

So for example

SELECT id, age  FROM person WHERE age = $1

And the planner should be able to and infer the types (in this case, I assume create table person (int id, int age)) so the parameter $1 must be of type int and return that to the FlightSQL Client

@alamb alamb changed the title Support EXECUTE statement for prepared statement Support PREPARE statements without explicit parameters Dec 20, 2022
@avantgardnerio
Copy link
Contributor

avantgardnerio commented Dec 20, 2022

Thanks @alamb for writing up a ticket, and thank you @NGA-TRAN for building out an implementation!

My personal backstory is that I'm trying to run https://github.com/AgilData/tpcc which defines a number of prepared statements: https://gist.github.com/avantgardnerio/5bdf77297956add71cbf664d3366224b

When the JDBC client calls the server, it doesn't use the prepare ... syntax - it just passes the query verbatim to do_action_create_prepared_statement() which accepts a request:

pub struct ActionCreatePreparedStatementRequest {
    /// The valid SQL string to create a prepared statement for.
    #[prost(string, tag = "1")]
    pub query: ::prost::alloc::string::String,
}

and returns a response:

pub struct ActionCreatePreparedStatementResult {
    /// Opaque handle for the prepared statement on the server.
    #[prost(bytes = "vec", tag = "1")]
    pub prepared_statement_handle: ::prost::alloc::vec::Vec<u8>,
    /// If a result set generating query was provided, dataset_schema contains the
    /// schema of the dataset as described in Schema.fbs::Schema, it is serialized as an IPC message.
    #[prost(bytes = "vec", tag = "2")]
    pub dataset_schema: ::prost::alloc::vec::Vec<u8>,
    /// If the query provided contained parameters, parameter_schema contains the
    /// schema of the expected parameters as described in Schema.fbs::Schema, it is serialized as an IPC message.
    #[prost(bytes = "vec", tag = "3")]
    pub parameter_schema: ::prost::alloc::vec::Vec<u8>,
}

This leads me to believe that in order to be compatible with the FlightSQL JDBC client and apps that use it, the type inference has to happen in the DataFusion planner.

That being said, implementation might not be too difficult: based upon the statements in the git gist above, types always seem to be:

  1. in a values clause and infer-able from what they are being inserted into
  2. or in an assignment clause, and infer-able from the LHS

I think it would be possible to take:

    Placeholder {
        /// The identifier of the parameter (e.g, $1 or $foo)
        id: String,
        /// The type the parameter will be filled in with
        data_type: DataType,
    },

and make data_type: Option<DataType>.

This way the placeholders could serve for a 1st pass, then a second inference pass could walk the tree, infer the types and set them if possible or error if it can't figure them out.

Just a suggestion - let me know if I can help out more.

@alamb
Copy link
Contributor Author

alamb commented Dec 21, 2022

This way the placeholders could serve for a 1st pass, then a second inference pass could walk the tree, infer the types and set them if possible or error if it can't figure them out.

I think this is a good high level plan (and one that has been sketched out in #4701 )

That being said, implementation might not be too difficult: based upon the statements in the git gist above, types always seem to be:

I think starting with simple rules (aka use the type of the other side of the operator) would be good and then we can extend the rules to be more sophisticated over time as needs evolve (like NOT $1 probably could infer $1 to be boolean, for example)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants