Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace Query with a more fitting name #713

Open
piodul opened this issue Apr 25, 2023 · 7 comments
Open

Replace Query with a more fitting name #713

piodul opened this issue Apr 25, 2023 · 7 comments
Assignees
Labels
API-breaking This might introduce incompatible API changes API-stability Part of the effort to stabilize the API area/documentation Improvements or additions to documentation area/statement-execution
Milestone

Comments

@piodul
Copy link
Collaborator

piodul commented Apr 25, 2023

Confusingly, the driver calls unprepared statement a Query (contrary to PreparedStatement). It's not a correct term and can lead to confusion. "Queries" are statements that ask for data and receive some data in return. "Statement" is a broader term and encompasses items that don't return data but rather modify data or schema, e.g. INSERT or CREATE. The Query object has no problems with statements that are not queries.

I suggest to rename Query to something that fits its purpose better, e.g. UnpreparedStatement or SimpleStatement - need to decide upon something, it's probably the best to look at the naming in other drivers. To ease transition to the new name, we will re-introduce Query as a type alias, then deprecate and eventually remove at some point.

The documentation will have to be adjusted as well as it seems to use "query" and "statement" interchangeably, which is wrong due to the reasons described above but was probably caused by confusion resulting from the existing Query name.

@piodul piodul added this to the 1.0.0 milestone Apr 25, 2023
@piodul piodul added the area/documentation Improvements or additions to documentation label Apr 25, 2023
@wprzytula wprzytula added the API-stability Part of the effort to stabilize the API label Jul 30, 2023
@nyh
Copy link

nyh commented Sep 12, 2023

By the way, the same logic also applies to the names of the Session methods - query() and execute(). When I started using the Rust driver (yesterday ;-)) it wasn't immediately obvious why the unprepared version of execute() is called "query" and not execute_unprepared, or something like it. It's not a huge problem (I just read the whole Session documentation and found everything) but it was strange.

Sadly, Rust doesn't seem to have function overloading, so it's not possible to allow the same name execute() for both prepared and unprepared versions, as done in Python for example, but at least the names could be closer, and perhaps also physically closer in the sorted documentation.

@piodul
Copy link
Collaborator Author

piodul commented Sep 12, 2023

By the way, the same logic also applies to the names of the Session methods - query() and execute(). When I started using the Rust driver (yesterday ;-)) it wasn't immediately obvious why the unprepared version of execute() is called "query" and not execute_unprepared, or something like it. It's not a huge problem (I just read the whole Session documentation and found everything) but it was strange.

Sadly, Rust doesn't seem to have function overloading, so it's not possible to allow the same name execute() for both prepared and unprepared versions, as done in Python for example, but at least the names could be closer, and perhaps also physically closer in the sorted documentation.

Function overloading can be simulated to some extent via traits. For example, PreparedStatement and UnpreparedStatement could implement a trait (let's say it's called Executable), and then you could have:

pub async fn execute(&self, stmt: &impl Executable) {
    // ...
}

Maybe this could be extended to batches as well.

@cvybhu
Copy link
Contributor

cvybhu commented Sep 12, 2023

Function overloading can be simulated to some extent via traits. For example, PreparedStatement and UnpreparedStatement could implement a trait (let's say it's called Executable), and then you could have.

I'm not a big fan of stuff like this. I feel like it's going to devolve into an unreadable web of traits, Executables, Executors etc. Kind of like what has happened in cdrs:
image

A separate function for each type of query is fine, it's easy to read and easy to use, even if we have to use a bunch of different names. KISS.

@Lorak-mmk
Copy link
Collaborator

Lorak-mmk commented Feb 27, 2024

Query is a bit weird after serialization refactor. Due to the need to know column types in order to serialize values, we can't just send Query with values, so the driver internally prepares it - but only if values passed are non-empty.
Hiding this preparation step may be surprising, even though there are warnings in the documentation about it. I think that preparing should be explicit - for implicit preparing there is CachingSession.
It is a problematic footgun for batches, where each query needs to be prepared.

There is also potential use case of BoundStatement (#941).

Batch interface is also a bit weird. Even right now it requires empty elements in values iterator on the positions corresponding to Query, it would be the same with BoundStatement.
It also always seemed to me to be a bit clunky - Rust is all about type-level safety and making illegal states irrepresentable, but here we have 2 vectors, one for pairs and one for values,
instead of one vector of statements.

I see 2 main paths forward., correct me if I missed any.

KISS, @cvybhu's approach

If we decide to do so, remove values arguments from query, query_paged, query_iter.
For BoundStatement introduce another set of functions - like execute_bound, execute_bound_paged, execute_bound_iter.
We would also need to add new variant to BatchStatement enum so that BoundStatements can be used there.

Advantages:

  • Simple interface without weird traits
  • Probably easy straightforward implementation without many changes

Disadvantages:

  • A lot of variants of basically the same methods
  • Hard / impossible to write user code that takes arbitrary statement (unprepared / prepared / bound), potentially with values, and executes it.
  • Still need to pass empty elements in batches for Query / BoundStatement

Executable trait

Create some trait, let's call it Executable as @piodul proposed. execute / execute_paged / execute_iter would accept impl Executable instead of statement + values.
batch would accept iterator of Executable.
For PreparedStatement executable would be implemented for (PreparedStatement, impl SerializeRow) to allow passing values.
BoundStatement would just implement Executable itself.
Query would do either, depends on wether we decide to allow it to take values.
Batch would also implement Executable which would eliminate need for a separate method. There would be just execute / execute_paged / execute_iter.

Advantages:

  • Safer batch interface, without possibilities of query / value mismatches and no need to pass artificial empty values
  • Small Session interface, without separate method set for each query type and batch.
  • Simplified code, with a part of logic moved from Session / Connection to trait implementations.
  • Easier to write code generic for arbitrary statements

Disadvantages:

  • Harder implementation (I don't see how would Executable trait look right now), more API-breaking changes
  • Executing prepared statement with values would require additional parentheses (session.execute((statement, values))).

I like Executable because of simplification and improved batch interface, but I'm not yet fully convinced it's the best way to go.
I also have no idea what methods should this trait have, I'd have to think about it.

WDYT @piodul ?

@wprzytula
Copy link
Collaborator

I like the idea of Executable trait, too. Further analysis is needed to design an API of such trait and then probably some try-on implementation.

@roydahan
Copy link
Collaborator

roydahan commented Feb 2, 2025

@muzarski was this one addressed by the merged PRs or need another PR?

@muzarski
Copy link
Contributor

muzarski commented Feb 2, 2025

Today we had a small discussion about it on Slack. There are different opinions on whether we should actually rename the struct or not. I decided to see the naming conventions in our other drivers for reference. So, here it is - the breakdown of execution APIs throughout the drivers

Rust driver

We distinguish three kinds of statements: Query (unprepared statement), PreparedStatement and Batch. The session object exposes query_[*], execute_[*] and batch methods for each of them respectively - i.e., query_[*] methods accept impl Into<Query> (something that we can construct Query object from, for example Query itself, or a String), execute_[*] accepts PreparedStatement, etc.

C/C++ driver

Cpp driver has three different types for statements, but only two session methods for execution. I will describe them briefly. Statement types:

  • CassPrepared - a prepared statement (this cannot be yet used in execution method!!!)
  • CassBatch - a batch statement
  • CassStatement - a bound statement - it can be either unprepared (via cass_statement_new), or prepared (via cass_prepared_bind) statement. It exposes public API to bind the values to the statement. The cass_prepared_bind method takes a CassPrepared and creates a CassStatement object, which references the original prepared statement.

Execution methods:

  • cass_session_execution_batch - self-explanatory, takes a CassBatch instance and executes it
  • cass_session_execute - accepts CassStatement instance. This means that it can be used for both unprepared and prepared statements.

Python driver

Here we will start with execution methods. Because there is only one - Session::execute(). Since it is a python, the argument to this can be technically of any type. But the documentation mentions: "query" may be a query string or an instance of :class:"cassandra.query.Statement". This method accepts the statement parameters as well.

Statement types (from query.py):

  • Statement - an abstract class. Instance of this class (or its subclass) can be passed to Session::execute().
  • PreparedStatement - returned from Session::prepare. It cannot be executed, since it does not derive from Statement
  • SimpleStatement - an unprepared statement. It derives from Statement, thus can be executed
  • BatchStatement - batch statement - derives from Statement
  • BoundStatement - a bound statement - can be obtained from PreparedStatement::bind(). This is a prepared statement with some bound set of values. It derives from Statement.

Java driver (4.x)

There are a lot of things going on, specific to Java - i.e. a lot of interfaces, subclassing etc. I think the easiest way to tackle this is to focus on SyncCqlSession::execute() methods. This method is overloaded - all overloads accept something that represents a statement/query - it can be a String, or an implementor of Statement interface. Some overloads accept bound values as well.

Now let's focus on Statement interface. Again, we can distinguish three main implementors:

  • BatchStatement
  • BoundStatement - a prepared statement with bound values. Obtained from PreparedStatement::bind method
  • SimpleStatement - an unprepared statement

We can see that API is very similar to Python driver's API (even the same names).

Gocql

Now this one is a bit confusing to me, and differs by a lot from different drivers. So if I misunderstood something, please correct me. There is only one object representing either unprepared or prepared statement - namely Query. To create a Query instance, user can run Session::Query or Session::Bind method - AFAIU, the difference is subtle, with Bind accepting some custom function/lambda to bind the values based on provided QueryInfo. What's important, is that such Query is automatically prepared - the documentation mentions Query is automatically prepared if it has not previously been executed.. The Query can then be executed/run using different methods such as Session::executeQuery or Query::execute.

There is a separate set of public API methods/structs for batches.

Summary

Most of the drivers stick to (Simple)Statement naming conventions for unprepared statements. The only exception is gocql, which does not distinguish between unprepared and prepared statements. Whenever "query" is mentioned in the documentation, it's used in two contexts:

  • a string content of the statement - for example documentation of SimpleStatement from Java driver: A one-off CQL statement consisting of a query string with optional placeholders, and a set of values for these placeholders.
  • something that can be executed, but it is used interchangeably with Statement in such context. For example the documentation of SyncCqlSession::execute: statement - the CQL query to execute (that can be any [Statement]).

My proposition is to go with (Simple)Statement naming convention for the unprepared statements. I think that UnpreparedStatement is too verbose (and too long). cc: @Lorak-mmk @wprzytula

@Lorak-mmk Lorak-mmk modified the milestones: 0.16.0, 1.0.0 Feb 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API-breaking This might introduce incompatible API changes API-stability Part of the effort to stabilize the API area/documentation Improvements or additions to documentation area/statement-execution
Projects
None yet
Development

No branches or pull requests

8 participants