Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature suggestion - explicit prepared statements #527

Closed
Lexcon opened this issue Feb 23, 2019 · 8 comments
Closed

Feature suggestion - explicit prepared statements #527

Lexcon opened this issue Feb 23, 2019 · 8 comments

Comments

@Lexcon
Copy link

Lexcon commented Feb 23, 2019

Not sure how difficult this would be to make but it would be cool to be able to work with prepared statements. Suppose you need to run a query very frequently, this query has a thousand bytes of query text and it returns nothing. Then by just running the prepared the statement you could decrease network traffic by a factor 100 or so. I know the likes of SQL Server are pretty smart in re-using optimization plans but that doesn't reduce network traffic. Especially useful if you don't want to have the overhead to make a stored procedure of every query:

prep = myconnection.prepareStatement('.....')
prep.run(parameters)

@mkleehammer
Copy link
Owner

When you use Cursor.execute with parameters, the SQL is prepared automatically and submitted with the parameters you give it. The prepared statement is cached and if you execute the same SQL again it is reused. As long as you keep executing the same SQL, you are using a prepared statement.

cursor = cnxn.cursor()
for i in range(5):
    cursor.execute("insert into t1(n) values (?)", i)

In this example, the statement is only prepared the first time it is executed, for value 0. The existing prepared statement is then reused for values 1-4.

A prepared statement in ODBC requires an HSTMT, which is a Cursor in pyodbc, so the following two things would be identical:

SQL1 = "insert into t1(n) values (?)"
SQL2 = "insert into t2(n) values (?)"
# A hypothetical prepare function:
prep1 = cnxn.prepare(SQL1)
prep2 = cnxn.prepare(SQL2)
prep1.run(1)
prep1.run(100)
prep2.run(2)
prep2,run(200)

# Identical code with current design:
cursor1 = cnxn.cursor()
cursor2 = cnxn.cursor()
cursor1.execute(SQL1, 1)
cursor1.execute(SQL1, 100)
cursor1.execute(SQL2, 2)
cursor1.execute(SQL2, 200)

We can see that we already get the performance benefit of prepared statements today, but that doesn't mean it wouldn't be a good idea to introduce a PreparedStatement to make it clearer how it works. At the moment I'd prefer not to, but I would like to hear what others like @gordthompson think about it.

One danger might be that novices would be more likely to create multiple prepared statements and not close them immediately, creating multiple open statements on the same connection. In some libraries like early JDBC, it is the only way to pass parameters, so novices might create prepared statements all the time when they don't need to. Some drivers simply do not support this. (There is a way to ask the driver.). Using a separate connection under the covers is a non-starter due to transaction isolation issues. Even with multiple statements on a connection, I don't know if there are possible problems with isolation.

Hopefully the first part of the answer gets you the performance you need. I'm open to discussion of whether it is a useful interface construct.

@mkleehammer mkleehammer changed the title Feature suggestion Feature suggestion - explicit prepared statements Feb 23, 2019
@Lexcon
Copy link
Author

Lexcon commented Feb 23, 2019 via email

@mkleehammer
Copy link
Owner

To expand on one of my previous examples, you can easily do this today by putting Cursors into a dictionary. Allocate a cursor for each SQL statement you want prepared. The code would be almost identical to what you would expect the cnxn.prepare() code to be.

For each cursor, as long as the current query you are executing matches the previous, it will reuse the prepared statement. Here is an example:

cache = {}

def execute(sql, *params):
    cursor = cache.get(sql)
    if not cursor:
        cursor = get_connection().cursor()
        cache[sql] = cursor
    cursor.execute(sql, *params)
    cursor.commit()
    return cursor

def save():
    execute("update session ...", user_id, session_data)

This creates a single cursor for each of the statements you want to prepare and reuses it. This is very simplistic, but you get the idea. The important point is it would be almost identical to the code you would use if you had a prepared statement object. You would need to check the cache, create one if you didn't have it, get the correct connection when you create one, then execute and commit.

To turn this into real code, you'd probably want to put a retry loop into execute to retry at least once. That will automatically handle stale connections. Adding a round trip test like "select 1" first would eliminate your performance benefits. After all, the DB server is already caching query plans whether you use a prepared statement or not.

@mkleehammer
Copy link
Owner

For those that find this in the future, I'll also expand on how it works:

Under the covers, pyodbc.Cursor has a Prepare function. Anytime you call Cursor.execute and provide parameters, it will use a prepared statement. That's simply a requirement of passing parameters with ODBC.

When preparing, the Cursor also keeps a copy of the SQL (pPreparedSQL in cursor.h). When executing it first compares the SQL with the previous SQL. If it is the same, the cursor is already prepared and it can simply forward the parameters to the server. If it is not the same, the new SQL is prepared and a copy of it is kept.

@Lexcon
Copy link
Author

Lexcon commented Feb 23, 2019 via email

@mkleehammer
Copy link
Owner

Glad to help. After thinking about it some more, I'll close this for now. I think the best solution is to add some documentation to the wiki about how prepared statements work.

@Lexcon
Copy link
Author

Lexcon commented Feb 26, 2019 via email

@jkyeung
Copy link

jkyeung commented Jan 27, 2023

Not sure where the best place to ask this is:

What if we purposely want to throw away a cached SQL statement? My use case is that, due to aliases, an identical SQL statement can be used on tables with same-named but different-sized columns. For example, say one table has an 80-character column and another table has a 92-character column of the same name. If I happen to retrieve the shorter one first, then I'll only get the first 80 characters when I try to retrieve the longer one later.

Apologies if there is already a way to achieve what I am trying to do. Just point me to the relevant documentation.

Related: #771, #214

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants