Feature suggestion - explicit prepared statements #527

Lexcon · 2019-02-23T12:35:44Z

Not sure how difficult this would be to make but it would be cool to be able to work with prepared statements. Suppose you need to run a query very frequently, this query has a thousand bytes of query text and it returns nothing. Then by just running the prepared the statement you could decrease network traffic by a factor 100 or so. I know the likes of SQL Server are pretty smart in re-using optimization plans but that doesn't reduce network traffic. Especially useful if you don't want to have the overhead to make a stored procedure of every query:

prep = myconnection.prepareStatement('.....')
prep.run(parameters)

mkleehammer · 2019-02-23T17:43:30Z

When you use Cursor.execute with parameters, the SQL is prepared automatically and submitted with the parameters you give it. The prepared statement is cached and if you execute the same SQL again it is reused. As long as you keep executing the same SQL, you are using a prepared statement.

cursor = cnxn.cursor()
for i in range(5):
    cursor.execute("insert into t1(n) values (?)", i)

In this example, the statement is only prepared the first time it is executed, for value 0. The existing prepared statement is then reused for values 1-4.

A prepared statement in ODBC requires an HSTMT, which is a Cursor in pyodbc, so the following two things would be identical:

SQL1 = "insert into t1(n) values (?)"
SQL2 = "insert into t2(n) values (?)"
# A hypothetical prepare function:
prep1 = cnxn.prepare(SQL1)
prep2 = cnxn.prepare(SQL2)
prep1.run(1)
prep1.run(100)
prep2.run(2)
prep2,run(200)

# Identical code with current design:
cursor1 = cnxn.cursor()
cursor2 = cnxn.cursor()
cursor1.execute(SQL1, 1)
cursor1.execute(SQL1, 100)
cursor1.execute(SQL2, 2)
cursor1.execute(SQL2, 200)

We can see that we already get the performance benefit of prepared statements today, but that doesn't mean it wouldn't be a good idea to introduce a PreparedStatement to make it clearer how it works. At the moment I'd prefer not to, but I would like to hear what others like @gordthompson think about it.

One danger might be that novices would be more likely to create multiple prepared statements and not close them immediately, creating multiple open statements on the same connection. In some libraries like early JDBC, it is the only way to pass parameters, so novices might create prepared statements all the time when they don't need to. Some drivers simply do not support this. (There is a way to ask the driver.). Using a separate connection under the covers is a non-starter due to transaction isolation issues. Even with multiple statements on a connection, I don't know if there are possible problems with isolation.

Hopefully the first part of the answer gets you the performance you need. I'm open to discussion of whether it is a useful interface construct.

Lexcon · 2019-02-23T18:07:20Z

In my case, i'm running a long running process servicing a web site. There are about a dozen queries used over and over again e.g. to save and get sessions. It would be nice to be able to pool a few prepared statements. I don't think my use case is unique, it's rather typical.A pool of prepared statements could be much like a dict of hashes made from the statement sql, pointing to the HSTMT values. They could have a keepalive time so they are discarded when not used.Verzonden vanaf mijn Samsung-apparaat

…

-------- Oorspronkelijk bericht -------- Van: Michael Kleehammer <[email protected]> Datum: 23-02-19 18:43 (GMT+01:00) Aan: mkleehammer/pyodbc <[email protected]> Cc: Lexcon <[email protected]>, Author <[email protected]> Onderwerp: Re: [mkleehammer/pyodbc] Feature suggestion (#527) When you use Cursor.execute with parameters, the SQL is prepared automatically and submitted with the parameters you give it. The prepared statement is cached and if you execute the same SQL again it is reused. As long as you keep executing the same SQL, you are using a prepared statement. cursor = cnxn.cursor() for i in range(5): cursor.execute("insert into t1(n) values (?)", i) In this example, the statement is only prepared the first time it is executed, for value 0. The existing prepared statement is then reused for values 1-4. A prepared statement in ODBC requires an HSTMT, which is a Cursor in pyodbc, so the following two things would be identical: SQL1 = "insert into t1(n) values (?)" SQL2 = "insert into t2(n) values (?)" # A hypothetical prepare function: prep1 = cnxn.prepare(SQL1) prep2 = cnxn.prepare(SQL2) prep1.run(1) prep1.run(100) prep2.run(2) prep2,run(200) # Identical code with current design: cursor1 = cnxn.cursor() cursor2 = cnxn.cursor() cursor1.execute(SQL1, 1) cursor1.execute(SQL1, 100) cursor1.execute(SQL2, 2) cursor1.execute(SQL2, 200) We can see that we already get the performance benefit of prepared statements today, but that doesn't mean it wouldn't be a good idea to introduce a PreparedStatement to make it clearer how it works. At the moment I'd prefer not to, but I would like to hear what others like @gordthompson think about it. One danger might be that novices would be more likely to create multiple prepared statements and not close them immediately, creating multiple open statements on the same connection. In some libraries like early JDBC, it is the only way to pass parameters, so novices might create prepared statements all the time when they don't need to. Some drivers simply do not support this. (There is a way to ask the driver.). Using a separate connection under the covers is a non-starter due to transaction isolation issues. Even with multiple statements on a connection, I don't know if there are possible problems with isolation. Hopefully the first part of the answer gets you the performance you need. I'm open to discussion of whether it is a useful interface construct. —You are receiving this because you authored the thread.Reply to this email directly, view it on GitHub, or mute the thread. {"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/mkleehammer/pyodbc","title":"mkleehammer/pyodbc","subtitle":"GitHub repository","main_image_url":"https://github.githubassets.com/images/email/message_cards/header.png","avatar_image_url":"https://github.githubassets.com/images/email/message_cards/avatar.png","action":{"name":"Open in GitHub","url":"https://github.com/mkleehammer/pyodbc"}},"updates":{"snippets":[{"icon":"PERSON","message":"@mkleehammer in #527: When you use Cursor.execute with parameters, the SQL is prepared automatically and submitted with the parameters you give it. The prepared statement is cached and if you execute the same SQL again it is reused. As long as you keep executing the same SQL, you are using a prepared statement.\r\n\r\n```python\r\ncursor = cnxn.cursor()\r\nfor i in range(5):\r\n cursor.execute(\"insert into t1(n) values (?)\", i)\r\n```\r\n\r\nIn this example, the statement is only prepared the first time it is executed, for value 0. The existing prepared statement is then reused for values 1-4.\r\n\r\nA prepared statement in ODBC requires an HSTMT, which is a Cursor in pyodbc, so the following two things would be identical:\r\n\r\n```python\r\nSQL1 = \"insert into t1(n) values (?)\"\r\nSQL2 = \"insert into t2(n) values (?)\"\r\n# A hypothetical prepare function:\r\nprep1 = cnxn.prepare(SQL1)\r\nprep2 = cnxn.prepare(SQL2)\r\nprep1.run(1)\r\nprep1.run(100)\r\nprep2.run(2)\r\nprep2,run(200)\r\n\r\n# Identical code with current design:\r\ncursor1 = cnxn.cursor()\r\ncursor2 = cnxn.cursor()\r\ncursor1.execute(SQL1, 1)\r\ncursor1.execute(SQL1, 100)\r\ncursor1.execute(SQL2, 2)\r\ncursor1.execute(SQL2, 200)\r\n```\r\n\r\nWe can see that we already get the performance benefit of prepared statements today, but that doesn't mean it wouldn't be a good idea to introduce a PreparedStatement to make it clearer how it works. At the moment I'd prefer not to, but I would like to hear what others like @gordthompson think about it.\r\n\r\nOne danger might be that novices would be more likely to create multiple prepared statements and not close them immediately, creating multiple open statements on the same connection. In some libraries like early JDBC, it is the only way to pass parameters, so novices might create prepared statements all the time when they don't need to. Some drivers simply do not support this. (There is a way to ask the driver.). Using a separate connection under the covers is a non-starter due to transaction isolation issues. Even with multiple statements on a connection, I don't know if there are possible problems with isolation.\r\n\r\nHopefully the first part of the answer gets you the performance you need. I'm open to discussion of whether it is a useful interface construct."}],"action":{"name":"View Issue","url":"#527 (comment)"}}} [ { "@context": "http://schema.org", "@type": "EmailMessage", "potentialAction": { "@type": "ViewAction", "target": "#527 (comment)", "url": "#527 (comment)", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { "@type": "Organization", "name": "GitHub", "url": "https://github.com" } } ]

mkleehammer · 2019-02-23T19:11:09Z

To expand on one of my previous examples, you can easily do this today by putting Cursors into a dictionary. Allocate a cursor for each SQL statement you want prepared. The code would be almost identical to what you would expect the cnxn.prepare() code to be.

For each cursor, as long as the current query you are executing matches the previous, it will reuse the prepared statement. Here is an example:

cache = {}

def execute(sql, *params):
    cursor = cache.get(sql)
    if not cursor:
        cursor = get_connection().cursor()
        cache[sql] = cursor
    cursor.execute(sql, *params)
    cursor.commit()
    return cursor

def save():
    execute("update session ...", user_id, session_data)

This creates a single cursor for each of the statements you want to prepare and reuses it. This is very simplistic, but you get the idea. The important point is it would be almost identical to the code you would use if you had a prepared statement object. You would need to check the cache, create one if you didn't have it, get the correct connection when you create one, then execute and commit.

To turn this into real code, you'd probably want to put a retry loop into execute to retry at least once. That will automatically handle stale connections. Adding a round trip test like "select 1" first would eliminate your performance benefits. After all, the DB server is already caching query plans whether you use a prepared statement or not.

mkleehammer · 2019-02-23T19:17:17Z

For those that find this in the future, I'll also expand on how it works:

Under the covers, pyodbc.Cursor has a Prepare function. Anytime you call Cursor.execute and provide parameters, it will use a prepared statement. That's simply a requirement of passing parameters with ODBC.

When preparing, the Cursor also keeps a copy of the SQL (pPreparedSQL in cursor.h). When executing it first compares the SQL with the previous SQL. If it is the same, the cursor is already prepared and it can simply forward the parameters to the server. If it is not the same, the new SQL is prepared and a copy of it is kept.

Lexcon · 2019-02-23T21:25:55Z

Ah, I get it. That's ... pretty brilliant. Thanks. Robert

…

On 23-2-2019 20:11, Michael Kleehammer wrote: To expand on one of my previous examples, you can easily do this today by putting /Cursors/ into a dictionary. Allocate a cursor for each SQL statement you want prepared. The code would be almost identical to what you would expect the cnxn.prepare() code to be. For each cursor, as long as the current query you are executing matches the previous, it will reuse the prepared statement. Here is an example: cache= {} def execute(sql,*params): cursor= cache.get(sql) if not cursor: cursor= get_connection().cursor() cache[sql]= cursor cursor.execute(sql,*params) cursor.commit() return cursor def save(): execute("update session ...", user_id, session_data) This creates a single cursor for each of the statements you want to prepare and reuses it. This is very simplistic, but you get the idea. The important point is it would be /almost identical/ to the code you would use if you had a prepared statement object. You would need to check the cache, create one if you didn't have it, get the correct connection when you create one, then execute and commit. To turn this into real code, you'd probably want to put a retry loop into |execute| to retry at least once. That will automatically handle stale connections. Adding a round trip test like "select 1" first would eliminate your performance benefits. After all, the DB server is already caching query plans whether you use a prepared statement or not. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#527 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AD5u-yYdDqWjkNXfrmQxJa3la5BuQ4Jbks5vQZJPgaJpZM4bN67Z>.

mkleehammer · 2019-02-26T02:53:14Z

Glad to help. After thinking about it some more, I'll close this for now. I think the best solution is to add some documentation to the wiki about how prepared statements work.

Lexcon · 2019-02-26T08:21:04Z

Thanks. Your package is fabulous. Robert

…

On 26-2-2019 03:53, Michael Kleehammer wrote: Glad to help. After thinking about it some more, I'll close this for now. I think the best solution is to add some documentation to the wiki about how prepared statements work. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#527 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AD5u-zwgv1RPTBTDePHQFP_eBYb9y8ucks5vRKGcgaJpZM4bN67Z>.

jkyeung · 2023-01-27T20:20:37Z

Not sure where the best place to ask this is:

What if we purposely want to throw away a cached SQL statement? My use case is that, due to aliases, an identical SQL statement can be used on tables with same-named but different-sized columns. For example, say one table has an 80-character column and another table has a 92-character column of the same name. If I happen to retrieve the shorter one first, then I'll only get the first 80 characters when I try to retrieve the longer one later.

Apologies if there is already a way to achieve what I am trying to do. Just point me to the relevant documentation.

Related: #771, #214

mkleehammer changed the title ~~Feature suggestion~~ Feature suggestion - explicit prepared statements Feb 23, 2019

mkleehammer added Investigating Request labels Feb 23, 2019

mkleehammer closed this as completed Feb 26, 2019

gordthompson mentioned this issue May 27, 2020

Question: ODBC API #771

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature suggestion - explicit prepared statements #527

Feature suggestion - explicit prepared statements #527

Lexcon commented Feb 23, 2019

mkleehammer commented Feb 23, 2019

Lexcon commented Feb 23, 2019 via email

mkleehammer commented Feb 23, 2019

mkleehammer commented Feb 23, 2019

Lexcon commented Feb 23, 2019 via email

mkleehammer commented Feb 26, 2019

Lexcon commented Feb 26, 2019 via email

jkyeung commented Jan 27, 2023 •

edited

Loading

Feature suggestion - explicit prepared statements #527

Feature suggestion - explicit prepared statements #527

Comments

Lexcon commented Feb 23, 2019

mkleehammer commented Feb 23, 2019

Lexcon commented Feb 23, 2019 via email

mkleehammer commented Feb 23, 2019

mkleehammer commented Feb 23, 2019

Lexcon commented Feb 23, 2019 via email

mkleehammer commented Feb 26, 2019

Lexcon commented Feb 26, 2019 via email

jkyeung commented Jan 27, 2023 • edited Loading

jkyeung commented Jan 27, 2023 •

edited

Loading