-
Notifications
You must be signed in to change notification settings - Fork 566
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature suggestion - explicit prepared statements #527
Comments
When you use Cursor.execute with parameters, the SQL is prepared automatically and submitted with the parameters you give it. The prepared statement is cached and if you execute the same SQL again it is reused. As long as you keep executing the same SQL, you are using a prepared statement. cursor = cnxn.cursor()
for i in range(5):
cursor.execute("insert into t1(n) values (?)", i) In this example, the statement is only prepared the first time it is executed, for value 0. The existing prepared statement is then reused for values 1-4. A prepared statement in ODBC requires an HSTMT, which is a Cursor in pyodbc, so the following two things would be identical: SQL1 = "insert into t1(n) values (?)"
SQL2 = "insert into t2(n) values (?)"
# A hypothetical prepare function:
prep1 = cnxn.prepare(SQL1)
prep2 = cnxn.prepare(SQL2)
prep1.run(1)
prep1.run(100)
prep2.run(2)
prep2,run(200)
# Identical code with current design:
cursor1 = cnxn.cursor()
cursor2 = cnxn.cursor()
cursor1.execute(SQL1, 1)
cursor1.execute(SQL1, 100)
cursor1.execute(SQL2, 2)
cursor1.execute(SQL2, 200) We can see that we already get the performance benefit of prepared statements today, but that doesn't mean it wouldn't be a good idea to introduce a PreparedStatement to make it clearer how it works. At the moment I'd prefer not to, but I would like to hear what others like @gordthompson think about it. One danger might be that novices would be more likely to create multiple prepared statements and not close them immediately, creating multiple open statements on the same connection. In some libraries like early JDBC, it is the only way to pass parameters, so novices might create prepared statements all the time when they don't need to. Some drivers simply do not support this. (There is a way to ask the driver.). Using a separate connection under the covers is a non-starter due to transaction isolation issues. Even with multiple statements on a connection, I don't know if there are possible problems with isolation. Hopefully the first part of the answer gets you the performance you need. I'm open to discussion of whether it is a useful interface construct. |
In my case, i'm running a long running process servicing a web site. There are about a dozen queries used over and over again e.g. to save and get sessions. It would be nice to be able to pool a few prepared statements. I don't think my use case is unique, it's rather typical.A pool of prepared statements could be much like a dict of hashes made from the statement sql, pointing to the HSTMT values. They could have a keepalive time so they are discarded when not used.Verzonden vanaf mijn Samsung-apparaat
…-------- Oorspronkelijk bericht --------
Van: Michael Kleehammer <[email protected]>
Datum: 23-02-19 18:43 (GMT+01:00)
Aan: mkleehammer/pyodbc <[email protected]>
Cc: Lexcon <[email protected]>, Author <[email protected]>
Onderwerp: Re: [mkleehammer/pyodbc] Feature suggestion (#527)
When you use Cursor.execute with parameters, the SQL is prepared automatically and submitted with the parameters you give it. The prepared statement is cached and if you execute the same SQL again it is reused. As long as you keep executing the same SQL, you are using a prepared statement.
cursor = cnxn.cursor()
for i in range(5):
cursor.execute("insert into t1(n) values (?)", i)
In this example, the statement is only prepared the first time it is executed, for value 0. The existing prepared statement is then reused for values 1-4.
A prepared statement in ODBC requires an HSTMT, which is a Cursor in pyodbc, so the following two things would be identical:
SQL1 = "insert into t1(n) values (?)"
SQL2 = "insert into t2(n) values (?)"
# A hypothetical prepare function:
prep1 = cnxn.prepare(SQL1)
prep2 = cnxn.prepare(SQL2)
prep1.run(1)
prep1.run(100)
prep2.run(2)
prep2,run(200)
# Identical code with current design:
cursor1 = cnxn.cursor()
cursor2 = cnxn.cursor()
cursor1.execute(SQL1, 1)
cursor1.execute(SQL1, 100)
cursor1.execute(SQL2, 2)
cursor1.execute(SQL2, 200)
We can see that we already get the performance benefit of prepared statements today, but that doesn't mean it wouldn't be a good idea to introduce a PreparedStatement to make it clearer how it works. At the moment I'd prefer not to, but I would like to hear what others like @gordthompson think about it.
One danger might be that novices would be more likely to create multiple prepared statements and not close them immediately, creating multiple open statements on the same connection. In some libraries like early JDBC, it is the only way to pass parameters, so novices might create prepared statements all the time when they don't need to. Some drivers simply do not support this. (There is a way to ask the driver.). Using a separate connection under the covers is a non-starter due to transaction isolation issues. Even with multiple statements on a connection, I don't know if there are possible problems with isolation.
Hopefully the first part of the answer gets you the performance you need. I'm open to discussion of whether it is a useful interface construct.
—You are receiving this because you authored the thread.Reply to this email directly, view it on GitHub, or mute the thread.
{"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/mkleehammer/pyodbc","title":"mkleehammer/pyodbc","subtitle":"GitHub repository","main_image_url":"https://github.githubassets.com/images/email/message_cards/header.png","avatar_image_url":"https://github.githubassets.com/images/email/message_cards/avatar.png","action":{"name":"Open in GitHub","url":"https://github.com/mkleehammer/pyodbc"}},"updates":{"snippets":[{"icon":"PERSON","message":"@mkleehammer in #527: When you use Cursor.execute with parameters, the SQL is prepared automatically and submitted with the parameters you give it. The prepared statement is cached and if you execute the same SQL again it is reused. As long as you keep executing the same SQL, you are using a prepared statement.\r\n\r\n```python\r\ncursor = cnxn.cursor()\r\nfor i in range(5):\r\n cursor.execute(\"insert into t1(n) values (?)\", i)\r\n```\r\n\r\nIn this example, the statement is only prepared the first time it is executed, for value 0. The existing prepared statement is then reused for values 1-4.\r\n\r\nA prepared statement in ODBC requires an HSTMT, which is a Cursor in pyodbc, so the following two things would be identical:\r\n\r\n```python\r\nSQL1 = \"insert into t1(n) values (?)\"\r\nSQL2 = \"insert into t2(n) values (?)\"\r\n# A hypothetical prepare function:\r\nprep1 = cnxn.prepare(SQL1)\r\nprep2 = cnxn.prepare(SQL2)\r\nprep1.run(1)\r\nprep1.run(100)\r\nprep2.run(2)\r\nprep2,run(200)\r\n\r\n# Identical code with current design:\r\ncursor1 = cnxn.cursor()\r\ncursor2 = cnxn.cursor()\r\ncursor1.execute(SQL1, 1)\r\ncursor1.execute(SQL1, 100)\r\ncursor1.execute(SQL2, 2)\r\ncursor1.execute(SQL2, 200)\r\n```\r\n\r\nWe can see that we already get the performance benefit of prepared statements today, but that doesn't mean it wouldn't be a good idea to introduce a PreparedStatement to make it clearer how it works. At the moment I'd prefer not to, but I would like to hear what others like @gordthompson think about it.\r\n\r\nOne danger might be that novices would be more likely to create multiple prepared statements and not close them immediately, creating multiple open statements on the same connection. In some libraries like early JDBC, it is the only way to pass parameters, so novices might create prepared statements all the time when they don't need to. Some drivers simply do not support this. (There is a way to ask the driver.). Using a separate connection under the covers is a non-starter due to transaction isolation issues. Even with multiple statements on a connection, I don't know if there are possible problems with isolation.\r\n\r\nHopefully the first part of the answer gets you the performance you need. I'm open to discussion of whether it is a useful interface construct."}],"action":{"name":"View Issue","url":"#527 (comment)"}}}
[
{
"@context": "http://schema.org",
"@type": "EmailMessage",
"potentialAction": {
"@type": "ViewAction",
"target": "#527 (comment)",
"url": "#527 (comment)",
"name": "View Issue"
},
"description": "View this Issue on GitHub",
"publisher": {
"@type": "Organization",
"name": "GitHub",
"url": "https://github.com"
}
}
]
|
To expand on one of my previous examples, you can easily do this today by putting Cursors into a dictionary. Allocate a cursor for each SQL statement you want prepared. The code would be almost identical to what you would expect the cnxn.prepare() code to be. For each cursor, as long as the current query you are executing matches the previous, it will reuse the prepared statement. Here is an example: cache = {}
def execute(sql, *params):
cursor = cache.get(sql)
if not cursor:
cursor = get_connection().cursor()
cache[sql] = cursor
cursor.execute(sql, *params)
cursor.commit()
return cursor
def save():
execute("update session ...", user_id, session_data) This creates a single cursor for each of the statements you want to prepare and reuses it. This is very simplistic, but you get the idea. The important point is it would be almost identical to the code you would use if you had a prepared statement object. You would need to check the cache, create one if you didn't have it, get the correct connection when you create one, then execute and commit. To turn this into real code, you'd probably want to put a retry loop into |
For those that find this in the future, I'll also expand on how it works: Under the covers, pyodbc.Cursor has a When preparing, the Cursor also keeps a copy of the SQL ( |
Ah, I get it. That's ... pretty brilliant.
Thanks.
Robert
…On 23-2-2019 20:11, Michael Kleehammer wrote:
To expand on one of my previous examples, you can easily do this today
by putting /Cursors/ into a dictionary. Allocate a cursor for each SQL
statement you want prepared. The code would be almost identical to
what you would expect the cnxn.prepare() code to be.
For each cursor, as long as the current query you are executing
matches the previous, it will reuse the prepared statement. Here is an
example:
cache= {}
def execute(sql,*params):
cursor= cache.get(sql)
if not cursor:
cursor= get_connection().cursor()
cache[sql]= cursor
cursor.execute(sql,*params)
cursor.commit()
return cursor
def save():
execute("update session ...", user_id, session_data)
This creates a single cursor for each of the statements you want to
prepare and reuses it. This is very simplistic, but you get the idea.
The important point is it would be /almost identical/ to the code you
would use if you had a prepared statement object. You would need to
check the cache, create one if you didn't have it, get the correct
connection when you create one, then execute and commit.
To turn this into real code, you'd probably want to put a retry loop
into |execute| to retry at least once. That will automatically handle
stale connections. Adding a round trip test like "select 1" first
would eliminate your performance benefits. After all, the DB server is
already caching query plans whether you use a prepared statement or not.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#527 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AD5u-yYdDqWjkNXfrmQxJa3la5BuQ4Jbks5vQZJPgaJpZM4bN67Z>.
|
Glad to help. After thinking about it some more, I'll close this for now. I think the best solution is to add some documentation to the wiki about how prepared statements work. |
Thanks.
Your package is fabulous.
Robert
…On 26-2-2019 03:53, Michael Kleehammer wrote:
Glad to help. After thinking about it some more, I'll close this for
now. I think the best solution is to add some documentation to the
wiki about how prepared statements work.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#527 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AD5u-zwgv1RPTBTDePHQFP_eBYb9y8ucks5vRKGcgaJpZM4bN67Z>.
|
Not sure where the best place to ask this is: What if we purposely want to throw away a cached SQL statement? My use case is that, due to aliases, an identical SQL statement can be used on tables with same-named but different-sized columns. For example, say one table has an 80-character column and another table has a 92-character column of the same name. If I happen to retrieve the shorter one first, then I'll only get the first 80 characters when I try to retrieve the longer one later. Apologies if there is already a way to achieve what I am trying to do. Just point me to the relevant documentation. |
Not sure how difficult this would be to make but it would be cool to be able to work with prepared statements. Suppose you need to run a query very frequently, this query has a thousand bytes of query text and it returns nothing. Then by just running the prepared the statement you could decrease network traffic by a factor 100 or so. I know the likes of SQL Server are pretty smart in re-using optimization plans but that doesn't reduce network traffic. Especially useful if you don't want to have the overhead to make a stored procedure of every query:
prep = myconnection.prepareStatement('.....')
prep.run(parameters)
The text was updated successfully, but these errors were encountered: