-
Notifications
You must be signed in to change notification settings - Fork 566
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can we optimize out preparing? #214
Comments
A similar discussion recently took place here regarding the SQL Server JDBC driver. The fix that came out of that discussion will call An explicit |
Thanks @gordthompson, I agree, for SQL Server it sounds like following the pattern that we used for JDBC should work well here too. If prepare is required in pyodbc for the use of parameters they sound analog to each other. BTW, for .NET we are discussing following the pattern even for an explicit call to Prepare (even though prepare is not needed for parameter binding). The thought is that the combination of a) it having negligible performance impact to call sp_executesql on the first call and next falling back to sp_prepexec, and b) there is a fairly high risk that developers call Prepare() even when they don't need it which would hurt performance by introducing additional roundtrips or unnecessary memory consumption on the server. Thoughts? /Tobias (SQL Server database engine) |
+@meet-bhagdev who is looking out for Python in SQL Server :-) |
While I can appreciate how a subtle performance optimization could accrue significant benefits as we scale up, I also have some mild reservations based on the Principle of least astonishment: If I called |
That makes sense, what about sp_unprepare? I get this from a "want to understand what happens" POV but I am wondering what would actually help most apps get the best perf experience :-) |
So we are all on the same page, what happens today is that any statement with parameters is prepared and parameters are bound. If the next call to execute is for the same SQL statement (by identity, not content at this time), then the previously prepared statement is used. If I understood the Java discussion correctly, that is what was proposed. Did I miss anything? I believe the current issue is stating that binding parameters is causing new execution plans to be created because the parameters have different lengths. For example, if the same prepared statement is used for a 10-character string and then for a 20-character string, it would be bound as SQL_WVARCHAR(10) the first time and SQL_WVARCHAR(20) the second time. I have not verified that this would actually cause a different execution plan. I believe the prepare part works. I have not looked at the binding yet. Did I misunderstand any of these? |
Please ignore the 2nd half of my previous comment - I followed a link from email and thought this was the related issue #213. Someone above mentioned that pyodbc might require a prepare for binding - it is doing that now because I originally thought ODBC required it but the SQL Server docs say otherwise. We should be able to call SQLExecuteW and bind parameters. I suspect many drivers will not, but we can default to this and use a flag for those that don't. One issue to be aware of is ODBC only allows one prepared statement per HSTMT, which is what we represent with a cursor. If you need to switch between two prepared statements you'd want to create two cursors. |
re: "what about sp_unprepare?" To be honest, I haven't worried too much about sp_unprepare. Over the last while I had noticed that several reputable implementations did not send an sp_unprepare for every sp_prepexec they sent (e.g., pyodbc ref: here, and .NET SqlClient ref: here). I suspected that they were leaving un-unprepared entries in a list somewhere on the server (and consuming numeric handles) but I took it on faith that the lack of cleanup wasn't likely to cause significant problems. re: binding without preparing If it can be done then I'd definitely support that. Again, it would be like SqlClient where .Execute without .Prepare does sp_executesql (with binding if necessary) and multiple .Executes on the same SqlCommand after a .Prepare will do an sp_prepexec followed by one or more sp_execute calls. That way,
|
@meet-bhagdev and myself did a bit of playing around with pyodbc and this is what we found out, at least as related to SQL Server :-) Hopefully it can help. For more background on prepared statements in SQL Server you can take a look at my first reply to the JDBC issue that @gordthompson mentioned. After testing pyodbc a bit with SQL Server it seems like pyodbc never re-uses a server-side prepared statement handle (please correct if wrong). I.e. you either do cursor.execute("my SQL code", ..) with or without params. There seems to be no way to re-issue the same "prepared statement" without sending the SQL code to the server again (something like cursor.executePrepared(statementHandle, ) if that is not what cusor.executeMany does...). Right now the following seems to be the case:
Looking at these results it seems like:
BTW, I think this is pretty much @gordthompson's proposal above in ODBC terminology :-) |
The last time I checked the odbc driver of sql server using a network sniffer, it didn't actually use prepared queries at all, it kept sending the entire query over the wire. Maybe msft considers it a non-optimisation. That said, for other db systems: it sounds like a keepprepared parameter might be useful when running a query. Consider this alternating sequence of queries:
Because of the alternation, the queries plan will not remain available since any prepared query gets dropped the moment the next query plan needs to be prepared. To facilitate a 'keepprepared' parameter, maybe it would be smart to have a dict like structure that keeps hashes of a query mapped to the compiled plan, then before running the query first checks if a prepared query already exists. I think that's how sql server does it server side anyway, so that might be duping existing funtionality: I think preparing a query is just a lookup on a highly optimized database server system. Anyway 'keepprepared' should maybe have a time value in seconds to avoid memory buildup in long running programs. That makes it a bit bloated. It has some risk of being a senseless optimisation, but one can only know that after there's a benchmark, so after such functionality would be created. |
I've written up some thoughts on parameter binding. I'm looking for input from anyone interested. In particular, the API looks very messy but I'm looking to start a discussion. |
Currently pyodbc always prepares statements if there are parameters. If the statement is identical to the previously prepared, then the previous prepared statement is reused. However, the SQL Server docs do not recommend this:
SQL Server 2016 Prepared Execution
Consider a 5.0 API that requires
cursor.prepare(sql)
before using prepared statements to see if we get any performance benefit.I'm pretty sure many drivers are going to choke if we don't call prepare before binding, though. We'll need a flag to determine if it is allowed.
The text was updated successfully, but these errors were encountered: