Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can we optimize out preparing? #214

Open
mkleehammer opened this issue Mar 14, 2017 · 11 comments
Open

Can we optimize out preparing? #214

mkleehammer opened this issue Mar 14, 2017 · 11 comments

Comments

@mkleehammer
Copy link
Owner

mkleehammer commented Mar 14, 2017

Currently pyodbc always prepares statements if there are parameters. If the statement is identical to the previously prepared, then the previous prepared statement is reused. However, the SQL Server docs do not recommend this:

SQL Server 2016 Prepared Execution

Some early ODBC applications used SQLPrepare any time SQLBindParameter was used.
SQLBindParameter does not require the use of SQLPrepare, it can be used with SQLExecDirect. For
example, use SQLExecDirect with SQLBindParameter to retrieve the return code or output
parameters from a stored procedure that is only executed one time. Do not use SQLPrepare with
SQLBindParameter unless the same statement will be executed multiple times.

Consider a 5.0 API that requires cursor.prepare(sql) before using prepared statements to see if we get any performance benefit.

ps = cursor.prepare(sql)
row = cursor.execute(ps, var1, var2)

I'm pretty sure many drivers are going to choke if we don't call prepare before binding, though. We'll need a flag to determine if it is allowed.

@gordthompson
Copy link
Collaborator

A similar discussion recently took place here regarding the SQL Server JDBC driver. The fix that came out of that discussion will call sp_executesql (with parameter substitution) the first time around, and automatically fall back to sp_prepexec if the prepared statement is executed a second time. However, that evolved from a specific use-case regarding JDBC PreparedStatement objects where an explicit "prepare" is implied by the prepareStatement method that is used to create the object.

An explicit .prepare() method for a pyodbc Cursor would actually make it more like the .NET OdbcCommand objects that have an explicit .Prepare() method. (ADO.NET decided to go with a single ...Command object instead of separate Statement and PreparedStatement objects as JDBC did.)

@TobiasSQL
Copy link

Thanks @gordthompson, I agree, for SQL Server it sounds like following the pattern that we used for JDBC should work well here too. If prepare is required in pyodbc for the use of parameters they sound analog to each other.

BTW, for .NET we are discussing following the pattern even for an explicit call to Prepare (even though prepare is not needed for parameter binding). The thought is that the combination of a) it having negligible performance impact to call sp_executesql on the first call and next falling back to sp_prepexec, and b) there is a fairly high risk that developers call Prepare() even when they don't need it which would hurt performance by introducing additional roundtrips or unnecessary memory consumption on the server.

Thoughts?

/Tobias (SQL Server database engine)

@TobiasSQL
Copy link

+@meet-bhagdev who is looking out for Python in SQL Server :-)

@gordthompson
Copy link
Collaborator

@TobiasSQL

While I can appreciate how a subtle performance optimization could accrue significant benefits as we scale up, I also have some mild reservations based on the Principle of least astonishment: If I called .prepare followed by .execute and then looked at SQL Profiler to see the result I would expect to see sp_prepare followed by sp_execute. Since most of the Python data-access layers that follow the DB-API don't have an explicit .prepare method I was able to accept repeated sp_prepexec calls as a reasonable alternative, but an explicit .prepare resulting in an initial sp_executesql and then a call to sp_prepexec (for the second .execute) might leave me scratching my head.

@TobiasSQL
Copy link

That makes sense, what about sp_unprepare? I get this from a "want to understand what happens" POV but I am wondering what would actually help most apps get the best perf experience :-)

@mkleehammer
Copy link
Owner Author

So we are all on the same page, what happens today is that any statement with parameters is prepared and parameters are bound. If the next call to execute is for the same SQL statement (by identity, not content at this time), then the previously prepared statement is used. If I understood the Java discussion correctly, that is what was proposed. Did I miss anything?

I believe the current issue is stating that binding parameters is causing new execution plans to be created because the parameters have different lengths. For example, if the same prepared statement is used for a 10-character string and then for a 20-character string, it would be bound as SQL_WVARCHAR(10) the first time and SQL_WVARCHAR(20) the second time. I have not verified that this would actually cause a different execution plan.

I believe the prepare part works. I have not looked at the binding yet.

Did I misunderstand any of these?

@mkleehammer
Copy link
Owner Author

Please ignore the 2nd half of my previous comment - I followed a link from email and thought this was the related issue #213.

Someone above mentioned that pyodbc might require a prepare for binding - it is doing that now because I originally thought ODBC required it but the SQL Server docs say otherwise. We should be able to call SQLExecuteW and bind parameters. I suspect many drivers will not, but we can default to this and use a flag for those that don't.

One issue to be aware of is ODBC only allows one prepared statement per HSTMT, which is what we represent with a cursor. If you need to switch between two prepared statements you'd want to create two cursors.

@gordthompson
Copy link
Collaborator

re: "what about sp_unprepare?"

To be honest, I haven't worried too much about sp_unprepare. Over the last while I had noticed that several reputable implementations did not send an sp_unprepare for every sp_prepexec they sent (e.g., pyodbc ref: here, and .NET SqlClient ref: here). I suspected that they were leaving un-unprepared entries in a list somewhere on the server (and consuming numeric handles) but I took it on faith that the lack of cleanup wasn't likely to cause significant problems.

re: binding without preparing

If it can be done then I'd definitely support that. Again, it would be like SqlClient where .Execute without .Prepare does sp_executesql (with binding if necessary) and multiple .Executes on the same SqlCommand after a .Prepare will do an sp_prepexec followed by one or more sp_execute calls. That way,

  • an explicit .Prepare could still be used to reduce traffic by not sending the command text each time (which can be a significant bottleneck for .executemany as in Slow Handling of executemany()  #120, especially if the command text is lengthy), but

  • without an explicit .Prepare we could get the more direct sp_executesql approach (instead of sp_prepexec) and still benefit from the execution plan cache without the prepared statement "cleanup" issues mentioned above.

@TobiasSQL
Copy link

TobiasSQL commented Mar 15, 2017

@meet-bhagdev and myself did a bit of playing around with pyodbc and this is what we found out, at least as related to SQL Server :-) Hopefully it can help.

For more background on prepared statements in SQL Server you can take a look at my first reply to the JDBC issue that @gordthompson mentioned.

After testing pyodbc a bit with SQL Server it seems like pyodbc never re-uses a server-side prepared statement handle (please correct if wrong). I.e. you either do cursor.execute("my SQL code", ..) with or without params. There seems to be no way to re-issue the same "prepared statement" without sending the SQL code to the server again (something like cursor.executePrepared(statementHandle, ) if that is not what cusor.executeMany does...).

Right now the following seems to be the case:

  1. cursor.execute("SELECT ...") - Execute without parameters. Calls ODBC SQLExecDirect which for SQL Server will generate a language event/SQL batch call.
  2. cursor.execute("SELECT ... WHERE col = ?", 42) - Execute with parameters. Calls ODBC SQLPrepare to bind parameters followed by SQLExecute at which time the SQL Server driver will generate a TDS RPC request to sp_prepexec in turn preparing a handle that can be re-used on the server.
  3. On connection close ODBC SQLFreeStmt seems to be called to unprepare the last prepared statement (sp_unprepare for SQL Server).

Here is my SQL Server trace:
pyodbc_screenshot

Looking at these results it seems like:

  1. Is great :-)
  2. Should be changed to call SQLExecDirect instead of SQLExecute, at least for SQL Server. You still call SQLPrepare first since that should not issue a call to the server and I think that is required for parameter binding as mentioned earlier in the thread.
  3. If other RDBMSs require the use of SQLPrepare+SQLExecute, SQLFreeStmt would need to be called after each statement.

BTW, I think this is pretty much @gordthompson's proposal above in ODBC terminology :-)
Thanks!
/Tobias

@Lexcon
Copy link

Lexcon commented Mar 22, 2017

The last time I checked the odbc driver of sql server using a network sniffer, it didn't actually use prepared queries at all, it kept sending the entire query over the wire. Maybe msft considers it a non-optimisation.

That said, for other db systems: it sounds like a keepprepared parameter might be useful when running a query. Consider this alternating sequence of queries:

  • fetch user
  • update user
  • fetch next user
  • update that user

Because of the alternation, the queries plan will not remain available since any prepared query gets dropped the moment the next query plan needs to be prepared.

To facilitate a 'keepprepared' parameter, maybe it would be smart to have a dict like structure that keeps hashes of a query mapped to the compiled plan, then before running the query first checks if a prepared query already exists.

I think that's how sql server does it server side anyway, so that might be duping existing funtionality: I think preparing a query is just a lookup on a highly optimized database server system.

Anyway 'keepprepared' should maybe have a time value in seconds to avoid memory buildup in long running programs. That makes it a bit bloated. It has some risk of being a senseless optimisation, but one can only know that after there's a benchmark, so after such functionality would be created.

@mkleehammer
Copy link
Owner Author

@gordthompson

I've written up some thoughts on parameter binding. I'm looking for input from anyone interested.

In particular, the API looks very messy but I'm looking to start a discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants