-
Notifications
You must be signed in to change notification settings - Fork 177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix prepared statement handling #242
Conversation
FYI @mdesmet - note that this is incomplete code - the commit for now just cleans up the HTTP portion (client.py). dbapi.py changes I'm still working on - but created this to receive some early feedback. In a future PR I plan to add additional parameter if user wants prepared statements to be re-used. |
b2ff5ff
to
b529df9
Compare
@@ -881,21 +881,6 @@ def __call__(self, *args, **kwargs): | |||
return http_response | |||
|
|||
|
|||
def test_trino_result_response_headers(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is redundant now since the actual user-facing functionality is that additional_http_headers
works by merging passed headers to existing headers which is tested in test_trino_query_response_headers
already.
The use of this test was to ensure that headers were travelling between TrinoQuery
and TrinoResult
since older prepared statement code basically passed-through the prepare headers through all layers - which was very simple solution but unclean and couples things together (and also buggy as Trino 398 shows).
b529df9
to
cf6af76
Compare
trino/dbapi.py
Outdated
@@ -295,58 +295,42 @@ def warnings(self): | |||
return self._query.warnings | |||
return None | |||
|
|||
def _new_request_with_session_from(self, request): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is better than copy.deepcopy
but I still hate it.
I think the way we try to make prepared statement work needs improvement.
Currently it's modelled as:
- A
Connection
creates aClientSession
- makes total sense - all persistent information for a connection is maintained using the session (roles, prepared statements, session properties, transactions) - A
Cursor
is created fromConnection
which either gets it's ownTrinoRequest
or re-uses one from an existingtransaction
. TheTrinoRequest
inherits a reference to theConnection
'sClientSession
. This doesn't make too much sense. It assumes a Cursor will only execute a query at a time (reasonable) but also leaks HTTP protocol information to theCursor
. Cursor.execute(sql, params)
is called which creates aTrinoQuery
using theCursor
'sTrinoRequest
.- Any operation on the cursor then affects the
TrinoRequest
- which means we should not shareTrinoRequest
objects.
These are the current rules, we break them very badly:
- In
Cursor.execute(sql, params)
if params are provided we need to executePREPARE statement FROM sql
- To execute something we need a
TrinoRequest
- so we re-use the one from current Cursor - bad bad because any other future query or running query would get affected - We workaround this by deepcopy. This copies too much and means that the
ClientSession
is no longer shared. - We then workaround this by creating new request but re-using the
ClientSession
from old one (to persist headers etc.) What if we miss other things that need to be re-used? Things will break. - We then execute our PREPARE with copied request, EXECUTE using original request and then DEALLOCATE using copied request.
I'll create an issue with some pseudocode class outline mentioning what refactoring I want to do to improve this - we can discuss other possible improvements there and improve what is honestly a very organically grown code which does not have enough structure to safeguard us from doing mistakes or introducing changes with confidence that we only affect what we changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A Cursor is created from Connection which either gets it's own TrinoRequest or re-uses one from an existing transaction
Does it mean that for each cursor object a new TrinoRequest
is created? Should it be somehow reused, you mentioned that is re-uses one from an existing transaction but where it's handled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not simply call self.connection._create_request()
?
The ClientSession is already shared and the prepared statement headers are also persisted in the ClientSession
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it mean that for each cursor object a new TrinoRequest is created? Should it be somehow reused, you mentioned that is re-uses one from an existing transaction but where it's handled?
Creating new TrinoRequest per cursor makes sense since the TrinoRequest tracks query state (nextUri, stats etc.) - it must not be re-used across queries where possible. Transactions re-use it here -
trino-python-client/trino/dbapi.py
Lines 212 to 215 in efb6680
if self.transaction is not None: | |
request = self.transaction.request | |
else: | |
request = self._create_request() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not simply call self.connection._create_request()?
The ClientSession is already shared and the prepared statement headers are also persisted in the ClientSession.
Good catch. Much simpler. Let me try if that works (it should).
BTW the existing code was buggy even before the Trino change because it didn't get to inspect the initial response from the coordinator because that response's headers were not being set to |
49e051e
to
e6d588f
Compare
trino/dbapi.py
Outdated
@@ -295,58 +295,42 @@ def warnings(self): | |||
return self._query.warnings | |||
return None | |||
|
|||
def _new_request_with_session_from(self, request): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A Cursor is created from Connection which either gets it's own TrinoRequest or re-uses one from an existing transaction
Does it mean that for each cursor object a new TrinoRequest
is created? Should it be somehow reused, you mentioned that is re-uses one from an existing transaction but where it's handled?
trino/dbapi.py
Outdated
@@ -295,58 +295,42 @@ def warnings(self): | |||
return self._query.warnings | |||
return None | |||
|
|||
def _new_request_with_session_from(self, request): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not simply call self.connection._create_request()
?
The ClientSession is already shared and the prepared statement headers are also persisted in the ClientSession
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice simplifications in this PR.
cc9199d
to
1b47a3c
Compare
1b47a3c
to
fcfb321
Compare
That's why I quite like the pre-commit checks to be present as git hooks. Note that if you haven't explicitly installed the pre-commit git hook (running |
The prepared statement handling code assumed that for each query we'll always receive some non-empty response even after the initial response which is not a valid assumption. This assumption worked because earlier Trino used to send empty fake results even for queries which don't return results (like PREPARE and DEALLOCATE) but is now invalid with trinodb/trino@bc794cd. The other problem with the code was that it leaked HTTP protocol details into dbapi.py and worked around it by keeping a deep copy of the request object from the PREPARE execution and re-using it for the actual query execution. The new code fixes both issues by processing the prepared statement headers as they are received and storing the resulting set of active prepared statements on the ClientSession object. The ClientSession's set of prepared statements is then rendered into the prepared statement request header in TrinoRequest. Since the ClientSession is created and reused for the entire Connection this also means that we can now actually implement re-use of prepared statements within a single Connection.
fcfb321
to
7e6edff
Compare
Description
The prepared statement handling code assumed that for each query we'll
always receive some non-empty response even after the initial response
which is not a valid assumption.
This assumption worked because earlier Trino used to send empty fake
results even for queries which don't return results (like PREPARE and
DEALLOCATE) but is now invalid with
trinodb/trino@bc794cd.
The other problem with the code was that it leaked HTTP protocol details
into dbapi.py and worked around it by keeping a deep copy of the request
object from the PREPARE execution and re-using it for the actual query
execution.
The new code fixes both issues by processing the prepared statement
headers as they are received and storing the resulting set of active
prepared statements on the ClientSession object. The ClientSession's set
of prepared statements is then rendered into the prepared statement
request header in TrinoRequest. Since the ClientSession is created and
reused for the entire Connection this also means that we can now
actually implement re-use of prepared statements within a single
Connection.
Non-technical explanation
Fix errors when using prepared statements with Trino >= 398.
Release notes
( ) This is not user-visible and no release notes are required.
( ) Release notes are required, please propose a release note for me.
(x) Release notes are required, with the following suggested text:
* Fix errors when using prepared statements with Trino servers newer than 398.