-
Notifications
You must be signed in to change notification settings - Fork 175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Acknowledge reception of data in TrinoResult
#220
Conversation
The CI issues can be simulated by following code. Seems that although the result has returned the Trino API still provides one or more next_uri's to fetch in a minority of cases.
Some questions:
|
e35d986
to
5d59610
Compare
It's not valid. It's a hack that BI tools and clients use instead of limiting their query to pull only what is needed.
You know that Change your experiment to a query which returns arbitrary number of rows and then you can't know anymore whether query is finished or not. Special casing the client protocol for queries which return single row doesn't seem useful. |
The alternative is to have a client protocol which is based on persistent TCP connections instead of HTTP long polling - which brings it's own set of problems. |
Also, specifically on why you cannot assume an empty rows being returned from API as proof that query has finished is the pipelined execution model. In queries it's possible for Trino to perform both table scans and output results at the same time. e.g. If you have a long chain of UNION ALL statements then Trino can start returning results as soon as the first UNION ALL query is done while other parts of the query are still executing . This means that the client might observe periods of time where there is no data returned but a nextUri is still included. If the client were to assume no data == query finished then it'll drop any upcoming rows that will be produced. |
@hashhar would you agree that to address the original issue we should add a fetch call after |
@nineinchnick I don't agree 100%. It's still useful to make the client as smart as the JDBC driver where the implementation detail of the Trino REST protocol isn't visible to users. But yes, since this might take more time than the quickfix I think we should go with the quickfix for now and then think about how to stop the protocol from leaking into user code. |
This is exactly what sqlalchemy does when you |
5d59610
to
4447b2e
Compare
4447b2e
to
9d898a8
Compare
d206e80
to
d358954
Compare
2bec10e
to
fb66f9c
Compare
Can you explain why you think this is a breaking change? IMHO the only way to break existing usage is if users didn't catch exceptions on Note that sqlalchemy doesn't call
IMHO this is already decoupled. The The double buffering as in the java Trino client, is implemented in this PR. Note that also the Java client doesn't use threading at this moment. I think it is a good idea to investigate but can be done independent from this PR. I don't see why cursor implementations are impacted, cursors would take the |
Because the API is now blocking.
I don't mean that this PR leaks the API details. I meant the opposite. Now we are one step closer to hide the API details within TrinoQuery and TrinoResult. An example of what this PR allows to do (but probably doesn't make sense) is to have a different impl of TrinoQuery which can probably use a different fictional transport mechanism to talk to Trino (instead of the REST API).
True
Exactly what I said above.
Again exactly what I said your PR allows us to do in future. |
Newer Trino versions will include trinodb/trino#14122 which can mean CI can be green without this change. I think we should add one more entry to matrix with 395 as the version being tested for the meantime. |
a03c5a9
to
29fb5fa
Compare
I added the entry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Acknowledge reception of data in TrinoResult fails without Execute should block until at least one row is received
So it seems we need to squash those.
LGTM otherwise.
A query only transitions to a FINISHED state when the results are fully consumed. The reception of the data is acknowledged by calling the next_uri before exposing the data through dbapi. `dbapi.execute()` will now block until the first result is received. This will ensure the result set is exhausted for DML queries and as such remove the need to for calling `dbapi.fetchall()`.
96e7c68
to
4858d6b
Compare
@hashhar: Squashed those commits. Are we good to go? |
Thanks. Good to go. |
Description
Fixes #232, #95
Ensures the received data is properly acknowledged by calling the next_uri. This will avoid seeing failed queries in the query log when executing scalar queries as in the following example.
Release notes
( ) This is not user-visible and no release notes are required.
( ) Release notes are required, please propose a release note for me.
(x) Release notes are required, with the following suggested text: