Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for polling. #18

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

matthewwardrop
Copy link
Member

@matthewwardrop matthewwardrop commented Mar 31, 2020

This PR was originally put up against the original prestodb version of this library, which now seems to be defunct; and adds support for polling the status of presto queries, decoupling it from collection of results (fixing #63 on that repository). This is important especially for long-running queries.

Changes:

  • Removed PrestoResult and merged its functionality with PrestoQuery. Keeping both led to some weird logic loops as each tried to keep the other up to date once polling was added.
  • Made PrestoQuery.fetch a private method PrestoQuery._fetch, since calling it directly leads to the iterator not being in sync.
  • Made the PrestoQuery object cache the query results, otherwise there is some weird behaviour whereby if you stop iterating over the rows, you throw away all of the rows left over in the last retrieved chunk.
  • Switched the fetchone / fetchmany relationship so that fetchone is a special case of fetchmany (because the changes above allow multiple iterators to be created over time, so long as there is only one at time, and in this way we don't create a new generator for every row).

@cla-bot
Copy link

cla-bot bot commented Mar 31, 2020

Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please submit the signed CLA to [email protected]. For more information, see https://github.com/prestosql/cla.

Copy link
Member

@ebyhr ebyhr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you confirm the CI failure?

@cla-bot
Copy link

cla-bot bot commented Apr 1, 2020

Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please submit the signed CLA to [email protected]. For more information, see https://github.com/prestosql/cla.

@matthewwardrop
Copy link
Member Author

matthewwardrop commented Apr 1, 2020

@ebyhr There was a trivial oversight in keeping track of progress when requesting one row at a time. I've pushed the fix, and signed the CLA.

@martint
Copy link
Member

martint commented Apr 1, 2020

@cla-bot check

@cla-bot cla-bot bot added the cla-signed label Apr 1, 2020
@cla-bot
Copy link

cla-bot bot commented Apr 1, 2020

The cla-bot has been summoned, and re-checked this pull request!

@matthewwardrop matthewwardrop requested a review from ebyhr April 3, 2020 17:19
@matthewwardrop
Copy link
Member Author

@ebyhr @martint Any updates here?

@ebyhr ebyhr requested a review from ggreg June 7, 2020 15:00
@mathieuseguin
Copy link

We're running into the same issue with long pulling queries.

Any chance we could see this being merged in a near future?

@martint martint requested review from electrum and findepi July 9, 2020 14:02
Copy link
Member

@findepi findepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a bunch of backward incompatible changes here

  • removal of result()
  • removal of rownumber()
  • removal of public fetch() (it's now internal)
  • fetchone used to be mean "fetch first", now its "fetch only"

Are these changes intentional?
If there is a good reason for making them, let's make introduce
them as separate commits, with explanation in the messages.

If the goal is to add only poll(), i think some of the changes can
be avoided.

@amitds1997
Copy link

@matthewwardrop Are you planning to work on this soon?

@matthewwardrop
Copy link
Member Author

matthewwardrop commented Apr 18, 2021

@amitds1997 I'm not sure. My original PRs to the presto-python-client and then this one have "enjoyed" very slow review times, making it challenging to make much headway (by the time a review occurs, I've not thought much about the patch for many months). @findepi's comments were valid ones, but as far as I can tell I already answered these concerns in the original PR description. I'm happy to restage the PR as a series of commits if that is helpful, but note that it won't be entirely straightforward: these changes hang together, and each standalone commit won't make a tonne of sense without the context of the others. I'm also happy if someone else closer to the project wants to reimplement things in a different way, but after a review of this patch I still feel this is a pretty clean implementation. It does make several "unnecessary" changes, but only in the sense that you although you could achieve polling without the other changes, the extra changes prevent weird epicycles in the code and the risk of losing data depending on the order in which you call methods on the class instances.

I originally wrote this patch so I could migrate to this more official client from pyhive, since I thought things would wind down there.... but things on that side have continued kicking along, and their Presto client already features this polling behaviour, so I haven't been too motivated to push hard on this.

@ebyhr ebyhr removed their request for review March 25, 2022 01:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

6 participants