-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add option for limiting rows of retrieved of results #102
Comments
Adding the |
Oh, I didn't know about datalab. My Would work on an IPython magic be done in this repository, or a separate standalone one? It would introduce a dependency on IPython. |
you can do it in this repo; it a separate piece so the dep is fine (and only impacts the magic piece) |
Yeah, could be an optional dependency. @bburky: My coworker @alixhami has started some work on making a |
Follow-up for magics, the |
I think we can close this - it's open because of the Re limiting rows - that's very easy to do with a Reopen if anyone disagrees |
The issue was opened with the thought that you could do a query with a lot of results and write to a destination table, but only want to sample the results.
Limiting the maximum results via Or @bburky did you not want a representative sample, more just a preview? |
Ah, preview makes sense. Sorry for being overzealous. |
Yes. I was running a query that returned many many results and saving the
results with `query.destinationTable`. To sanity check the query I ran, I
wanted to see a 100 row preview or so.
A possible implementation is just `itertools.islice()` of the query
results. I don't mind if this isn't completely built into pandas-gbq, but
there wasn't any way for me to hook or modify anything except copy pasting
and modifying `run_query()`.
|
I found the pandas-gbq interface easy to use and wanted to also use it for creating tables in BigQuery, not just downloading all the results at once. The existing capabilities of read_gbq() is actually already sufficient to do this, because you can just set
query.destinationTable
in the job configuration. However, I would like to limit the number of retrieved rows to a small sample of the whole table that was created instead of downloading the many thousands of rows that were created.I've already played with making the changes myself in a project I'm working on:
http://nbviewer.jupyter.org/github/bburky/subredditgenderratios/blob/master/Subreddit%20Gender%20Ratios.ipynb
In the current code for run_query(), you read all rows from the table by converting the iterator into a list. Instead, you could pass the iterator to
itertools.islice()
first to limit it to a configurable limit. You can look at my code to see how it could be done.Also, if you're interested I could contribute the IPython
%%bigquery
cell magic I am using in that project. It should be a very simple wrapper aroundread_gbq()
.The text was updated successfully, but these errors were encountered: