Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MemoryError while reading nearly 1.8M rows of data from Bigquery table #205

Closed
HarshCHhz opened this issue Aug 29, 2018 · 6 comments
Closed

Comments

@HarshCHhz
Copy link

Hi, while trying to fetch the data of 1.8 M rows from Bigquery to local machine, we are getting Memory Error.

Import pandas_gbq as pgbq
df= pgbq.read_gbq(sql, dialect)

@max-sixty
Copy link
Contributor

Thanks for posting the issue.

This is a known one - ref #133 & #167

You can try installing the latest master - it has a material improvement, even if not where we want it to be. Depending on the size of your rows, 1.8m is stretching the current implementation on a moderately sized machine, but within range

I'll close this as a dupe but please post back here / another issue if needed

@HarshCHhz
Copy link
Author

@max-sixty will this solve the memory error issue?

@max-sixty
Copy link
Contributor

Upgrading to master will reduce the memory requirement, and so may solve the memory issue.

@HarshCHhz
Copy link
Author

HarshCHhz commented Aug 29, 2018 via email

@max-sixty
Copy link
Contributor

You need to install the code in this repo (on the master branch), rather than using the version you have installed. To install this version you can run

pip install git+https://github.com/pydata/pandas-gbq.git

...and then try your workflow again.

If that's a steep learning curve, have a think whether you can do some aggregation on your data in BQ before you download it to pandas, as a shortcut through trying to download so much data.

@tswast
Copy link
Collaborator

tswast commented Aug 30, 2018

I'm not sure on your use-case for getting a pandas DataFrame for a large BQ table, but if complex SQL is a problem, you may consider Ibis for aggregations. I've helped a bit on that project and BigQuery support in Ibis is pretty solid. It's a pandas-like interface for constructing SQL.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants