MemoryError while reading nearly 1.8M rows of data from Bigquery table #205

HarshCHhz · 2018-08-29T15:52:40Z

Hi, while trying to fetch the data of 1.8 M rows from Bigquery to local machine, we are getting Memory Error.

Import pandas_gbq as pgbq
df= pgbq.read_gbq(sql, dialect)

max-sixty · 2018-08-29T16:33:26Z

Thanks for posting the issue.

This is a known one - ref #133 & #167

You can try installing the latest master - it has a material improvement, even if not where we want it to be. Depending on the size of your rows, 1.8m is stretching the current implementation on a moderately sized machine, but within range

I'll close this as a dupe but please post back here / another issue if needed

HarshCHhz · 2018-08-29T17:54:40Z

@max-sixty will this solve the memory error issue?

max-sixty · 2018-08-29T18:08:35Z

Upgrading to master will reduce the memory requirement, and so may solve the memory issue.

HarshCHhz · 2018-08-29T18:15:20Z

Can you explain me the term masters as I am rookie into the python programming

…

On Wed, Aug 29, 2018, 11:38 PM Maximilian Roos ***@***.***> wrote: Upgrading to master will reduce the memory requirement, and so may solve the memory issue. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#205 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AfhePUJmg1eIK8lXk5jhp-qmCcqCSAZyks5uVtilgaJpZM4WR12n> .

max-sixty · 2018-08-29T18:24:39Z

You need to install the code in this repo (on the master branch), rather than using the version you have installed. To install this version you can run

pip install git+https://github.com/pydata/pandas-gbq.git

...and then try your workflow again.

If that's a steep learning curve, have a think whether you can do some aggregation on your data in BQ before you download it to pandas, as a shortcut through trying to download so much data.

tswast · 2018-08-30T00:57:10Z

I'm not sure on your use-case for getting a pandas DataFrame for a large BQ table, but if complex SQL is a problem, you may consider Ibis for aggregations. I've helped a bit on that project and BigQuery support in Ibis is pretty solid. It's a pandas-like interface for constructing SQL.

max-sixty closed this as completed Aug 29, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MemoryError while reading nearly 1.8M rows of data from Bigquery table #205

MemoryError while reading nearly 1.8M rows of data from Bigquery table #205

HarshCHhz commented Aug 29, 2018

max-sixty commented Aug 29, 2018

HarshCHhz commented Aug 29, 2018

max-sixty commented Aug 29, 2018

HarshCHhz commented Aug 29, 2018 via email

max-sixty commented Aug 29, 2018

tswast commented Aug 30, 2018

MemoryError while reading nearly 1.8M rows of data from Bigquery table #205

MemoryError while reading nearly 1.8M rows of data from Bigquery table #205

Comments

HarshCHhz commented Aug 29, 2018

max-sixty commented Aug 29, 2018

HarshCHhz commented Aug 29, 2018

max-sixty commented Aug 29, 2018

HarshCHhz commented Aug 29, 2018 via email

max-sixty commented Aug 29, 2018

tswast commented Aug 30, 2018