slow download #2927

pdudnik · 2017-01-09T22:32:15Z

I am trying to download a 400M gcs file. I am using https://github.com/GoogleCloudPlatform/google-cloud-python/blob/ce6756fbe3633c74fd742567654565147628f4ba/storage/google/cloud/storage/blob.py. I noticed that by default my download looks to be chunked due to this setting here

https://github.com/GoogleCloudPlatform/google-cloud-python/blob/master/core/google/cloud/streaming/transfer.py#L46

As a result downloading the file results in 400 calls to GCS which significantly slows down the download.

Is there some clean way I can override that when using blob.download_to_file?

dhermes · 2017-01-12T00:50:49Z

Thanks for reporting. This is actually deeper than it seems. The "correct" fix is for us to remove the (non-public) google.streaming stuff that this relies on and get a better chunking story (that doesn't rely on httplib2).

For a fix that works right now, you can duplicate the source but pass chunksize to Download. Also, gsutil (the CLI tool) has a very optimized strategy for fast downloads.

bjwatson · 2017-02-28T20:41:26Z

@lukesneeringer says this is blocked on httplib2 work.

lukesneeringer · 2017-03-17T18:30:14Z

The correct solution is blocked on #1998.

What would be the benefits and drawbacks of increasing the default in the meantime, though? Would it be okay to make the default chunk size 10 MB instead of 1 MB?

Alternatively, could we find out the size of the file in advance and make the chunk size into some reasonable fragment (say, 2% or 5% of file size, with a 1 MB lower limit)?

pdudnik · 2017-03-17T19:23:51Z

I think gcs has a way of Auto detecting what's the best chunk size, so I'd ask them. I know they have solutions for this issue

…

On Mar 17, 2017 11:30, "Luke Sneeringer" ***@***.***> wrote: The *correct* solution is blocked on #1998 <#1998>. What would be the benefits and drawbacks of increasing the default, though? Would it be okay to make the default chunk size 10 MB instead of 1 MB? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#2927 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AD3X0_ilSzH8HsAgfP3ZdUhkG0L6g76xks5rmtE5gaJpZM4Lez3I> .

evanj · 2017-06-03T15:10:20Z

I think this is basically a duplicate of #2222

lukesneeringer · 2017-08-10T14:43:46Z

@dhermes Is this easier now that #1998 is done?

dhermes · 2017-08-10T17:00:04Z

Not easier or harder. AFAIK there is no perfect magic chunking answer, @thobrla has said before that downloading in a single request (vs. chunks) is almost always the right answer

tseaver · 2018-01-08T18:47:55Z

Duplicate: #2222.

daspecster added the api: storage Issues related to the Cloud Storage API. label Jan 10, 2017

danoscarmike added Status: Acknowledged priority: p2 Moderately-important priority. Fix may not be included in next release. type: question Request for information or clarification. Not an issue. labels Feb 28, 2017

bjwatson added the status: blocked Resolving the issue is dependent on other work. label Feb 28, 2017

tseaver added the performance label Jun 5, 2017

tseaver mentioned this issue Jun 5, 2017

Performance test storage uploads and downloads #909

Closed

lukesneeringer removed the status: acknowledged label Nov 8, 2017

tseaver closed this as completed Jan 8, 2018

JustinBeckwith assigned tseaver Feb 1, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

slow download #2927

slow download #2927

pdudnik commented Jan 9, 2017

dhermes commented Jan 12, 2017 •

edited by lukesneeringer

Loading

bjwatson commented Feb 28, 2017

lukesneeringer commented Mar 17, 2017 •

edited

Loading

pdudnik commented Mar 17, 2017 via email

evanj commented Jun 3, 2017

lukesneeringer commented Aug 10, 2017

dhermes commented Aug 10, 2017

tseaver commented Jan 8, 2018

slow download #2927

slow download #2927

Comments

pdudnik commented Jan 9, 2017

dhermes commented Jan 12, 2017 • edited by lukesneeringer Loading

bjwatson commented Feb 28, 2017

lukesneeringer commented Mar 17, 2017 • edited Loading

pdudnik commented Mar 17, 2017 via email

evanj commented Jun 3, 2017

lukesneeringer commented Aug 10, 2017

dhermes commented Aug 10, 2017

tseaver commented Jan 8, 2018

dhermes commented Jan 12, 2017 •

edited by lukesneeringer

Loading

lukesneeringer commented Mar 17, 2017 •

edited

Loading