-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
slow download #2927
Comments
Thanks for reporting. This is actually deeper than it seems. The "correct" fix is for us to remove the (non-public) For a fix that works right now, you can duplicate the source but pass |
@lukesneeringer says this is blocked on httplib2 work. |
The correct solution is blocked on #1998. What would be the benefits and drawbacks of increasing the default in the meantime, though? Would it be okay to make the default chunk size 10 MB instead of 1 MB? Alternatively, could we find out the size of the file in advance and make the chunk size into some reasonable fragment (say, 2% or 5% of file size, with a 1 MB lower limit)? |
I think gcs has a way of Auto detecting what's the best chunk size, so I'd
ask them. I know they have solutions for this issue
…On Mar 17, 2017 11:30, "Luke Sneeringer" ***@***.***> wrote:
The *correct* solution is blocked on #1998
<#1998>.
What would be the benefits and drawbacks of increasing the default,
though? Would it be okay to make the default chunk size 10 MB instead of 1
MB?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#2927 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AD3X0_ilSzH8HsAgfP3ZdUhkG0L6g76xks5rmtE5gaJpZM4Lez3I>
.
|
I think this is basically a duplicate of #2222 |
Not easier or harder. AFAIK there is no perfect magic chunking answer, @thobrla has said before that downloading in a single request (vs. chunks) is almost always the right answer |
Duplicate: #2222. |
I am trying to download a 400M gcs file. I am using https://github.com/GoogleCloudPlatform/google-cloud-python/blob/ce6756fbe3633c74fd742567654565147628f4ba/storage/google/cloud/storage/blob.py. I noticed that by default my download looks to be chunked due to this setting here
https://github.com/GoogleCloudPlatform/google-cloud-python/blob/master/core/google/cloud/streaming/transfer.py#L46
As a result downloading the file results in 400 calls to GCS which significantly slows down the download.
Is there some clean way I can override that when using blob.download_to_file?
The text was updated successfully, but these errors were encountered: