Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Out of memory when running az storage blob upload #1105

Closed
sebasmannem opened this issue Oct 20, 2016 · 9 comments
Closed

Out of memory when running az storage blob upload #1105

sebasmannem opened this issue Oct 20, 2016 · 9 comments
Assignees
Labels
bug This issue requires a change to an existing behavior in the product in order to be resolved. Storage az storage
Milestone

Comments

@sebasmannem
Copy link

sebasmannem commented Oct 20, 2016

I'm using az to upload an image to azure.
I'm using python3, but have seen the same issue with python2.
Image is 30G (although only 1,5 G sparse).
Eventually I managed to upload using --max-connections 1.
I traced the issue down to this little piece of code:
file: storage/blob/_upload_chunking.py, line:70

    if max_connections > 1:
        import concurrent.futures
        executor = concurrent.futures.ThreadPoolExecutor(max_connections)
        range_ids = list(executor.map(uploader.process_chunk, uploader.get_chunk_streams()))
    else:
        range_ids = [uploader.process_chunk(result) for result in uploader.get_chunk_streams()]

Problem is with the line

        range_ids = list(executor.map(uploader.process_chunk, uploader.get_chunk_streams()))

calling executor.map with parameter uploader.get_chunk_streams() creates a list with all the element that are yielded in get_chunk_streams().
This list holds all 30G of file data and is built in memory before passing it on to executor.map().

So, if you want to upload with maxconnections > 1, basically, you need at least (memory+swap) larger than file you wish to upload...

@tjprescott tjprescott self-assigned this Oct 20, 2016
@tjprescott tjprescott added bug This issue requires a change to an existing behavior in the product in order to be resolved. Storage az storage labels Oct 20, 2016
@tjprescott
Copy link
Member

Hi @sebasmannem thank you for the issue report!

@tjprescott
Copy link
Member

It appears this is a bug in the python storage SDK. Azure/azure-storage-python#190

I've referenced your issue in that thread.

@tjprescott tjprescott added the Service Attention This issue is responsible by Azure service team. label Oct 20, 2016
@tjprescott tjprescott assigned troydai and unassigned tjprescott Nov 18, 2016
@tjprescott
Copy link
Member

@troydai since you are working on the upload command, we should probably implement a workaround until this is fixed in the service. Basically set max_connections to 1 if the file being uploaded is large. Possibly log a warning when we do so.

@mayurid mayurid modified the milestones: Sprint 7, Backlog Nov 19, 2016
@mayurid
Copy link
Member

mayurid commented Nov 19, 2016

Updating sprint

@tjprescott
Copy link
Member

Workaround is in, so I'm moving this to the backlog. When the SDK is fixed, the workaround can be removed and this can be closed.

@matthchr
Copy link
Member

matthchr commented Jan 6, 2017

@tjprescott Any ETA on when the fix for this will be public on pypi? Also, won't it negatively impact performance significantly to only use max_connections = 1 for large files?

edit: I just realized I confused myself a bit -- the workaround is in azure-cli - I am actually curious when the real fix is going into the python client. I'll ask my question in that repo.

@tjprescott
Copy link
Member

@mattchr the fix will be in for the January 14th release. It will slow performance for large files, but that's better than crashing due to running out of memory. When the storage SDK is fixed, we will remove the workaround.

@williexu williexu assigned williexu and unassigned troydai and tjprescott Oct 17, 2017
@williexu
Copy link
Contributor

This should be fixed, I'll take a look to make sure

@williexu
Copy link
Contributor

Was fixed with Azure/azure-storage-python#190

@haroldrandom haroldrandom added bug This issue requires a change to an existing behavior in the product in order to be resolved. ServiceAttn Storage az storage labels Oct 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue requires a change to an existing behavior in the product in order to be resolved. Storage az storage
Projects
None yet
Development

No branches or pull requests

7 participants