Out of memory when running az storage blob upload #1105

sebasmannem · 2016-10-20T13:29:45Z

I'm using az to upload an image to azure.
I'm using python3, but have seen the same issue with python2.
Image is 30G (although only 1,5 G sparse).
Eventually I managed to upload using --max-connections 1.
I traced the issue down to this little piece of code:
file: storage/blob/_upload_chunking.py, line:70

    if max_connections > 1:
        import concurrent.futures
        executor = concurrent.futures.ThreadPoolExecutor(max_connections)
        range_ids = list(executor.map(uploader.process_chunk, uploader.get_chunk_streams()))
    else:
        range_ids = [uploader.process_chunk(result) for result in uploader.get_chunk_streams()]

Problem is with the line

        range_ids = list(executor.map(uploader.process_chunk, uploader.get_chunk_streams()))

calling executor.map with parameter uploader.get_chunk_streams() creates a list with all the element that are yielded in get_chunk_streams().
This list holds all 30G of file data and is built in memory before passing it on to executor.map().

So, if you want to upload with maxconnections > 1, basically, you need at least (memory+swap) larger than file you wish to upload...

tjprescott · 2016-10-20T15:04:43Z

Hi @sebasmannem thank you for the issue report!

tjprescott · 2016-10-20T17:09:07Z

It appears this is a bug in the python storage SDK. Azure/azure-storage-python#190

I've referenced your issue in that thread.

tjprescott · 2016-11-18T19:02:09Z

@troydai since you are working on the upload command, we should probably implement a workaround until this is fixed in the service. Basically set max_connections to 1 if the file being uploaded is large. Possibly log a warning when we do so.

mayurid · 2016-11-19T00:19:22Z

Updating sprint

tjprescott · 2016-12-15T19:13:31Z

Workaround is in, so I'm moving this to the backlog. When the SDK is fixed, the workaround can be removed and this can be closed.

matthchr · 2017-01-06T17:59:34Z

@tjprescott Any ETA on when the fix for this will be public on pypi? Also, won't it negatively impact performance significantly to only use max_connections = 1 for large files?

edit: I just realized I confused myself a bit -- the workaround is in azure-cli - I am actually curious when the real fix is going into the python client. I'll ask my question in that repo.

tjprescott · 2017-01-06T18:02:30Z

@mattchr the fix will be in for the January 14th release. It will slow performance for large files, but that's better than crashing due to running out of memory. When the storage SDK is fixed, we will remove the workaround.

williexu · 2017-10-17T23:09:22Z

This should be fixed, I'll take a look to make sure

williexu · 2017-10-18T20:57:33Z

Was fixed with Azure/azure-storage-python#190

tjprescott self-assigned this Oct 20, 2016

tjprescott added bug This issue requires a change to an existing behavior in the product in order to be resolved. Storage az storage labels Oct 20, 2016

tjprescott added this to the Sprint 5 (MVP Summit) milestone Oct 20, 2016

tjprescott mentioned this issue Oct 20, 2016

create_blob_from_path hangs if file is larger than MAX_SINGLE_PUT_SIZE Azure/azure-storage-python#190

Closed

tjprescott modified the milestones: Backlog, Sprint 5 (MVP Summit) Oct 20, 2016

tjprescott added the Service Attention This issue is responsible by Azure service team. label Oct 20, 2016

tjprescott assigned troydai and unassigned tjprescott Nov 18, 2016

mayurid modified the milestones: Sprint 7, Backlog Nov 19, 2016

tjprescott mentioned this issue Nov 23, 2016

Introduce batch upload and download for blob #1428

Merged

tjprescott modified the milestones: Sprint 9 (Internal Release), Sprint 7 Dec 15, 2016

tjprescott self-assigned this Dec 15, 2016

tjprescott mentioned this issue Dec 15, 2016

[Storage] Workaround for blob upload SDK bug #1580

Merged

tjprescott modified the milestones: Backlog, Sprint 9 (Internal Release) Dec 15, 2016

williexu assigned williexu and unassigned troydai and tjprescott Oct 17, 2017

williexu closed this as completed Oct 18, 2017

haroldrandom added bug This issue requires a change to an existing behavior in the product in order to be resolved. ServiceAttn Storage az storage labels Oct 25, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Out of memory when running az storage blob upload #1105

Out of memory when running az storage blob upload #1105

sebasmannem commented Oct 20, 2016 •

edited by troydai

Loading

tjprescott commented Oct 20, 2016

tjprescott commented Oct 20, 2016

tjprescott commented Nov 18, 2016

mayurid commented Nov 19, 2016

tjprescott commented Dec 15, 2016

matthchr commented Jan 6, 2017 •

edited

Loading

tjprescott commented Jan 6, 2017

williexu commented Oct 17, 2017

williexu commented Oct 18, 2017

Out of memory when running az storage blob upload #1105

Out of memory when running az storage blob upload #1105

Comments

sebasmannem commented Oct 20, 2016 • edited by troydai Loading

tjprescott commented Oct 20, 2016

tjprescott commented Oct 20, 2016

tjprescott commented Nov 18, 2016

mayurid commented Nov 19, 2016

tjprescott commented Dec 15, 2016

matthchr commented Jan 6, 2017 • edited Loading

tjprescott commented Jan 6, 2017

williexu commented Oct 17, 2017

williexu commented Oct 18, 2017

sebasmannem commented Oct 20, 2016 •

edited by troydai

Loading

matthchr commented Jan 6, 2017 •

edited

Loading