-
Notifications
You must be signed in to change notification settings - Fork 239
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
create_blob_from_path hangs if file is larger than MAX_SINGLE_PUT_SIZE #190
Comments
I tried the same experiment reading the file into a text string and then using create_blob_from_text and experienced the same behavior. |
setting max_connections = 1 seems to work around what I'm seeing, so it may just be a problem with importing concurrent.futures with python 2.7? |
I'm not able to repro this. Each time we release we run all of our tests in 2.7 and we have explicit tests for every API in both parallel and non-parallel mode. The tests for this particular API are here and we actually use the same trick you did to make them run faster -- reducing put size and block size. I just tried them in both 2.7 and 3.5 and they pass. I also validated in Fiddler to confirm that they were indeed running in parallel and saw multiple requests, as expected.
|
I have a problem and I'm not sure if it is related, but when I upload a large file (18 GB) using create_blob_from_path(), my memory usage goes through the roof. Eventually I run out of ram and Linux kills my process. I upload multiple files concurrently in 16 threads. I'm using Python 2.7.6 with azure-storage 0.33.0. |
And we just got this bug report which seems to be related exactly to this:
|
I think this is indeed the issue that I'm experiencing as well. |
Thanks guys, I will investigate these RAM issues further, but our upcoming release will resolve this issue as we are reworking the upload strategy. @marcelvb, have you tried reducing your |
I set |
@rambo-msft @tjprescott Any update on this? Updating to use |
@matthchr @marcelvb @tjprescott Feel free to open a new issue if you run into any problems with the new version. Thanks! |
Steps to reproduce:
from azure.storage.blob import BlockBlobService
class BlobInteraction:
def init(self, ACCOUNT_NAME, ACCOUNT_KEY):
self.account_name = ACCOUNT_NAME
self.account_key = ACCOUNT_KEY
self.blob_service = BlockBlobService(account_name=ACCOUNT_NAME, account_key=ACCOUNT_KEY)
def put(self, container_name, blob_name, local_filename):
if self.blob_service is None:
self.blob_service = BlockBlobService(account_name=self.account_name, account_key=self.account_key)
self.blob_service.create_blob_from_path(
container_name,
blob_name,
local_filename
)
BLOB_ACCOUNT_NAME = "MY_ACCOUNT_NAME"
BLOB_CONTAINER_NAME = "MY_CONTAINER_NAME"
BLOB_ACCOUNT_KEY = "MY_KEY"
blob = BlobInteraction(BLOB_ACCOUNT_NAME, BLOB_ACCOUNT_KEY)
blob.put(BLOB_CONTAINER_NAME,
'small_blob.csv',
'path/to/small.csv')
blob.put(BLOB_CONTAINER_NAME,
'large_blob.csv',
'path/to/large.csv')
Intended behavior: small_blob.csv and large_blob.csv appear in my blob storage
What happens: small_blob.csv appears in blob storage, code hangs and can't be terminated after second call to create_blob_from_path.
I tried setting the max sizes to something smaller to see if the "small_blob.csv" file failed to upload and the process hung, and it does:
added this to def init(self, ACCOUNT_NAME, ACCOUNT_KEY):
self.blob_service.MAX_SINGLE_PUT_SIZE = 32 * 1024
self.blob_service.MAX_BLOCK_SIZE = 4 * 1024
The text was updated successfully, but these errors were encountered: