-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce batch upload and download for blob #1428
Conversation
@troydai, thanks for your PR! By analyzing the history of the files in this pull request, we identified @tjprescott, @derekbekoe and @yugangw-msft to be potential reviewers. |
Can you show the --help output for these updated commands? |
|
||
result.append(client.get_blob_to_path(source_container_name, blob, dst)) | ||
if progress: | ||
print('Copied %s' % blob) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should use logger.warning here instead of printing to stdout
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll remove the --progress
option for now and draft a better solution later.
upload_action = _upload_blob if blob_type == 'block' or blob_type == 'page' else _append_blob | ||
|
||
if dryrun: | ||
print('upload action: from {} to {}'.format(source, destination)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should use logger.warning here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
discussed with @derekbekoe offline, print to standard output is fine here since the command itself doesn't return anything.
cli_main(command.split()) | ||
|
||
blobs = [b.name for b in self._blob_service.list_blobs(self._test_container_name)] | ||
assert len(blobs) == 31 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does this 31 come from?
Is it reproducible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar with the other tests.
If they are going to be skipped by Travis CI, when/how do we run them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- They may be convert eventually to vcr test once all the features are completed.
- I'll set up internal CI for long run integration test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the result is stable and reproducible, because the sample is generated during test and controlled by the tests.
@@ -14,3 +14,4 @@ requests==2.9.1 | |||
six==1.10.0 | |||
tabulate==0.7.5 | |||
vcrpy==1.7.4 | |||
nose==1.3.7 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is nose used in this PR?
If not, does it need to be added here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Main thing to change is using logger.warning instead of print.
@derekbekoe @tjprescott PR is updated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly questions.
register_cli_argument('storage blob upload-batch', 'content_type', arg_group='Content Control') | ||
register_cli_argument('storage blob upload-batch', 'content_cache_control', arg_group='Content Control') | ||
register_cli_argument('storage blob upload-batch', 'content_language', arg_group='Content Control') | ||
register_cli_argument('storage blob upload-batch', 'max_connections', type=int, arg_group='Content Control') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
max_connections
pertains to concurrency. I don't think it would go in the Content Control group.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will move.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, we need logic in here to address #1105 for batch upload. (It should be in regular upload too, but not necessarily as part of this PR)
@@ -255,7 +291,6 @@ def register_source_uri_arguments(scope): | |||
register_cli_argument('storage container delete', 'fail_not_exist', help='Throw an exception if the container does not exist.') | |||
|
|||
register_cli_argument('storage container exists', 'blob_name', ignore_type) | |||
register_cli_argument('storage container exists', 'blob_name', ignore_type) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to make sure we include this in the list of breaking changes for this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not breaking, I just remove duplicate code.
from azure.cli.command_modules.storage.storage_url_helpers import parse_url | ||
|
||
# 1. quick check | ||
if namespace.destination is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If destination/source are required parameters and not supplied, this will never be reached. Argparse will throw an error first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool, i'll remove this then.
raise ValueError('Source parameter is missing.') | ||
|
||
if not os.path.exists(namespace.destination) or not os.path.isdir(namespace.destination): | ||
raise ValueError('Destination folder {} does not exist'.format(namespace.source)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not create the directory? I would be more wary about a directory that already existed in which case files may be overwritten.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the choice i made to reduce the disk operation this command will execute so as to simply the logic.
raise ValueError('Destination folder {} does not exist'.format(namespace.source)) | ||
|
||
# 2. try to extract account name and container name from source string | ||
storage_desc = parse_url(namespace.source) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would give this a more descriptive name. At first I thought this was the generic urlparse library method. "parse_storage_url" or something like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will update.
The pattern is used for files globing. The supported patterns are '*', '?', '[seq]', | ||
and '[!seq]'. | ||
|
||
:param bool dryrun: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does download not have dryrun?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because download dryrun is essentially list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But can you list with patterns? Presumably I'd use dryrun to test my pattern before I actually move data.
from collections import namedtuple | ||
from azure.cli.core._profile import CLOUD | ||
|
||
StorageUrlDescription = namedtuple('StorageUrlDescription', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is SAS token?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not used here. I indent to do it in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tracking: #1434
snapshot = None | ||
|
||
if sys.version_info >= (3,): | ||
from urllib.parse import urlparse # pylint: disable=no-name-in-module, import-error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of this, try:
from six.moves.urllib.parse import urlparse
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
|
||
|
||
@skipIf(os.environ.get('Travis', 'false') == 'true', 'Integration tests are skipped in Travis CI') | ||
class StorageIntegrationTestBase(TestCase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we do this instead of a VCR test recording? Is this essentially a local "run live" test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. i wrote these tests in a hurry, will be convert in the future.
@@ -14,3 +14,4 @@ requests==2.9.1 | |||
six==1.10.0 | |||
tabulate==0.7.5 | |||
vcrpy==1.7.4 | |||
nose==1.3.7 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this package for?
@tjprescott take another look? |
@tjprescott dry run for download was just added. |
Did we add logic to address #1105 for batch upload? |
no, not specifically. |
@tjprescott sign off? |
* Azure/master: (39 commits) User should use no-cache so we build a fresh image (Azure#1455) Bump all modules to version 0.1.0b10 (Azure#1454) [Docs] Move around the order of install instructions. (Azure#1439) acs: *_install_cli mark cli as executable (Azure#1449) Fix resource list table. (Azure#1452) [Compute] VM NIC updates (Azure#1421) Introduce batch upload and download for blob (Azure#1428) Add auto-registration for resource providers. (Azure#1330) interpret the '@' syntax if something higher hasn't already done that. (Azure#1423) Aliasing plan argument with shorthand (Azure#1433) ad:fix one more place which still uses localtime for secret starttime (Azure#1429) Add table formatting for deployments and sort by timestmap. (Azure#1427) Add table formatting for resource group list (and add 'status') (Azure#1425) [Docs] Add shields specifying latest version and supported python versions (Azure#1420) Add new az storage blob copy start-batch command (Azure#1250) Component Discovery (Azure#1409) Add poison logic to prevent re-recording tests that need updating. (Azure#1412) [Storage] Fix storage table outputs and help text. (Azure#1402) [mention-bot] Attempt to fix config (Azure#1410) ad:use utc time on setting app's creds (Azure#1408) ...
Resolve issue #1432