Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce batch upload and download for blob #1428

Merged
merged 5 commits into from
Nov 23, 2016
Merged

Introduce batch upload and download for blob #1428

merged 5 commits into from
Nov 23, 2016

Conversation

troydai
Copy link
Contributor

@troydai troydai commented Nov 23, 2016

Resolve issue #1432

10:21 $ az storage blob upload-batch -h

Command
    az storage blob upload-batch: Upload files to storage container as blobs.

Arguments
    --destination -d [Required]: The string represents the destination of this upload operation. The
                                 source can be the container URL or the container name. When the
                                 source is the container URL, the storage account name will parsed
                                 from the URL.
    --source -s      [Required]: The directory where the files to be uploaded.
    --dryrun                   : Show the summary of the operations to be taken instead of actually
                                 upload the file(s).
    --lease-id                 : Required if the blob has an active lease.
    --metadata                 : Metadata in space-separated key=value pairs. This overwrites any
                                 existing metadata.
    --pattern                  : The pattern is used for files globing. The supported patterns are
                                 '*', '?', '[seq]', and '[!seq]'.
    --timeout                  : Request timeout in seconds. Applies to each call to the service.
    --type -t                  : Defaults to 'page' for *.vhd files, or 'block' otherwise.  Allowed
                                 values: append, block, page.

Content Control Arguments
    --content-cache-control    : The cache control string.
    --content-disposition      : Conveys additional information about how to process the response
                                 payload, and can also be used to attach additional metadata.
    --content-encoding         : The content encoding type.
    --content-language         : The content language.
    --content-md5              : The content's MD5 hash.
    --content-type             : The content MIME type.
    --max-connections          : Default: 2.
    --maxsize-condition        : The max length in bytes permitted for an append blob.
    --validate-content         : Specifies that an MD5 hash shall be calculated for each chunk of
                                 the blob and verified by the service when the chunk has arrived.

Pre-condition Arguments
    --if-match                 : An ETag value, or the wildcard character (*). Specify this header
                                 to perform the operation only if the resource's ETag matches the
                                 value specified.
    --if-modified-since        : Alter only if modified since supplied UTC datetime
                                 (Y-m-d'T'H:M'Z').
    --if-none-match            : An ETag value, or the wildcard character (*). Specify this header
                                 to perform the operation only if the resource's ETag does not match
                                 the value specified. Specify the wildcard character (*) to perform
                                 the operation only if the resource does not exist, and fail the
                                 operation if it does exist.
    --if-unmodified-since      : Alter only if unmodified since supplied UTC datetime
                                 (Y-m-d'T'H:M'Z').

Storage Account Arguments
    --account-key              : Storage account key. Must be used in conjunction with storage
                                 account name. Environment variable: AZURE_STORAGE_KEY.
    --account-name             : Storage account name. Must be used in conjunction with either
                                 storage account key or a SAS token. Environment variable:
                                 AZURE_STORAGE_ACCOUNT.
    --connection-string        : Storage account connection string. Environment variable:
                                 AZURE_STORAGE_CONNECTION_STRING.
    --sas-token                : A Shared Access Signature (SAS). Must be used in conjunction with
                                 storage account name. Environment variable:
                                 AZURE_STORAGE_SAS_TOKEN.

Global Arguments
    --debug                    : Increase logging verbosity to show all debug logs.
    --help -h                  : Show this help message and exit.
    --output -o                : Output format.  Allowed values: json, jsonc, list, table, tsv.
                                 Default: json.
    --query                    : JMESPath query string. See http://jmespath.org/ for more
                                 information and examples.
    --verbose                  : Increase logging verbosity. Use --debug for full debug logs.
10:15 $ az storage blob download-batch -h

Command
    az storage blob download-batch: Download blobs in a container recursively.

Arguments
    --destination -d [Required]: The string represents the destination folder of this download
                                 operation. The folder must exist.
    --source -s      [Required]: The string represents the source of this download operation. The
                                 source can be the container URL or the container name. When the
                                 source is the container URL, the storage account name will parsed
                                 from the URL.
    --pattern                  : The pattern is used for files globing. The supported patterns are
                                 '*', '?', '[seq]', and '[!seq]'.

Storage Account Arguments
    --account-key              : Storage account key. Must be used in conjunction with storage
                                 account name. Environment variable: AZURE_STORAGE_KEY.
    --account-name             : Storage account name. Must be used in conjunction with either
                                 storage account key or a SAS token. Environment variable:
                                 AZURE_STORAGE_ACCOUNT.
    --connection-string        : Storage account connection string. Environment variable:
                                 AZURE_STORAGE_CONNECTION_STRING.
    --sas-token                : A Shared Access Signature (SAS). Must be used in conjunction with
                                 storage account name. Environment variable:
                                 AZURE_STORAGE_SAS_TOKEN.

Global Arguments
    --debug                    : Increase logging verbosity to show all debug logs.
    --help -h                  : Show this help message and exit.
    --output -o                : Output format.  Allowed values: json, jsonc, list, table, tsv.
                                 Default: json.
    --query                    : JMESPath query string. See http://jmespath.org/ for more
                                 information and examples.
    --verbose                  : Increase logging verbosity. Use --debug for full debug logs.

@mention-bot
Copy link

@troydai, thanks for your PR! By analyzing the history of the files in this pull request, we identified @tjprescott, @derekbekoe and @yugangw-msft to be potential reviewers.

@tjprescott
Copy link
Member

Can you show the --help output for these updated commands?


result.append(client.get_blob_to_path(source_container_name, blob, dst))
if progress:
print('Copied %s' % blob)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should use logger.warning here instead of printing to stdout

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll remove the --progress option for now and draft a better solution later.

upload_action = _upload_blob if blob_type == 'block' or blob_type == 'page' else _append_blob

if dryrun:
print('upload action: from {} to {}'.format(source, destination))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should use logger.warning here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

discussed with @derekbekoe offline, print to standard output is fine here since the command itself doesn't return anything.

cli_main(command.split())

blobs = [b.name for b in self._blob_service.list_blobs(self._test_container_name)]
assert len(blobs) == 31
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does this 31 come from?
Is it reproducible?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar with the other tests.
If they are going to be skipped by Travis CI, when/how do we run them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. They may be convert eventually to vcr test once all the features are completed.
  2. I'll set up internal CI for long run integration test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the result is stable and reproducible, because the sample is generated during test and controlled by the tests.

@@ -14,3 +14,4 @@ requests==2.9.1
six==1.10.0
tabulate==0.7.5
vcrpy==1.7.4
nose==1.3.7
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is nose used in this PR?
If not, does it need to be added here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll remove it.

Copy link
Member

@derekbekoe derekbekoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Main thing to change is using logger.warning instead of print.

@troydai
Copy link
Contributor Author

troydai commented Nov 23, 2016

@derekbekoe @tjprescott PR is updated.

Copy link
Member

@tjprescott tjprescott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly questions.

register_cli_argument('storage blob upload-batch', 'content_type', arg_group='Content Control')
register_cli_argument('storage blob upload-batch', 'content_cache_control', arg_group='Content Control')
register_cli_argument('storage blob upload-batch', 'content_language', arg_group='Content Control')
register_cli_argument('storage blob upload-batch', 'max_connections', type=int, arg_group='Content Control')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

max_connections pertains to concurrency. I don't think it would go in the Content Control group.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will move.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, we need logic in here to address #1105 for batch upload. (It should be in regular upload too, but not necessarily as part of this PR)

@@ -255,7 +291,6 @@ def register_source_uri_arguments(scope):
register_cli_argument('storage container delete', 'fail_not_exist', help='Throw an exception if the container does not exist.')

register_cli_argument('storage container exists', 'blob_name', ignore_type)
register_cli_argument('storage container exists', 'blob_name', ignore_type)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to make sure we include this in the list of breaking changes for this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not breaking, I just remove duplicate code.

from azure.cli.command_modules.storage.storage_url_helpers import parse_url

# 1. quick check
if namespace.destination is None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If destination/source are required parameters and not supplied, this will never be reached. Argparse will throw an error first.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool, i'll remove this then.

raise ValueError('Source parameter is missing.')

if not os.path.exists(namespace.destination) or not os.path.isdir(namespace.destination):
raise ValueError('Destination folder {} does not exist'.format(namespace.source))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not create the directory? I would be more wary about a directory that already existed in which case files may be overwritten.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the choice i made to reduce the disk operation this command will execute so as to simply the logic.

raise ValueError('Destination folder {} does not exist'.format(namespace.source))

# 2. try to extract account name and container name from source string
storage_desc = parse_url(namespace.source)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would give this a more descriptive name. At first I thought this was the generic urlparse library method. "parse_storage_url" or something like that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will update.

The pattern is used for files globing. The supported patterns are '*', '?', '[seq]',
and '[!seq]'.

:param bool dryrun:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does download not have dryrun?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because download dryrun is essentially list.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But can you list with patterns? Presumably I'd use dryrun to test my pattern before I actually move data.

from collections import namedtuple
from azure.cli.core._profile import CLOUD

StorageUrlDescription = namedtuple('StorageUrlDescription',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is SAS token?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not used here. I indent to do it in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tracking: #1434

snapshot = None

if sys.version_info >= (3,):
from urllib.parse import urlparse # pylint: disable=no-name-in-module, import-error
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of this, try:
from six.moves.urllib.parse import urlparse

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok



@skipIf(os.environ.get('Travis', 'false') == 'true', 'Integration tests are skipped in Travis CI')
class StorageIntegrationTestBase(TestCase):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we do this instead of a VCR test recording? Is this essentially a local "run live" test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. i wrote these tests in a hurry, will be convert in the future.

@@ -14,3 +14,4 @@ requests==2.9.1
six==1.10.0
tabulate==0.7.5
vcrpy==1.7.4
nose==1.3.7
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this package for?

@troydai
Copy link
Contributor Author

troydai commented Nov 23, 2016

@tjprescott take another look?

@troydai
Copy link
Contributor Author

troydai commented Nov 23, 2016

@tjprescott dry run for download was just added.

@tjprescott
Copy link
Member

Did we add logic to address #1105 for batch upload?

@troydai
Copy link
Contributor Author

troydai commented Nov 23, 2016

no, not specifically.

@troydai
Copy link
Contributor Author

troydai commented Nov 23, 2016

@tjprescott sign off?

@troydai troydai merged commit 83788ee into Azure:master Nov 23, 2016
@troydai troydai deleted the azcopy4 branch November 23, 2016 23:22
thegalah pushed a commit to thegalah/azure-cli that referenced this pull request Nov 28, 2016
* Azure/master: (39 commits)
  User should use no-cache so we build a fresh image (Azure#1455)
  Bump all modules to version 0.1.0b10 (Azure#1454)
  [Docs] Move around the order of install instructions. (Azure#1439)
  acs: *_install_cli mark cli as executable (Azure#1449)
  Fix resource list table. (Azure#1452)
  [Compute] VM NIC updates (Azure#1421)
  Introduce batch upload and download for blob (Azure#1428)
  Add auto-registration for resource providers. (Azure#1330)
  interpret the '@' syntax if something higher hasn't already done that. (Azure#1423)
  Aliasing plan argument with shorthand (Azure#1433)
  ad:fix one more place which still uses localtime for secret starttime (Azure#1429)
  Add table formatting for deployments and sort by timestmap. (Azure#1427)
  Add table formatting for resource group list (and add 'status') (Azure#1425)
  [Docs] Add shields specifying latest version and supported python versions (Azure#1420)
  Add new az storage blob copy start-batch command (Azure#1250)
  Component Discovery (Azure#1409)
  Add poison logic to prevent re-recording tests that need updating. (Azure#1412)
  [Storage] Fix storage table outputs and help text. (Azure#1402)
  [mention-bot] Attempt to fix config (Azure#1410)
  ad:use utc time on setting app's creds (Azure#1408)
  ...
xscript pushed a commit to xscript/azure-cli that referenced this pull request Nov 30, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants