Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add get-page-ranges command for page blobs #6754

Closed
OddBloke opened this issue Jul 9, 2018 · 7 comments · Fixed by #6776
Closed

Add get-page-ranges command for page blobs #6754

OddBloke opened this issue Jul 9, 2018 · 7 comments · Fixed by #6776
Assignees
Labels

Comments

@OddBloke
Copy link

OddBloke commented Jul 9, 2018

Describe the bug

Given a sparse image.vhd:

$ ls -lah image.vhd 
-rw-r--r-- 1 ubuntu ubuntu 31G Jul  9 15:35 image.vhd
$ du -h image.vhd 
1015M	image.vhd

Running

az storage blob upload <snip> --type page --file image.vhd --name image.vhd

takes substantially longer than uploading a file that contains only the non-sparse parts of the image. This strongly suggests to me that (a) the upload isn't taking advantage of the file's sparseness, and (b) the file as stored in Azure won't be sparse (making operation on it more expensive in the future).

To Reproduce
Create a sparse file and upload it.

Expected behavior
The upload would take the amount of time that uploading the non-sparse part of the file takes, and would be stored as sparse in Azure.

Environment summary

pip installed on Ubuntu 18.10

@tjprescott tjprescott added Service Attention This issue is responsible by Azure service team. Storage labels Jul 9, 2018
@tjprescott
Copy link
Member

Seems related to #5872.

@OddBloke
Copy link
Author

OddBloke commented Jul 9, 2018

Potentially, yes; in #5872 (comment) @zezha-msft said "We have the sparse file optimization for upload but not for download" which is why I filed this as a bug rather than a feature request.

(Some grepping through git history has no mention of "sparse" either here or in the storage repo, so I'm not sure what that comment is based upon to try and work out what problem I'm seeing.)

@tjprescott
Copy link
Member

@zezha-msft for comment. cc/ @williexu

@zezha-msft
Copy link

Hi @OddBloke, thanks for reaching out!

The sparse file upload optimization for page blobs is contained in the following 2 commits: commit1 and commit2. In short, we read each chunk before uploading it to verify that it is not empty. We do store the file sparsely on the server side, but while uploading, reading the source file is still required. Thus, it might be almost impossible that uploading the sparse file takes the same amount of time as uploading only the non-sparse parts, because reading the source file could be a bottleneck. A fairer comparison would be uploading the sparse file with and without this optimization(current CLI vs previous CLI before optimization).

You can verify that the file is stored sparsely by performing a get page range.

@OddBloke
Copy link
Author

OddBloke commented Jul 9, 2018

Thanks, @zezha-msft, that's really helpful!

Is there a way to do the "get page range" from the CLI?

@williexu
Copy link
Contributor

williexu commented Jul 9, 2018

@OddBloke Not at the moment.
Would this be a useful utility for you?
If so, I can add a az storage blob get-page-ranges command as a new feature.

@OddBloke
Copy link
Author

OddBloke commented Jul 9, 2018 via email

@williexu williexu closed this as completed Jul 9, 2018
@williexu williexu reopened this Jul 9, 2018
@williexu williexu changed the title az storage blob upload doesn't perform sparse upload Add get-page-ranges command for page blobs Jul 9, 2018
@williexu williexu self-assigned this Jul 10, 2018
@mozehgir mozehgir added the Storage az storage label Jul 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants