-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update zarr checksum file format #937
Comments
@dchiquito and @yarikoptic - while we are updating this format, can we consider a digest of the form |
@satra are you proposing that a file's digest is simply I personally don't see what value that gives us. The multipart upload checksum form is |
a directory digest is the md5 that we are computing, which is simply an md5 of the json, but the
it gives us a quick representation of the set of components that make up the eventual checksum. just like multipart count simply tells us the set of parts that went into computing the etag. |
btw, the checksumming algorithm used is whatever we define the schema property to be and fix the algorithm to be in dandischema. the checksum itself doesn't tell us anything about the algorithm at this point. (we did not go to multihash for this). and we do anticipate that future versions of the archive may have other algorithms that are used (e.g., based on blake or others). |
So a zarr containing 1,000,000 files would have a checksum |
greedy me, echoing the "size" in the original description, may be it should be |
The way the API would be set up once #925 is addressed, the API would include the checksum and size whenever a zarr file/directory is queried. I don't see a lot of value in also embedding that data into the checksum, but I have no moral objection. @satra any objection to |
no moral objection |
@dchiquito Should this issue have been closed? The |
The zarr
.checksum
file format has some deficiencies which should be addressed ASAP before too many zarrs are uploaded.name
field instead of apath
.This should resolve Re-design zarr checksum to not rely on "full path"s (and become a proper tree checksum) #931
.checksum
file is used to populate the/zarr/{zarr_id}.zarr/
view, so the best way to include file size in that view would be to include size in the.checksum
files. We can also include directory size for free by summing the sizes of their constituent files/directories.This should resolve zarr_content_read: include sizes for the "files" (and possibly "directories") #925
The text was updated successfully, but these errors were encountered: