Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix reading partial shards #2397

Merged
merged 2 commits into from
Oct 18, 2024
Merged

Fix reading partial shards #2397

merged 2 commits into from
Oct 18, 2024

Conversation

normanrz
Copy link
Member

This PR fixes a pretty major bug in the read code path of the sharding codec. Instead of issuing byte ranges in the form of (byte_start_offset, byte_length), the codec issues (byte_start_offset, byte_end_offset).

Fixes #2302

@normanrz normanrz self-assigned this Oct 18, 2024
@normanrz normanrz requested review from d-v-b and jhamman October 18, 2024 12:58
@normanrz normanrz added the V3 label Oct 18, 2024
@d-v-b
Copy link
Contributor

d-v-b commented Oct 18, 2024

This PR fixes a pretty major bug in the read code path of the sharding codec. Instead of issuing byte ranges in the form of (byte_start_offset, byte_length), the codec issues (byte_start_offset, byte_end_offset).

Fixes #2302

Thanks for this fix. I'm guessing the bug slipped by because when the offset is 0, the (start, start + length) and (start, end) are identical. I don't remember offhand which convention for linear indexing we are using in the rest of the codebase, but at some point we should ensure that it's all consistent. But that's out of scope for this PR.

@TomAugspurger since you are interested in NewType usage: using NewType to brand ints as IntervalStart, IntervalEnd, IntervalLength could be a fun use of NewType.

@normanrz
Copy link
Member Author

I don't remember offhand which convention for linear indexing we are using in the rest of the codebase, but at some point we should ensure that it's all consistent. But that's out of scope for this PR.

The store uses (start, length). Python slices uses (start, end). HTTP uses (start, end-1).

I now refactored the sharding codec to use (start, length) because that is what the store expects.
Would be good to get this fix out quickly!

@jhamman
Copy link
Member

jhamman commented Oct 18, 2024

We have a small merge conflict to solve but if that is good, I can include this in the beta.1 release later today.

@normanrz normanrz merged commit ee112b9 into main Oct 18, 2024
31 checks passed
@normanrz normanrz deleted the fix-sharding-ranges branch October 18, 2024 15:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[v3] zarr-python fails to decode sharded array written by other implementations.
3 participants