implement `store.list_prefix` and `store._set_many` #2064

d-v-b · 2024-08-03T09:58:54Z

fixes / implements list_prefix for stores. list_prefix(prefix=foo) now consistently returns keys with the shared prefix foo stripped. I'd be fine altering this to return absolute keys instead.
implements tests for list_prefix on the StoreTests base class
adds a _set_dict(Mapping[str, Buffer]) method to stores that allows declaring a collection of key: value pairs to write to storage. The primary usage is to make store tests simpler via a declarative API. I also suspect that making tests simpler anticipates making other code simpler. The default implementation of _set_dict simply wraps store.set, but it's easy to imagine fancier batching / transaction implementations.
alters some of the store tests to use _set_dict.
adds some missing type annotations in the remotestore tests

TODO:

Add unit tests and/or doctests in docstrings
Add docstrings and API docs for any new/modified user-facing classes and functions
New/modified features documented in docs/tutorial.rst
Changes documented in docs/release.rst
GitHub Actions have all passed
Test coverage is 100% (Codecov passes)

d-v-b · 2024-08-03T10:03:27Z

@martindurant i'm on flaky train internet and the github web API is giving me inconsistent signals about whether you have been requested for review; to be clear, I would appreciate your review :)

martindurant · 2024-08-03T16:59:31Z

list_prefix(prefix=foo) now consistently returns keys with the shared prefix foo stripped. I'd be fine altering this to return absolute keys instead.

What matters is how we intend to call it! fsspec likes to provide full paths (but has explicit prefix interfaces) - of course it's fine to tailor the situation to what we need, but I don't really know what that is.

I would appreciate your review

I may have got a notification or a few.

src/zarr/abc/store.py

martindurant · 2024-08-03T17:02:21Z

src/zarr/store/local.py

@@ -193,14 +193,10 @@ async def list_prefix(self, prefix: str) -> AsyncGenerator[str, None]:
        -------
        AsyncGenerator[str, None]
        """
-        for p in (self.root / prefix).rglob("*"):
-            if p.is_file():
-                yield str(p)


Were we getting duplicates?

i think we were. this code path was not tested until this PR

martindurant · 2024-08-03T17:02:30Z

src/zarr/store/local.py

        to_strip = str(self.root) + "/"
        for p in (self.root / prefix).rglob("*"):
            if p.is_file():
-                yield str(p).replace(to_strip, "")
+                yield str(p).removeprefix(to_strip)


martindurant · 2024-08-03T17:05:49Z

src/zarr/store/remote.py

-        for onefile in await self._fs._ls(prefix, detail=False):
-            yield onefile
+        find_str = "/".join([self.path, prefix])
+        for onefile in await self._fs._find(find_str):


The defaults for find are: maxdepth=None, withdirs=False, detail=False; maybe good to be specific.

Why is find() better than ls()? The former will return all child files, not just one level deep - is that the intent? If not, ls() ought to be generally more efficient.

using find here is merely due to my ignorance of fsspec. I will implement ls as you suggest

It depends on whether you want one directory level or everything below it. When I wrote the original, I didn't know the intent.

i believe the intent here is to list everything below prefix (at least, that's how I'm using it)

I misunderstood your first comment. since the intent is to use the behavior of _find, I'm keeping it, but adding explicit kwargs as you suggested.

martindurant · 2024-08-03T17:06:42Z

src/zarr/sync.py

+    """
+    result = []
+    async for x in data:
+        result.append(x)


asyncio.gather? Like above, not much point in having coroutines if we serially wait for them.

I don't think we can use asyncio.gather here, because AsyncGenerator is not iterable. Happy to be corrected, since I don't really know asyncio very well.

and for clarification, _collect_aiterator largely exists for convenience in testing, because I need some way to collect async generators when debugging with pdb. This function is not intended for use in anything performance sensitive.

We should probably re-examine the use of async-iterators, though. If we can't gather() on them (seems to be true?), then they are the wrong abstraction since gather() is probably always what we actually want.

On second thoughts, maybe I'm wrong - does async for schedule all the coroutines at once?? Should be easy to test.

I don't think it schedules them all at once. in [x async for x in async_generator], x is not an awaitable; it's already awaited. since the basic model of the generator is that it's a resumable, stateful iterator, I don't think we can schedule all the tasks at once.

the idea with the generators is to a) support seamless pagination and b) support pipelining (del_prefix will be able to take advantage of this at some point).

martindurant · 2024-08-03T17:09:17Z

src/zarr/testing/store.py

+        store_dict = dict(zip(keys, data_buf, strict=True))
+        await store._set_dict(store_dict)
+        for k, v in store_dict.items():
+            assert self.get(store, k).to_bytes() == v.to_bytes()


if x.to_bytes() == y.to_bytes(), does x== y?

Isn't there a multiple get? Maybe not important here.

if x.to_bytes() == y.to_bytes(), does x== y?

no, and I suspect this might be deliberate since in principle Buffer instances can have identical bytes but different devices (e.g., gpu memory vs host memory); thus x == y might only be true if two buffers are bytes-equal and device-equal, but I'm speculating here. @madsbk would have a better answer I think.

Isn't there a multiple get? Maybe not important here.

there is no multiple get (nor a multiple set, nor a multiple delete).

xref: src/zarr/buffer.py in #2006

martindurant · 2024-08-03T17:12:20Z

tests/v3/test_store/test_remote.py

@@ -113,8 +107,8 @@ def store_kwargs(self, request) -> dict[str, str | bool]:
        raise AssertionError

    @pytest.fixture(scope="function")
-    def store(self, store_kwargs: dict[str, str | bool]) -> RemoteStore:
-        url = store_kwargs["url"]
+    async def store(self, store_kwargs: dict[str, str | bool | UPath]) -> RemoteStore:


This isn't actually async

correct, but the class we are inheriting from defines this as an async method

…ore explicit

jhamman · 2024-08-09T20:43:37Z

src/zarr/abc/store.py

@@ -221,6 +224,13 @@ def close(self) -> None:
        self._is_open = False
        pass

+    async def _set_dict(self, dict: Mapping[str, Buffer]) -> None:


set_many() (analogous to insert_many)?

I moved away from set_dict and switched to _set_many

…nto store-list-prefix

…e-list-prefix

d-v-b · 2024-09-19T16:14:14Z

we should try to get this in, because it fixes problematic list_prefix behavior that has been noted elsewhere.

* v3: chore: update pre-commit hooks (zarr-developers#2222) fix: validate v3 dtypes when loading/creating v3 metadata (zarr-developers#2209) fix typo in store integration test (zarr-developers#2223) Basic Zarr-python 2.x compatibility changes (zarr-developers#2098) Make Group.arrays, groups compatible with v2 (zarr-developers#2213) Typing fixes to test_indexing (zarr-developers#2193) Default to RemoteStore for fsspec URIs (zarr-developers#2198) Make MemoryStore serialiazable (zarr-developers#2204) [v3] Implement Group methods for empty, full, ones, and zeros (zarr-developers#2210) implement `store.list_prefix` and `store._set_many` (zarr-developers#2064) Fixed codec for v2 data with no fill value (zarr-developers#2207)

d-v-b added 2 commits August 3, 2024 11:36

implement store.list_prefix and store._set_dict

ebbfbe0

simplify string handling

da6083e

d-v-b requested review from martindurant and jhamman and removed request for martindurant August 3, 2024 09:59

d-v-b mentioned this pull request Aug 3, 2024

Add array storage helpers #2065

Merged

6 tasks

martindurant reviewed Aug 3, 2024

View reviewed changes

d-v-b added 2 commits August 3, 2024 20:52

use asyncio.gather in _set_dict

e4101b7

add docstrings to list_prefix methods, and make invocation of _find m…

70f9ceb

…ore explicit

jhamman reviewed Aug 9, 2024

View reviewed changes

jhamman added the V3 label Aug 9, 2024

Merge branch 'v3' of https://github.com/zarr-developers/zarr-python i…

6eadb0c

…nto store-list-prefix

dcherian mentioned this pull request Aug 15, 2024

Stateful store tests #2070

Merged

6 tasks

add byterangerequest type

0b54e4c

jhamman mentioned this pull request Sep 17, 2024

LocalStore.list_prefix adds directory prefix to all paths #2190

Closed

d-v-b added 3 commits September 19, 2024 17:01

Merge branch 'v3' of github.com:zarr-developers/zarr-python into stor…

ddbbd60

…e-list-prefix

fix: activate test of localstore.list_prefix, fix zipstore.list_prefix

49b4c1a

Merge branch 'v3' of github.com:zarr-developers/zarr-python into stor…

c139c10

…e-list-prefix

d-v-b requested a review from jhamman September 19, 2024 16:13

d-v-b changed the title ~~implement store.list_prefix and store._set_dict~~ implement store.list_prefix and store._set_many Sep 19, 2024

jhamman approved these changes Sep 19, 2024

View reviewed changes

d-v-b merged commit 06e3215 into v3 Sep 19, 2024
29 checks passed

d-v-b deleted the store-list-prefix branch September 19, 2024 16:26

jhamman added this to the 3.0.0.beta milestone Oct 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

implement `store.list_prefix` and `store._set_many` #2064

implement `store.list_prefix` and `store._set_many` #2064

d-v-b commented Aug 3, 2024

d-v-b commented Aug 3, 2024

martindurant commented Aug 3, 2024

martindurant Aug 3, 2024

d-v-b Aug 3, 2024

martindurant Aug 3, 2024

martindurant Aug 3, 2024

d-v-b Aug 3, 2024

martindurant Aug 3, 2024

d-v-b Aug 3, 2024

d-v-b Aug 3, 2024

martindurant Aug 3, 2024

d-v-b Aug 3, 2024

d-v-b Aug 3, 2024

martindurant Aug 3, 2024

martindurant Aug 3, 2024 •

edited

Loading

d-v-b Aug 3, 2024

jhamman Aug 9, 2024

martindurant Aug 3, 2024

d-v-b Aug 3, 2024

jhamman Aug 9, 2024

martindurant Aug 3, 2024

d-v-b Aug 3, 2024

jhamman Aug 9, 2024 •

edited

Loading

d-v-b Sep 19, 2024

d-v-b commented Sep 19, 2024

implement store.list_prefix and store._set_many #2064

implement store.list_prefix and store._set_many #2064

Conversation

d-v-b commented Aug 3, 2024

d-v-b commented Aug 3, 2024

martindurant commented Aug 3, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

martindurant Aug 3, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jhamman Aug 9, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

d-v-b commented Sep 19, 2024

implement `store.list_prefix` and `store._set_many` #2064

implement `store.list_prefix` and `store._set_many` #2064

martindurant Aug 3, 2024 •

edited

Loading

jhamman Aug 9, 2024 •

edited

Loading