Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEAT] Create obstore store in fsspec on demand #198

Open
wants to merge 45 commits into
base: main
Choose a base branch
from
Open
Changes from 1 commit
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
909b5b0
feat: split bucket from path + construct store
machichima Feb 3, 2025
29464a7
feat: remove store + add protocol + apply to all methods
machichima Feb 4, 2025
a0d9e1d
feat: inherit from AsyncFsspecStore to specify protocol
machichima Feb 4, 2025
6614906
fix: correctly split protocol if exists in path
machichima Feb 6, 2025
75c738e
feat: use urlparse to extract protocol
machichima Feb 7, 2025
2209839
Merge branch 'main' into obstore-instance-in-fsspec
kylebarron Feb 7, 2025
46c6b59
update typing
kylebarron Feb 7, 2025
9ab35e1
fix: unbounded error
machichima Feb 8, 2025
cb80495
fix: remove redundant import
machichima Feb 8, 2025
b6a3d3a
feat: add register() to register AsyncFsspecStore for provided protocol
machichima Feb 8, 2025
68cdff9
feat: add validation for protocol in register()
machichima Feb 8, 2025
fa5b539
test: for register()
machichima Feb 8, 2025
b704779
feat: add async parameter for register()
machichima Feb 8, 2025
61deac4
test: test async store created by register()
machichima Feb 8, 2025
4bc1599
feat: add http(s) into protocol_with_bucket list
machichima Feb 8, 2025
4dc9143
feat: ls return path with bucket name
machichima Feb 9, 2025
fb607d0
feat: enable re-register same protocol
machichima Feb 9, 2025
b74948a
test: update pytest fixture to use register()
machichima Feb 9, 2025
f6ba27c
test: update test with new path format
machichima Feb 9, 2025
30250cf
fix: mkdocs build error
machichima Feb 9, 2025
4a8e6fc
Merge branch 'main' into obstore-instance-in-fsspec
machichima Feb 13, 2025
27a0ac7
fix: error when merging
machichima Feb 13, 2025
d2d0235
build: add some ruff ignore
machichima Feb 13, 2025
1f97703
fix: ruff error
machichima Feb 13, 2025
b002afb
build: add cachetools dependencies
machichima Feb 13, 2025
9088104
Merge branch 'main' into obstore-instance-in-fsspec
machichima Feb 13, 2025
897beb0
better scoping of lints
kylebarron Feb 13, 2025
0726999
lint
kylebarron Feb 13, 2025
5b87c46
fix: update lru_cache + clean class attribute
machichima Feb 14, 2025
79a03f7
Merge branch 'main' into obstore-instance-in-fsspec
machichima Feb 14, 2025
dc4215d
fix some bugs when using get/put/cp/info/ls
machichima Feb 15, 2025
f59152b
Merge branch 'main' into obstore-instance-in-fsspec
machichima Feb 15, 2025
4896ba3
fix: declare lru_cache in __init__
machichima Feb 16, 2025
c9378b8
fix: make AsyncFsspecStore cachable
machichima Feb 16, 2025
549a4ac
test: for cache constructed store and filesystem obj
machichima Feb 16, 2025
a93fe2e
build: remove dependencies
machichima Feb 16, 2025
a54a3fe
fix: prevent send folder path to cat_file
machichima Feb 19, 2025
c804a18
fix: enable cp folders
machichima Feb 19, 2025
6c2c513
lint
machichima Feb 19, 2025
c6392f2
fix: clobber=False to prevent re-register and cause memory leak
machichima Feb 23, 2025
347e63e
test: clean up after each test to prevent memory leak
machichima Feb 23, 2025
69bbed6
Merge branch 'main' into obstore-instance-in-fsspec
machichima Feb 23, 2025
9e423f5
Simplify protocol registration
kylebarron Feb 24, 2025
096845c
fix+test: register check types
machichima Feb 25, 2025
5ea2ba8
small edits
kylebarron Feb 26, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions obstore/python/obstore/fsspec.py
Original file line number Diff line number Diff line change
Expand Up @@ -104,6 +104,12 @@ def _split_path(self, path: str) -> Tuple[str, str]:
# no bucket name in path
return "", path

if path.startswith(self.protocol + "://"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming that this function will always receive something a URL like s3://mybucket/path/to/file, I'm inclined for this function to use urlparse instead of manually handling the parts of the URL

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will not always be s3://mybucket/path/to/file, but may be without protocol like mybucket/path/to/file

Copy link
Author

@machichima machichima Feb 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I use urlparse like this here, which works for both s3://mybucket/path/to/file and mybucket/path/to/file

res = urlparse(path)
if res.scheme:
if res.scheme != self.protocol:
raise ValueError(f"Expect protocol to be {self.protocol}. Got {res.schema}")
path = res.netloc + res.path

path = path[len(self.protocol) + 3 :]
elif path.startswith(self.protocol + "::"):
path = path[len(self.protocol) + 2 :]
path = path.rstrip("/")

if "/" not in path:
return path, ""
else:
Expand Down
Loading