-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add fsspec support #190
Add fsspec support #190
Conversation
Hey @ap-- thanks for the thoughtful contribution! I do have a few small questions listed in the review above. This does seem like a straightforward implementation and I appreciate your consideration of the existing code structure. @manzt can you comment on whether this PR addresses your requirements? |
Hi @mdshw5, I currently can't see your review comments. Did you submit it? Cheers, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ap-- I had a few comments here. Looks really good, and I’ll be glad to merge after I study the changes a bit more with a clear mind.
"Cannot read FASTA file %s" % filename) | ||
|
||
self.indexname = filename + '.fai' | ||
self.indexname = self.filename + '.fai' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How will this work when the filename is derived from a remote URI? What does the filename for the local index file become?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point.
Currently this stores the index file on the remote.
So: providing fsspec.open("s3://mybucket/somefile.fasta")
will make self.filename = "mybucket/somefile.fasta"
and self._fs
will be an s3fs filesystem instance. self.indexname = mybucket/somefile.fasta.fai
.
But this requires the remote to be writable. (I basically stuck to the current behavior that stores the index next to the file.) In case the remote is readonly, it would be nice if the user could specify a local path for the index file.
I could make this configurable through a kwarg, with the default being current behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the current behavior is fine, and as I haven't used fsspec I did not understand that the remote file-like object would be created. As long as there is as reasonable exception generated when the remote is read-only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add a test for this on read-only filesystem with a fasta file with no index file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case the remote is readonly, it would be nice if the user could specify a local path for the index file.
I ran into this when experimenting with an implementation a little while ago. It's hard to know what the "correct" behavior should be. I guess one option would be to have an index_filename
kwarg in the constructor, which could similarly be a fsspec.OpenFile
or str
and would specify explicitly where to look for (or create) an index.
So the default would be to read and write index adjacent to filename
in whatever filesystem is provided. If filename
is readonly, then we could raise an exception, and the user could fix either by 1.) configuring the filesystem with write permissions or 2.) specifying a local path (e.g., ./local.fasta.fai
). My only concern is that it could get challenging to handle multiple file objects.
Ah yeah - I forgot to submit 😅 |
@ap-- I've merged these changes and plan to release a new v0.6.5 package asap. One thing I forgot to request: can you please provide an example in the Readme.rst demonstrating usage of fsspec with this package? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! This looks great and like it would cover 95% of my use cases.
I think it would be useful to have the ability to write an index locally if the fsspec.filesystem
is not writeable and one would need to (re)build an index. E.g,.
import fsspec
from pyfaidx import Fasta
fileobj = fsspec.open('http://example.com/data.fasta')
fasta = Fasta(fileobj, index='./data.fasta.fai')
However, there are likely some edge cases that further need to be considered and I don't think its worth blocking this PR.
"Cannot read FASTA file %s" % filename) | ||
|
||
self.indexname = filename + '.fai' | ||
self.indexname = self.filename + '.fai' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case the remote is readonly, it would be nice if the user could specify a local path for the index file.
I ran into this when experimenting with an implementation a little while ago. It's hard to know what the "correct" behavior should be. I guess one option would be to have an index_filename
kwarg in the constructor, which could similarly be a fsspec.OpenFile
or str
and would specify explicitly where to look for (or create) an index.
So the default would be to read and write index adjacent to filename
in whatever filesystem is provided. If filename
is readonly, then we could raise an exception, and the user could fix either by 1.) configuring the filesystem with write permissions or 2.) specifying a local path (e.g., ./local.fasta.fai
). My only concern is that it could get challenging to handle multiple file objects.
Oh oops, didn't see this got merged while I looked it over. Nice work, and thanks! |
@manzt I do like the idea of adding an |
Thanks for merging ❤️ Cheers, |
Don't hold your breath on #192 😄. It's more complicated than I thought. If you want to take a stab at it be my guest. Otherwise, some documentation for this PR in the README will suffice, and then I'll tag a new release. |
Hi @mdshw5,
I saw that there already was an attempt at implementing
fsspec
support and that the PR is basically stale.So here's my attempt at implementing fsspec support:
This closes #165 and replaces #168
I explicitly tried to keep the required changes to a minimum, to simplify review.
This follows @manzt recommendations and accepts
fsspec.core.OpenFile
objects.This can still be done cleaner by refactoring more of
Faidx.__init__
Tests all pass and I added a single
fsspec
test. Let me know what the best tests would be for covering the base functionality of this library to testfsspec
support.Cheers,
Andreas 😃