-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify data access using pyfilesystem #520
Conversation
|
||
def remove(self, target, recursive=False, confirm=True): | ||
"""Wrapper around rm command | ||
def copy(self, src, dest): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we rename src
to path
? It is more intuitive and this is what is used in the _check_file_exists
function
""" | ||
raise NotImplementedError | ||
if self.fs.exists(dest) and self.fs.isdir(dest): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understand, self.fs
is only define in the LocalDataAccess
class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It also refers to the property defined in SSHDataAccess
. So it will be defined on any subclass instance, although not enforced by the base class (we could use the abc module to do this at some point).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. What would happen if someone instantiate the DataClass
and call the copy
method? Would the the abc
module prevent that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it would raise an error in that case. No one should be instantiating DataAccess
directly though. I believe using abc
would prevent that.
a4d7ca9
to
2d9b0ed
Compare
157862a
to
9e61fb4
Compare
@jon-hagg, the code looks good. Did you test it in the containerized framework? Should we test it in the server framework (@danielolsen, @BainanXia)? |
I will test it again today with the latest changes. |
"original": posixpath.join(self.root, new_name), | ||
"updated": posixpath.join(self.root, backup), | ||
"lockfile": posixpath.join(self.root, "scenario.lockfile"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be something that needs to be changed via os
so that our client and server setup will support non-linux servers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah there are several things we'd need to change in that case. As far as I know, non linux servers aren't anywhere on our roadmap.
errors = stderr.readlines() | ||
if len(errors) > 0: | ||
raise IOError(f"Failed to create {full_path} on server") | ||
class MemoryDataAccess(SSHDataAccess): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we put down some doc strings here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've read through the code and learned a lot. This is definitely much cleaner. Thanks for the effort.
Purpose
Consolidate data access code by using the api provided by pyfilesystem which provides consistency across local, ssh, in memory, and future (blob storage) backed storage.
What the code is doing
The first group of commits is using fsspec to achieve the same goal, but then is replaced by pyfilesystem which I think matches our usage better.
We define a
self.fs
property on eachDataAccess
subclass which hold the filesystem object that does most of the work. Additionally,SSHDataAccess
also has aself.local_fs
, and all file based operations are done through a pyfilesystem object, using pyfilesystem paths which are internally converted to the correct platform. VariousDataAccess
methods are either replaced or simplified using pyfilesystem methods. To preserve progress bars and the current checksum functionality, we customize the defaultSSHFS
instance by wrapping it in aSubFS
(pyfilesystem builtin meant for this purpose).Testing
I've done testing throughout the process, but need to run through them again with the latest changes. I will update this accordingly. It will include creating and launching a scenario over ssh, delete/move states, etc.
Where to look
There is a lot, but mostly originating with changes to
data_access.py
. The tests also now use an in memory filesystem which simplifies the setup, and might enable increasing coverage further.Time estimate
1+ hour(s). We should review in person