Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify data access using pyfilesystem #520

Merged
merged 33 commits into from
Aug 24, 2021
Merged

Simplify data access using pyfilesystem #520

merged 33 commits into from
Aug 24, 2021

Conversation

jenhagg
Copy link
Collaborator

@jenhagg jenhagg commented Jul 8, 2021

Purpose

Consolidate data access code by using the api provided by pyfilesystem which provides consistency across local, ssh, in memory, and future (blob storage) backed storage.

What the code is doing

The first group of commits is using fsspec to achieve the same goal, but then is replaced by pyfilesystem which I think matches our usage better.

We define a self.fs property on each DataAccess subclass which hold the filesystem object that does most of the work. Additionally, SSHDataAccess also has a self.local_fs, and all file based operations are done through a pyfilesystem object, using pyfilesystem paths which are internally converted to the correct platform. Various DataAccess methods are either replaced or simplified using pyfilesystem methods. To preserve progress bars and the current checksum functionality, we customize the default SSHFS instance by wrapping it in a SubFS (pyfilesystem builtin meant for this purpose).

Testing

I've done testing throughout the process, but need to run through them again with the latest changes. I will update this accordingly. It will include creating and launching a scenario over ssh, delete/move states, etc.

Where to look

There is a lot, but mostly originating with changes to data_access.py. The tests also now use an in memory filesystem which simplifies the setup, and might enable increasing coverage further.

Time estimate

1+ hour(s). We should review in person

@jenhagg jenhagg self-assigned this Jul 8, 2021
@jenhagg jenhagg added dependencies Pull requests that update a dependency file refactor Code that is being refactored labels Jul 8, 2021

def remove(self, target, recursive=False, confirm=True):
"""Wrapper around rm command
def copy(self, src, dest):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we rename src to path? It is more intuitive and this is what is used in the _check_file_exists function

"""
raise NotImplementedError
if self.fs.exists(dest) and self.fs.isdir(dest):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand, self.fs is only define in the LocalDataAccess class

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also refers to the property defined in SSHDataAccess. So it will be defined on any subclass instance, although not enforced by the base class (we could use the abc module to do this at some point).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. What would happen if someone instantiate the DataClass and call the copy method? Would the the abc module prevent that?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it would raise an error in that case. No one should be instantiating DataAccess directly though. I believe using abc would prevent that.

@jenhagg jenhagg force-pushed the jon/fs2 branch 2 times, most recently from a4d7ca9 to 2d9b0ed Compare July 27, 2021 03:56
@jenhagg jenhagg force-pushed the jon/fs2 branch 2 times, most recently from 157862a to 9e61fb4 Compare July 28, 2021 21:12
@rouille
Copy link
Collaborator

rouille commented Jul 29, 2021

@jon-hagg, the code looks good. Did you test it in the containerized framework? Should we test it in the server framework (@danielolsen, @BainanXia)?

@jenhagg
Copy link
Collaborator Author

jenhagg commented Jul 29, 2021

I will test it again today with the latest changes.

Comment on lines +297 to +299
"original": posixpath.join(self.root, new_name),
"updated": posixpath.join(self.root, backup),
"lockfile": posixpath.join(self.root, "scenario.lockfile"),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be something that needs to be changed via os so that our client and server setup will support non-linux servers?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah there are several things we'd need to change in that case. As far as I know, non linux servers aren't anywhere on our roadmap.

errors = stderr.readlines()
if len(errors) > 0:
raise IOError(f"Failed to create {full_path} on server")
class MemoryDataAccess(SSHDataAccess):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we put down some doc strings here?

Copy link
Collaborator

@BainanXia BainanXia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've read through the code and learned a lot. This is definitely much cleaner. Thanks for the effort.

@jenhagg jenhagg merged commit 176f8b8 into develop Aug 24, 2021
@jenhagg jenhagg deleted the jon/fs2 branch August 24, 2021 20:55
@ahurli ahurli mentioned this pull request Oct 4, 2021
1 task
kasparm pushed a commit that referenced this pull request Oct 23, 2021
kasparm pushed a commit that referenced this pull request Oct 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependencies Pull requests that update a dependency file refactor Code that is being refactored
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants