-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement the H5StoreManager class. #129
Conversation
@glotzerlab/signac-maintainers This is ready for review. |
Manages multiple instances of H5Store within a specific directory. Exposed as job.stores and project.stores.
Register returned dicts in internal registry to ensure persistence of returned dicts over the lifetime of the manager instance.
In addition to testing 'job.data', we also test 'job.stores.test'.
This patch fixes an issue that required users to explicitly hold a reference to each store returned from `job.stores`, otherwise the returned container would be immediatley deleted prior to use.
d9f8cb5
to
ed4ddde
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I don't work with the H5Store enough to approve the PR but looking over it I don't see anything obviously wrong. We should make sure to update glotzerlab/signac-docs#6 in light of the changes made here.
Thank you, that's a good point and I will go through the docs before merging. |
I will review this tomorrow, trying to finish up another task before I start going back through signac PRs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a pretty minimal set of changes to enable the new use-case, I like it. Just a couple small things and we're good to go.
pass | ||
raise error | ||
else: | ||
del self._dict_registry[key] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't follow why this is necessary. The file corresponding to key
was successfully created if we reach this point, why does it need to be removed from the tracked list?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not really necessary, I figured it might be a good idea to replace any references of "old" H5Store under the same key, but there is virtually no difference. I'll remove that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns out this is actually necessary. I tried to delete this line and the test_assign
and test_assign_data
unit tests started to fail.
for fn in os.listdir(self.prefix): | ||
m = re.match('(.*){}'.format(self.suffix), fn) | ||
if m: | ||
yield m.groups()[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using re
seems like overkill here, I'd switch this to using glob.glob
: yield from glob.glob("*.{}".format(self.suffix))
(without using yield from
for py2 compatibility).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't directly extract the key name with glob
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose you'd still have to do something like ...replace(self.suffix, '')
to get just the key without the extension, that's all that you're referring to right? I personally think that's still clearer than using a regex since you're really just parsing out an extension from a set of filenames, but I'm OK with leaving it as is if you prefer since it doesn't simplify the code that much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just tried it. I would also need to strip off the directory name, so I'm sticking to the original implementation.
However, should we explicitly skip hidden h5-files?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that's necessary. I view creating and editing hidden hd5 files as a feature, not a bug. If users explicitly access a hidden file I think it's reasonable to expect that they know what they're doing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vyasr Thank you for your review. I'll address any required code changes tomorrow.
pass | ||
raise error | ||
else: | ||
del self._dict_registry[key] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not really necessary, I figured it might be a good idea to replace any references of "old" H5Store under the same key, but there is virtually no difference. I'll remove that.
for fn in os.listdir(self.prefix): | ||
m = re.match('(.*){}'.format(self.suffix), fn) | ||
if m: | ||
yield m.groups()[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't directly extract the key name with glob
.
self.assertIn(key, job.stores.test) | ||
self.assertEqual(job.stores.test[key], d) | ||
self.assertEqual(job.stores.test.get(key), d) | ||
self.assertEqual(job.stores.test.get('bs', d), d) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest another word like 'nonexistent_key'
so it's more clear. Do a find/replace because this re-occurs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ummm, yes. Oops. :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to fix that in a separate commit, because there are a bunch of other occurrences of the same pattern in other test modules.
@vyasr I've addressed all of your comments. |
Approving pending the change that Bradley requested. |
Description
Introduces the
DictManager
and theH5StoreManager
classes that are exposed asjob.stores
andproject.stores
.Motivation and Context
It would be highly advantageous to enable users to easily manage multiple HDF5-files instead of just one. There are plenty of reasons why that is a good idea:
h5repack
).rsync
or GLOBUS will have to transfer the complete file. Furthermore, all data stored within one large file share one modification time stamp, making fine-grained synchronization even more difficult. Finally, synchronization of parts of a file are impossible.Types of Changes
This pull request is based on
1The change breaks (or has the potential to break) existing functionality.
Checklist:
If necessary: