Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

only write a specific namespace version if it does not exist #346

Merged
merged 4 commits into from
May 15, 2020

Conversation

ajtritt
Copy link
Contributor

@ajtritt ajtritt commented May 12, 2020

Motivation

Specs were being cached to a file even if they were already cached

help fix NeurodataWithoutBorders/pynwb#1228
fix #350

How to test the behavior?

import hdmf

fname = 'ANY_HDMF_FILE.h5'

with hdmf.HDF5IO(fname, 'a') as io:
    f = io.read()
    io.write(f)

@ajtritt ajtritt requested a review from rly May 12, 2020 23:59
oruebel
oruebel previously approved these changes May 13, 2020
@rly
Copy link
Contributor

rly commented May 13, 2020

If the new group has the same name as the old group, but the dataset contents are different, then I think a warning should be raised, since that could happen if someone is using a schema that is improperly versioned. i.e., '1.0.0' refers to two different schema.

@ajtritt ajtritt marked this pull request as ready for review May 13, 2020 00:11
@ajtritt
Copy link
Contributor Author

ajtritt commented May 13, 2020

@rly so, check the contents of every dataset, and warn if they are different? This would happen every time a file is appended to...

@rly
Copy link
Contributor

rly commented May 13, 2020

@ajtritt sorry, maybe I wasn't clear. I mean if the group specifications/[namespace name]/[namespace version] exists, then read the spec datasets and the namespace datasets inside of it and compare them to the values that would have been written if the group didn't exist. If they differ, raise a warning. On second thought, though, this might be better to do on read, rather than on write.

@rly
Copy link
Contributor

rly commented May 13, 2020

Please add tests but this is otherwise good to go

@ajtritt
Copy link
Contributor Author

ajtritt commented May 13, 2020

@rly I'm not sure we want to do that. This workflow--1) Users loads file with cached spec. 2) User then appends to file 3) Users writes file, caching spec--would be doing a lot of unnecessary checking. We would be checking cached data to see if it matches in-memory data that was read in from that very same cached data. To avoid that, we would need to track the source of namespace that exists in memory. If an in-memory namespace has the same version of a cached namespace, AND the in-memory namespace was read in, then we don't check. Otherwise, we check all the data. If we go that route, then I don't see the point of having a version namespace. If we assume versions can be flexible (i.e. the relationship between namespace version and namespace specification contents is not one-to-one), then there's not really a point in version. We have to revert to calling the contents of the namespace the version.

Please add tests but this is otherwise good to go

This code is already hit by existing tests.

@rly
Copy link
Contributor

rly commented May 13, 2020

I agree - this would be a lot of checking, and yeah we would want to add code to check whether the in-memory namespace was read in or loaded from elsewhere, but it would help catch cases where a developer does not update their version number before release.

I have already seen this to some degree in the NDX Catalog, where I request a change in an extension and the developer makes the change, but does not update the version. I can easily see it happening for a user who creates their own extension, writes a file with cached spec, updates their extension locally without changing the version number (let's say to add a new type), appends to the file, and wonders why their new type isn't written. It's not critical to have the check, and it's the extension developer's fault, but it could help protect users. I think the computational cost of the check would be relatively low. In any case, I think this can be solved in a subsequent PR.

@ajtritt
Copy link
Contributor Author

ajtritt commented May 13, 2020

@rly

I have already seen this to some degree in the NDX Catalog, where I request a change in an extension and the developer makes the change, but does not update the version.

maybe we should check for that and prevent it somehow. My concern is that if the version-spec relationship is not 1-1, then how do we know which spec is the right one?

Also, do you know why flake8 is now complaining about using l as a function argument name? It's flagging code that was written years ago...

@oruebel
Copy link
Contributor

oruebel commented May 13, 2020

I have already seen this to some degree in the NDX Catalog

For the catalog we could make this part of the CI since in the majority of cases and update there is going to go along with an update of the version number of the extension.

@oruebel
Copy link
Contributor

oruebel commented May 13, 2020

Also, do you know why flake8 is now complaining

Did maybe the version of flake8 change?

@rly
Copy link
Contributor

rly commented May 13, 2020

My concern is that if the version-spec relationship is not 1-1, then how do we know which spec is the right one?

We don't -- I suggest just raising a warning that says the cached spec does not match the loaded spec with the same name and version.

For the catalog, yes, we can check for this. I worry about the homebrew extension case that never makes it to the catalog.

re: flake8, probably the version changed and was updated to catch this: https://www.flake8rules.com/rules/E741.html

@ajtritt ajtritt mentioned this pull request May 15, 2020
@codecov
Copy link

codecov bot commented May 15, 2020

Codecov Report

Merging #346 into dev will increase coverage by 0.00%.
The diff coverage is 68.18%.

Impacted file tree graph

@@           Coverage Diff           @@
##              dev     #346   +/-   ##
=======================================
  Coverage   73.51%   73.51%           
=======================================
  Files          33       33           
  Lines        6535     6537    +2     
  Branches     1424     1425    +1     
=======================================
+ Hits         4804     4806    +2     
  Misses       1306     1306           
  Partials      425      425           
Impacted Files Coverage Δ
src/hdmf/common/table.py 69.50% <0.00%> (ø)
src/hdmf/container.py 72.75% <60.00%> (ø)
src/hdmf/build/builders.py 86.68% <66.66%> (ø)
src/hdmf/backends/hdf5/h5tools.py 66.57% <80.00%> (+0.08%) ⬆️
src/hdmf/backends/hdf5/h5_utils.py 66.42% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bd2b005...ee78c9d. Read the comment docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Flake8 errors Error updating specification of older files during pynwb.NWBHDF5IO.write()
3 participants