only write a specific namespace version if it does not exist #346

ajtritt · 2020-05-12T23:59:43Z

Motivation

Specs were being cached to a file even if they were already cached

help fix NeurodataWithoutBorders/pynwb#1228
fix #350

How to test the behavior?

import hdmf

fname = 'ANY_HDMF_FILE.h5'

with hdmf.HDF5IO(fname, 'a') as io:
    f = io.read()
    io.write(f)

rly · 2020-05-13T00:03:38Z

If the new group has the same name as the old group, but the dataset contents are different, then I think a warning should be raised, since that could happen if someone is using a schema that is improperly versioned. i.e., '1.0.0' refers to two different schema.

ajtritt · 2020-05-13T00:16:35Z

@rly so, check the contents of every dataset, and warn if they are different? This would happen every time a file is appended to...

src/hdmf/common/table.py

rly · 2020-05-13T06:04:16Z

@ajtritt sorry, maybe I wasn't clear. I mean if the group specifications/[namespace name]/[namespace version] exists, then read the spec datasets and the namespace datasets inside of it and compare them to the values that would have been written if the group didn't exist. If they differ, raise a warning. On second thought, though, this might be better to do on read, rather than on write.

rly · 2020-05-13T06:05:09Z

Please add tests but this is otherwise good to go

ajtritt · 2020-05-13T15:39:30Z

@rly I'm not sure we want to do that. This workflow--1) Users loads file with cached spec. 2) User then appends to file 3) Users writes file, caching spec--would be doing a lot of unnecessary checking. We would be checking cached data to see if it matches in-memory data that was read in from that very same cached data. To avoid that, we would need to track the source of namespace that exists in memory. If an in-memory namespace has the same version of a cached namespace, AND the in-memory namespace was read in, then we don't check. Otherwise, we check all the data. If we go that route, then I don't see the point of having a version namespace. If we assume versions can be flexible (i.e. the relationship between namespace version and namespace specification contents is not one-to-one), then there's not really a point in version. We have to revert to calling the contents of the namespace the version.

Please add tests but this is otherwise good to go

This code is already hit by existing tests.

rly · 2020-05-13T20:55:41Z

I agree - this would be a lot of checking, and yeah we would want to add code to check whether the in-memory namespace was read in or loaded from elsewhere, but it would help catch cases where a developer does not update their version number before release.

I have already seen this to some degree in the NDX Catalog, where I request a change in an extension and the developer makes the change, but does not update the version. I can easily see it happening for a user who creates their own extension, writes a file with cached spec, updates their extension locally without changing the version number (let's say to add a new type), appends to the file, and wonders why their new type isn't written. It's not critical to have the check, and it's the extension developer's fault, but it could help protect users. I think the computational cost of the check would be relatively low. In any case, I think this can be solved in a subsequent PR.

ajtritt · 2020-05-13T21:10:45Z

@rly

I have already seen this to some degree in the NDX Catalog, where I request a change in an extension and the developer makes the change, but does not update the version.

maybe we should check for that and prevent it somehow. My concern is that if the version-spec relationship is not 1-1, then how do we know which spec is the right one?

Also, do you know why flake8 is now complaining about using l as a function argument name? It's flagging code that was written years ago...

oruebel · 2020-05-13T21:12:29Z

I have already seen this to some degree in the NDX Catalog

For the catalog we could make this part of the CI since in the majority of cases and update there is going to go along with an update of the version number of the extension.

oruebel · 2020-05-13T21:13:19Z

Also, do you know why flake8 is now complaining

Did maybe the version of flake8 change?

rly · 2020-05-13T21:48:45Z

My concern is that if the version-spec relationship is not 1-1, then how do we know which spec is the right one?

We don't -- I suggest just raising a warning that says the cached spec does not match the loaded spec with the same name and version.

For the catalog, yes, we can check for this. I worry about the homebrew extension case that never makes it to the catalog.

re: flake8, probably the version changed and was updated to catch this: https://www.flake8rules.com/rules/E741.html

codecov · 2020-05-15T01:57:39Z

Codecov Report

Merging #346 into dev will increase coverage by 0.00%.
The diff coverage is 68.18%.

@@           Coverage Diff           @@
##              dev     #346   +/-   ##
=======================================
  Coverage   73.51%   73.51%           
=======================================
  Files          33       33           
  Lines        6535     6537    +2     
  Branches     1424     1425    +1     
=======================================
+ Hits         4804     4806    +2     
  Misses       1306     1306           
  Partials      425      425

Impacted Files	Coverage Δ
src/hdmf/common/table.py	`69.50% <0.00%> (ø)`
src/hdmf/container.py	`72.75% <60.00%> (ø)`
src/hdmf/build/builders.py	`86.68% <66.66%> (ø)`
src/hdmf/backends/hdf5/h5tools.py	`66.57% <80.00%> (+0.08%)`	⬆️
src/hdmf/backends/hdf5/h5_utils.py	`66.42% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bd2b005...ee78c9d. Read the comment docs.

only write a specific namespace version if it does not exist

e7b195c

ajtritt requested a review from rly May 12, 2020 23:59

oruebel previously approved these changes May 13, 2020

View reviewed changes

ajtritt marked this pull request as ready for review May 13, 2020 00:11

fix flake8 errors

34473f7

ajtritt dismissed oruebel’s stale review via 34473f7 May 13, 2020 00:21

rly reviewed May 13, 2020

View reviewed changes

src/hdmf/common/table.py Show resolved Hide resolved

fix flake8 error

e84a0b8

ajtritt mentioned this pull request May 15, 2020

Fix new flake8 errors #351

Closed

fix flake8 errors

ee78c9d

oruebel approved these changes May 15, 2020

View reviewed changes

rly approved these changes May 15, 2020

View reviewed changes

ajtritt merged commit 7f7fa45 into dev May 15, 2020

rly deleted the help/mies_append branch May 15, 2020 17:15

wbwakeman mentioned this pull request May 18, 2020

passes cache_spec=False to nwb_io.write() to avoid spec writer shape … AllenInstitute/ipfx#406

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

only write a specific namespace version if it does not exist #346

only write a specific namespace version if it does not exist #346

ajtritt commented May 12, 2020 •

edited

Loading

rly commented May 13, 2020

ajtritt commented May 13, 2020

rly commented May 13, 2020

rly commented May 13, 2020

ajtritt commented May 13, 2020

rly commented May 13, 2020

ajtritt commented May 13, 2020

oruebel commented May 13, 2020

oruebel commented May 13, 2020

rly commented May 13, 2020

codecov bot commented May 15, 2020 •

edited

Loading

only write a specific namespace version if it does not exist #346

only write a specific namespace version if it does not exist #346

Conversation

ajtritt commented May 12, 2020 • edited Loading

Motivation

How to test the behavior?

rly commented May 13, 2020

ajtritt commented May 13, 2020

rly commented May 13, 2020

rly commented May 13, 2020

ajtritt commented May 13, 2020

rly commented May 13, 2020

ajtritt commented May 13, 2020

oruebel commented May 13, 2020

oruebel commented May 13, 2020

rly commented May 13, 2020

codecov bot commented May 15, 2020 • edited Loading

Codecov Report

ajtritt commented May 12, 2020 •

edited

Loading

codecov bot commented May 15, 2020 •

edited

Loading