[WIP] Fix append to dynamic table #920

rly · 2019-05-08T20:53:11Z

Fixes #918 which was actually a deeper issue: it was not possible to append to any DynamicTable that was read from a file.

Let me know if there is a more elegant way to handle:

def append(self, arg):
        if isinstance(self.data, HDMFDataset) or isinstance(self.data, Dataset):
            self.__data = self.data[()]

I also added tests to act on containers after checking that what was written equals what was read.

…into fix_dyntable_append

codecov · 2019-05-08T21:03:19Z

Codecov Report

Merging #920 into dev will decrease coverage by 0.05%.
The diff coverage is 44.44%.

@@            Coverage Diff             @@
##              dev     #920      +/-   ##
==========================================
- Coverage   71.16%   71.11%   -0.06%     
==========================================
  Files          37       37              
  Lines        2792     2797       +5     
  Branches      554      556       +2     
==========================================
+ Hits         1987     1989       +2     
  Misses        679      679              
- Partials      126      129       +3

Impacted Files	Coverage Δ
src/pynwb/file.py	`71.03% <100%> (ø)`	⬆️
src/pynwb/core.py	`73.3% <37.5%> (-0.2%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 962b91e...d0a6088. Read the comment docs.

bendichter · 2019-05-08T21:57:48Z

Is there any way to do this without reading the entire dataset into memory?

ajtritt · 2019-05-08T22:12:15Z

tests/integration/ui_write/base.py

+            except Exception as e:
+                self.reader.close()
+                self.reader = None
+                raise e


will this cause test_roundtrip to get skipped if actOnContainer is not implemented?

ajtritt · 2019-05-08T22:13:58Z

tests/integration/ui_write/base.py

@@ -194,6 +202,10 @@ def getContainer(self, nwbfile):
        ''' Should take an NWBFile object and return the Container'''
        raise unittest.SkipTest('Cannot run test unless getContainer is implemented')

+    def actOnContainer(self, nwbfile):


Is the point of this to provide the ability to do something to a container after roundtripping?

ajtritt · 2019-05-08T22:16:51Z

src/pynwb/core.py

@@ -318,6 +319,8 @@ def __getitem__(self, args):
        return self.data[args]

    def append(self, arg):
+        if isinstance(self.data, HDMFDataset) or isinstance(self.data, Dataset):
+            self.__data = self.data[()]


You could avoid reading the dataset into memory by reshaping the dataset, and then adding the new data.

ajtritt

@rly Just a few questions

luiztauffer · 2019-05-09T10:18:26Z

Hi @rly , thanks for the fix!
I’m working on a project where it would be desirable not only to append, but also remove elements. What’s the best way of allowing more freedom of manipulation by the user?

oruebel · 2019-05-09T10:47:30Z

@luiztauffer could you elaborate a bit more on the specific use-case you are working with that requires removal of rows?

luiztauffer · 2019-05-09T11:26:51Z

@luiztauffer could you elaborate a bit more on the specific use-case you are working with that requires removal of rows?

@oruebel
I’m working on a GUI for manual marking of invalid intervals (or any other classification for intervals of interest) for multiple time series signals, all within a NWB file. Many times the identification of events on these time series is not a clear cut decision and, even when it is, the user might mark and save a wrongly assigned time interval, and wish to remove it later.

The current version of pynwb does not allow for directly setting already defines fields. E.g. I tried to directly assign a new field:
nwb2.invalid_times = TimeIntervals(‘invalid_times’,’time intervals to be removed’)
which raises the error:
AttributeError: can’t set attribute ‘invalid_times’ – already set
from core.py, line 134

I just comment out this line of code it and can now update any fields I want. Any particular reason why this direct update of fields is forbidden on pynwb?

rly · 2019-05-09T21:40:33Z

I agree that modification of data already written to file is an important and useful feature, especially for the invalid trial times. This is also relevant when intermediate data is stored. However, this is a complicated issue that we need to discuss as a team and with users. Currently dataset modification is not really supported except for probably a few edge cases. So even though you can update fields by commenting out that line of code, I think with the current code, when you go to write the NWBFile, your change would not be written to disk.

One way to implement this is anytime a user wants to alter an existing dataset (add, remove, modify), then PyNWB would read the entire original dataset and alter it. Writing the changes to disk would involve writing the entire modified dataset.

You could currently do this yourself by reading the entire dataset into memory, making changes, reading all other data, and then writing a brand new NWBFile, but it would be nice to have this functionality built-in. We would just have to be very explicit about it because of the potentially high computing cost.

luiztauffer · 2019-05-10T09:36:24Z

@rly
For small files that maybe would do, but for data over GBs it’s certainly not a good option.
I understand where the concerns might come from and possibly you have discussed it all already. But how about a security check, maybe setting an argument, overwrite=True?
Any data versioning functionality on the future plans?

oruebel · 2019-05-10T11:49:10Z

@luiztauffer For data arrays that are being read lazily you can do updates right now already but that is limited to large data arrays (and updates are immediate). Enabling update of files directly (without making full copies) should be doable but will require tracking of which fields have been updated. Currently this is done on a per-container basis but not yet on a per-field basis.

Any data versioning functionality on the future plans?

If you mean allowing users to assign a version number to a file, that is certainly doable. However, more general version control and journaling of data are not on the current development plan right now. Doing versioning of files as a whole (i.e., storing full versions of a file) is problematic due to the large size of the data (and is something that on could easily do themselves). Doing journaling on a per-field basis where each attribute, dataset etc. is journaled and versioned independently is very involved and would require us rolling out our own solutions, because none of the existing file standards support this. In general, this sort of unctionality is more the regime of the storage backend, rather than NWB:N or specific API for NWB:N. E.g., one could imagine creating a database-based FORMIO backend for NWB:N.

bendichter · 2019-05-10T16:50:41Z

Yes, built-in versioning would be pretty cool, but as @oruebel points out it would be very involved. Actually, Gigantum does something like this and might be of interest for users wanting version control of big data files.

But let's get back to the issue of altering an existing written dataset. I think our best options is altering the data by accessing the data directly on disk via h5py Datasets. e.g. nwb.acquisition.t_series.data[:] = [1., 2., 3.]. This deviates from the normal pynwb workflow, because it changes the values immediately and does not require a write command. This also only currently works if the size of the new data is the same as the size of the old data, so it would not work in the case of append or remove currently. In order to get append to work we'll need to have a maxshape parameter that accommodates the growing array.

rly · 2019-10-07T17:13:26Z

This PR has been superseded by hdmf-dev/hdmf#161. Discussions are still relevant for #1067, however.

rly added 4 commits May 8, 2019 12:07

Add handling for h5py Dataset for NWBData types

e6f147e

Add missing docval(?)

ba162de

Merge branch 'dev' of https://github.com/NeurodataWithoutBorders/pynwb …

e0881d9

…into fix_dyntable_append

Add support for stored HDMFDataset, add integration tests

06f217f

rly requested a review from ajtritt May 8, 2019 20:53

ajtritt reviewed May 8, 2019

View reviewed changes

rly changed the title ~~Fix append to dynamic table~~ [WIP] Fix append to dynamic table Jul 30, 2019

Merge branch 'dev' into fix_dyntable_append

d0a6088

rly mentioned this pull request Sep 16, 2019

Make all DynamicTables chunked and resizable by default #1067

Open

rly closed this Oct 7, 2019

rly deleted the fix_dyntable_append branch October 7, 2019 17:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Fix append to dynamic table #920

[WIP] Fix append to dynamic table #920

rly commented May 8, 2019

codecov bot commented May 8, 2019 •

edited

Loading

bendichter commented May 8, 2019

ajtritt May 8, 2019 •

edited

Loading

ajtritt May 8, 2019

ajtritt May 8, 2019

ajtritt left a comment

luiztauffer commented May 9, 2019

oruebel commented May 9, 2019

luiztauffer commented May 9, 2019

rly commented May 9, 2019

luiztauffer commented May 10, 2019

oruebel commented May 10, 2019

bendichter commented May 10, 2019

rly commented Oct 7, 2019

[WIP] Fix append to dynamic table #920

[WIP] Fix append to dynamic table #920

Conversation

rly commented May 8, 2019

codecov bot commented May 8, 2019 • edited Loading

Codecov Report

bendichter commented May 8, 2019

ajtritt May 8, 2019 • edited Loading

Choose a reason for hiding this comment

ajtritt May 8, 2019

Choose a reason for hiding this comment

ajtritt May 8, 2019

Choose a reason for hiding this comment

ajtritt left a comment

Choose a reason for hiding this comment

luiztauffer commented May 9, 2019

oruebel commented May 9, 2019

luiztauffer commented May 9, 2019

rly commented May 9, 2019

luiztauffer commented May 10, 2019

oruebel commented May 10, 2019

bendichter commented May 10, 2019

rly commented Oct 7, 2019

codecov bot commented May 8, 2019 •

edited

Loading

ajtritt May 8, 2019 •

edited

Loading