Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move written flag from builder to HDF5IO, add tests #381

Merged
merged 8 commits into from
Jun 16, 2020
Merged

Conversation

rly
Copy link
Contributor

@rly rly commented Jun 16, 2020

Motivation

Required for the export PR: #326.

Currently, Builder objects have a field written that is used only by HDF5IO to mark whether the Builder object has been written to disk so that it does not get written multiple times in case of links and references. Builder objects should contain minimal state information such as whether it has been written, and this should really be information stored in the IO object. Refactoring to make HDF5IO track whether a builder has been written by that object will allow us to use a different HDF5IO instance to write the same builder to a different file.

Note:

  • HDF5IO.__set_written is private because no other class should be able to change whether a builder has been written by this IO object. HDF5IO.get_written is public because other backends and code may want to be able to know whether a builder has been written to disk, but this could be made private.
  • I replaced the test test_double_cache_spec with a new set of tests for writing/appending to check that the written flag is used correctly.

Close #380.

How to test the behavior?

See tests.

Checklist

  • Have you checked our Contributing document?
  • Have you ensured the PR clearly describes the problem and the solution?
  • Is your contribution compliant with our coding style? This can be checked running flake8 from the source directory.
  • Have you checked to ensure that there aren't other open Pull Requests for the same change?
  • Have you included the relevant issue number using "Fix #XXX" notation where XXX is the issue number? By including "Fix #XXX" you allow GitHub to close issue #XXX when the PR is merged.

@rly rly requested review from ajtritt and oruebel June 16, 2020 07:10
@@ -884,10 +913,10 @@ def _filler():
# NOTE: we can ignore options['io_settings'] for scalar data
elif self.__is_ref(options['dtype']):
_dtype = self.__dtypes.get(options['dtype'])
self.__set_written(builder)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all branches of the if statement below had builder.written = True so I moved it to before the if statement

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The written flag should be set after the if/elif statements since that is when the state is modified. E.g., if an error occurs in require_dataset then the builder is falsely recorded as written.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that makes sense.

@codecov
Copy link

codecov bot commented Jun 16, 2020

Codecov Report

Merging #381 into hdmf_2.0 will not change coverage.
The diff coverage is 72.00%.

Impacted file tree graph

@@            Coverage Diff            @@
##           hdmf_2.0     #381   +/-   ##
=========================================
  Coverage     74.55%   74.55%           
=========================================
  Files            33       33           
  Lines          6633     6633           
  Branches       1447     1446    -1     
=========================================
  Hits           4945     4945           
  Misses         1279     1279           
  Partials        409      409           
Impacted Files Coverage Δ
src/hdmf/build/builders.py 86.30% <ø> (-0.39%) ⬇️
src/hdmf/backends/hdf5/h5tools.py 67.56% <72.00%> (+0.37%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3fa0587...b586b41. Read the comment docs.

@rly
Copy link
Contributor Author

rly commented Jun 16, 2020

Also consider the use case where a user writes an in-memory container to a file, then adds to the in-memory container, then writes the updated in-memory container to the same file using the following code:

    def test_write_append_unwritten(self):
        """Test writing a container, adding to the in-memory container, then writing it again in append mode."""
        with HDF5IO(self.path, manager=self.manager, mode='w') as io:
            io.write(self.foofile)

        # append new container
        foo3 = Foo('foo3', [10, 20], "I am foo3", 2, 0.1)
        new_bucket1 = FooBucket('new_bucket1', [foo3])
        self.foofile.buckets.append(new_bucket1)
        new_bucket1.parent = self.foofile

        # write to same file with same manager in APPEND mode
        with HDF5IO(self.path, manager=self.manager, mode='a') as io:
            io.write(self.foofile)

Previously this would succeed because the builder for self.foofile was already marked as written. Now it will fail with the h5py error ValueError: Unable to create group (name already exists) because the second IO object does not know that a group for self.foofile already exists in the file.

Although possibly confusing, I think this is OK behavior. Users should NOT use the above code to write an in-memory container and then append to it. Instead, they should do one of the following:

  1. write the in-memory container only once after everything has been added,
  2. on the second write, open the file in 'w' mode to overwrite the existing contents, or
  3. open the file in 'a' mode after the first write, append to the read container, then call io.write

@rly rly changed the base branch from dev to hdmf_2.0 June 16, 2020 07:47
@rly rly added this to the HDMF 2.0 milestone Jun 16, 2020
builder_id = self.__builderhash(builder)
return self._written_builders.get(builder_id, False)

def __builderhash(self, obj):
Copy link
Contributor

@ajtritt ajtritt Jun 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same function is implemented here:

def __bldrhash__(self, obj):
return id(obj)

Not sure if it should be generalized, by just adding __hash__ to Builder. It seems simple enough that its not a big deal to leave as is, but thought I would mention it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's where I got it from. I figured there might have been a reason that it was on the BuildManager rather than on Builder in the first place, e.g., maybe builders should by default have a hash that is based on its contents, so I left it and re-implemented it here.

@rly
Copy link
Contributor Author

rly commented Jun 16, 2020

Note that since this technically includes a breaking change, this PR is being merged to a branch for HDMF 2.0, not dev.

@rly rly merged commit 4739b5a into hdmf_2.0 Jun 16, 2020
@rly rly deleted the enh/change_written branch June 16, 2020 17:39
@rly rly mentioned this pull request Jul 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants