Empty unlimited chunked variables cause crash #67

bnlawrence · 2025-01-07T08:06:12Z

If ones has created a variable which is intended to be chunked, but it is currently empty, when the file is read, we get a stack dump that ends with this:

    def _read_node_header(self, offset, node_level):
        """ Return a single node header in the b-tree located at a give offset. """
>       self.fh.seek(offset)
E       ValueError: cannot fit 'int' into an offset-sized integer

This pull request includes code to create a file which manifests the problem, a test to expose it, and a fix.

… in chunked files

bmaranville · 2025-01-07T15:24:11Z

pyfive/dataobjects.py

@@ -480,6 +480,10 @@ def _get_contiguous_data(self, property_offset):

    def _get_chunked_data(self, offset):
        """ Return data which is chunked. """
+


I think this will work - it is also possible to test if the chunk address is UNDEFINED_ADDRESS, which will happen when no data has been written to the Dataset yet (see change in usnistgov/jsfive@f228420 , which I should have backported to pyfive)

EDIT - I think maybe the test for UNDEFINED_ADDRESS is important here because sometimes you encounter datasets with non-zero shapes but which have not been written yet (initializing a dataset and writing data to it are two separate steps).

That seems sensible. At this point I'm not minded to follow through fixing this here, as where it gets done in the new H5D.py will be slightly different. What I have there is:

# look out for an empty dataset, which will have no btree if np.prod(self.shape) == 0 or dataobject._chunk_address == UNDEFINED_ADDRESS: self._index = {} return

(This is in the context of caching the b-tree when we instantiate a DatasetID, which we do when we create a variable instance with eg. `x=myfile['variable']. We do that at this point so that all threads in a thread pool have their b-tree before they get going on their bit of work.)

…jhelmus#67 and jjhelmus#66

Bryan Lawrence added 5 commits January 7, 2025 08:01

test that exposes issue with empty variable with unlimited dimensions…

669bf41

… in chunked files

Fix for the empty unlimited netcdf issue

eb23817

upstream testing needs the actual data file

c920554

Still missing test files

9459e8a

Still missing files

ee9fc50

bmaranville reviewed Jan 7, 2025

View reviewed changes

bnlawrence pushed a commit to NCAS-CMS/pyfive that referenced this pull request Jan 15, 2025

Minor changes which come from upstream advice on my two pull requests j…

c4a38b9

…jhelmus#67 and jjhelmus#66

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Empty unlimited chunked variables cause crash #67

Empty unlimited chunked variables cause crash #67

bnlawrence commented Jan 7, 2025

bmaranville Jan 7, 2025 •

edited

Loading

bnlawrence Jan 15, 2025

		@@ -480,6 +480,10 @@ def _get_contiguous_data(self, property_offset):

		def _get_chunked_data(self, offset):
		""" Return data which is chunked. """

Empty unlimited chunked variables cause crash #67

Are you sure you want to change the base?

Empty unlimited chunked variables cause crash #67

Conversation

bnlawrence commented Jan 7, 2025

bmaranville Jan 7, 2025 • edited Loading

Choose a reason for hiding this comment

bnlawrence Jan 15, 2025

Choose a reason for hiding this comment

bmaranville Jan 7, 2025 •

edited

Loading