Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add parameter for changing index order #87

Merged
merged 21 commits into from
Apr 8, 2019
Merged
Show file tree
Hide file tree
Changes from 18 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
68 changes: 66 additions & 2 deletions docs/source/user-guide.rst
Original file line number Diff line number Diff line change
Expand Up @@ -182,6 +182,8 @@ Reading NRRD files
------------------
There are three functions that are used to read NRRD files: :meth:`read`, :meth:`read_header` and :meth:`read_data`. :meth:`read` is a convenience function that opens the file located at :obj:`filename` and calls :meth:`read_header` and :meth:`read_data` in succession and returns the data and header.

Reading NRRD files using :meth:`read` or :meth:`read_data` will by default return arrays being indexed in Fortran-order, i.e array elements are accessed by `data[x, y, z]`. This differs from the C-order where array elements are accessed by `data[z, y, x]`, which is the more common order in Python and libraries (e.g. NumPy, scikit-image, PIL, OpenCV). The :obj:`index_order` parameter can be used to specify which index ordering should be used on the returned array ('C' or 'F'). The :obj:`index_order` parameter needs to be consistent with the parameter of same name in :meth:`write`.

The :meth:`read` and :meth:`read_header` methods accept an optional parameter :obj:`custom_field_map` for parsing custom field types not listed in `Supported Fields`_ of the header. It is a :class:`dict` where the key is the custom field name and the value is a string identifying datatype for the custom field. See `Header datatypes`_ for a list of supported datatypes.

The :obj:`file` parameter of :meth:`read_header` accepts a filename or a string iterator object to read the header from. If a filename is specified, then the file will be opened and closed after the header is read from it. If not specifying a filename, the :obj:`file` parameter can be any sort of iterator that returns a string each time :meth:`next` is called. The two common objects that meet these requirements are file objects and a list of strings. The :meth:`read_header` function returns a :class:`dict` with the field and parsed values as keys and values to the dictionary.
Expand All @@ -194,12 +196,74 @@ Writing NRRD files
------------------
Writing to NRRD files can be done with the single function :meth:`write`. The :obj:`filename` parameter to the function specifies the absolute or relative filename to write the NRRD file. If the :obj:`filename` extension is .nhdr, then the :obj:`detached_header` parameter is set to true automatically. If the :obj:`detached_header` parameter is set to :obj:`True` and the :obj:`filename` ends in .nrrd, then the header file will have the same path and base name as the :obj:`filename` but with an extension of .nhdr. In all other cases, the header and data are saved in the same file.

The :obj:`data` parameter is a :class:`numpy.ndarray` of data to be saved. :obj:`header` is an optional parameter of type :class:`dict` containing the field/values to be saved to the NRRD file.
The :obj:`data` parameter is a :class:`numpy.ndarray` of data to be saved. :obj:`header` is an optional parameter of type :class:`dict` containing the field/values to be saved to the NRRD file.

Writing NRRD files will by default index the :obj:`data` array in Fortran-order where array elements are accessed by `data[x, y, z]`. This differs from C-order where array elements are accessed by `data[z, y, x]`, which is the more common order in Python and libraries (e.g. NumPy, scikit-image, PIL, OpenCV). The :obj:`index_order` parameter can be used to specify which index ordering should be used for the given :obj:`data` array ('C' or 'F').

.. note::

The following fields are automatically generated based on the :obj:`data` parameter ignoring these values in the :obj:`header`: 'type', 'endian', 'dimension', 'sizes'.

.. note::

The default encoding field used if not specified in :obj:`header` is 'gzip'.
simeks marked this conversation as resolved.
Show resolved Hide resolved
The default encoding field used if not specified in :obj:`header` is 'gzip'.

.. note::

addisonElliott marked this conversation as resolved.
Show resolved Hide resolved
The :obj:`index_order` parameter must be consistent with the index order specified in :meth:`read`. Reading an NRRD file in C-order and then writing as Fortran-order or vice versa will result in the data being transposed in the NRRD file.

Index Ordering
--------------
NRRD stores the image elements in [row-major ordering](https://en.wikipedia.org/wiki/Row-_and_column-major_order) where the row can be seen as the fastest-varying axis. The header fields of NRRD that describes the axes are always specified in the order from fastest-varying to slowest-varying, i.e., `sizes` will be equal to `(#rows, #columns)`. This is also applicable to images of higher dimensions.
simeks marked this conversation as resolved.
Show resolved Hide resolved

Both the reading and writing functions in pynrrd include an :obj:`index_order` parameter that is used to specify whether the returned data array should be in C-order ('C') or Fortran-order ('F').

Fortran-order is where the dimensions of the multi-dimensional data is ordered from fastest-varying to slowest-varying, i.e. in the same order as the header fields. So for a 3D set of data, the dimensions would be ordered `(x, y, z)`.

On the other hand, C-order is where the dimensions of the multi-dimensional data is ordered from slowest-varying to fastest-varying. So for a 3D set of data, the dimensions would be ordered `(z, y, x)`.

C-order is the index order used in Python and many Python libraries (e.g. NumPy, scikit-image, PIL, OpenCV). pynrrd recommends using C-order indexing to be consistent with the Python community. However, as of this time, the default indexing is Fortran-order to maintain backwards compatibility. In the future, the default index order will be switched to C-order.

.. note::

Converting from one index order to the other is done via transposing the data.

.. note::

All header fields are specified in Fortran order, per the NRRD specification, regardless of the index order. For example, a C-ordered array with shape (60, 800, 600) would have a sizes field of (600, 800, 60).

Example reading and writing NRRD files with C-order indexing
------------------------------------------------------------
.. code-block:: python

import numpy as np
import nrrd

# Treat this data array as a 3D volume using C-order indexing
# This means we have a volume with a shape of 600x800x70 (x by y by z)
data = np.zeros((70, 800, 600))
# Save the NRRD object with the correct index order
nrrd.write('output.nrrd', data, index_order='C')

# Read the NRRD file with C-order indexing
# Note: We can specify either index order here, this is just a preference unlike in nrrd.write where
# it MUST be set based on the data setup
data, header = nrrd.read('output.nrrd', index_order='C')

# The data shape is exactly the same shape as what we wrote
print(data.shape)
>>> (70, 800, 600)

# But the shape saved in the header is in Fortran-order
print(header['sizes'])
>>> [600 800 70]

# Read the NRRD file with Fortran-order indexing now
data, header = nrrd.read('output.nrrd', index_order='F')

# The data shape is exactly the same shape as what we wrote
# The data shape is now in Fortran order, or transposed from what we wrote
print(data.shape)
>>> (600, 800, 70)
print(header['sizes'])
>>> [600 800 70]
33 changes: 27 additions & 6 deletions nrrd/reader.py
Original file line number Diff line number Diff line change
Expand Up @@ -298,7 +298,7 @@ def read_header(file, custom_field_map=None):
return header


def read_data(header, fh=None, filename=None):
def read_data(header, fh=None, filename=None, index_order='F'):
"""Read data from file into :class:`numpy.ndarray`

The two parameters :obj:`fh` and :obj:`filename` are optional depending on the parameters but it never hurts to
Expand All @@ -317,6 +317,10 @@ def read_data(header, fh=None, filename=None):
filename : :class:`str`, optional
Filename of the header file. Only necessary if data is detached from the header. This is used to get the
absolute data path.
index_order : {'C', 'F'}, optional
Specifies the index order of the resulting data array. Either 'C' (C-order) where the dimensions are ordered from
slowest-varying to fastest-varying (e.g. (z, y, x)), or 'F' (Fortran-order) where the dimensions are ordered
from fastest-varying to slowest-varying (e.g. (x, y, z)).

Returns
-------
Expand All @@ -328,6 +332,9 @@ def read_data(header, fh=None, filename=None):
:meth:`read`, :meth:`read_header`
"""

if index_order not in ['F', 'C']:
raise NRRDError('Invalid index order')

# Check that the required fields are in the header
for field in _NRRD_REQUIRED_FIELDS:
if field not in header:
Expand Down Expand Up @@ -420,25 +427,39 @@ def read_data(header, fh=None, filename=None):
raise NRRDError('Size of the data does not equal the product of all the dimensions: {0}-{1}={2}'
.format(total_data_points, data.size, total_data_points - data.size))

# Eliminate need to reverse order of dimensions. NRRD's data layout is the same as what Numpy calls 'Fortran'
# order, where the first index is the one that changes fastest and last index changes slowest
data = np.reshape(data, tuple(header['sizes']), order='F')
# In the NRRD header, the fields are specified in Fortran order, i.e, the first index is the one that changes
# fastest and last index changes slowest. This needs to be taken into consideration since numpy uses C-order
# indexing.

# The array shape from NRRD (x,y,z) needs to be reversed as numpy expects (z,y,x).
data = np.reshape(data, tuple(header['sizes'][::-1]))

# Transpose data to enable Fortran indexing if requested.
if index_order == 'F':
data = data.T

return data


def read(filename, custom_field_map=None):
def read(filename, custom_field_map=None, index_order='F'):
"""Read a NRRD file and return the header and data

See :ref:`user-guide:Reading NRRD files` for more information on reading NRRD files.

.. note::
Users should be aware that the `index_order` argument needs to be consistent between `nrrd.read` and `nrrd.write`. I.e., reading an array with `index_order='F'` will result in a transposed version of the original data and hence the writer needs to be aware of this.
addisonElliott marked this conversation as resolved.
Show resolved Hide resolved

Parameters
----------
filename : :class:`str`
Filename of the NRRD file
custom_field_map : :class:`dict` (:class:`str`, :class:`str`), optional
Dictionary used for parsing custom field types where the key is the custom field name and the value is a
string identifying datatype for the custom field.
index_order : {'C', 'F'}, optional
Specifies the index order of the resulting data array. Either 'C' (C-order) where the dimensions are ordered from
slowest-varying to fastest-varying (e.g. (z, y, x)), or 'F' (Fortran-order) where the dimensions are ordered
from fastest-varying to slowest-varying (e.g. (x, y, z)).

Returns
-------
Expand All @@ -455,6 +476,6 @@ def read(filename, custom_field_map=None):
"""Read a NRRD file and return a tuple (data, header)."""
with open(filename, 'rb') as fh:
header = read_header(fh, custom_field_map)
data = read_data(header, fh, filename)
data = read_data(header, fh, filename, index_order)

return data, header
49 changes: 33 additions & 16 deletions nrrd/tests/test_reading.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
import nrrd


class TestReadingFunctions(unittest.TestCase):
class TestReadingFunctions(object):
def setUp(self):
self.expected_header = {u'dimension': 3,
u'encoding': 'raw',
Expand All @@ -20,7 +20,9 @@ def setUp(self):
u'space origin': np.array([0, 0, 0]),
u'type': 'short'}

self.expected_data = np.fromfile(RAW_DATA_FILE_PATH, np.int16).reshape((30, 30, 30), order='F')
self.expected_data = np.fromfile(RAW_DATA_FILE_PATH, np.int16).reshape((30, 30, 30))
if self.index_order == 'F':
self.expected_data = self.expected_data.T

def test_read_header_only(self):
with open(RAW_NRRD_FILE_PATH, 'rb') as fh:
Expand Down Expand Up @@ -49,7 +51,7 @@ def test_read_detached_header_only(self):
np.testing.assert_equal(self.expected_header, header)

def test_read_header_and_data_filename(self):
data, header = nrrd.read(RAW_NRRD_FILE_PATH)
data, header = nrrd.read(RAW_NRRD_FILE_PATH, index_order=self.index_order)

np.testing.assert_equal(self.expected_header, header)
np.testing.assert_equal(data, self.expected_data)
Expand All @@ -61,7 +63,7 @@ def test_read_detached_header_and_data(self):
expected_header = self.expected_header
expected_header[u'data file'] = os.path.basename(RAW_DATA_FILE_PATH)

data, header = nrrd.read(RAW_NHDR_FILE_PATH)
data, header = nrrd.read(RAW_NHDR_FILE_PATH, index_order=self.index_order)

np.testing.assert_equal(self.expected_header, header)
np.testing.assert_equal(data, self.expected_data)
Expand All @@ -74,7 +76,7 @@ def test_read_detached_header_and_data_with_byteskip_minus1(self):
expected_header[u'data file'] = os.path.basename(RAW_DATA_FILE_PATH)
expected_header[u'byte skip'] = -1

data, header = nrrd.read(RAW_BYTESKIP_NHDR_FILE_PATH)
data, header = nrrd.read(RAW_BYTESKIP_NHDR_FILE_PATH, index_order=self.index_order)

np.testing.assert_equal(self.expected_header, header)
np.testing.assert_equal(data, self.expected_data)
Expand All @@ -89,7 +91,7 @@ def test_read_detached_header_and_nifti_data_with_byteskip_minus1(self):
expected_header[u'encoding'] = 'gzip'
expected_header[u'data file'] = 'BallBinary30x30x30.nii.gz'

data, header = nrrd.read(GZ_BYTESKIP_NIFTI_NHDR_FILE_PATH)
data, header = nrrd.read(GZ_BYTESKIP_NIFTI_NHDR_FILE_PATH, index_order=self.index_order)

np.testing.assert_equal(self.expected_header, header)
np.testing.assert_equal(data, self.expected_data)
Expand All @@ -100,18 +102,18 @@ def test_read_detached_header_and_nifti_data_with_byteskip_minus1(self):
def test_read_detached_header_and_nifti_data(self):
with self.assertRaisesRegex(nrrd.NRRDError, 'Size of the data does not equal '
+ 'the product of all the dimensions: 27000-27176=-176'):
nrrd.read(GZ_NIFTI_NHDR_FILE_PATH)
nrrd.read(GZ_NIFTI_NHDR_FILE_PATH, index_order=self.index_order)

def test_read_detached_header_and_data_with_byteskip_minus5(self):
with self.assertRaisesRegex(nrrd.NRRDError, 'Invalid byteskip, allowed values '
+ 'are greater than or equal to -1'):
nrrd.read(RAW_INVALID_BYTESKIP_NHDR_FILE_PATH)
nrrd.read(RAW_INVALID_BYTESKIP_NHDR_FILE_PATH, index_order=self.index_order)

def test_read_header_and_gz_compressed_data(self):
expected_header = self.expected_header
expected_header[u'encoding'] = 'gzip'

data, header = nrrd.read(GZ_NRRD_FILE_PATH)
data, header = nrrd.read(GZ_NRRD_FILE_PATH, index_order=self.index_order)

np.testing.assert_equal(self.expected_header, header)
np.testing.assert_equal(data, self.expected_data)
Expand All @@ -125,7 +127,7 @@ def test_read_header_and_gz_compressed_data_with_byteskip_minus1(self):
expected_header[u'type'] = 'int16'
expected_header[u'byte skip'] = -1

data, header = nrrd.read(GZ_BYTESKIP_NRRD_FILE_PATH)
data, header = nrrd.read(GZ_BYTESKIP_NRRD_FILE_PATH, index_order=self.index_order)

np.testing.assert_equal(self.expected_header, header)
np.testing.assert_equal(data, self.expected_data)
Expand All @@ -137,7 +139,7 @@ def test_read_header_and_bz2_compressed_data(self):
expected_header = self.expected_header
expected_header[u'encoding'] = 'bzip2'

data, header = nrrd.read(BZ2_NRRD_FILE_PATH)
data, header = nrrd.read(BZ2_NRRD_FILE_PATH, index_order=self.index_order)

np.testing.assert_equal(self.expected_header, header)
np.testing.assert_equal(data, self.expected_data)
Expand All @@ -150,7 +152,7 @@ def test_read_header_and_gz_compressed_data_with_lineskip3(self):
expected_header[u'encoding'] = 'gzip'
expected_header[u'line skip'] = 3

data, header = nrrd.read(GZ_LINESKIP_NRRD_FILE_PATH)
data, header = nrrd.read(GZ_LINESKIP_NRRD_FILE_PATH, index_order=self.index_order)

np.testing.assert_equal(self.expected_header, header)
np.testing.assert_equal(data, self.expected_data)
Expand All @@ -176,6 +178,9 @@ def test_read_dup_field_error_and_warn(self):

import warnings
with warnings.catch_warnings(record=True) as w:
# Enable all warnings
warnings.simplefilter("always")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Go ahead and remove this line. When running tests on your branch, I noticed I received some warnings that I didn't get before. I finally figured out it's because of this line.

I have an upcoming PR where I enable filters for the entire tests (I put this code at the top of each file). I also will be fixing some of these warnings too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 . This makes the test fail though, but since you're enabling it in another PR I guess it's fine.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, that's the reason why the one test was failing. I'm curious as to why this started happening only in your PR. I'm sure there's some reason.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason is mostly likely caused by the fact that we want to trigger the same warning twice (for both 'C' and 'F'). I don't know if it is intended but it seems that Python 2 only triggers warnings once, omitting the warning in subsequent calls.


nrrd.reader.ALLOW_DUPLICATE_FIELD = True
header = nrrd.read_header(header_txt_tuple)

Expand All @@ -192,7 +197,7 @@ def test_read_header_and_ascii_1d_data(self):
u'spacings': [1.0458000000000001],
u'type': 'unsigned char'}

data, header = nrrd.read(ASCII_1D_NRRD_FILE_PATH)
data, header = nrrd.read(ASCII_1D_NRRD_FILE_PATH, index_order=self.index_order)

self.assertEqual(header, expected_header)
np.testing.assert_equal(data.dtype, np.uint8)
Expand All @@ -209,11 +214,13 @@ def test_read_header_and_ascii_2d_data(self):
u'spacings': [1.0458000000000001, 2],
u'type': 'unsigned short'}

data, header = nrrd.read(ASCII_2D_NRRD_FILE_PATH)
data, header = nrrd.read(ASCII_2D_NRRD_FILE_PATH, index_order=self.index_order)

np.testing.assert_equal(header, expected_header)
np.testing.assert_equal(data.dtype, np.uint16)
np.testing.assert_equal(data, np.arange(1, 28).reshape(3, 9, order='F'))

expected_shape = (3, 9) if self.index_order == 'F' else (9, 3)
np.testing.assert_equal(data, np.arange(1, 28).reshape(expected_shape, order=self.index_order))

# Test that the data read is able to be edited
self.assertTrue(data.flags['WRITEABLE'])
Expand All @@ -233,7 +240,7 @@ def test_read_simple_4d_nrrd(self):
[0., 1.0000000006, 0.],
[0., 0., 1.000000000000009]])}

data, header = nrrd.read(RAW_4D_NRRD_FILE_PATH)
data, header = nrrd.read(RAW_4D_NRRD_FILE_PATH, index_order=self.index_order)

np.testing.assert_equal(header, expected_header)
np.testing.assert_equal(data.dtype, np.float64)
Expand Down Expand Up @@ -409,5 +416,15 @@ def test_invalid_endian(self):
nrrd.read_data(header, fh, RAW_NRRD_FILE_PATH)


def test_invalid_index_order(self):
with self.assertRaisesRegex(nrrd.NRRDError, 'Invalid index order'):
nrrd.read(RAW_NRRD_FILE_PATH, index_order=None)

class TestReadingFunctionsFortran(TestReadingFunctions, unittest.TestCase):
index_order = 'F'

class TestReadingFunctionsC(TestReadingFunctions, unittest.TestCase):
index_order = 'C'

if __name__ == '__main__':
unittest.main()
Loading