Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

datamodels doesn't validate arrays in some cases #307

Closed
jdavies-st opened this issue Jun 4, 2019 · 8 comments · Fixed by #395 or #403
Closed

datamodels doesn't validate arrays in some cases #307

jdavies-st opened this issue Jun 4, 2019 · 8 comments · Fixed by #395 or #403
Assignees

Comments

@jdavies-st
Copy link
Contributor

jwst.datamodels validates array sizes on reading from a FITS file or on assignment. So ImageModel requires the data array to be 2D, and so if I try to read or assign a 3D array, it fails validation:

In [22]: m = datamodels.ImageModel()

In [24]: m.data = np.arange(10*10).reshape((1, 10, 10))
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-24-d91d60f9cc1c> in <module>
----> 1 m.data = np.arange(10*10).reshape((1, 10, 10))

~/dev/jwst/jwst/datamodels/model_base.py in __setattr__(self, attr, value)
    623             object.__setattr__(self, attr, value)
    624         elif ndmodel.NDModel.my_attribute(self, attr):
--> 625             ndmodel.NDModel.__setattr__(self, attr, value)
    626         else:
    627             properties.ObjectNode.__setattr__(self, attr, value)

~/dev/jwst/jwst/datamodels/ndmodel.py in data(self, value)
    130         if not primary_array_name:
    131             primary_array_name = 'data'
--> 132         properties.ObjectNode.__setattr__(self, primary_array_name, value)
    133 
    134     @property

~/dev/jwst/jwst/datamodels/properties.py in __setattr__(self, attr, val)
    258             if val is None:
    259                 val = _make_default(attr, schema, self._ctx)
--> 260             val = _cast(val, schema)
    261 
    262             node = ObjectNode(attr, val, schema, self._ctx)

~/dev/jwst/jwst/datamodels/properties.py in _cast(val, schema)
     37             raise ValueError(
     38                 "Array has wrong number of dimensions.  Expected {0}, got {1}".format(
---> 39                     schema['ndim'], len(val.shape)))
     40 
     41         if 'max_ndim' in schema and len(val.shape) > schema['max_ndim']:

ValueError: Array has wrong number of dimensions.  Expected 2, got 3

That's good.

But if I instantiate an ImageModel with a 3D array or from an already existing datamodel with a 3D array, it doesn't do any validation and allows it.

In [25]: m = datamodels.ImageModel((1, 10, 10))

In [26]: cube = datamodels.CubeModel((1, 10, 10))

In [27]: m = datamodels.ImageModel(cube)

This is incorrect. It should be giving the same validation error as above.

@philhodge
Copy link

I understand (well, maybe) why one would want different models for 2-D and 3-D arrays, but your example brought to mind a possible snag. We often reshape an array to add an axis of length 1, for the purpose of broadcasting with another array. Explicitly specifying where the axis of length 1 falls within the shape may be necessary if it isn't the last axis; the irs2 code in refpix (translated from IDL) does this a lot. If an array was reshaped without first making a copy, and a validation check was done while a (10, 10) array temporarily had shape (1, 10, 10), that would not be helpful.

@jdavies-st
Copy link
Contributor Author

I think as long as you're not assigning it, it should be fine:

In [3]: m = datamodels.ImageModel()

In [4]: m.data = np.arange(10*10).reshape((10, 10))

In [5]: m.data
Out[5]: 
array([[ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9.],
       [10., 11., 12., 13., 14., 15., 16., 17., 18., 19.],
       [20., 21., 22., 23., 24., 25., 26., 27., 28., 29.],
       [30., 31., 32., 33., 34., 35., 36., 37., 38., 39.],
       [40., 41., 42., 43., 44., 45., 46., 47., 48., 49.],
       [50., 51., 52., 53., 54., 55., 56., 57., 58., 59.],
       [60., 61., 62., 63., 64., 65., 66., 67., 68., 69.],
       [70., 71., 72., 73., 74., 75., 76., 77., 78., 79.],
       [80., 81., 82., 83., 84., 85., 86., 87., 88., 89.],
       [90., 91., 92., 93., 94., 95., 96., 97., 98., 99.]], dtype=float32)

In [6]: m.data.reshape((1, 10, 10))
Out[6]: 
array([[[ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9.],
        [10., 11., 12., 13., 14., 15., 16., 17., 18., 19.],
        [20., 21., 22., 23., 24., 25., 26., 27., 28., 29.],
        [30., 31., 32., 33., 34., 35., 36., 37., 38., 39.],
        [40., 41., 42., 43., 44., 45., 46., 47., 48., 49.],
        [50., 51., 52., 53., 54., 55., 56., 57., 58., 59.],
        [60., 61., 62., 63., 64., 65., 66., 67., 68., 69.],
        [70., 71., 72., 73., 74., 75., 76., 77., 78., 79.],
        [80., 81., 82., 83., 84., 85., 86., 87., 88., 89.],
        [90., 91., 92., 93., 94., 95., 96., 97., 98., 99.]]],
      dtype=float32)

In [7]: m.data.reshape((1, 10, 10)).shape
Out[7]: (1, 10, 10)

@braingram braingram transferred this issue from spacetelescope/jwst Jul 21, 2024
@emolter
Copy link
Contributor

emolter commented Feb 10, 2025

The following fails as expected, too:

>>> import stdatamodels.jwst.datamodels as dm
>>> import numpy as np
>>> arr = np.zeros((1,10,10))
>>> m2 = dm.ImageModel(arr)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/emolter/repositories/stdatamodels/src/stdatamodels/model_base.py", line 266, in __init__
    setattr(self, primary_array_name, init)
  File "/Users/emolter/repositories/stdatamodels/src/stdatamodels/model_base.py", line 689, in __setattr__
    properties.ObjectNode.__setattr__(self, attr, value)
  File "/Users/emolter/repositories/stdatamodels/src/stdatamodels/properties.py", line 347, in __setattr__
    val = _cast(val, schema)
          ^^^^^^^^^^^^^^^^^^
  File "/Users/emolter/repositories/stdatamodels/src/stdatamodels/properties.py", line 109, in _cast
    raise ValueError(
ValueError: Array has wrong number of dimensions.  Expected 2, got 3

That is, the problem does not occur when using an array to instantiate the model, only when the model attempts to create its own default array from an input shape. In the latter case, there is no check to ensure ctx.shape is compatible with the ndim field of the data array schema.

@jdavies-st
Copy link
Contributor Author

jdavies-st commented Feb 11, 2025

Nice catch!

The other case where it doesn't validate is when one instantiates a 2D model from a 3D model. One sometimes does this in pipelines to have a shortcut for pulling all the metadata from one model to the other, if supported. And the data too, if supported by the schemas.

In [1]: from jwst import datamodels

In [2]: import numpy as np

In [6]: cube = datamodels.CubeModel((2, 3, 4))

In [7]: cube.data
Out[7]: 
array([[[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]],

       [[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]]], dtype=float32)

In [8]: cube.data.shape
Out[8]: (2, 3, 4)

In [9]: image = datamodels.ImageModel(cube)

In [10]: image.data.shape
Out[10]: (2, 3, 4)

@emolter
Copy link
Contributor

emolter commented Feb 11, 2025

Gotcha, ok I will see if that is easily fixable as well. But if it's actively used in the pipelines, it's potentially more challenging to fix non-disruptively

@emolter
Copy link
Contributor

emolter commented Feb 11, 2025

#395 attempts to resolve the initial issue, where the dimensions are mismatched when instantiating from a shape. I think it makes sense for that to be its own PR. I'll investigate the other case and leave this issue open until we decide what to do about it

@jdavies-st
Copy link
Contributor Author

Sounds good. I think both are actively used in the pipelines, so it would be good to check that these fixes to main eventually don't cause issues in the pipelines, and if they do, fix the pipelines too.

@emolter
Copy link
Contributor

emolter commented Feb 12, 2025

This is only half-complete. The unexpected behavior where initializing an ImageModel with a CubeModel still needs to be investigated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants