Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JP-1713: fix gentle_asarray structured array handling and allow extra columns in datamodel schema #189

Merged
merged 16 commits into from
Aug 21, 2023

Conversation

braingram
Copy link
Collaborator

@braingram braingram commented Aug 4, 2023

This fixes issues with gentle_asarray handling of structured arrays (see #188) and adds support for allowing extra columns in tables defined in datamodel schema.

gentle_asarray is used when setting a data model property when the value is _cast to a dtype defined by the schema. (Here is one example from properties.py which includes several uses of _cast for __setattr__, __setitem__ etc...).

Prior to this PR gentle_asarray mangled some dtypes when used with structured arrays (see #188 for more examples and details). For example,

a = np.zeros(2, dtype=[('f', 'f8'), ('i', 'i4')])
ndt = np.dtype([('i', 'i4'), ('f', 'f8')])  # swap 'i' and 'f'
b = gentle_asarray(a, ndt)
assert b.dtype == ndt
assert ndt == np.dtype([('f', '<i4'), ('i', '<f8')]) # note that 'f' now has a 'i4' subdtype

This PR:

  • reorganizes gentle_asarray by inverting some conditionals to allow removing nested if statements (to improve readability)
  • reorganizes array columns using merge_arrays when provided with input that has a dtype with columns in an order that does not match the schema
  • calls _safe_asanyarray if sub-dtypes (dtypes for columns) differ (even if the difference is byte order, more on that below).
  • adds an option allow_extra_columns to gentle_asarray to allow extra columns (which will be put at the end of the required columns) and exposes this option in the schema (see the oifits schema changes in this PR).

As mentioned above _safe_asanyarray is now used for byte order differences. Although fitsrec appears to ignore byte order differences for views (unlike normal structured arrays) this breaks down if a binary mask is applied to the fitsrec and can result in unexpected byte swaps and erroneous values.

a = np.ones(3, dtype=[('a', '>f4'), ('b', '>f4')])
ndt = np.dtype([('a', '<f4'), ('b', '<f4')])
assert np.allclose(a.view(ndt)['a'], 4.6006e-41)  # bytes were correctly swapped
h = fits.HDUList()
h.append(fits.BinTableHDU())
h[-1].data = a
v = h[-1].data.view(ndt)  # this should swap the bytes
assert np.allclose(v['a'], 1.0)  # bytes were not correctly swapped
assert np.allclose(v[[True, False, False]]['a'], 4.6006e-41)  # applying a binary mask results in swapped bytes

@perrygreenfield I think this is another reason to remove all uses of fitsrec.

While working on this PR I discovered that the jwst miri cubepars reference files contain tables with columns in an order that do not match the schema (see #188). Prior to this PR, columns names but not dtypes were being swapped. This was not an issue in practice as the swapped columns have the same dtype. With this PR the names and dtypes are swapped (and tests were added for this and several other cases, see the new parametrized test in test_util.py).

This PR moves the single test test_gentle_asarray from test_storage.py into test_util.py and removes test_storage.py.

The changes in this PR will make JP-1713 easier to address as we won't need to make different oifits schemas for each file type (which have different, optional columns in some tables).

Checklist

  • added entry in CHANGES.rst (either in Bug Fixes or Changes to API)
  • updated relevant tests
  • updated relevant documentation
  • updated relevant milestone(s)
  • added relevant label(s)

@codecov
Copy link

codecov bot commented Aug 4, 2023

Codecov Report

Patch coverage: 97.53% and project coverage change: +0.27% 🎉

Comparison is base (e74ded1) 64.01% compared to head (f653693) 64.28%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #189      +/-   ##
==========================================
+ Coverage   64.01%   64.28%   +0.27%     
==========================================
  Files         101      101              
  Lines        5558     5606      +48     
==========================================
+ Hits         3558     3604      +46     
- Misses       2000     2002       +2     
Files Changed Coverage Δ
src/stdatamodels/util.py 96.68% <97.40%> (-0.49%) ⬇️
src/stdatamodels/properties.py 85.46% <100.00%> (+0.10%) ⬆️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@braingram braingram changed the title Gentle asarray fix gentle_asarray oddities and allow extra columns in datamodel schema Aug 8, 2023
@braingram braingram changed the title fix gentle_asarray oddities and allow extra columns in datamodel schema fix gentle_asarray structured array handling and allow extra columns in datamodel schema Aug 8, 2023
@braingram

This comment was marked as outdated.

@braingram
Copy link
Collaborator Author

I ran another regression test run this morning and it had one failure:
https://plwishmaster.stsci.edu:8081/blue/organizations/jenkins/RT%2FJWST-Developers-Pull-Requests/detail/JWST-Developers-Pull-Requests/851/pipeline/202/
The failure test_nircam_image_stages12[rate] looks caused by a file not yet updated to comply with #190

E         Primary HDU:
E         
E            Headers contain differences:
E              Keyword OSS_VER  has different comments:
E                 a> Observatory Scheduling Software (OSS) Version
E                  ?                                       ^
E                 b> Observatory Scheduling Software (OSS) version n
E                  ?                                       ^      ++

@braingram braingram marked this pull request as ready for review August 10, 2023 13:57
@braingram braingram requested a review from a team as a code owner August 10, 2023 13:57
@braingram
Copy link
Collaborator Author

@mcara @nden mentioned adding you as a reviewer for this package. Thanks in advance if you can take a look at this PR.

@braingram braingram requested a review from hbushouse August 10, 2023 14:03
@nden nden requested a review from mcara August 10, 2023 14:36
@braingram braingram changed the title fix gentle_asarray structured array handling and allow extra columns in datamodel schema JP-1713: fix gentle_asarray structured array handling and allow extra columns in datamodel schema Aug 15, 2023
Copy link
Member

@mcara mcara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a great fix. Thanks!

@braingram
Copy link
Collaborator Author

I think this is a great fix. Thanks!

Thanks for reviewing the changes!

@braingram braingram merged commit bf55ea1 into spacetelescope:master Aug 21, 2023
@braingram braingram deleted the gentle_asarray branch August 21, 2023 12:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants