Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store AMI observables in OIFITS format products #5351

Closed
stscijgbot-jp opened this issue Sep 18, 2020 · 43 comments
Closed

Store AMI observables in OIFITS format products #5351

stscijgbot-jp opened this issue Sep 18, 2020 · 43 comments

Comments

@stscijgbot-jp
Copy link
Collaborator

Issue JP-1713 was created on JIRA by Howard Bushouse:

In addition to the overall algorithm upgrades to AMI processing requested by the AMI WG in JP-1449, they have also requested that the results ("observables") of each step be stored in products that adhere to the OIFITS convention (see https://fits.gsfc.nasa.gov/registry/oifits.html).

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Rachel Cooper on JIRA:

Nadia Dencheva requested the following: 1. the code from the standalone algorithm that we currently use to read/write OIFITS files, and 2. an example file that can be used for testing.

  1. The code can be found here: https://github.com/anand0xff/ImPlaneIA/blob/master/nrm_analysis/misctools/oifits.py
    It is called by functions contained here: https://github.com/anand0xff/ImPlaneIA/blob/master/nrm_analysis/misctools/implane2oifits.py 
    I've pulled out most of the relevant pieces out of these codes (reshaping data, creating various keyword values) in my updates to AMI3, which I can push to my fork soon.
  2. I have attached example averaged and per-integration OIFITS files for testing when that time comes.
    I'm happy to answer any questions about what this code does; as I mentioned, it's a bit of a patchwork.

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Brett Graham on JIRA:

Rachel Cooper Thanks for sharing the example files. I've been taking a look at them, the OIFITS standard and doing some testing with stdatamodels and jwst to see what will be required to make an OIFITS compliant output.

Is there software that can be used to verify that the output is OIFITS compliant?

I've found 2 possible options, each with issues:

  1. http://oival.jmmc.fr/index.html is an online OIFITS validator that validates one of the example files ('jw01093012001_03102_00001_nis.oifits') but appears to fail to load the second one (without producing a validation report)
  2. a python library with oifits validation https://github.com/pboley/oifits. This fails to open either example file in part because of the extra column in OI_ARRAY (which the standard appears to support but the software does not: OI_ARRAY does not allow for extra columns or alternative orderings pboley/oifits#3

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Rachel Cooper on JIRA:

Hi Brett, Thanks for digging into this. We haven't been using any kind of verification software unfortunately. I have used a version of the same JMMC oifits validator (the OIFITSExplorer GUI runs it when you open files I think) but I didn't find an obvious python code that would work out of the box so we haven't used anything like that as part of creating the files. At some point in the past we were using pboley's oifits package in our ImPlaneIA to write the files; I'm not entirely sure why we stopped but possibly because we wanted to include additional info that should be acceptable for the OIFITS2 standard but wasn't supported by that code.
If we specify everything correctly in the data model schema, could that act as a replacement for a file validation code?

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Brett Graham on JIRA:

Hi Rachel,

Thanks for the follow up and extra information.

The data model schema can provide some checks for OIFITS compliance (making sure certain fits keywords and tables are defined, etc). Developing this schema will benefit from having an independent means of verifying that the produced files are compliant (so we can find bugs in the schema). I'm not sure at this point if the datamodel schema will be sufficient for testing complete OIFITS compliance (there are expectations that the schema cannot easily enforce such as the cross-referencing described in the OIFITS paper one example being each OI_VIS table having a keyword INSNAME that corresponds to a keyword INSNAME in an OI_WAVELENGTH table).

I was able to find a third option for checking a fits file for OIFITS compliance: https://github.com/jsy1001/oifitslib
This library contains an 'oifits-check' tool that validates jw01093012001_03102_00001_nis.oifits (with two warnings about columns TARGET and SPECTYP having incorrect datatypes [16A instead of the expected 32A]).

However, multi_jw01093012001_03102_00001_nis.oifits fails with errors related to the data tables (see errors below). Would you help me to understand the data in this file? Looking at the OIFITS standard, I would expect the OI_WAVELENGTH table to have rows equal to the number of spectral channels (referred to in the document as NWAVE). This table in the multi_* file contains 1 row. This should mean that the data tables, such as OI_VIS2 should contain data columns, such as VIS2DATA, with 1 element per row. However for this file, OI_VIS2 VIS2DATA contains 69 elements. I believe this is causing the failures in compliance tests for the OIFITS standard. Are the data shapes for these columns an error?

WARNING! Expecting format 32A but found 16A for column 'TARGET'
WARNING! Expecting format 32A but found 16A for column 'SPECTYP'
CFITSIO error in read_next_oi_vis:

FITSIO status = 307: bad first row number
Attempt to read past end of table:
  Table has 21 rows with 1 elements per row;
  Tried to read 69 elements starting at row 1, element 1.

Skipping bad OI_VIS (bad first row number)
FITSIO error in read_next_oi_vis2:

FITSIO status = 307: bad first row number
Attempt to read past end of table:
  Table has 21 rows with 1 elements per row;
  Tried to read 69 elements starting at row 1, element 1.

Skipping bad OI_VIS2 (bad first row number)
CFITSIO error in read_next_oi_t3:

FITSIO status = 307: bad first row number
Attempt to read past end of table:
  Table has 35 rows with 1 elements per row;
  Tried to read 69 elements starting at row 1, element 1.

Skipping bad OI_T3 (bad first row number)
OIFITS version 2 data:
  ORIGIN  = 'STScI'
  DATE    = '2022-06-05'
  DATE-OBS= '2022-06-05T00:00:00.000'
  TELESCOP= 'JWST'
  INSTRUME= 'NIRISS'
  OBSERVER= 'UNKNOWN'
  OBJECT  = 'AB DOR'
  INSMODE = 'NRM'
  OBSTECH = ''

  1 OI_ARRAY tables:
    #1  ARRNAME='g7s6'  7 elements
  1 OI_WAVELENGTH tables:
    #1  INSNAME='NIRISS'  1 channels   4817.0- 4817.0nm
  0 OI_CORR tables:
  0 OI_INSPOL tables:
  0 OI_VIS tables:
  0 OI_VIS2 tables:
  0 OI_T3 tables:
  0 OI_FLUX tables:
*** Does not conform to the OIFITS standard:
Mandatory table missing, 1 occurrences:-
    No data table - at least one OI_VIS/VIS2/T3/FLUX required

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Brett Graham on JIRA:

I was able to construct a schema for the files like the example "jwst_jw01093012001_03102_00001_nis.fits" and opened a PR for stdatamodels adding the schema and an 'AmiOIModel' which uses the schema (and adds some additional logic to copy existing file metadata to fits keywords compatible with OIFITS).
spacetelescope/stdatamodels#174

I included portions of the data from this example file in a test that generates a file that passes schema validation and produces a file compatible with 'oifits-check' from oifitslib.

Please feel free to take a look at the PR. I'm happy to answer any questions and to discuss next steps.

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Rachel Cooper on JIRA:

Hi Brett,

The multi_* file data arrays contain interferometric observables for each integration. Generally these would be averaged together, as in the regular OIFITS files, but we have found it helpful especially in these early days to also provide all the non-averaged integrations for a more detailed look into the stability of the data and allow for more more careful data cleaning. I think the OIFITS standard expects that if there are multiple columns in the observable arrays, they should be for separate spectral channels, but in our case it is just a single exposure in a single filter. Maybe we could get around this by setting NWAVE = NINTS or repeating the central wavelength where necessary or something?
To be honest the multi_* files are more like an intermediate data product, so I don't think it's worth expending too much energy making sure it conforms perfectly to the standard. It's more important that the averaged OIFITS files adhere to the standard enough that they are usable by further analysis software.
Thanks so much for your work on this – I'll take a look at the schema ASAP!

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Brett Graham on JIRA:

Hi Rachel,

Great idea to try duplicating the central wavelength. If I load the multi_* file and:

  • duplicate the central wavelength (so the OI_WAVELENGTH table contains 69 matching rows)
  • expand the FLAGS columns for the OI_VIS OI_VIS2 and OI_T3 tables (these are currently single bools in the example file and need to be NWAVE bools per row)

the file is closer to conforming. However oifits-check reports errors with the OI_T3 table containing unnormalized triple product amplitudes:

WARNING! Expecting format 32A but found 16A for column 'TARGET'
WARNING! Expecting format 32A but found 16A for column 'SPECTYP'
OIFITS version 2 data:
  ORIGIN  = 'STScI'
  DATE    = '2022-06-05'
  DATE-OBS= '2022-06-05T00:00:00.000'
  TELESCOP= 'JWST'
  INSTRUME= 'NIRISS'
  OBSERVER= 'UNKNOWN'
  OBJECT  = 'AB DOR'
  INSMODE = 'NRM'
  OBSTECH = ''

  1 OI_ARRAY tables:
    #1  ARRNAME='g7s6'  7 elements
  1 OI_WAVELENGTH tables:
    #1  INSNAME='NIRISS'  69 channels   4817.0- 4817.0nm
  0 OI_CORR tables:
  0 OI_INSPOL tables:
  1 OI_VIS tables:
    #1  DATE-OBS=2022-06-05T00:00:00.000
    INSNAME='NIRISS'  ARRNAME='g7s6'  CORRNAME=''
        21 records x  69 wavebands
  1 OI_VIS2 tables:
    #1  DATE-OBS=2022-06-05T00:00:00.000
    INSNAME='NIRISS'  ARRNAME='g7s6'  CORRNAME=''
        21 records x  69 wavebands
  1 OI_T3 tables:
    #1  DATE-OBS=2022-06-05T00:00:00.000
    INSNAME='NIRISS'  ARRNAME='g7s6'  CORRNAME=''
        35 records x  69 wavebands
  0 OI_FLUX tables:
*** Does not conform to the OIFITS standard:
OI_T3 table may contain unnormalised triple product amplitude, 1662 occurrences:-
    OI_T3 #1 record 3 channel 1
    OI_T3 #1 record 3 channel 2
    OI_T3 #1 record 3 channel 3
    OI_T3 #1 record 3 channel 4
    OI_T3 #1 record 3 channel 5
    OI_T3 #1 record 3 channel 6
    OI_T3 #1 record 3 channel 7
    OI_T3 #1 record 3 channel 8
    OI_T3 #1 record 3 channel 9
    [List truncated]

I'm not seeing any mention of needing these to be normalized in the paper (although it appears oifitslib wants the T3_AMP values to be <=1). Running this same file against the online checker shows no errors but does show numerous warnings about 0.0 values that should be NaN (or flagged out). Here is one example.
WARNING Fixed value at index 43 for column 'VIS2ERR' line 4, found '0.0' set to NaN (should be > 0 or NaN or flagged out)

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Rachel Cooper on JIRA:

Hi Brett,

Good, I'm glad that seems to have worked! Theoretically T3_amp values > 1 are nonphysical, but I don't think we want to normalize them currently. The all-zero error tables are kind of there as a placeholder because there have to be error estimates on the observables but since we've been calculated errors across integrations we just populated the multi-int file error tables with zeros. I think they could probably be NaNs without breaking anything too badly if that would satisfy the standard.
I'm assuming the next step is for me to test that the schemas work with the updates I have made to the ami_analyze step that will be producing the oifits files. The current amilg datamodel output gets populated with this call:

output_model = datamodels.AmiLgModel(
            fit_image=nrm.modelpsf,
            resid_image=nrm.residual,
            closure_amp_table=np.asarray(nrm.redundant_cas),
            closure_phase_table=np.asarray(nrm.redundant_cps),
            fringe_amp_table=np.asarray(nrm.fringeamp),
            fringe_phase_table=np.asarray(nrm.fringephase),
            pupil_phase_table=np.asarray(nrm.fringepistons),
            solns_table=np.asarray(nrm.soln)) ```
I have code where I've gathered all the updated header keys and data tables into a dictionary, but I don't quite see how the datamodel schema defines the way the data given to it is used to populate the modeli.e. in the above example, where are the keyword arguments to datamodels.AmiLgModel() defined?

I also wanted to let you know that I'll be out of communication June 30-July 14, but will return to working on this when I get back. 
Thanks!

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Brett Graham on JIRA:

Hi Rachel,

Thanks for taking a look at the updates.

It is good to know that converting the 0s to NaNs shouldn't be an issue (I don't believe this is something the schema can enforce and the same for the >1 T3_AMP values).

The relationship between keyword arguments to the model and the schema values is a bit convoluted. I will try to explain things (to the best of my knowledge) here but feel free to let me know if a discussion might help to clear things up.

The keyword arguments get turned into attribute assignments, so using the old AmiLgModel as an example:

m = datamodels.AmiLgModel(fit_image=[[1, 2]])
# is the same as
m = datamodels.AmiLgModel()
m.fit_image = [[1, 2]]

The requirements (type, shape, etc) for these arguments come from the schema. Sticking with the 'fit_image' example, here is it's entry in 'amilg.schema.yaml'
https://github.com/spacetelescope/stdatamodels/blob/master/src/stdatamodels/jwst/datamodels/schemas/amilg.schema.yaml#L10-L15

    fit_image:
      title: Fitted image
      fits_hdu: FIT
      default: 0.0
      ndim: 2
      datatype: float32

The schema also defines where these attributes will be saved in the resulting fits file (for the fit_image example it will be stored in the 'FIT' fits extension).

For the new oifits compatible files, many of these attributes will likely need to change or be mapped to new fits extensions and keywords. For defining the new schema I attempted to base names off of the oifits standard. The test function added in the PR is likely a good starting point:
https://github.com/spacetelescope/stdatamodels/pull/174/files#diff-a2579f1897f1556138aabe3540110b01abef8fd8d515c1e8f918b8a6b9ed41a2R380
In that function an 'AmiOiModel' is defined and many attributes set based off values in the one example file you shared.

    m = datamodels.AmiOIModel()
    m.meta.telescope = 'JWST'
    m.meta.origin = 'STScI'
    m.meta.instrument.name = 'NIRISS'
    m.meta.program.pi_name = 'UNKNOWN'
    m.meta.target.proposer_name = 'AB DOR'
    m.meta.observation.date = '2022-06-05'
    m.meta.oifits.array_name = 'g7s6'
    m.meta.oifits.instrument_mode = 'NRM'
    m.array = ... # truncated for readability, see the example for the assigned values
    m.target = ...
    m.t3 = ...
    m.vis = ...
    m.vis2 = ...
    m.wavelength = ...

Some of the above metadata overlaps with what is already used in the jwst pipeline ("m.meta.telescope", "origin", "instrument.name", etc). These need to be defined for the file to be oifits compliant (note that m.meta.observation.date stores a date, not a date time, it is unclear if this violates the oifits standard, none of the checkers complained).
There are additional metadata entries that are required by oifits and were not already defined in the standard jwst metadata (m.meta.oifits.array_name and m.meta.oifits.instrument_mode). These also need to be defined for oifits compliance.
Finally the tables m.array (corresponding to OI_ARRAY), m.target (OI_TARGET) and m.wavelength (OI_WAVELENGTH) will all need to be defined and finally at least one data table (t3, vis, vis2) is required for oifits compliance.
Beyond what is described above, any additional information you'd like to store in the file will need to be added to the schema (so that the fits extension and/or keyword can be defined).

Does this help to describe how to use the new model? I made quite a few guesses at names (all of this can be changed) and tried to keep the attributes to the minimum required for oifits compliance (with the expectation that more would need to be added informed by the details of the processing step).

One option for proceeding might be to merge this PR (it doesn't interfere with other steps) to be able to start using it in a JWST PR adding the new processing step. Then address any changes with future PRs. How does that sound to you?

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Rachel Cooper on JIRA:

Hi Brett,
This is great, that's exactly the kind of example I was hoping to find in documentation somewhere but couldn't. I figured it had to be something like that (mapping keywords into attributes whose names match the items in the schema). I think merging your PR into the stdatamodels package would be a good next step, and then when I'm back I'll update my AMI3 fork to use the models and we can iterate on any changes that need to happen with the naming in the schemas at that point. 

@braingram
Copy link
Collaborator

Sounds great!

I merged the PR.

There is an open issue about updating the 'Creating a new model' docs that I'll xref here: #4436

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Rachel Cooper on JIRA:

Hi Brett,
I was just trying to test the code I wrote to use the new datamodel but encountered an import error that ended with: 

cannot import name 's3_utils' from 'stdatamodels' (/Users/rcooper/miniconda3/envs/ami3test/lib/python3.9/site-packages/stdatamodels/__init__.py)```
I'm not sure if this is due to how it's being imported somewhere in the jwst.ami code? I'm using stdatamodels version 1.7.1. I've attached the full traceback in case that's helpful.
[^stdatamodels_import_traceback.txt]

 

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Brett Graham on JIRA:

Hi Rachel,

Thanks for sharing the traceback. It appears the stpipe version you have is out-of-date. Can you try installing the newest (version 0.5.0)? If that doesn't fix the issue would you share the output of pip freeze? Thanks!

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Rachel Cooper on JIRA:

Yep, that did it! Thanks :D

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Rachel Cooper on JIRA:

Hi Brett,
Another question: can I populate the arrays with their column names from the OIFITS file in some way (like m.array.tel_name = [...]) , or do I need to just make the array in the correct order and shape ahead of time and assign it directly to the extension, the way it is being done in the test function you shared? I ran into some trouble with inhomogeneous arrays the way I tried to do it and was wondering if there was another way to do it before I spend a lot of time debugging that. Thanks!

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Brett Graham on JIRA:

Hi Rachel,

Hopefully I can address your question here but feel free to let me know if you think discussing this work would be helpful.

The 'm.array' in this case is a numpy structured array and should work with any of the tools for that structure. If you're constructing an array column-by-column (knowing ahead of time how many rows will be in the table) you could try initializing a empty array (note that this will have seemingly random data but this could be filled with 0s, nan or some other value that is easier to distinguish from data).

import numpy as np
from stdatamodels.jwst.datamodels import AmiOIModel
model = AmiOIModel
n_rows = 100
model.array = np.empty(n_rows, dtype=model.array.dtype)

With the initialized array you can then assign values to columns using the column name

mode.array['TEL_NAME'] = 'JWST'

If you don't know the size of the array ahead of time it's possible to append rows. However this can result in redefinition and copying of memory (if the array grows beyond the bounds of the initially allocated memory) so it might be more efficient to pre-construct the data before assigning it to the array.

Does this help to answer your question?

--
Thanks,
Brett

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Rachel Cooper on JIRA:

Hi Brett,
Thanks, I think this way is a bit easier to debug, though numpy still doesn't like that the  'STAXYZ' column has shape (7, 3) while the other columns have shape (7,). There is another column that has shape (7,2) but it doesn't seem to object to that one for some reason. I tried to np.expand_dims so all the (7,) columns had shape (7,1), but it still complained it "could not broadcast input array from shape (7,3) into shape (7,)." Here is how I was trying to do it (necessary lines commented out so it would run):
 

m = datamodels.AmiOIModel()
nrows = 7
m.array = np.zeros(nrows, dtype=m.array.dtype)
m.array['TEL_NAME'] = np.expand_dims(np.array(self.oifits_dct['OI_ARRAY']['TEL_NAME']),axis=0)
m.array['STA_NAME'] = np.expand_dims(np.array(self.oifits_dct['OI_ARRAY']['STA_NAME']),axis=0)
m.array['STA_INDEX'] = np.expand_dims(np.array(self.oifits_dct['OI_ARRAY']['STA_INDEX']),axis=0)
m.array['DIAMETER'] = np.expand_dims(np.array(self.oifits_dct['OI_ARRAY']['DIAMETER']),axis=0)
#m.array['STAXYZ'] = np.array(self.oifits_dct['OI_ARRAY']['STAXYZ'])
m.array['FOV'] = np.expand_dims(np.array(self.oifits_dct['OI_ARRAY']['FOV']),axis=0)
m.array['FOVTYPE'] = np.expand_dims(np.array(self.oifits_dct['OI_ARRAY']['FOVTYPE']),axis=0)
m.array['CTRS_EQT'] = np.array(self.oifits_dct['OI_ARRAY']['CTRS_EQT'])
#m.array['PISTONS'] = np.expand_dims(np.array(self.oifits_dct['OI_ARRAY']['PISTONS']),axis=0)
#m.array['PIST_ERR'] = np.expand_dims(np.array(self.oifits_dct['OI_ARRAY']['PIST_ERR']),axis=0)
print('got past here') ```

Then, when I comment out the STAXYZ column definition it complains that the model.array['PISTONS'] and ['PIST_ERR'] keys don't exist, and that I think is coming from the model schema, since that was something we added since that sample file I sent you was produced. That is part of what will differ between the raw and calibrated OIFITS files: raw oifits files will have PISTONS and PIST_ERR columns in the OI_ARRAY extension, while calibrated fits will have 'PIST_T', 'PIST_C', and 'PIST_ERR' columns. I'm not sure if those piston columns can be defined as optional in the schema or if it is preferable to create a separate schema for the calibrated version of these files that will be produced by the ami_normalize step?
Thanks!

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Brett Graham on JIRA:

Hi Rachel,

I think there might be 2 (maybe 3) issues here.

  1. STAXYZ: Looking at the oifits paper it looks like this column should have a shape == 3. However the schema doesn't define a shape for this column (an error on my part copying the standard):
    https://github.com/spacetelescope/stdatamodels/blob/078bd1c504482f6d09118d2d240c449e4c71f99f/src/stdatamodels/jwst/datamodels/schemas/oifits.schema.yaml#L204-L205
    However, stdatamodels does not appear to complain if a shape 3 array is assigned to this column. You will however have to modify the dtype in the example code you shared (thanks for sharing that!). Specifically, m.array.dtype reports as:
dtype((numpy.record, [('TEL_NAME', 'S16'), ('STA_NAME', 'S16'), ('STA_INDEX', '<i2'), ('DIAMETER', '<f4'), ('STAXYZ', '<f8', (3,)), ('FOV', '<f8'), ('FOVTYPE', 'S6'), ('CTRS_EQT', '<f8', (2,))]))

The easiest way that I know of to defined a dtype for a structured array is to define it as a list of tuples. So instead of using m.array.dtype it should work to define a compatible dtype that sets the 'STAXYZ' column to shape 3 as follows:

modified_dtype = np.dtype([('TEL_NAME', 'S16'), ('STA_NAME', 'S16'), ('STA_INDEX', '<i2'), ('DIAMETER', '<f4'), ('STAXYZ', '<f8', (3,)), ('FOV', '<f8'), ('FOVTYPE', 'S6'), ('CTRS_EQT', '<f8', (2,))])

and use it by modifying a few lines from your example (I won't include them all to make it more readable)

nrows = 7
m.array = np.zeros(nrows, dtype=modified_dtype)
m.array['TEL_NAME'] = np.expand_dims(np.array(self.oifits_dct['OI_ARRAY']['TEL_NAME']),axis=0)
  1. Extra 'PISTONS' and 'PIST_ERR' columns. As you pointed out these aren't in the schema which appears to be the cause of the error while assigning to m.array. The error I see when testing this includes the following: "ValueError: Column names don't match schema.". I am not sure if there is a way to add optional columns (I'll ask around) but if not this seems like a potentially useful feature. It shouldn't be an issue to update the schema to include the new columns. As these don't exist in uncalibrated files, these columns could either be filled with some 'missing data' value (I don't recall if oifits specifies a value) or two schemas could be defined. Is that the only difference between raw and calibrated files?

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Rachel Cooper on JIRA:

Hi Brett,
Thanks, by modifying the dtype and assigning each column by its name I was able to populate the model for the averaged version of the oifits file. Is updating the dtype also the best way to handle the multi-integration version, where the shapes of some columns in the different extensions are determined by the number of integrations in the exposure?
I think the only other differences between the raw and calibrated oifits files are a few keywords. I'm not aware of a "missing data" value for the oifits standard, but I think that could possibly be confusing in the case of the pistons. In the multi-integration files we currently have arrays of zeros for various error quantities, because for the averaged oifits file the errors are calculated from the statistics of those multi-integration arrays, though that may change in the future.
I can provide you with an example file that includes those piston columns, and a calibrated version as well if that would be helpful?

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Brett Graham on JIRA:

Hi Rachel,

Thanks for the follow up.

I asked around and there is not currently a way to define optional columns. I have a PR in the works that should allow for this but this will require some more testing before it's ready for review: spacetelescope/stdatamodels#189 I am hoping to start the tests later today and if all goes well open it for review. The changes in the PR should allow you to add extra columns to the oifits data tables (array, vis, vis2, etc) which the oifits paper says is allowable by the standard.

For the multi-integration version are there errors you are encountering when constructing the model? If I recall correctly, the example file you shared had some columns with shapes that violated the oifits standard (and I'm not sure if this is checked by the schema). It might make sense to run the files generated by the model against one of the oifits checkers to verify that they comply with the standard.

Another example file (or more if there were modifications to the multi-integration files to try and make them compliant) would be great if it's not too much work. If the above PR doesn't make significant progress in the next few days (if that timeline works for you) we can go with a fall-back PR that includes a new schema for the calibrated files (which should have no impact on other jwst code).

Please let me know what I can do to help with this work and feel free to reach out.

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Rachel Cooper on JIRA:

Hi Brett,
That would be awesome! We may want to add more columns in the future so the flexibility allowed by your proposed changes would be great. When trying to populate the datamodel for the multi-int case, I encountered errors like "ValueError: could not broadcast input array from shape (21,3) into shape (21,)." 3 is the number of integrations in this case. Those are probably the array shapes that may technically violate the standard, but we (and others in the field) have started using these type files more. I think you got around it by duplicating the number of rows in the wavelength extension to match the number of ints, and expanding the FLAG columns to match. I could do that for the multi-int case, but we would still have to update the dtype on the spot I think?
I uploaded two files:
jw01242001001_03102_00001_nis.oifits includes the PISTONS and PIST_ERR columns in the OI_ARRAY extension.  I haven't yet produced a version that changes the shape of the OI_WAVELENGTH array and FLAG columns to comply with the standard. I'm also not sure if I should replace the all-zero error arrays with NaNs as you mentioned one of the oifits compliance checkers required, or if that might break things further down the line.
jw01093001001_01101_00005_nis_cal_jw01093002001_01101_00005_nis.oifits is a calibrated file that has the PISTON_T, PISTON_C, and PIST_ERR columns, and the 'CALIB' keyword in the primary header.
Your plan sounds good; I'll wait a bit to see how the PR for the additional columns progresses and otherwise we can modify the schema(s). In the meantime I'll be working on modifying the next step to take the AMIOIModels as input.
Thanks!

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Brett Graham on JIRA:

Good morning! As an update to this, the PR adding support to allow_extra_columns and with the fix to STAXYZ was just merged. Let me know if there are any issues and/or additional changes needed: spacetelescope/stdatamodels#189

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Rachel Cooper on JIRA:

Fantastic, thanks so much for doing that so quickly! I'll update my code to use those changes.

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Rachel Cooper on JIRA:

Hi Brett,
I still need to define an updated dtype with the new columns when using the "allow_extra_columns" option in the schema, right? Otherwise I still get the same "KeyError: "Key 'PISTONS' does not exist." I wanted to double-check that's the intended behavior and not some misunderstanding on my part. Thanks!

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Brett Graham on JIRA:

Hi Rachel,

That is correct. The allow_extra_columns option allows assignment of a table containing columns that aren't defined in the schema. If the detype of the 'default' table is inspected (like what will occur if you check the dtype of the attribute corresponding to the table for a file where the table hasn't been defined) then the dtype will only contain the required columns. Put another way:

m = datamodels.AmiOIModel()
nrows = 7
m.array = np.zeros(nrows, dtype=m.array.dtype)

Will generate m.array with a dtype containing only the required columns (no 'PISTONS' in this case) which may be what's leading to the error you're seeing. If you're using extra columns you'll need to define a different dtype (that contains all of the required columns and ideally the required columns first and then any optional columns, internally this should be slightly more performant although the code will attempt to reorganize columns if the optional columns are interspersed).

Let me know if you're running into any more issues and/or if more details would be helpful.

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Rachel Cooper on JIRA:

Hi Howard Bushouse, I've been able to populate the new AMI datamodels and run the updated AmiAverage and AmiNormalize steps in my environment (thanks to Brett Graham for all the datamodel help!) I can save the outputs of the steps with the save_results=True option, but now that there are three outputs from the AmiAnalyze step they get saved with '_0_amianalyzestep.fits', '_1_amianalyzestep.fits,' etc by default, at least when I run the step independently. I was looking through the code library for what I need to update to create the appropriate suffixes for the files, but I was wondering if there is a precedent for a step having multiple file outputs that I could look at?
I assume the associations generator for AMI3 will also need to be updated. For the default AMI3Pipeline flow, the averaged AmiOIModel from ami_analyze for the target will be passed along with an averaged AmiOIModel of the ref star to ami_normalize, which will produce a single calibrated AmiOIModel product. The other two products from the AmiAnalyze step, the multi-integration AmiOIModel and the AmiLGFitModel, do not go any further. For suffixes, we'd like to use '_ami.oifits', '_amimulti.oifits', and '_amilg.fits' for the averaged oifits, multi-int oifits, and image-based FITS file products of AmiAverage respectively, and then '_aminorm.oifits' for the calibrated output of AmiNormalize when the steps are run individually. I recall that there were different suffixes depending on how the pipeline was run before – if it's possible, it would be preferable to only have one file extension to work with whether it's run from the full pipeline or step-by-step. If you could give me some guidance on this when you are able that would be great. Other than this and probably some minor style fixes, I think my updates to the AMI3 pipeline are just about ready for testing.

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Richard A. Shaw on JIRA:

There seems to be a misconception running through this ticket and the lengthy discussion. The products being discussed here are FITS format, and should have the extension ".fits" as for all files in MAST that adhere to the FITS standard. The work here suggests you are following a FITS convention for optical (now, also IR) interferometry, which is fine. But the convention specifies particular content, and organization of that content within a FITS file. Such products should pass fitsverify without difficulty if the implementation is correct.

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Howard Bushouse on JIRA:

Richard A. Shaw Agreed!  From my bit of research into the OIFITS  format it is simply a convention for how to organize interferometry data within FITS-compliant files. The only thing I wasn't completely sure of is whether the file names still use the standard ".fits" extension (as opposed to ".oifits"). Perhaps they must use the standard ".fits" file name extension in order to qualify as FITS-compliant files.

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Brett Graham on JIRA:

It is also my understanding that oifits is a more strict version of fits. Every oifits file must be a valid fits file but additionally has to have certain tables, keywords, etc.

I just checked with the online oifits validator and offline with the oifitslib checker on the above linked example files after moving them to a .fits instead of .oifits extension. Both checkers validated the files without issue so it shouldn't be required (from the perspective of the oifits standard) to have these files have the .oifits extension.

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Rachel Cooper on JIRA:

Correct, OIFITS files are just FITS files with particular structure and contents, but people familiar with them would expect them to have the .oifits extension rather than just .fits. Would that be a problem for MAST?

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Richard A. Shaw on JIRA:

Yes, it would be a problem for MAST. It is used in pretty much all of MAST to convey the file format. It may be better for the "OI" case to use a new file suffix (i.e., the part of the filename that appears between the last underscore and the ".fits" extension) to indicate a new semantic type.

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Rachel Cooper on JIRA:

Would that be in addition to the new suffixes I've proposed for these files? So instead of e.g. _ami.oifits you're suggesting  _ami_oi.fits or something similar? Is it possible to have MAST interpret the .oifits extensions the same as .fits?

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Richard A. Shaw on JIRA:

I have not reviewed this ticket closely enough to notice what new file suffixes are being suggested. But the suffix itself must not contain an underscore. Hyphens (e.g., "_ami-oi.fits") are ok though, I believe.

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Rachel Cooper on JIRA:

My comment on this ticket posted on 9/1 at 3:12 PM (just before your first comment today) gives the suffixes I suggested for the updated outputs of the AMI3 steps. I think looping in Howard Bushouse into a more active discussion now would be helpful for efficiency's sake, because it sounds like simultaneous updates to the code, documentation, and MAST are all required here. 

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Brett Graham on JIRA:

stdatamodels now supports oifits files. The pipeline updates to support these files is in progress.

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Rachel Cooper on JIRA:

Hi Brett Graham, Sorry to bring this ticket back to life again, but we're trying to make the final tweaks to get the code that produces the new OIFITS files up to snuff for MAST, which apparently means that the header keywords from the input have to be copied to the outputs. I currently get this error:

2023-11-02 13:58:50,581 - stpipe.AmiAnalyzeStep - WARNING - /System/Volumes/Data/user/rcooper/Projects/NIRISS/AMI/ami3_update/stdatamodels/src/stdatamodels/validate.py:38: ValidationWarning: While validating <AmiOIModel(35,) from jw01093012001_03102_00001_nis_calints.fits> the following error occurred:'derived' is a required property Failed validating 'required' in schema['properties']['meta']['properties']['oifits']:    OrderedDict([('type', 'object'),                 ('required', ['array_name', 'instrument_mode', 'derived']),                 ('properties',                  OrderedDict([('array_name',                                OrderedDict([('title', 'Array name'),                                             ('type', 'string'),                                             ('fits_keyword', 'ARRNAME'),                                             ('fits_hdu', 'OI_ARRAY')])),                               ('instrument_mode',                                OrderedDict([('title', 'Instrument mode'),                                             ('type', 'string'),                                             ('fits_keyword', 'INSMODE')])),                               ('derived',                                OrderedDict([('type', 'object'),                                             ('required',                                              ['array',                                               'content',                                               'object',                                               'observer',                                               't3',                                               'target',                                               'vis',                                               'vis2',                                               'wavelength']),                                             ('properties',                                              OrderedDict([('array',                                                            OrderedDict([('type',                                                                          'object'),                                                                         ('required',                                                                          ...  warnings.warn(errmsg, ValidationWarning) ```
The line I used to do this was:

oifitsmodel.update(input_model, only="PRIMARY")```
 where oifitsmodel is an AmiOIModel datamodel instance returned at the end of the ami_analyze step. The same line worked to copy the input header to the AmiLGFitModel. I guess I don't understand the schema validation well enough to understand what needs to be changed here – could this be due to some of the same keywords already existing in the AmiOIModel with different values than the ones in the input datamodel? Thanks!

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Brett Graham on JIRA:

Hi Rachel,

Happy to help! I wasn't aware of update and I was able to reproduce the issue you described (thanks for the detailed description).
I think the issue might be that the mapping of existing JWST to OIFITS keywords is being done 'on_save' in the AmiOIModel (see: https://github.com/spacetelescope/stdatamodels/blob/41ed29f4f591bb3085d10e9db0c34f3b1867311e/src/stdatamodels/jwst/datamodels/amioi.py#L20). It appears that DataModel.update calls validate before returning and does not call on_save:
https://github.com/spacetelescope/stdatamodels/blob/41ed29f4f591bb3085d10e9db0c34f3b1867311e/src/stdatamodels/model_base.py#L973
which is leading to the error you're seeing (as the data model is not yet valid as all of the OIFITS keywords haven't been mapped yet). I think this should be straightforward to fix in stdatamodels (by also overriding update). Would you test with the changes in this PR to see if that addresses the issue?
spacetelescope/stdatamodels#234

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Rachel Cooper on JIRA:

Hi Brett, thanks so much for your quick action on this. I just tried the changes and that seems to have solved it! Should I leave an approving review on the PR?

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Brett Graham on JIRA:

Thanks for testing. An approving review would be great. I will open it for review and queue up a regression test run.

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Rachel Cooper on JIRA:

I think I'm not currently able to review the PR since I'm not one of the spacetelescope/stdatamodels-maintainers? I may be missing something though.

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Brett Graham on JIRA:

Thanks for letting me know. Feel free to leave a comment on the PR if it looks good and I'll ping some other maintainers for approval (to allow the branch protection to pass).

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Brett Graham on JIRA:

Howard Bushouse I am closing this ticket per the instructions https://github.com/spacetelescope/jwst/wiki/How-to-resolve-JIRA-issues under the assumption that you created this and it's finished "testing". Let me know if I made any mistakes here in closing it.

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Rachel Cooper on JIRA:

Testing done; AMI observables are now saved in "_ami-oi.fits" (also "_amimulti-oi.fits", "aminorm-oi.fits") products which adhere to the OIFITS standard. Commissioning data (PID 1093) used for testing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants