Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flag out NaN weights in coadd & allow for weights to have different WCS than data #474

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

keflavich
Copy link
Contributor

I encountered a severe error case in which the weights array became NaN after reprojection. Flagging out locations where weights is NaN solved the issue.

I cannot come up with an MWE; this occurs deep in the guts of a complicated mosaicking attempt, and it happened for only 1 field out of 270. I think it might be the consequence of a numerical error that occurs only under unique conditions, but it also looks logically like this fix should work.

Copy link

codecov bot commented Oct 1, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 91.46%. Comparing base (9f4ad1c) to head (c83e4f1).
Report is 3 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #474      +/-   ##
==========================================
+ Coverage   91.00%   91.46%   +0.45%     
==========================================
  Files          25       25              
  Lines        1078     1089      +11     
==========================================
+ Hits          981      996      +15     
+ Misses         97       93       -4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@keflavich
Copy link
Contributor Author

Currently broken:

FAILED reproject/mosaicking/tests/test_coadd.py::TestReprojectAndCoAdd::test_coadd_with_weights[interp-False-arrays] - TypeError: input_data should either be an HDU object or a tuple of (array, WCS) or (array, Header)
FAILED reproject/mosaicking/tests/test_coadd.py::TestReprojectAndCoAdd::test_coadd_with_weights[interp-False-filenames] - ValueError: Output WCS has celestial components but input WCS does not
FAILED reproject/mosaicking/tests/test_coadd.py::TestReprojectAndCoAdd::test_coadd_with_weights[interp-False-hdus] - ValueError: Output WCS has celestial components but input WCS does not
FAILED reproject/mosaicking/tests/test_coadd.py::TestReprojectAndCoAdd::test_coadd_with_weights[interp-True-arrays] - TypeError: input_data should either be an HDU object or a tuple of (array, WCS) or (array, Header)
FAILED reproject/mosaicking/tests/test_coadd.py::TestReprojectAndCoAdd::test_coadd_with_weights[interp-True-filenames] - ValueError: Output WCS has celestial components but input WCS does not
FAILED reproject/mosaicking/tests/test_coadd.py::TestReprojectAndCoAdd::test_coadd_with_weights[interp-True-hdus] - ValueError: Output WCS has celestial components but input WCS does not
FAILED reproject/mosaicking/tests/test_coadd.py::TestReprojectAndCoAdd::test_coadd_with_weights[exact-False-arrays] - TypeError: input_data should either be an HDU object or a tuple of (array, WCS) or (array, Header)
FAILED reproject/mosaicking/tests/test_coadd.py::TestReprojectAndCoAdd::test_coadd_with_weights[exact-False-filenames] - NotImplementedError: Currently only data with a 2-d celestial WCS can be reprojected using flux-conserving algorithm
FAILED reproject/mosaicking/tests/test_coadd.py::TestReprojectAndCoAdd::test_coadd_with_weights[exact-False-hdus] - NotImplementedError: Currently only data with a 2-d celestial WCS can be reprojected using flux-conserving algorithm
FAILED reproject/mosaicking/tests/test_coadd.py::TestReprojectAndCoAdd::test_coadd_with_weights[exact-True-arrays] - TypeError: input_data should either be an HDU object or a tuple of (array, WCS) or (array, Header)
FAILED reproject/mosaicking/tests/test_coadd.py::TestReprojectAndCoAdd::test_coadd_with_weights[exact-True-filenames] - NotImplementedError: Currently only data with a 2-d celestial WCS can be reprojected using flux-conserving algorithm
FAILED reproject/mosaicking/tests/test_coadd.py::TestReprojectAndCoAdd::test_coadd_with_weights[exact-True-hdus] - NotImplementedError: Currently only data with a 2-d celestial WCS can be reprojected using flux-conserving algorithm
FAILED reproject/mosaicking/tests/test_coadd.py::test_coadd_solar_map - TypeError: input_data should either be an HDU object or a tuple of (array, WCS) or (array, Header)

@keflavich
Copy link
Contributor Author

Tests now pass locally (w/o sunpy installed...)

@keflavich
Copy link
Contributor Author

@astrofrog could you review this? There are several critical fixes for spectral-cube's cube mosaicking.

Major changes are:

  • Weights are no longer assumed to be on the same grid as the data, and they can bring their own WCS
  • input and output array dtypes are not assumed to be the same; input is swapped to match output (under the assumption that that's cheaper - but perhaps this needs to be made optional / documented)
  • bad weights are removed, which was the original point of this PR, but might have been a side-effect of the first

@keflavich
Copy link
Contributor Author

For coverage checks: Do we have a dask fits reader, or could we add that in? The byte swapping does get tested in spectral-cube tests, since those sometimes load different endian arrays, but I don't see a simple way to do that here.

Copy link
Member

@astrofrog astrofrog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks mostly good to me, but one thing I'm not sure about is whether the byteswapping is happening too early here. Could you explain what issues you have been seeing here related to this? The byteswapping should probably ideally happen inside each individual reprojection chunk in case the input isn't a dask array and to avoid doubling memory usage?

@keflavich
Copy link
Contributor Author

in case the input isn't a dask array

Technically I think this might be needed for numpy arrays too, but the FITS reader for numpy arrays already handles that?

In any case, I'll report the issues and see if we can move the byteswapping. I haven't worked up a minimal example for the dask byteswapped data; iirc, it was any dask array loaded with a fits reader

@astrofrog
Copy link
Member

@keflavich - just to check though, why is the byteswapping needed at all? Where does it crash if we don't byteswap?

@keflavich
Copy link
Contributor Author

it doesn't crash. It interprets the data as having the wrong endianness and all values end up as 2e-205 and similar.

@astrofrog
Copy link
Member

Ok interesting, is that making use of reproject_interp?

@keflavich
Copy link
Contributor Author

yes

@keflavich
Copy link
Contributor Author

so, I didn't figure out how to build a MWE within reproject, but if you comment out the _byteordermatch call and run the spectral cube tests, you'll get errors like this:

spectral_cube/tests/test_regrid.py ...................................................F.FFF............F.F.F.F.......                                                                                                                   [100%]

================================================================================================================== FAILURES ===================================================================================================================
_____________________________________________________________________________________________________ test_mosaic_cubes[True-100-False0] ______________________________________________________________________________________________________

use_memmap = False, data_adv = PosixPath('/private/var/folders/k_/7qh4l0nn72b7qgq15pkd4hw40000gt/T/pytest-of-adam/pytest-171/test_mosaic_cubes_True_100_Fal0/adv.fits'), use_dask = True, spectral_block_size = 100

    @pytest.mark.parametrize('spectral_block_size,use_memmap', ((None, False),
                                                                (100, False),
                                                                (None, True),
                                                                (100, False),
                                                                (1, True),
                                                                (1, False),
                                                                ))
    def test_mosaic_cubes(use_memmap, data_adv, use_dask, spectral_block_size):

        pytest.importorskip('reproject')

        # Read in data to use
        cube, data = cube_and_raw(data_adv, use_dask=use_dask)

        # cube is doppler-optical by default, which uses the rest wavelength,
        # which isn't auto-computed, resulting in nan pixels in the WCS transform
        cube._wcs.wcs.restwav = constants.c.to(u.m/u.s).value / cube.wcs.wcs.restfrq

        expected_header = combine_headers(cube.header, cube.header)
        expected_wcs = WCS(expected_header).celestial

        # Make two overlapping cubes of the data
        part1 = cube[:, :round(cube.shape[1]*2./3.), :]
        part2 = cube[:, round(cube.shape[1]/3.):, :]

        assert part1.wcs.wcs.restwav != 0
        assert part2.wcs.wcs.restwav != 0

        # Mosaic give the expected header.
        result = mosaic_cubes([part1, part2],
                              # order is not a mosaic_cubes argument order='nearest-neighbor',
                              target_header=cube.header,
                              # not used roundtrip_coords=False,
                              spectral_block_size=spectral_block_size,
                              save_to_tmp_dir=False,
                              verbose=True,
                              use_memmap=use_memmap,
                              method='cube')

        # Check that the shapes are the same
        assert result.shape == cube.shape

        assert repr(cube.wcs.celestial) == repr(result.wcs.celestial)

        # Check WCS in reprojected matches wcs_out
        # (comparing WCS failed for no reason we could discern)
        assert repr(expected_wcs) == repr(result.wcs.celestial)

        # Check that values of original and result are comparable
>       np.testing.assert_almost_equal(result.filled_data[:].value,
                                       cube.filled_data[:].value, decimal=9)

spectral_cube/tests/test_regrid.py:651:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../../mambaforge/envs/py312/lib/python3.12/contextlib.py:81: in inner
    return func(*args, **kwds)
../../mambaforge/envs/py312/lib/python3.12/contextlib.py:81: in inner
    return func(*args, **kwds)
../../mambaforge/envs/py312/lib/python3.12/site-packages/numpy/_utils/__init__.py:85: in wrapper
    return fun(*args, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

args = (<function assert_array_almost_equal.<locals>.compare at 0x284f77920>, array([[[ 2.68308185e+185, -2.51563734e+252],
 ...0.39923534]],

       [[0.74684203, 0.85496775],
        [0.65468046, 0.96629004],
        [0.64365533, 0.74426717]]]))
kwds = {'err_msg': '', 'header': 'Arrays are not almost equal to 9 decimals', 'precision': 9, 'verbose': True}

    @wraps(func)
    def inner(*args, **kwds):
        with self._recreate_cm():
>           return func(*args, **kwds)
E           AssertionError:
E           Arrays are not almost equal to 9 decimals
E
E           Mismatched elements: 24 / 24 (100%)
E           Max absolute difference among violations: 3.89622506e+287
E           Max relative difference among violations: 4.55716028e+287
E            ACTUAL: array([[[ 2.683081851e+185, -2.515637336e+252],
E                   [-9.039787586e+039,  3.043705986e+222],
E                   [-1.223304808e+051,  2.728961237e+167]],...
E            DESIRED: array([[[0.214686889, 0.909502371],
E                   [0.817235373, 0.903880475],
E                   [0.70687927 , 0.118076201]],...

../../mambaforge/envs/py312/lib/python3.12/contextlib.py:81: AssertionError
------------------------------------------------------------------------------------------------------------ Captured stdout call -------------------------------------------------------------------------------------------------------------
INFO: Using memory [spectral_cube.cube_utils]
INFO: Using Cube method [spectral_cube.cube_utils]
-------------------------------------------------------------------------------------------------------------- Captured log call --------------------------------------------------------------------------------------------------------------
INFO     astropy:cube_utils.py:1004 Using memory
INFO     astropy:cube_utils.py:1004 Using Cube method

@astrofrog
Copy link
Member

Ok perfect, thanks!

@astrofrog
Copy link
Member

astrofrog commented Feb 1, 2025

Ok I have a MWE with just reproject_interp, so I'll see how we can fix it:

import numpy as np
from astropy.io import fits
from astropy.wcs import WCS
from astropy.utils.data import get_pkg_data_filename
from reproject import reproject_interp
from dask import array as da

hdu1 = fits.open(get_pkg_data_filename("galactic_center/gc_2mass_k.fits"))[0]
hdu2 = fits.open(get_pkg_data_filename("galactic_center/gc_msx_e.fits"))[0]

output_array = np.zeros(
    shape=(hdu1.header["NAXIS2"], hdu1.header["NAXIS1"]), dtype=">f8"
)

data = da.from_array(hdu2.data)
wcs = WCS(hdu2.header)

reproject_interp(
    (data, wcs), hdu1.header, output_array=output_array, block_size=(50, 50)
)

print(np.nanmin(output_array), np.nanmax(output_array))

and it happens for reproject_exact too, so this is something that needs fixing in the generic dispatcher.

@astrofrog
Copy link
Member

Ok I've spotted the bug, I will open a separate PR shortly

@astrofrog
Copy link
Member

@keflavich - see #487

@keflavich keflavich force-pushed the flag_out_nan_weights branch from a28250a to 0201830 Compare February 1, 2025 18:16
@keflavich keflavich changed the title Flag out NaN weights in coadd Flag out NaN weights in coadd & allow for weights to have different WCS than data Feb 1, 2025
@astrofrog
Copy link
Member

You should be able to rebase against main now as I have merged the other PR

@astrofrog
Copy link
Member

(Once you rebase this I will review and merge ASAP)

@keflavich keflavich force-pushed the flag_out_nan_weights branch from 0201830 to 8e42fba Compare February 2, 2025 02:19
@keflavich
Copy link
Contributor Author

rebase done

Copy link
Member

@astrofrog astrofrog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks mostly good but a couple of comments - in particular we shouldn't break APE-14 support. I don't think you need to check has_celestial, if the weight's WCS isn't compatible with the mosaic WCS then an error will happen during reprojection anyway. We shouldn't ignore the WCS just because it's not celestial?

weights_in, weights_wcs = parse_input_weights(
input_weights[idata], hdu_weights=hdu_weights, return_wcs=True
)
if weights_wcs is None or not weights_wcs.has_celestial:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

weights_wcs.has_celestial breaks support for APE 14 WCS

@@ -211,7 +211,12 @@ def reproject_and_coadd(
if input_weights is None:
weights_in = None
else:
weights_in = parse_input_weights(input_weights[idata], hdu_weights=hdu_weights)
weights_in, weights_wcs = parse_input_weights(
input_weights[idata], hdu_weights=hdu_weights, return_wcs=True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the only place parse_input_weights is used so I don't think we need the kwarg, we can just always return the WCS

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fine by me, i can remove the kwarg

return input_weights.data
if return_wcs:
ww = WCS(input_weights.header)
ww = ww if ww.has_celestial else None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we again break compatibility with APE 14 WCS - why is it needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check is serving a critical role and cannot be removed, so we have to think of a replacement.

This check is asking whether the FITS header contains a WCS that should be interpreted as the WCS for the weights. Since WCS() (i.e., WCS with an empty dict as input) is still a valid WCS object but contains no transforms, we need some check here whether the WCS exists. Annoyingly, you can't just check that naxis matches ndim, since naxis defaults to 2.

Maybe...

ww = ww if ww.array_shape == input_weights.data.shape

...I'll check this...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, I don't think that works because the shape comes from the data when reading.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the APE14 way to check if there is a transformation associated with an axis?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there's any general way to get at this. All WCSes contain valid transforms, so the only solution is to use the full WCS validation. I've added that now. It's less clear for specific cases, but at least it solves the general case.

@keflavich
Copy link
Contributor Author

has_celestial isn't necessary, but we need something to check that there is a valid WCS. The default is just WCS(), and it is possible (likely) for a user to pass a FITS HDU with a weight array and no WCS information in the header. What's the appropriate way to do this check under APE14?

@keflavich
Copy link
Contributor Author

Your suggestion seems to be to just let it break downstream if there is an invalid WCS? That's fine, I guess. I'm not a fan of that strategy in general; I like to catch obvious errors where I know they'll occur.

@astrofrog
Copy link
Member

astrofrog commented Feb 3, 2025

@keflavich - I think reproject_and_coadd can be used for any arbitrary WCS that might not even be celestial? Or is that incorrect? In that sense, even ignoring APE 14 it does not make sense to restrict this?

@keflavich
Copy link
Contributor Author

right.... the cube modifications made that possible. OK, I'll get rid of has_celestial, but I think the error messages may be opaque

@keflavich
Copy link
Contributor Author

This fails now with:

E           NotImplementedError: Currently only data with a 2-d celestial WCS can be reprojected using flux-conserving algorithm

should we just skip these tests?

@keflavich
Copy link
Contributor Author

Also, ValueError: Output WCS has celestial components but input WCS does not. That's the expected error; I haven't looked more closely yet but at least the error is clear.

@astrofrog
Copy link
Member

pre-commit.ci autofix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants