Fix MDI and preserve lazy data in PP saves. #2744

pp-mo · 2017-08-18T18:00:09Z

This fixes one of the test failures listed in #2717 :

expects lazy data but doesn't get it
FAIL: test_lazy_data (iris.tests.unit.fileformats.pp.test_as_pairs.TestAsFields)

Following discussion in #2738, this now abandons any use of masked-array fill_value for setting a PPField BMDI value.

Content:

the save rules no longer fetch PPField.data to get a fill-value, nor set BMDI at all.
- (before, data fetch was realising + so causing the test-fail)
the cube-to-field conversion code in save_pairs_from_cube applies the default BMDI to everything
the PPField.save() code now also gives a warning when filling masked data, if any unmasked points are equal to the fill-value.

pelson · 2017-08-21T13:08:14Z

lib/iris/fileformats/pp.py

-            data = data.filled(fill_value=self.bmdi)
+        if isinstance(data, ma.MaskedArray):
+            mdi = self.bmdi
+            if np.any(data == mdi):


How come we don't need to do precisely the same check for non-masked data? Given we are setting a fill value, there is always the risk of collisions, no?

In my head it is the "fill" operation itself which is dangerous, not setting the fill-value.
I'm arguing elsewhere that we should take mdi values appearing in un-masked data as actual missing datapoints, rather than real points with a coincidental value.

For PP, at least, that seems workable as the standard BMDI values are quite extreme.
Likewise, on PP load we automatically interpret any points that match MDI as masked : here
I think we have heard rumours that a separate missing-data value may be used in some integer type fields, which might imply a greater risk of "collision", but no standard usage has been agreed + we don't code for it.

It's rather out of scope here, but I think the netCDF4 requirements make a good case for treating masked + unmasked data differently : Otherwise, we will have to complain about any byte value of 255.

pelson · 2017-08-21T13:09:45Z

lib/iris/fileformats/pp.py

@@ -2414,6 +2422,11 @@ def save_pairs_from_cube(cube, field_coords=None, target=None):
        pp_field.lbcode = 1     # Grid code.
        pp_field.bmks = 1.0     # Some scaley thing.
        pp_field.lbproc = 0
+        # Set default MDI value.
+        # Note: *we no longer respect any fill_value of masked data arrays*.


This "documentation" is going to age really poorly. Is it necessary? Does it not apply to all of these attributes?

I will remove the historical reference !

I am concerned though that the effect of the new approach still needs be made a bit clearer :
The user should be able to see what fill-value is used + how you might control it
The developer should also be able to follow the implementation.

pelson · 2017-08-21T13:13:24Z

lib/iris/tests/unit/fileformats/pp/test_PPField.py

+        mock_warn.assert_not_called()
+
+    def test_plainarray_mdi_value_nowarning(self):
+        # Check that MDI in *unmasked* data does not raise a warning.


Seems like questionable behaviour. Why should we not raise a warning for this?

If we have a point==mdi in un-masked data, I'd assume that means the data was loaded by non-standard means, and the magic value means it really is a missing point.

If it really is missing, then let's use masks and/or NaN, not run the risk of genuine collisions.

"let's use"

The point is, what interpretation we should put on this case if it does arise.

As I said, if we apply a "no mdi values allowed" rule, then in the case of netcdf data we will generally have no way to save byte data without warnings. Unless we make that a special case, which I really don't much like.
Does this not sway you ?

pelson · 2017-08-21T13:13:34Z

lib/iris/tests/test_cdm.py

@@ -1199,7 +1199,8 @@ def test_save_and_merge(self):
            self.assertEqual(len(merged_cubes), 1, "expected a single merged cube")
            merged_cube = merged_cubes[0]
            self.assertEqual(merged_cube.dtype, dtype)
-            self.assertEqual(merged_cube.data.fill_value, fill_value)
+            # Note: original data fill-value is *ignored*.


Again, comment seems superfluous.

Sorry, it should probably say "check that original data fill-value is ignored".
I do think that is worth pointing out, though,

djkirkham · 2017-08-23T09:42:54Z

lib/iris/tests/unit/fileformats/pp/test_PPField.py

-            field.lbuser = 0
-            field.brsvd = 0
-            field.bdatum = 0
+            field = TestPPField()._ready_for_save()


I know it's nit-picky, but why make _ready_for_save a private method if you are calling it externally?

pelson · 2017-08-25T13:01:03Z

lib/iris/fileformats/pp.py

-            data = data.filled(fill_value=self.bmdi)
+        if isinstance(data, ma.MaskedArray):
+            mdi = self.bmdi
+            if np.any(data == mdi):


This is still inconsistent with the proposal at #2748 (comment). Specifically:

warn if any fill-values are present (unmasked points only, in case of masked array) explain in terms of "this won't read back the same"

pp-mo · 2017-08-25T18:31:18Z

@pelson This is still inconsistent with the proposal at #2748

It is.
I decided it will now be easier to fix this when that has gone in.

pp-mo · 2017-10-10T13:50:34Z

Current-status reminder:

I decided it will now be easier to fix this when that has gone in.

Refers #2748, but that is not a PR.

I think #2747 is our testcase for agreeing the new ideas.
I will review + re-submit this when #2747 is in the bag,

pp-mo · 2017-10-11T15:06:29Z

Fully rebased to get tests re-checking.

I will review all the comments above + see what may be outstanding ...

djkirkham · 2017-10-12T10:39:29Z

lib/iris/fileformats/pp.py

+                   "data points.")
+            warnings.warn(msg.format(mdi))
+        if isinstance(data, ma.MaskedArray):
+            if ma.is_masked(data):


ma.is_masked can be used on ndarrays too, so the instance check isn't necessary.

Except that it also qualifies the need for "data = data.data".
We could have..

if ma.is_masked(data): data = data.filled(fill_value=mdi) elif isinstance(data, ma.MaskedArray): data = data.data

But frankly I think the existing is clearer ?

Oh sorry, I missed the data = data.data. I would question the need for that actually, but I won't quibble about it.

djkirkham · 2017-10-12T10:50:26Z

lib/iris/tests/unit/fileformats/netcdf/test_Saver.py

+        elif warning_texts:
+            msg = "Unexpected warnings raised : {!r}."
+            msg = msg.format(warning_texts)
+            self.assertEqual(warning_texts, [], msg)


You've changed the way this works (perhaps intentionally); previously in the expect_warning=False case, it would only fail if a warning matching the text was raised. The reason I wrote it that way was so that if the code happened to raise some other warning it wouldn't cause the test to fail incorrectly.

You've changed the way this works (perhaps intentionally)

No, it's a mistake really.
I'm going to investigate (a) putting this as an assert routine in iris.tests.IrisTest, and (b) replacing the 'contains' with a Regexp

djkirkham · 2017-10-12T14:58:02Z

lib/iris/fileformats/pp.py

+            msg = ('PPField data contains unmasked points equal to the fill '
+                   "value, {}. As saved, these points will read back as "
+                   "missing data. To save these as normal values, please "
+                   "set the field BMDI not equal to any valid data points.")


Are you not allowing the BMDI to be specified via a keyword to iris.save?

On consideration, I didn't think it was necessary.
It's not the same as the netdcf case, because we don't have to worry about smaller integer types. So, especially if we don't provide an option to easily change BMDI, then chance collisions are extremely unlikely. Likewise, more recent version of the UM spec ("F3 document") don't actually admit the use of non-standard BMDI values.

I'm still not completely sold - I think there's an argument for providing a uniform interface for both PP and NetCDF. But I think it could easily be changed later (post Iris 2 release) without breaking anything so I won't worry about it now.

djkirkham · 2017-10-12T15:00:09Z

lib/iris/tests/test_cdm.py

@@ -1199,7 +1199,8 @@ def test_save_and_merge(self):
            self.assertEqual(len(merged_cubes), 1, "expected a single merged cube")
            merged_cube = merged_cubes[0]
            self.assertEqual(merged_cube.dtype, dtype)
-            self.assertEqual(merged_cube.data.fill_value, fill_value)
+            # Check that the original masked-array fill-value is *ignored*.
+            self.assertArrayAllClose(merged_cube.data.fill_value, -1e30)


Why not self.assertAlmostEqual?

The "assert_array_almost_equal" applies an absolute precision, 6 decimal places by default, i.e. value differences up to +/-0.000001.
That doesn't work here, you get :

Arrays are not almost equal to 6 decimals (mismatch 100.0%) x: array(-1.0000000150474662e+30, dtype=float32) y: array(-1e+30)

The problem is a mismatch between a float64 and float32 versions of "-1e30".
To get it to work, you have to apply the frankly rather ridiculous "decimal=-23", because of the enormous size of the numbers.

That's why I prefer the "assert_allclose" approach -- the use of relative as well as absolute tolerance is more flexible, and also makes for a better default behaviour.

djkirkham · 2017-10-13T08:40:18Z

lib/iris/tests/__init__.py

@@ -527,6 +528,28 @@ def assertRaisesRegexp(self, *args, **kwargs):
        return six.assertRaisesRegex(super(IrisTest_nometa, self),
                                     *args, **kwargs)

+    @contextlib.contextmanager
+    def assertGivesWarning(self, expected_regexp='', expect_warning=True):


This name could be confusing since this function can be used to check that code doesn't give a warning.

pelson · 2017-10-13T12:43:31Z

Thanks for this implementation @pp-mo.

Just as a reminder: mdi round-trip ability loss will need to be documented in the release notes.

pp-mo · 2017-10-16T10:55:41Z

mdi round-trip ability loss will need to be documented in the release notes

Whoops. Made a new project issue for that #2798

pp-mo added the dask-mask label Aug 18, 2017

pp-mo requested review from djkirkham and pelson August 18, 2017 18:00

This was referenced Aug 18, 2017

Fix other mask related failures #2717

Closed

Fix fill_value when saving to pp #2704

Closed

pelson reviewed Aug 21, 2017

View reviewed changes

djkirkham reviewed Aug 23, 2017

View reviewed changes

djkirkham approved these changes Aug 23, 2017

View reviewed changes

pelson reviewed Aug 25, 2017

View reviewed changes

djkirkham mentioned this pull request Aug 30, 2017

Fix test failures where it was expecting a masked array but it's not #2714

Closed

12 tasks

This was referenced Oct 10, 2017

Use conda-forge for Dask in Travis (not special branch download) #2762

Merged

Handle fill value on netCDF save #2747

Merged

pp-mo force-pushed the dask_pp_savepairs branch from b8de183 to 3efa90a Compare October 11, 2017 15:02

djkirkham reviewed Oct 12, 2017

View reviewed changes

pp-mo added 3 commits October 12, 2017 14:54

Add context to netcdf save warnings; rework warnings tester function.

fc05363

Make warnings checker a generic IrisTest function.

94ddc0c

Implement new approach in PP saving.

949db81

pp-mo force-pushed the dask_pp_savepairs branch from b0580c1 to 949db81 Compare October 12, 2017 14:28

djkirkham reviewed Oct 12, 2017

View reviewed changes

pp-mo assigned djkirkham Oct 12, 2017

pp-mo mentioned this pull request Oct 12, 2017

Design a common strategy for handling masked data in file i/o #2748

Closed

djkirkham reviewed Oct 13, 2017

View reviewed changes

djkirkham merged commit f6d2da7 into SciTools:dask_mask_array Oct 13, 2017

QuLogic added this to the dask-mask milestone Oct 13, 2017

pp-mo deleted the dask_pp_savepairs branch October 17, 2019 10:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix MDI and preserve lazy data in PP saves. #2744

Fix MDI and preserve lazy data in PP saves. #2744

pp-mo commented Aug 18, 2017 •

edited

Loading

pelson Aug 21, 2017

pp-mo Aug 21, 2017 •

edited

Loading

pelson Aug 21, 2017

pp-mo Aug 21, 2017

pelson Aug 21, 2017

pp-mo Aug 21, 2017

pelson Aug 21, 2017

pp-mo Aug 21, 2017 •

edited

Loading

pelson Aug 21, 2017

pp-mo Aug 21, 2017

djkirkham Aug 23, 2017

pelson Aug 25, 2017

pp-mo commented Aug 25, 2017

pp-mo commented Oct 10, 2017 •

edited

Loading

pp-mo commented Oct 11, 2017

djkirkham Oct 12, 2017

pp-mo Oct 12, 2017 •

edited

Loading

djkirkham Oct 12, 2017

djkirkham Oct 12, 2017 •

edited

Loading

pp-mo Oct 12, 2017

djkirkham Oct 12, 2017

pp-mo Oct 12, 2017

djkirkham Oct 13, 2017 •

edited

Loading

djkirkham Oct 12, 2017 •

edited

Loading

pp-mo Oct 12, 2017 •

edited

Loading

djkirkham Oct 13, 2017

pelson commented Oct 13, 2017

pp-mo commented Oct 16, 2017

Fix MDI and preserve lazy data in PP saves. #2744

Fix MDI and preserve lazy data in PP saves. #2744

Conversation

pp-mo commented Aug 18, 2017 • edited Loading

Choose a reason for hiding this comment

pp-mo Aug 21, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pp-mo Aug 21, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pp-mo commented Aug 25, 2017

pp-mo commented Oct 10, 2017 • edited Loading

pp-mo commented Oct 11, 2017

Choose a reason for hiding this comment

pp-mo Oct 12, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

djkirkham Oct 12, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

djkirkham Oct 13, 2017 • edited Loading

Choose a reason for hiding this comment

djkirkham Oct 12, 2017 • edited Loading

Choose a reason for hiding this comment

pp-mo Oct 12, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pelson commented Oct 13, 2017

pp-mo commented Oct 16, 2017

pp-mo commented Aug 18, 2017 •

edited

Loading

pp-mo Aug 21, 2017 •

edited

Loading

pp-mo Aug 21, 2017 •

edited

Loading

pp-mo commented Oct 10, 2017 •

edited

Loading

pp-mo Oct 12, 2017 •

edited

Loading

djkirkham Oct 12, 2017 •

edited

Loading

djkirkham Oct 13, 2017 •

edited

Loading

djkirkham Oct 12, 2017 •

edited

Loading

pp-mo Oct 12, 2017 •

edited

Loading