dask: `Data.hardmask`, `Data._set_dask`, `Data.to_dask_array` #399

davidhassell · 2022-05-04T14:02:15Z

Work going on in the active storage world has made it undesirable to stack up dask operations during Data.__init__ unless absolutely necessary.

Currently, after the data definition, __init__ always adds on a "harden/soften mask" operation, and in addition might add on an "astype" and/or a "where" operation.

The current thinking is that to implement active storage reductions, the dask graph can only contain the data definition. If the "astype" and "where" operations are needed then we don't want to use active storage anyway, but the hardness of the mask will not interfere with a storage-side reduction, so we don't want to add this operation by default, thereby making it very hard to tell if an active storage operation is possible, or not.

A solution is to "somewhat lazify" the setting of the mask hardness. Instead of the status quo of adding a dask graph operation every time d.hardmask = X is run, the hardmask attribute now just records the desired state of the mask hardness. It is then up to a future method to apply a mask hardening/softening operation if and only if it is necessary. At present, only two methods need this (__setitem__ and where).

Not only does this pave the way for active storage reductions to be implemented, it also (in my opinion!) makes the code cleaner, too.

…to dask-hardmask

davidhassell · 2022-05-04T14:31:59Z

It also will reduce clutter in the dask graph by removing most of the "cf_harden_mask" and "cf_soften_mask" layers, which are nearly always null-ops.

sadielbartholomew

First of all I support the new approach underlying this PR, i.e:

A solution is to "somewhat lazify" the setting of the mask hardness. Instead of the status quo of adding a dask graph operation every time d.hardmask = X is run, the hardmask attribute now just records the desired state of the mask hardness. It is then up to a future method to apply a mask hardening/softening operation if and only if it is necessary. At present, only two methods need this (setitem and where).

and agree with your motivating conclusions:

Not only does this pave the way for active storage reductions to be implemented, it also (in my opinion!) makes the code cleaner, too.

The code here is all good (no comments to make at all, in fact 💯) though your recent and final commit to synchronise with lama-to-dask here seems to have introduced (from your other recently-merged PRs) a few cases of reset_mask_hardness which need appropriate conversion, namely I am currently seeing:

======================================================================
ERROR: test_Data_masked_all (__main__.DataTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_Data.py", line 3912, in test_Data_masked_all
    d = cf.Data.masked_all(shape)
  File "/home/sadie/cf-python/cf/data/data.py", line 9348, in masked_all
    d._set_dask(dx, reset_mask_hardness=False)
TypeError: _set_dask() got an unexpected keyword argument 'reset_mask_hardness'

======================================================================
ERROR: test_Data_outerproduct (__main__.DataTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_Data.py", line 1431, in test_Data_outerproduct
    f = d.outerproduct(b)
  File "/home/sadie/cf-python/cf/data/data.py", line 96, in wrapper
    return method(*args, **kwargs)
  File "/home/sadie/cfdm/cfdm/decorators.py", line 44, in inplace_wrapper
    processed_copy = operation_method(self, *args, **kwargs)
  File "/home/sadie/cf-python/cf/decorators.py", line 62, in precede_with_kwarg_deprecation_check
    operation_method_result = operation_method(self, *args, **kwargs)
  File "/home/sadie/cf-python/cf/data/data.py", line 8793, in outerproduct
    d._set_dask(dx, reset_mask_hardness=False)
TypeError: _set_dask() got an unexpected keyword argument 'reset_mask_hardness'

----------------------------------------------------------------------

Once those are rectified (and merge conflicts managed) this is ready to merge as far as I am concerned. Thanks.

davidhassell · 2022-05-13T07:32:00Z

Thanks Sadie - conflict fixed, and merging ....

davidhassell added 8 commits May 4, 2022 14:18

hardmask refactor

055e1f9

Merge branch 'lama-to-dask' of ssh://github.com/NCAS-CMS/cf-python in…

a8ec0dd

…to dask-hardmask

Merge branch 'lama-to-dask' of ssh://github.com/NCAS-CMS/cf-python in…

b080119

…to dask-hardmask

hardmask refactor

a7e9844

hardmask refactor

d442780

Merge branch 'lama-to-dask' of ssh://github.com/NCAS-CMS/cf-python in…

8e88a38

…to dask-hardmask

upstream merge

f39755b

hardmask refactor

92cf9f5

davidhassell added the dask Relating to the use of Dask label May 4, 2022

davidhassell requested a review from sadielbartholomew May 4, 2022 14:02

davidhassell mentioned this pull request May 4, 2022

Replace LAMA with Dask: grouping methods to migrate #295

Closed

dev

79c57b9

Merge branch 'lama-to-dask' into dask-hardmask

f9ddec7

sadielbartholomew reviewed May 13, 2022

View reviewed changes

davidhassell and others added 2 commits May 13, 2022 08:29

Merge branch 'lama-to-dask' into dask-hardmask

615bcda

upstream sync

2ce2d7f

davidhassell merged commit 46b09a0 into NCAS-CMS:lama-to-dask May 13, 2022

davidhassell mentioned this pull request Jun 15, 2022

LAMA to Dask: arithmetic, logical & comparison operators #409

Merged

davidhassell deleted the dask-hardmask branch November 15, 2022 09:26

davidhassell added this to the 3.14.0 milestone Nov 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dask: `Data.hardmask`, `Data._set_dask`, `Data.to_dask_array` #399

dask: `Data.hardmask`, `Data._set_dask`, `Data.to_dask_array` #399

davidhassell commented May 4, 2022 •

edited

Loading

davidhassell commented May 4, 2022

sadielbartholomew left a comment •

edited

Loading

davidhassell commented May 13, 2022

dask: Data.hardmask, Data._set_dask, Data.to_dask_array #399

dask: Data.hardmask, Data._set_dask, Data.to_dask_array #399

Conversation

davidhassell commented May 4, 2022 • edited Loading

davidhassell commented May 4, 2022

sadielbartholomew left a comment • edited Loading

Choose a reason for hiding this comment

davidhassell commented May 13, 2022

dask: `Data.hardmask`, `Data._set_dask`, `Data.to_dask_array` #399

dask: `Data.hardmask`, `Data._set_dask`, `Data.to_dask_array` #399

davidhassell commented May 4, 2022 •

edited

Loading

sadielbartholomew left a comment •

edited

Loading