You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As an: Iris user loading NetCDF files written according to the CF Conventions.
I want: Iris to capture malformed CF information during loading, instead of crashing or disposing of it.
So that: I can fix CF problems within my Iris script - avoiding the complexity of multiple scripts/tools.
Why this is hard
There are many ways Iris crashes when it encounters bad CF. It is tempting to think of these crashes as deliberate - with easy-to-modify code blocks for each rule - and there are a few of these, but Iris is not a CF-checker. Instead we have used CF to make assumptions so that the code can be simpler/smaller; barely any of the crashes are raised from dedicated lines, and they are often hard to predict.
Any 'fix' must therefore be a form of generic error handling, which cannot have knowledge of what precisely might go wrong.
Proposal
iris.LOAD_PROBLEMS - a global object where Iris can capture objects that could not be loaded, and the error that was raised. Is it possible to capture a full stack trace object?
raw_cube_from_cf_var() - a routine that will represent any CFVariable as a very basic Cube, with as little interpretation as possible.
Separate building objects versus adding them to the Cube being loaded.
Ensure all objects - including names, units, etcetera - are contained within their own building routine.
Use try-except in the loading processes in helpers.py and actions.py:
Can't build the object? Use raw_cube_from_cf_var() and store in iris.LOAD_PROBLEMS.
Object built but can't add to the Cube? Store in iris.LOAD_PROBLEMS.
Issue a warning at the end of loading if anything is found within iris.LOAD_PROBLEMS.
Make it easier to convert Cubes - output by raw_cube_from_cf_var() - into other objects e.g. DimCoords? Documentation at the very least.
Note that I have checked cf.py and believe it can remain unchanged. This has a defensive philosophy already, which involves checking if variables can be interpreted as different types, and the remainder are all represented as CfDataVariables, so we already have an existing fallback in place. Anything here that is not formatted correctly just shows up as extra Cube(s) in the loaded CubeList.
Example structure: {"file/path/1": [(problem_object_1, error_or_stacktrace), (problem_object_1, error_or_stacktrace)]}
helpers.py:
Separate as much as possible into build_ routines. We already have many, but even getting hold of standard names etcetera should be separated in this way.
Refactor build_ routines to only perform the building - returning the built object rather than adding it to the Cube.
actions.py:
Create the raw_cube_from_cf_var() function.
Three action_ routines have success criteria and failure information. These should be refactored so that a failure falls back to raw_cube_from_cf_var(). The failure reason (already being recorded) should be captured in iris.LOAD_PROBLEMS.
Introduce a new function - capture_load_problems() maybe - that:
Attempts to call a build_ routine (passed as an argument) in a try-except
On failure: falls back on raw_cube_from_cf_var().
Attempts to add successfully built objects (e.g. DimCoord) to a Cube (passed as an argument) in a try-except
On failure: adds the built objects to iris.LOAD_PROBLEMS instead.
ALL action_ routines should be refactored to call the proposed capture_load_problems() instead of calling build_ routines directly.
It's worth noting that the existing "actions" code, in particular, is quite obscure and over-complicated.
There is definitely an opportunity to reduce complexity there by refactoring : see #6316.
That could potentially make the code changes here easier, but if so it probably wants doing separately, and beforehand.
Whether this is the right time, and worth it, is a big question.
If this all seems to be getting too complicated, then maybe we could retrospectively "go back" and consider it.
Closes #5165
User story
Why this is hard
There are many ways Iris crashes when it encounters bad CF. It is tempting to think of these crashes as deliberate - with easy-to-modify code blocks for each rule - and there are a few of these, but Iris is not a CF-checker. Instead we have used CF to make assumptions so that the code can be simpler/smaller; barely any of the crashes are raised from dedicated lines, and they are often hard to predict.
Any 'fix' must therefore be a form of generic error handling, which cannot have knowledge of what precisely might go wrong.
Proposal
iris.LOAD_PROBLEMS
- a global object where Iris can capture objects that could not be loaded, and the error that was raised. Is it possible to capture a full stack trace object?raw_cube_from_cf_var()
- a routine that will represent anyCFVariable
as a very basicCube
, with as little interpretation as possible.Cube
being loaded.try
-except
in the loading processes inhelpers.py
andactions.py
:raw_cube_from_cf_var()
and store iniris.LOAD_PROBLEMS
.Cube
? Store iniris.LOAD_PROBLEMS
.iris.LOAD_PROBLEMS
.Cube
s - output byraw_cube_from_cf_var()
- into other objects e.g.DimCoord
s? Documentation at the very least.Note that I have checked
cf.py
and believe it can remain unchanged. This has a defensive philosophy already, which involves checking if variables can be interpreted as different types, and the remainder are all represented asCfDataVariables
, so we already have an existing fallback in place. Anything here that is not formatted correctly just shows up as extraCube
(s) in the loadedCubeList
.More specifics on implementation
For reference when writing #6318 and #6319
__init__.py
:LOAD_PROBLEMS
object{"file/path/1": [(problem_object_1, error_or_stacktrace), (problem_object_1, error_or_stacktrace)]}
helpers.py
:build_
routines. We already have many, but even getting hold of standard names etcetera should be separated in this way.build_
routines to only perform the building - returning the built object rather than adding it to theCube
.actions.py
:raw_cube_from_cf_var()
function.action_
routines have success criteria and failure information. These should be refactored so that a failure falls back toraw_cube_from_cf_var()
. The failure reason (already being recorded) should be captured iniris.LOAD_PROBLEMS
.capture_load_problems()
maybe - that:build_
routine (passed as an argument) in atry
-except
raw_cube_from_cf_var()
.DimCoord
) to aCube
(passed as an argument) in atry
-except
iris.LOAD_PROBLEMS
instead.action_
routines should be refactored to call the proposedcapture_load_problems()
instead of callingbuild_
routines directly.The text was updated successfully, but these errors were encountered: