Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PI] Improve load/create to default to "unknown" #3708

Closed
14 tasks done
abooton opened this issue May 15, 2020 · 4 comments
Closed
14 tasks done

[PI] Improve load/create to default to "unknown" #3708

abooton opened this issue May 15, 2020 · 4 comments

Comments

@abooton
Copy link
Contributor

abooton commented May 15, 2020

Overview

As in #3585 the units should default to "unknown" when loading/creating:

Acceptance Criteria

  • All cubes and DimensionalMetadata whose units are not otherwise specified are loaded with units of "unknown". Specifically, defaults are set for:

    • Cubes
    • _DimensionalMetadata
      • Coord
      • DimCoord
      • AuxCoord
      • CellMeasure
      • AncillaryVariable
  • All cubes and DimensionalMetadata created in iris without specified units are created with units of "unknown". Specifically, defaults are set for:

    • Cubes (this is already the default)
    • _DimensionalMetadata
      • Coord
      • DimCoord
      • AuxCoord
      • CellMeasure
      • AncillaryVariable

Other useful "unit associated" items:

Correcting saving behaviour is addressed in #3394
Ancillary variables will be addressed in #3473
Improvements to flags will be addressed in #3474


Context

Units can be classified into different types (Known units vs unknown vs no-units. With dimensionless units as a subtype of known units). We interpret these types as follows:

Known dimensional units: e.g. mm
If the data's units are known and are recognized by cf, the units are loaded accordingly, used and saved to files. Known units are comprised of a prefix and unit, e.g mm. (See http://cfconventions.org/Data/cf-conventions/cf-conventions-1.7/cf-conventions.html#units section 3.1 for supported prefixes)

Known dimensionless units: e.g. 1 (number of "parts")
Is applicable to data where dimensional analysis gives a "pure ratio". The value represents the number of "parts" so is typically "1", but there are other units which are considered dimensionless such as "degrees" or "percent". Dimensionless units can also be a n arbitrary value, for example "1e-6" would indicate data is parts per million (this is similar to the concept of a prefix as mentioned above). The value will be saved to file.

"no-unit": no-unit or no_unit
This implies that units are not appropriate for the data. e.g. if data is a string. Data with "no-unit" is disallowed from arithmetic, such operations are considered inappropriate.
This concept is not described by the CF conventions (it is borrowed from cf_units) and therefore a unit of "no-unit" will not be saved to file. Unitless variables are acceptable in CF conventions.

"unknown":
The data's units have not been defined, or they are invalid (and hence are not known). This could also conceivably describe data which ought to be described by "no-unit" but for which that fact could not be determined. Generally though, this is not considered to be the case. Data with "unknown" units are allowed to be used in arithmetic but will always yield data with "unknown" units.
This concept is not described by the CF conventions (it is borrowed from cf_units) and therefore a unit of "unknown" will not be saved to file.


Reasoning behind changes

We consider "unknown" to be the safest and most appropriate unit to give in iris when there is insufficient information to determine a unit. It allows for arithmetic while preventing the creation of incorrect units. Making this change has the additional bonus of allowing the round tripping of NetCDF files containing variables whose units have been intentionally left missing (as seems to be the case when dealing with quality flags).
For quality flags specifically, we have decided that "no-unit" is the more appropriate unit since the data should not be considered to be a numerical quantity (this is similar to how we treat string type data as having "no-unit") . However, for cases like these, "unknown" is still prefereable to the previous default of "1" and therefore "unknown" is a safer default in case there are any other such unanticipated cases.

@abooton abooton added this to the v3.0.0 milestone May 15, 2020
@pp-mo pp-mo changed the title Improve load/create to default to "unknown" PI-3585: Improve load/create to default to "unknown" Jun 3, 2020
@abooton abooton changed the title PI-3585: Improve load/create to default to "unknown" Improve load/create to default to "unknown" Jun 5, 2020
@abooton abooton changed the title Improve load/create to default to "unknown" [PI] Improve load/create to default to "unknown" Jun 10, 2020
@abooton
Copy link
Contributor Author

abooton commented Aug 20, 2020

Merge of feature branch to master, see #3795

@abooton
Copy link
Contributor Author

abooton commented Aug 21, 2020

Additional "unit associated" features were addressed in:
Ancillary variables (including cell measures) and quality_flags were addressed in #3473 and #3474
The launch_ancils feature branch was merged in #3800

Units involved in cube arithmetic is addressed in #3483

@abooton
Copy link
Contributor Author

abooton commented Aug 21, 2020

All of the code changes are in place now. However, we no longer have how we are using/defining the use of "unit definitions" anywhere in our main documentation. We need to have this information (i.e the information in the Context above) in our docs so that users can look it up! It think the best place we have is in section4.2 of the user guide - when we explain to users how to use the units attribute.

This issue will complete once we add the final touches to the documenation.

@stephenworsley
Copy link
Contributor

Closed by #3803.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants