-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allowed special case for unit conversion of precipitation (kg m-2 s-1
<--> mm day-1
)
#1574
Conversation
Codecov Report
@@ Coverage Diff @@
## main #1574 +/- ##
==========================================
+ Coverage 91.47% 91.48% +0.01%
==========================================
Files 204 204
Lines 11143 11163 +20
==========================================
+ Hits 10193 10213 +20
Misses 950 950
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still needs proper handling of standard names. Also, standard names are probably the better way to identify possible conversions.
Can you elaborate on that?
I disagree. This preprocessor needs to be explicitly called by the user, so it will not transform data unexpectedly. If the user explicitly asks for the conversion to Probably most of ESMValTool's input data uses the correct standard names, but we also need to think about possible derived variables without a standard name or the usage of this preprocessor outside of ESMValTool (e.g., in a Jupyter Notebook) on data that is not fully CMOR-compliant. |
Sure. If there is a standard name, then it has associated canonical units. Changing the units without changing the standard name produces inconsistent results.
It's not about allowing it or not, but rather about having the necessary information. Suppose you want to convert (
I don't see the impact of use outside of ESMValTool; this is also not about CMOR but just the basic CF. If there is no standard name, things may be different, though in that case, I think it is easier to treat this as a data problem and apply a fix in the preprocessor. While we can consider all kinds of things outside of ESMValTool and with highly irregular data (see the Australian SILO data for a dataset that never mentions precipitation but uses "rain"), let's start by making sure that the normal use inside ESMValTool doesn't turn correct data into incorrect data. |
The CF conventions say: that "Unless it is dimensionless, a variable with a standard_name attribute must have units which are physically equivalent (not necessarily identical) to the canonical units [...]". One could definitely argue that the two units are physically equivalent here.
If we really want to go down this road with have to modify a large part of our preprocessors. Here are some examples:
|
While the language of the conventions in some places is regrettably unclear, the standard names table has a more explicit note about this at the top. In the CF community it is generally understood that units must be convertible to the canonical units by udunits.
Go ahead and open the issues. Good to track them. Of course, previous mistakes are no reason to introduce more mistakes, much less when a clear solution is available. |
The only thing I can offer is to remove the |
The code I posted in the issue contains a decent start. I imagine you have a concrete problem you are trying to solve, which might give you another entry. Why not start with those and throw an exception otherwise. Then anyone who needs a new case can add it quickly. |
I like this and I support @schlunma here - we need to allow the user to change whatever they want as long as it's within some norms and regulations - I don't see anything wrong with this, and as long as it's a user option, we should become less and less restrictive wrt what the user needs. @zklaus what you mean by
🍺 |
The use case I originally had in mind was for Also, the standard names |
@valeriupredoi's points:
I too agree that this would be very nice functionality.
Users can do whatever they want in diagnostics. In preprocessors, we have generally higher standards wrt compliance.
That's exactly the point: It's not. If there is a standard name, the units must agree with it; hence changing the units without changing the standard name produces inconsistent results.
The solution is to change the standard name along with the units. The simplest thing for us to do is to require the user to provide the standard name. Other approaches, such as a lookup table are possible. @schlunma's points:
The counterpart for
As model resolution is increased, these become important. The first we'll likely have to deal with it is in CORDEX. |
right! OK, I agree with this and I too think we should require that. About the flexibility point - higher standards for norms are good but we must find the balance between them and usability - remember the age-old argument about data being strictly CMOR, ok maybe less, fine we can just allow the user to run with only CF-compliant data if they wish - these sort of things push our tool away from users. But yeah, wasn't sure what you meant and cheers for clarifying, K-man! |
All right, I changed it now so that only |
@zklaus any further comments on this? |
Good progress! I agree with you that it would be nice to include more standard names. We could start with the inverse conversion. Suppose we set things up to have a collection of standard names with any necessary additional information that allows interconversion between any two of them. In that case, we have a flexible system that allows the easy addition of more standard names as required. I would also not frame this conversion as "allowing an exception" that cf-units is forbidding (for some nefarious reason?). This really is just something completely different, namely the conversion of one quantity (e.g. mass flux per time per area) to a very different quantity (e.g. rate in column height per time); of course a unit conversion cannot do that. On the technical side, we now have two unit conversions on the cube. That is a potentially costly operation that we can avoid by calculating a single conversion factor. |
What do you have in mind here? The system is already flexible, further standard_names can be supported by extending the
I 100% agree that cf_units should not allow that. Since this is an addition to the
Good point, will change that! |
Right now there is a clear asymmetry: You have to specify a source and a target for every conversion. To add the inverse, you need to duplicate the information. Worse, for n quantities that are all mutually convertible, you need n*(n-1) = O(n^2) entries in the table, which is more difficult to maintain than the n entries you need in a simple set of quantities. Hence my preference. Regarding your other comments, perhaps this fits better into its own preprocessor? |
I don't think so. Converting precipitation data from |
It certainly is a standard operation and I have no problem colocating it in this preprocessor. |
For units that differ by a constant factor that's not a problem, but for others (like |
I am not sure I follow. This only applies to the special quantity conversion, no? To be clear, I was referring only to the two conversions in lines 74 and 76, not to the one triggering the special treatment in line 57. |
Yes, that's what I understood. As far as can tell, in order to avoid one of them, you would need to calculate a conversion factor between the two (e.g., from |
As for efficiency, I think the impact in terms of FLOPS is likely small in any case. I am a bit more concerned with the impact on laziness and the number of additional tasks that the dask scheduler will have to deal with. Since this is an ad-hoc solution that only covers the conversions listed explicitly, and since those are only of the form mentioned, I don't think there is a problem with a conversion factor. |
I implemented all you comments, please give it another look 👍 |
kg m-2 s-1
to mm day-1
)kg m-2 s-1
<--> mm day-1
)
kg m-2 s-1
<--> mm day-1
)kg m-2 s-1
<--> mm day-1
)
@zklaus are these changes fine for you? Is there anything else I can do? |
Already looks good. I still have some comments, but need a bit more time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a look and the changes looks reasonable to me. At the end of the day, it's work that adds a new functionality to the core with respect to the past release and that is needed by people using the tool.
@zklaus all your points are very valid but considering that the code freeze is coming up, and that @schlunma kindly adressed your requests in a timely manner would it be possible to be able to merge this?
Just merging with main. If by 4PM there are no news I would suggest @schlunma creates a new PR and we merge that instead since this already has two approved reviews. |
I will merge this on Monday if there are no further comments. |
Description
This PR expands the preprocessor
convert_units
so that it supports the "special unit conversions" fromkg m-2 s-1
tomm day-1
for precipitation fluxes.Closes #1573
Link to documentation: https://esmvaltool--1574.org.readthedocs.build/projects/ESMValCore/en/1574/recipe/preprocessor.html#unit-conversion
Before you get started
Checklist
It is the responsibility of the author to make sure the pull request is ready to review. The icons indicate whether the item will be subject to the 🛠 Technical or 🧪 Scientific review.
To help with the number pull requests: