-
Notifications
You must be signed in to change notification settings - Fork 266
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
deprecate non-relax coord bounds checking, also fix parallel builds for --enable-pnetcdf #1238
deprecate non-relax coord bounds checking, also fix parallel builds for --enable-pnetcdf #1238
Conversation
Before I accept this, @wkliao, do you have plans to make relaxed coord bounds the default in upcoming pnetcdf releases? |
Without knowing how much of our community is using this option, it feels like poor form to simply remove it and potentially break workflows. I will speak with @DennisHeimbigner about marking this option deprecated for the next release and marked for future removal. No big hurry, we are focused elsewhere for the moment. |
@WardF actually this option should be removed before anyone uses it. It's presence is a bug. It creates incompatible netCDF installations on HPC systems when used. |
Regardless, we will want to make sure we aren’t breaking any workflows by unceremoniously removing an option that was introduced when this feature was introduced. We may well remove it but not without getting an idea of the scope of impact. There is no rush, particularly since the next NetCDF release is a bit down the road. |
The original bounds checking code (written by me) was incorrect; it was written without a good understanding of advanced collective I/O programming. It returns an error for parameters that should be allowed, and are necessary in some parallel I/O cases (and harmless in all other cases). Hence the need for relaxed bounds checking. Allowing users to disable relaxed bounds checking perpetuates this original bug. There is no good reason anyone should want incorrect bounds checking. Disabling relaxed bounds checking will cause working (and correctly written) netCDF parallel I/O code to fail. This is clearly a bug. Why do we want users to be able to do this? There's no reasonable way that their existing code can depend on this bug. For full (and lengthy) discussion see #1010. Rather than continue this battle, I will change my PR to allow the user to activate disable relaxed bounds checking, but with a warning that it should not be used and that it may cause valid parallel I/O code to fail. I will also mention that it is deprecated. Then it can be fully-removed after the next release. @WardF will this be acceptable? |
Getting caught up here on the tail end of the holidays, Happy NYE everybody. @edhartnett this works for me, any further objections/thoughts on your end @wkliao ? |
@WardF I will modify this PR as discussed above. I'll wait for your current aggregated PR to get merged... Thanks~! |
OK this PR has been modified as discussed and is ready to merge to master. |
Part of #1010 (see discussion there).
Fixes #1237.
Fixes #1206.
In this PR I remove the option to disable relaxed coord bound checking. As discussed in #1010, once coord bound checking is relaxed (as it should be), it must be relaxed everywhere, or else there may be incompatible APIs between different HPC installations. Programs written where coord bounds checking is relaxed will fail on machines where it is not.
As discussed (at length) in #1010 there is no harm or impact in always relaxing coord bound checking in all cases. In that case, there can be no incompatibilities, and all existing code continues to work exactly as it has.
This PR also now enforces that pnetcdf is also built with relaxed coord bound checking. Presumably this will become default in the next pnetcdf release, until then, users must use --enable-relax-coord-bound when building netCDF. @WardF if you have pnetcdf in your CI system I'm afraid you will have to rebuild it with --enable-relax-coord-bound.
Also fixed a pnetcdf test memory leak, and some pnetcdf build issues.