Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deprecate non-relax coord bounds checking, also fix parallel builds for --enable-pnetcdf #1238

Merged
merged 7 commits into from
Jan 16, 2019

Conversation

edhartnett
Copy link
Contributor

@edhartnett edhartnett commented Dec 3, 2018

Part of #1010 (see discussion there).

Fixes #1237.
Fixes #1206.

In this PR I remove the option to disable relaxed coord bound checking. As discussed in #1010, once coord bound checking is relaxed (as it should be), it must be relaxed everywhere, or else there may be incompatible APIs between different HPC installations. Programs written where coord bounds checking is relaxed will fail on machines where it is not.

As discussed (at length) in #1010 there is no harm or impact in always relaxing coord bound checking in all cases. In that case, there can be no incompatibilities, and all existing code continues to work exactly as it has.

This PR also now enforces that pnetcdf is also built with relaxed coord bound checking. Presumably this will become default in the next pnetcdf release, until then, users must use --enable-relax-coord-bound when building netCDF. @WardF if you have pnetcdf in your CI system I'm afraid you will have to rebuild it with --enable-relax-coord-bound.

Also fixed a pnetcdf test memory leak, and some pnetcdf build issues.

@WardF
Copy link
Member

WardF commented Dec 3, 2018

Before I accept this, @wkliao, do you have plans to make relaxed coord bounds the default in upcoming pnetcdf releases?

@WardF
Copy link
Member

WardF commented Dec 3, 2018

Without knowing how much of our community is using this option, it feels like poor form to simply remove it and potentially break workflows. I will speak with @DennisHeimbigner about marking this option deprecated for the next release and marked for future removal. No big hurry, we are focused elsewhere for the moment.

@edhartnett
Copy link
Contributor Author

@WardF actually this option should be removed before anyone uses it. It's presence is a bug. It creates incompatible netCDF installations on HPC systems when used.

@WardF
Copy link
Member

WardF commented Dec 3, 2018

Regardless, we will want to make sure we aren’t breaking any workflows by unceremoniously removing an option that was introduced when this feature was introduced. We may well remove it but not without getting an idea of the scope of impact. There is no rush, particularly since the next NetCDF release is a bit down the road.

@edhartnett
Copy link
Contributor Author

The original bounds checking code (written by me) was incorrect; it was written without a good understanding of advanced collective I/O programming. It returns an error for parameters that should be allowed, and are necessary in some parallel I/O cases (and harmless in all other cases). Hence the need for relaxed bounds checking.

Allowing users to disable relaxed bounds checking perpetuates this original bug. There is no good reason anyone should want incorrect bounds checking.

Disabling relaxed bounds checking will cause working (and correctly written) netCDF parallel I/O code to fail. This is clearly a bug. Why do we want users to be able to do this? There's no reasonable way that their existing code can depend on this bug.

For full (and lengthy) discussion see #1010.

Rather than continue this battle, I will change my PR to allow the user to activate disable relaxed bounds checking, but with a warning that it should not be used and that it may cause valid parallel I/O code to fail. I will also mention that it is deprecated. Then it can be fully-removed after the next release.

@WardF will this be acceptable?

@WardF
Copy link
Member

WardF commented Dec 31, 2018

Getting caught up here on the tail end of the holidays, Happy NYE everybody. @edhartnett this works for me, any further objections/thoughts on your end @wkliao ?

@edhartnett
Copy link
Contributor Author

@WardF I will modify this PR as discussed above. I'll wait for your current aggregated PR to get merged...

Thanks~!

@edhartnett edhartnett changed the title now always relax coord bounds checking, also fix parallel builds for --enable-pnetcdf deprecate non-relax coord bounds checking, also fix parallel builds for --enable-pnetcdf Jan 2, 2019
@edhartnett
Copy link
Contributor Author

OK this PR has been modified as discussed and is ready to merge to master.

@WardF WardF mentioned this pull request Jan 15, 2019
@WardF WardF merged commit 8c896f6 into Unidata:master Jan 16, 2019
@edhartnett edhartnett deleted the ejh_relax branch February 16, 2019 17:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

make -j check fails for pnetcdf builds due to test dependencies pnetcdf build has memory leaks
2 participants