-
Notifications
You must be signed in to change notification settings - Fork 266
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some changes to HDF5 code needed to support H5Pset_all_coll_metadata_ops() call for parallel I/O #781
Comments
@gsjaardema did you see this issue? |
I don't have any good thoughts on this. We haven't seen the corruption, but we only use the NetCDF C API (although we do use Fortran, our Fortran API to Exodus calls the NetCDF C API). I do know that at large processor counts, the |
OK, so does that imply that we should give the user control over whether H5Pset_all_coll_metadata_ops() is set? Perhaps a mode flag setting? Because right not it is used for all parallel file access. |
@gsjaardema another thought is that perhaps it's time to switch to more advanced IO features. The new PIO/NetCDF integreation (now available as HPC NetCDF, V4.6.0.3 at https://github.com/HPC-NetCDF/netcdf-c) contains the PIO library. This allows you to specify that N processors will do all I/O. Other processors will (transparently to you) send the data to those N processors, who will do all the I/O under the covers. This will allow you to use parallel I/O on a reasonable number of processors. |
We are definitely planning to look into PIO/NetCDF. |
I am fine using have the |
I will continue to investigate before introducing flag. All my netCDF/PIO C tests work, the problem is in one fortran test of PIO that is doing something weird. (But nonetheless should still work.) The PIO/NetCDF merger is available as HPC NetCDF: v 4.6.0.3 contains all of your and wkliao's outstanding PRs, plus all of mine, plus the PIO merged code. Better documentation coming in future releases. |
If we can figure out the conditions under which collective is failing, then it |
A complete list is provided in the HDF5 docs. Protecting from them will require some effort, but I will probably undertake it. Essentially, instead of using collective metadata operations in the file level property list, you use it in lots of the other property lists. This more fine-grained control allows us to avoid the case they are warning about. I am actively investigating and may have reproduced this in a PIO test. (Or maybe just have a test bug. Still checking.) I will update as I know more. |
After thinking about it some, I think that protecting the offending hdf5 calls |
I don't believe the list of affected functions will change - it consists of all functions that don't accept a property list. That includes some historic functions but then the HDF5 programmers started including property lists everywhere in the HDF5 API. So I don't think that will be a problem, and I believe that converting it to what they recommend as safe is going to be safer than doing what they say we should not do, and hoping that it's not causing a problem. ;-) Doing independent access from large numbers of cores is going to be slow, because you are generating a disk access for each core independently instead of grouping your disk accesses, which would then access pages. Or if they are all reading the same bytes, then independent access would cause the read to happen repeatedly instead of once. @gsjaardema please chime in here if I've got it wrong. NetCDF/PIO will provide a good solution here by allowing the user to use parallel I/O from 1000 cores, but not from all 500K. Also want to stress that I'm still not sure that what I am working on is actually a netCDF bug. I just know it is activated because of this collective metadata call. This may turn out to be a bug in PIO or a bug in the PIO test. I will get it fully figured out before I make any changes to netCDF code. |
Ok, that makes sense. |
I have not forgotten this but would like to do it after #856, which will separate the HDF5 code from the libsrc4 code. Then I will make the changes needed for H5Pset_all_coll_metadata_ops() to work within the HDF5 team guidelines. |
An update: the original bug that triggered this issue seems to be in HDF5. There is a fix underway. However, we must still make the changes described in this ticket so that we are using HDF5 in accordance with the instructions of the HDF5 team, when parallel I/O is involved. Just because we have gotten away with it until now is not reason to close this issue. ;-) |
Tagged this 'future' since we need to push out a release in short order so that we can also get the C++ and Fortran interfaces out, and there's no need to get this into that release, but it's also not meant to mean "sometime in the future", which is how the 'future' milestone is often used. Thanks! |
No worries, I am always in favor of a release! |
I have examined the code again and I think we are already acting in accordance with HDF5 documentation WRT collective metadata access. And, as we have learned, the bug that triggered this ticket was a HDF5 bug, fixed in the 1.10.4 release. So I will close this ticket. |
In the 4.5.0 release @gsjaardema contributed a performance enhancement PR which included a call to H5Pset_all_coll_metadata_ops() to make HDF5 metadata read operations collective.
I see in the documentation for this function (https://support.hdfgroup.org/HDF5/doc/RM/RM_H5P.html#Property-SetAllCollMetadataOps):
My concern is that the default access mode is independent. (An unfortunate choice on my part. Collective would have been a better default.) And, as also noted in the documentation:
So don't we have to ensure we are never trying any independent metadata read operations or risk file corruption?
That's not very "groovy", as the kids would say these days.
All this is under investigation because turning on the collective metadata write makes the PIO fortran tests break. I have not yet been able to recreate the problem in C. The PIO issue is here:
NCAR/ParallelIO#1070.
@gsjaardema do you have any thoughts on this?
It should also be noted that I can't seem to get any failures or problems from this with a bunch of new C tests. But when the HDF5 team talks about possible corruption, that does not mean 100% of the time. Maybe my test cases are not activating the bug yet.
The text was updated successfully, but these errors were encountered: