-
Notifications
You must be signed in to change notification settings - Fork 266
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Separate internal netCDF-4 data model from HDF5 code #856
Comments
The approach I was planning to take was to take each netcdf-4 structure
|
The NC_VAR_INFO_T can already represent all data in the classic and enhanced models. So I would suggest that be the universal understanding (internally) of what a var is, and functions like nc_inq_var*() are already written to work off it. So I don't think anything should be taken out. Then each user-defined format must jam its data into that model. To the extent they can do so, they take advantage of all the existing inq code that already works well. Some parts of the model will not be used (for example, NC_CLASSIC_MODEL files cannot have user-defined types), but the code is already written to handle that. If no user-defined types are present, the list can be NULL and that's fine. The current internal data model can handle all variants of the classic and enhanced netCDF models. I would like user-defined formats to have access to this existing model. I don't think there are going to be many user-defined formats that have a more complex model than HDF5, so I'm pretty sure they will all be able to easily fit within the existing netCDF-4 model. In many cases, read-only access is fine, so they can then implement a dispatch layer with just a few functions, and take advantage of all the functions that already work off the existing data model. I don't object to giving them an additional void pointer that they can use within their dispatch layer code for implementation dependent data, but I would suggest that the current netCDF enhanced data model be fully supported in the internal data structures, so that functions written to it can continue to work for many different formats, just like the inq functions currently work for both HDF4 and HDF5. Does that fit what you thinking? |
OK, I think I have a better idea of what you mean. Just so we have an example:
You mean there would be a void pointer (format_infop or something), In the case of HDF5 it would contian such things as the datasetid, dimscale_hdf5_objids, and any other values that have to with HDF5 specifics. Things like name, ndims, dim, etc., would remain as part of the netCDF internal data that is independent of format. |
Remember that the NC_VAR_INFO structure has some hdf5 |
Yes, I get it, and I agree. A void pointer to the format-dependent data elements would work well. It would clean up the data structures very nicely. |
A couple of other thoughts.
|
@DennisHeimbigner and @edhartnett here's my 2 cents worth. IMO , chunking or filters are part of the storage layer not of the netcdf4 data model layer. The netcdf3, DAP2 or DAP4 are another examples of different dispatchers (storage layers). Yes, a special API for storage layer (implementations) features would be desirable. |
We can certainly separate the chunking/filter code so that it is in the HDF5 (libsrc4) directory, instead of the new internals directory (libint). We can put chunking/filter values into the proposed HDF5 var data info (let's call it NC_HDF5_VAR_INFO_T). Recall that chunking and filters are also part of the API. So they will always be part of the libsrc4 directory for that reason. However, we can store the results in NC_HDF5_VAR_INFO_T instead. But chunking is also used by HDF4, an addition since my departure from the project, I believe. So chunking at least is general to more than one binary format. So maybe chunksizes should stay in NC_VAR_INFO_T, and filter info, and other HDF5-only values, would move into NC_HDF5_VAR_INFO_T, and NC_VAR_INFO_T will contain a void pointer to the format specific struct. |
|
Before this goes much further, we need to have |
What, you just reuse the same comment on three threads? ;-) First I will get a working implementation, then I will document it for you. It's easier that way. ;-) |
I have a pretty good prototype of these changes, which I still need to work on to get memory-clear - some leaks have crept in. But the general shape of the solution I propose is this:
In terms of documentation, I will add a (internal) documentation section about each directory (libsrc4 and libhdf5), describing what is in each, and how they work with the dispatch code. Of course I have also modified existing documentation with the code, so all the internal documentation reflects the changes. My next steps will be:
|
Here's some new documentation describing the libsrc4 directory, after HDF5 is removed: \internal \defgroup metadata_code The Internal NetCDF-4 Metadata Code The internal netCDF-4 metadata code resides in subdirectory \section Dispatch Layer use of Metadata Code Each dispatch layer may make use of the metadata code to manage the This allows a format reading layer to be easily implemented. On \section Isolation of Metadata Code from Binary Formats The libsrc4 code does not and should not include header files or types This allows the netCDF library to be built with whatever third-party \section Functions The functions which comprise the internal netadata API are defined in
The handling of internal metadata is necessaryly complex, and there \section Enhanced Model The netCDF enhanced model was specifically designed to offer (almost) However, the enhanded model is fully supported by the metadata layer, */ |
I have this working now in HPC NetCDF. It was harder than I had anticipated, with many code changes. However, it does work well and separates out the HDF5 code nicely from the metadata-handling code in libsrc4. I will put up a PR today or tomorrow with the changes. |
An update: much of this work is complete and merged. For example, the HDF5 and HDF4 specific fields have all been removed from the structs in nc4internal.h. The remaining issues have to do with the following functions:
So @DennisHeimbigner I wonder whether provenance is supposed to be just for netCDF-4/HDF5 files? That's what it seems like in the code. Also it seems that the special atts dealt with by nc4_get_att_special() are for netCDF-4/HDF5 files only. But I saw @WardF made a comment that implied you were going to make these special atts available for classic files as well. The problem is that nc4_get_att_special() is in libsrc4, but calls NC4_buildpropinfo(), which is in libhdf5/nc4info.c. If these functions are completely free of HDF5, they can be moved to libsrc4. If not, they all can be moved to libhdf5. I would like to leave most of the get_att functionality in libsrc4 (it is used by HDF4 as well), but it looks like I need to move the handling of the special atts to libhdf5. |
This work is complete and merged to master. I will close this ticket. Thanks Ward and Dennis for your patience. ;-) |
As discussed in PR #849, it will soon be possible to separate the internal netCDF-4 data model code (i.e. lists of atts, vars, and dims, and the functions that manipulate them) from the HDF5 code. I add this issue so that there is a place to discuss the worthiness and possible implementation of this goal.
This would enable user-defined formats (#177) and the HDF4 layer to work without HDF5. It would also be a useful simplification of the code which would make the libsrc4 code smaller and easier to maintain.
After this ticket has been accomplished several other opportunities present themselves:
DAP has it's own internal data model (I believe), and at some point it could be simplified by using the common data model.
The classic code in libsrc uses a separate but related implementation of the data model. This too could eventually be removed, so that all parts of the library were using the same internal data structs.
The text was updated successfully, but these errors were encountered: