Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enabling szlib compression #51

Closed
edwardhartnett opened this issue Jan 28, 2020 · 15 comments
Closed

enabling szlib compression #51

edwardhartnett opened this issue Jan 28, 2020 · 15 comments

Comments

@edwardhartnett
Copy link
Collaborator

This issue is an offshoot of #23

@junwang-noaa here are instructions for trying szlib compression.

1 - Build HDF5 (1.10.6 for best performance) with szip. Use the --enable-szlib= option to configure. For example:
./configure --with-szlib=/usr/local/szip-2.1.1

At the end of the configure, information will be printed about the build. You should see:
I/O filters (external): deflate(zlib),szip(encoder)

2 - Rebuild netcdf-c with that HDF5 build. NetCDF will detect that szip has been included in HDF5, and you will see this in the information at the end of the configure step:

SZIP Support:           yes
SZIP Write Support:     yes
Parallel Filters:       yes

3 - In your fortran code, do not set the deflate settings, and instead call code like this:

     integer, parameter :: H5_SZIP_NN_OPTION_MASK = 32
     integer, parameter :: H5_SZIP_MAX_PIXELS_PER_BLOCK_IN = 32
     integer, parameter :: HDF5_FILTER_SZIP = 4
C     Set  filter on variable
      params(1) = H5_SZIP_NN_OPTION_MASK
      params(1) = H5_SZIP_MAX_PIXELS_PER_BLOCK_IN
      retval = nf90_def_var_filter(ncid, varid, HDF5_FILTER_SZIP, 1, params)
      if (retval .ne. nf_noerr) stop 1

Just as with nc_def_var_deflate(), this must be called for each variable you want to be compressed.

4 - When done, you can detect filtered data with ncdump -h -s, the variable will have a special attribute like this:
datasetF32:_Filter = "4,169,32,32,2500" ;

@jswhit
Copy link
Contributor

jswhit commented Jan 29, 2020

Thanks Ed. Once #23 is merged, I can create a new fork to test this.

@junwang-noaa
Copy link
Collaborator

junwang-noaa commented Jan 29, 2020 via email

@jswhit
Copy link
Contributor

jswhit commented Jan 29, 2020

is this now in netcdf-c master, or do we need your fork?

@edwardhartnett
Copy link
Collaborator Author

@jswhit everything has been merged to netcdf-c master. No need for any of my branches any longer.

@junwang-noaa
Copy link
Collaborator

junwang-noaa commented Jan 29, 2020 via email

@jswhit
Copy link
Contributor

jswhit commented Jan 29, 2020

Ed - I went ahead and tried this, and it works! However, with the default parameter settings the file sizes are very large compared to zlib

-rw-r--r-- 1 Jeffrey.S.Whitaker gsienkf  3483360063 Jan 29 19:53 dynf006.nc.zlib
-rw-r--r-- 1 Jeffrey.S.Whitaker gsienkf 17024837145 Jan 29 20:38 dynf006.nc.szip

@edwardhartnett
Copy link
Collaborator Author

But was it faster?

@jswhit
Copy link
Contributor

jswhit commented Jan 29, 2020

A little bit (30 secs vs 35 secs).

@edwardhartnett
Copy link
Collaborator Author

OK, good! So it seems to be working and giving faster times, and that's what it's supposed to do. Yes, it does not compress as much. But it's also faster to read. So its a tradeoff.

However, we can also try some other szip settings. Try using:

 integer, parameter :: H5_SZIP_NC_OPTION_MASK = 4

Use this as the szip options mask and see if it is faster or compresses more.

Also try setting H5_SZIP_MAX_PIXELS_PER_BLOCK_IN to 4 and see if that helps.

@jswhit
Copy link
Contributor

jswhit commented Jan 29, 2020

Ed - I think the szip filter may not be enabled after all. I don't see the datasetF32:_Filter attribute in the ncdump output, even though I am calling nf90_def_var_filter via

              szip_params(1) = H5_SZIP_NN_OPTION_MASK
              szip_params(2) = H5_SZIP_MAX_PIXELS_PER_BLOCK_IN
              ncerr = nf90_def_var_filter(ncid, varids(i), HDF5_FILTER_SZIP, 2, szip_params)

@jswhit
Copy link
Contributor

jswhit commented Jan 29, 2020

OK - I wasn't checking the return code. nf90_def_var_filter is actually failing with

line          224 NetCDF: Invalid argument

I'm using netcdf-fortran 4.5.2 - do I need to update from master?

@edwardhartnett
Copy link
Collaborator Author

Stand by and I will add nf90_def_var_szip to a fortran branch, and you can try that. Probably won't be ready until tomorrow...

@jswhit
Copy link
Contributor

jswhit commented Jan 29, 2020

Why doesn't nf90_def_var_filter work as in your example above?

@edwardhartnett
Copy link
Collaborator Author

I don't know. ;-) But I have to add nf90_def_var_szip to fortran, and I will add a test, and then I will know that it works. ;-)

climbfuji added a commit to climbfuji/fv3atm that referenced this issue Jun 7, 2020
Add SAR and RRFS suites for SRW App release (based on NOAA-EMC#45 by @panll)
@edwardhartnett
Copy link
Collaborator Author

OK, all this has been added to netcdf-c, do I will close this issue.

Unfortunately, szip with parallel writes is not working. ;-( I have raised an issue with the HDF5 team but it's unlikely this will get much attention in the short term.

The good news is that Charlie Zender and I are soon coming out with a new release of the CCR package, which will add more compression options to netCDF. ;-)

The netcdf-c filter behavior of 4.7.2, 4.7.3, and 4.7.4 has been a bit rocky, which explains some of the confusion about this. There have been changes in approach at Unidata and we have settled on an API and behavior which is guaranteed to be stable moving forward, starting in release 4.8.0.

ShanSunNOAA pushed a commit to ShanSunNOAA/fv3atm that referenced this issue Oct 28, 2020
…uc_init

Move RUC LSM soil variables initialization to lsm_ruc_init
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants