Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leaks due to NC4_provenance_init #1168

Closed
3 of 12 tasks
gsjaardema opened this issue Oct 23, 2018 · 23 comments
Closed
3 of 12 tasks

Memory leaks due to NC4_provenance_init #1168

gsjaardema opened this issue Oct 23, 2018 · 23 comments

Comments

@gsjaardema
Copy link
Contributor

Please provide as much of the following information as you can, as applicable to the issue being reported. Naturally, not all information is relevant to every issue, but the more information we have to start, the better!

Environment Information

Feel free to skip this if the issue is related to documentation, a feature request, or general discussion.

  • What platform are you using? (please provide specific distribution/version in summary)
    • Linux
    • Windows
    • OSX
    • Other
    • NA
  • 32 and/or 64 bit?
    • 32-bit
    • 64-bit
  • What build system are you using?
    • autotools (configure)
    • cmake
  • Can you provide a sample netCDF file or C code to recreate the issue?
    • Yes (please attach to this issue, thank you!)
    • No
    • Not at this time

Summary of Issue

When running the current HEAD version of NetCDF using valgrind, I get memory leaks. All leaks have NC4_provenance_init called from nc_initialize() in the valgrind call tree.

I have added a call to nc_finalize() to my code, but the leaks are still there. Should there be a NC4_provenance_finalize() or something similar call, or is there an existing function that I am not calling correctly.

Here is the list of leaks found by valgrind:

==7962==
==7962== HEAP SUMMARY:
==7962==     in use at exit: 189 bytes in 6 blocks
==7962==   total heap usage: 20,807 allocs, 20,801 frees, 9,045,994 bytes allocated
==7962==
==7962== 5 bytes in 1 blocks are still reachable in loss record 1 of 6
==7962==    at 0x4A05B05: malloc (vg_replace_malloc.c:299)
==7962==    by 0x3049081021: strdup (in /lib64/libc-2.12.so)
==7962==    by 0x5AF34E7: NC4_provenance_init (nc4info.c:78)
==7962==    by 0x5A9558C: nc_initialize (nc_initialize.c:92)
==7962==    by 0x5A9814F: NC_open (dfile.c:2209)
==7962==    by 0x5A97511: nc_open (dfile.c:822)
==7962==    by 0x52EF8D6: ex_open_int (ex_open.c:186)
==7962==    by 0x47A378: main (elb_main.C:156)
==7962==
==7962== 7 bytes in 1 blocks are still reachable in loss record 2 of 6
==7962==    at 0x4A05B05: malloc (vg_replace_malloc.c:299)
==7962==    by 0x3049081021: strdup (in /lib64/libc-2.12.so)
==7962==    by 0x5AF345D: NC4_provenance_init (nc4info.c:67)
==7962==    by 0x5A9558C: nc_initialize (nc_initialize.c:92)
==7962==    by 0x5A9814F: NC_open (dfile.c:2209)
==7962==    by 0x5A97511: nc_open (dfile.c:822)
==7962==    by 0x52EF8D6: ex_open_int (ex_open.c:186)
==7962==    by 0x47A378: main (elb_main.C:156)
==7962==
==7962== 7 bytes in 1 blocks are still reachable in loss record 3 of 6
==7962==    at 0x4A05B05: malloc (vg_replace_malloc.c:299)
==7962==    by 0x3049081021: strdup (in /lib64/libc-2.12.so)
==7962==    by 0x5AF357E: NC4_provenance_init (nc4info.c:88)
==7962==    by 0x5A9558C: nc_initialize (nc_initialize.c:92)
==7962==    by 0x5A9814F: NC_open (dfile.c:2209)
==7962==    by 0x5A97511: nc_open (dfile.c:822)
==7962==    by 0x52EF8D6: ex_open_int (ex_open.c:186)
==7962==    by 0x47A378: main (elb_main.C:156)
==7962==
==7962== 18 bytes in 1 blocks are still reachable in loss record 4 of 6
==7962==    at 0x4A05B05: malloc (vg_replace_malloc.c:299)
==7962==    by 0x3049081021: strdup (in /lib64/libc-2.12.so)
==7962==    by 0x5AF34A2: NC4_provenance_init (nc4info.c:72)
==7962==    by 0x5A9558C: nc_initialize (nc_initialize.c:92)
==7962==    by 0x5A9814F: NC_open (dfile.c:2209)
==7962==    by 0x5A97511: nc_open (dfile.c:822)
==7962==    by 0x52EF8D6: ex_open_int (ex_open.c:186)
==7962==    by 0x47A378: main (elb_main.C:156)
==7962==
==7962== 24 bytes in 1 blocks are still reachable in loss record 5 of 6
==7962==    at 0x4A05B05: malloc (vg_replace_malloc.c:299)
==7962==    by 0x5AA3856: nclistnew (nclist.c:30)
==7962==    by 0x5AF3427: NC4_provenance_init (nc4info.c:62)
==7962==    by 0x5A9558C: nc_initialize (nc_initialize.c:92)
==7962==    by 0x5A9814F: NC_open (dfile.c:2209)
==7962==    by 0x5A97511: nc_open (dfile.c:822)
==7962==    by 0x52EF8D6: ex_open_int (ex_open.c:186)
==7962==    by 0x47A378: main (elb_main.C:156)
==7962==
==7962== 128 bytes in 1 blocks are still reachable in loss record 6 of 6
==7962==    at 0x4A07897: calloc (vg_replace_malloc.c:711)
==7962==    by 0x5AA3A07: nclistsetalloc (nclist.c:76)
==7962==    by 0x5AA3D42: nclistpush (nclist.c:140)
==7962==    by 0x5AF348E: NC4_provenance_init (nc4info.c:69)
==7962==    by 0x5A9558C: nc_initialize (nc_initialize.c:92)
==7962==    by 0x5A9814F: NC_open (dfile.c:2209)
==7962==    by 0x5A97511: nc_open (dfile.c:822)
==7962==    by 0x52EF8D6: ex_open_int (ex_open.c:186)
==7962==    by 0x47A378: main (elb_main.C:156)
==7962==
==7962== LEAK SUMMARY:
==7962==    definitely lost: 0 bytes in 0 blocks
==7962==    indirectly lost: 0 bytes in 0 blocks
==7962==      possibly lost: 0 bytes in 0 blocks
==7962==    still reachable: 189 bytes in 6 blocks
==7962==         suppressed: 0 bytes in 0 blocks
==7962==
==7962== For counts of detected and suppressed errors, rerun with: -v
==7962== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 1691 from 12)## Steps to reproduce the behavior

I see that @DennisHeimbigner is working on memory leaks in the infoleak.dmh branch and the above results are with those changes already applied.

Thanks,

@edhartnett
Copy link
Contributor

@gsjaardema are you getting these errors from a netcdf test? Or some of your code?

When I run the netcdf tests with changes from @DennisHeimbigner 's branch, I get no memory leaks in netCDF tests...

@gsjaardema
Copy link
Contributor Author

@edhartnett These are from my code.

@gsjaardema
Copy link
Contributor Author

I can get the same/similar results from the following:

#include <netcdf.h>
int main()
{
  nc_initialize();
  nc_finalize();
}

Valgrind shows:

ceerws2703a:build(master)> LD_LIBRARY_PATH=../lib valgrind --leak-check=full --show-leak-kinds=all ./a.out
==17187== Memcheck, a memory error detector
==17187== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==17187== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
==17187== Command: ./a.out
==17187==
==17187==
==17187== HEAP SUMMARY:
==17187==     in use at exit: 189 bytes in 6 blocks
==17187==   total heap usage: 2,763 allocs, 2,757 frees, 227,653 bytes allocated
==17187==
==17187== 5 bytes in 1 blocks are still reachable in loss record 1 of 6
==17187==    at 0x4A05B05: malloc (vg_replace_malloc.c:299)
==17187==    by 0x3049081021: strdup (in /lib64/libc-2.12.so)
==17187==    by 0x4F384E7: NC4_provenance_init (nc4info.c:78)
==17187==    by 0x4EDA58C: nc_initialize (nc_initialize.c:92)
==17187==    by 0x40062C: main (test.c:5)
==17187==
==17187== 7 bytes in 1 blocks are still reachable in loss record 2 of 6
==17187==    at 0x4A05B05: malloc (vg_replace_malloc.c:299)
==17187==    by 0x3049081021: strdup (in /lib64/libc-2.12.so)
==17187==    by 0x4F3845D: NC4_provenance_init (nc4info.c:67)
==17187==    by 0x4EDA58C: nc_initialize (nc_initialize.c:92)
==17187==    by 0x40062C: main (test.c:5)
==17187==
==17187== 7 bytes in 1 blocks are still reachable in loss record 3 of 6
==17187==    at 0x4A05B05: malloc (vg_replace_malloc.c:299)
==17187==    by 0x3049081021: strdup (in /lib64/libc-2.12.so)
==17187==    by 0x4F3857E: NC4_provenance_init (nc4info.c:88)
==17187==    by 0x4EDA58C: nc_initialize (nc_initialize.c:92)
==17187==    by 0x40062C: main (test.c:5)
==17187==
==17187== 18 bytes in 1 blocks are still reachable in loss record 4 of 6
==17187==    at 0x4A05B05: malloc (vg_replace_malloc.c:299)
==17187==    by 0x3049081021: strdup (in /lib64/libc-2.12.so)
==17187==    by 0x4F384A2: NC4_provenance_init (nc4info.c:72)
==17187==    by 0x4EDA58C: nc_initialize (nc_initialize.c:92)
==17187==    by 0x40062C: main (test.c:5)
==17187==
==17187== 24 bytes in 1 blocks are still reachable in loss record 5 of 6
==17187==    at 0x4A05B05: malloc (vg_replace_malloc.c:299)
==17187==    by 0x4EE8856: nclistnew (nclist.c:30)
==17187==    by 0x4F38427: NC4_provenance_init (nc4info.c:62)
==17187==    by 0x4EDA58C: nc_initialize (nc_initialize.c:92)
==17187==    by 0x40062C: main (test.c:5)
==17187==
==17187== 128 bytes in 1 blocks are still reachable in loss record 6 of 6
==17187==    at 0x4A07897: calloc (vg_replace_malloc.c:711)
==17187==    by 0x4EE8A07: nclistsetalloc (nclist.c:76)
==17187==    by 0x4EE8D42: nclistpush (nclist.c:140)
==17187==    by 0x4F3848E: NC4_provenance_init (nc4info.c:69)
==17187==    by 0x4EDA58C: nc_initialize (nc_initialize.c:92)
==17187==    by 0x40062C: main (test.c:5)
==17187==
==17187== LEAK SUMMARY:
==17187==    definitely lost: 0 bytes in 0 blocks
==17187==    indirectly lost: 0 bytes in 0 blocks
==17187==      possibly lost: 0 bytes in 0 blocks
==17187==    still reachable: 189 bytes in 6 blocks
==17187==         suppressed: 0 bytes in 0 blocks
==17187==
==17187== For counts of detected and suppressed errors, rerun with: -v
==17187== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 4 from 4)

@DennisHeimbigner
Copy link
Collaborator

I moved the leak fixing to the leaks.tmp branch.
At this point I believe that there are two main leak
sources outstanding.

  1. dap code
  2. ncgen
    The ncgen leaks affect a number of tests across directories
    because they use ncgen. The problem occurs only for complex
    netcdf-4 files.

@gsjaardema
Copy link
Contributor Author

I still get the same leaks on the leaks.tmp branch. They are in the "still reachable" category, but would be nice if there was a way to get rid of them.

@DennisHeimbigner
Copy link
Collaborator

Unfortunately, the -fsanitize=address in gcc does not appear to show
allocated reachable memory.
In any case, are you using nc_finalize()?

@DennisHeimbigner
Copy link
Collaborator

Spoke too soon; I see your example.
The issue is tracking down all the global reachables
and freeing them in nc_finalize.
I will try to tackle that after I get all the true memory leaks fixed.

@edhartnett
Copy link
Contributor

@DennisHeimbigner it would be great if the dap code did not leak.

We have been leak free with asan for some time with --disable-dap --disable-utilities. This is what I am testing in my CI with the address santizer.

@gsjaardema
Copy link
Contributor Author

I used it to see if it helps. Here is the boiled down example:

#include <netcdf.h>
int main()
{
  nc_initialize();
  nc_finalize();
}

@gsjaardema
Copy link
Contributor Author

@DennisHeimbigner I think that tracking down the globals shouldn't be too difficult as they all arise from memory allocated in NC4_provenance_init. If I comment out the call to NC4_provenance_init and then rerun my example, I get a clean run with no leaks of any kind.

Looks like all we need is a NC4_provenance_finalize which undoes what is done in NC4_provenance_init

Maybe something like:

void
NC4_provenance_finalize(void)
{
  if(globalpropinfo.properties != NULL)
    nclistfreeall(globalpropinfo.properties);
}

@DennisHeimbigner
Copy link
Collaborator

Houston, we have a problem. The memory returned by e.g. nc_get_vara
can be a complex tree of objects when vlens are involved. Unfortunately,
ncdump does not properly reclaim all the memory for these kinds of complex
structures. So this leads to leaks.
There is no simple fix for this other than some significant changes to ncdump.
Any suggestions about what to do about this?

BTW, the current leaks.tmp passes -fsanitize except for:

  1. the ncdump situation above
  2. any tests that use ncgen

@WardF
Copy link
Member

WardF commented Oct 24, 2018

Can you outline the significant changes it would require? I'm inclined to not hold up the release on what is a bit of an edge case; vlens are used, of course, but don't appear to be a common feature.

Are the tests that use ncgen failing for the same reason? Or are they simply yet to be fixed?

@WardF
Copy link
Member

WardF commented Oct 24, 2018

At Ed's request, I've held off on the rc2 release until we get the memory errors fixed, but if we can get the bulk of them fixed (and good job on that!) for the release candidate, I'd be happy enough to put out RC2 and see if we can get them completely ironed out for the full release.

@edhartnett
Copy link
Contributor

The leaks due to nested complex types are known, at least in the attributes.

I would suggest that we simply need to keep moving forward. So the next release should have at least no leaks in nctest/nc_test/nc_test4 for --disable-dap --disable-utilities. Utilities and DAP have leaked and those leaks should not hold up the release, IMO.

However I would be very happy to see progress on leaks in these areas.

After the major issues I'm now working on (separating HDF5 from libsrc4, and lazy var reads), I can take a spin at freeing complex structs better within the library.

@DennisHeimbigner
Copy link
Collaborator

I just added globalpropinfo cleanup to leaks.tmp

@DennisHeimbigner
Copy link
Collaborator

The ncgen leaks should be fixable. I will finish that off.

Basically, I think what we need is a generalization of nc_free_vlen(s) in which
one gives a pointer to a memory chunk, the number of top-level objects in it
and the typeid of the type of the objects. This function would then
use that information to free up each of the top-level objects, including any
subobjects they contain (such as vlens).
This would be handy not only for ncdump (and ncgen) but also for user code as well.

@gsjaardema
Copy link
Contributor Author

Thanks.

@edhartnett
Copy link
Contributor

@DennisHeimbigner I believe you are seeing a manifestation of #541.

@DennisHeimbigner
Copy link
Collaborator

DennisHeimbigner commented Oct 25, 2018

The current situation is this:

  1. with utilities turned off, leaks.tmp branch runs -fsanitize clean (includinng dap2
    and dap4)
  2. I have not tested pnetcdf or hdf4 or examples
  3. I have not tried valgrind

@edhartnett
Copy link
Contributor

Examples and HDF4 are part of my CI. They're good now, on my branch that includes your recent fixes for the nc4info leak.

I will check asan parallel builds and pnetcdf builds...

@edhartnett
Copy link
Contributor

pnetcdf builds fail, but parallel builds without pnetcdf pass.

I am only using a subset of memory fixes. I suggest you merge what you have and I'll retry.

DennisHeimbigner added a commit that referenced this issue Oct 30, 2018
    #1168
    #1163
    #1162

This PR partially fixes memory leaks in the netcdf-c library,
in the ncdump utility, and in some test cases.

The netcdf-c library now runs memory clean with the assumption
that the --disable-utilities option is used. The primary remaining
problem is ncgen. Once that is fixed, I believe the netcdf-c library
will run memory clean with no limitations.

Notes
-----------
1. Memory checking was performed using gcc -fsanitize=address.
   Valgrind-based testing has yet to be performed.
2. The pnetcdf, hdf4, and examples code has not been tested.

Misc. Non-leak changes
1. Make tst_diskless2 only run when netcdf4 is enabled (issue 1162)
2. Fix CmakeLists.txt to turn off logging if ENABLE_NETCDF_4 is OFF
3. Isolated all my debug scripts into a single top-level directory
   called debug
4. Fix some USE_NETCDF4 dependencies in nc_test and nc_test4 Makefile.am
DennisHeimbigner added a commit that referenced this issue Oct 31, 2018
    #1168
    #1163
    #1162

This PR partially fixes memory leaks in the netcdf-c library,
in the ncdump utility, and in some test cases.

The netcdf-c library now runs memory clean with the assumption
that the --disable-utilities option is used. The primary remaining
problem is ncgen. Once that is fixed, I believe the netcdf-c library
will run memory clean with no limitations.

Notes
-----------
1. Memory checking was performed using gcc -fsanitize=address.
   Valgrind-based testing has yet to be performed.
2. The pnetcdf, hdf4, and examples code has not been tested.

Misc. Non-leak changes
1. Make tst_diskless2 only run when netcdf4 is enabled (issue 1162)
2. Fix CmakeLists.txt to turn off logging if ENABLE_NETCDF_4 is OFF
3. Isolated all my debug scripts into a single top-level directory
   called debug
4. Fix some USE_NETCDF4 dependencies in nc_test and nc_test4 Makefile.am
@edhartnett
Copy link
Contributor

I believe this issue is now fixed and should be closed.

@WardF
Copy link
Member

WardF commented Nov 2, 2018

I agree, will revisit if somebody comments to the contrary.

@WardF WardF closed this as completed Nov 2, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants