-
Notifications
You must be signed in to change notification settings - Fork 264
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory leak when opening Dataset #986
Comments
Confirmed. Easier to see using the psutil module: import netCDF4, psutil, os
filename = 'test.nc'
dataset = netCDF4.Dataset(filename, "w")
dataset.createDimension("time", 10)
# The memory issue is only present if we create a variable
v = dataset.createVariable("time", "f8", ("time", ))
dataset.close()
proc = psutil.Process(os.getpid())
i = 0
while True:
dataset = netCDF4.Dataset(filename, 'r')
dataset.close()
mem = proc.memory_info().rss
print("\t Loop: {}\t mem: {}".format(i, mem))
i += 1 |
memory seems to grow linearly with loop index at a rate of 1024 bytes per iteration (regardless of how large the variable is). |
I suspect this may be coming from the C library - will have to write a C program to open and close the dataset to be sure. |
The memory increase is by the way also happening for the MFDataset method. But the Dataset and MFDataset methods probably share a lot of code. |
Here's a c program that reproduces the memory leak for me. Running in a terminal and monitoring the RSS in top I see a linear increase pretty similar to the python program. Note that the memory usage does not increase with time if the format is changed from #include <netcdf.h>
#include <stdio.h>
int main() {
int dataset_id, time_id, dummyvar_id, ret, idx;
size_t start[1] = {0};
size_t count[1] = {100};
double data[100];
for (idx = 0; idx < 100; idx++) {
data[idx]=-99;};
ret=nc_create("test.nc", NC_CLOBBER | NC_NETCDF4, &dataset_id);
if(ret!=NC_NOERR) {printf("%s\n",nc_strerror(ret)); return 1;}
ret=nc_def_dim(dataset_id, "time", NC_UNLIMITED, &time_id);
if(ret!=NC_NOERR) {printf("%s\n",nc_strerror(ret)); return 1;}
ret=nc_def_var(dataset_id, "dummy", NC_DOUBLE, 1, &time_id, &dummyvar_id);
if(ret!=NC_NOERR) {printf("%s\n",nc_strerror(ret)); return 1;}
ret=nc_put_vara(dataset_id, dummyvar_id, start, count, data);
if(ret!=NC_NOERR) {printf("%s\n",nc_strerror(ret)); return 1;}
ret=nc_close(dataset_id);
if(ret!=NC_NOERR) {printf("%s\n",nc_strerror(ret)); return 1;}
for (idx = 0; idx < 100000; idx++) {
ret=nc_open("test.nc", NC_NOWRITE, &dataset_id);
if(ret!=NC_NOERR) {printf("%s\n",nc_strerror(ret)); return 1;}
ret=nc_close(dataset_id);
if(ret!=NC_NOERR) {printf("%s\n",nc_strerror(ret)); return 1;}}
} |
Possibly related to Unidata/netcdf-c#1571 |
Let me try the C program using e.g. valgrind. |
@DennisHeimbigner did you have any luck running the C program with valgrind:-) |
I think this is being addressed in issue Unidata/netcdf-c#1575 |
Thanks for the update Dennis:-) |
What version of HDF5 was being used? |
Fixed by Unidata/netcdf-c#1634 |
It seems like there is a memory leak when we open and close a dataset. A minimal example script triggering the issue on my computer can be seen below:
The memory increase for this example is quite slow but we have seen much faster increases for realistic datasets.
I am using netCDF4 1.5.3 on an Ubuntu 18.04 machine
The text was updated successfully, but these errors were encountered: