-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
memory leak in influxdb cluster #705
Comments
Interesting, what kind of os have you used? |
ubuntu 12.04 |
This host has other things on it right? any way you can configure collectd or whatever you use to just monitor influxdb's resources? I run mine in Docker so I enabled Diamond's Memory collector (not to dissuade @prune998's problem) |
the graphs you are seeing are from diamond. the hosts are only running influxdb, plus a local logstash shipper, diamond... admin tools finaly. |
I've checked with influxdb v0.7.3 and 2 nodes, 2 replication factor, graphite plugin (tcp, udp)+ collectd but it works fine. chobie: |
first command :
second :
On the second server, the "master" where all clients are writing and where grafana, our only "reader" is querying :
and :
note that, as I haven't restarted influxdb on the second server for a long time, the SWAP is all used, taken by influxdb. |
FYI, limits in the process :
|
Hmm, Probably you should ask about this topic on influxdb mailing list https://groups.google.com/forum/#!forum/influxdb I've sent statistic API (#635) but I'm not sure that helps you as It doesn't correct graphite and protobuf server metrics right now. |
the problem is there on the first node too. It's not related to killing one. |
What does On Thu, Jul 3, 2014 at 4:09 PM, Prune [email protected] wrote:
Edward Muller |
it finaly occured :
will come back with the meminfo once all the memory is gone again... |
This is all taken in the "master" (=more used server) : cat /proc/meminfo
cat /proc/6032/limits
cat /proc/6032/status
ps aux | sort -nk +4 | tail
cat /proc/6032/io
--> don't know why so many cancelled_write_bytes... Looking at the smap, I can see the memory is taken in the heap space : cat /proc/6032/smaps
Also, I find it strange that, in the status, the FDSize is 4096... while the max limit is far more than that... In my config I have :
Since few days I also have a lot of :
slave side :
Despite of this and the RAM consumption, it seems to be working fine... both versions are 0.7.3 on linux Ubuntu 10.04 :
|
I use it currently for a big test with 50k-100k metrics every 10 seconds and influxdb use currently the complete 16 gig of ram after short time. Is it a normal situation or is something leaking? The influxdb process stops sometimes without error or something else in log.txt. I use the current version 0.8.0 for my tests. Linux 3.13.0-34-generic #60-Ubuntu SMP Wed Aug 13 15:45:27 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
I'm inclined to close this issue since there's nothing actionable to do here. The issue is full of images and memory information that aren't useful in debugging the issue and is honestly getting out of hand. Without any further information I'm inclined to say this is caused by the buffer size being too big. Unfortunately, I mentioned this earlier in the thread and no one cared to test lowering the buffer size. @Kenterfie let's track this on the mailing list, can you send an email with an explanation of the setup and the configuration that you use. I honestly would like to get this problem fixed as soon as i can if someone can provide a script and a setup (i.e. cluster vs. single node, toml configuration used) that reproduce the issue I'll drop everything I'm working on and take a look immediately. Let me know on the mailing list if you can help. |
I opened another issue. Same problem on 0.8.2. Please, tell me what you need to work on this issue, as it seem you did not found it in this issue. |
I'm seeing a kind of memory leak on both of my 2 instances cluster.
![image](https://cloud.githubusercontent.com/assets/1110398/3445895/dbf0bcf4-013b-11e4-8b02-1ff73cdaaa8d.png)
On the picture below, I stopped/started the second node, which is a replica node without any query. then, memory starts to drawn... once it will be at zero, swap is going to be taken until it also reaches zero. It doesn't seem to crash when no memory or swap are available, though :
Influxdb version is 0.7.3
2 node cluster
most of the data coming by influxdb port (and a few on graphite input)
config is the same on two nodes (this is the second node, acting as a replica, no query on it) :
The text was updated successfully, but these errors were encountered: