Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

memory leak in influxdb cluster #705

Closed
prune998 opened this issue Jul 1, 2014 · 20 comments
Closed

memory leak in influxdb cluster #705

prune998 opened this issue Jul 1, 2014 · 20 comments

Comments

@prune998
Copy link

prune998 commented Jul 1, 2014

I'm seeing a kind of memory leak on both of my 2 instances cluster.
On the picture below, I stopped/started the second node, which is a replica node without any query. then, memory starts to drawn... once it will be at zero, swap is going to be taken until it also reaches zero. It doesn't seem to crash when no memory or swap are available, though :
image

Influxdb version is 0.7.3
2 node cluster
most of the data coming by influxdb port (and a few on graphite input)
config is the same on two nodes (this is the second node, acting as a replica, no query on it) :

bind-address = "0.0.0.0"

[logging]
level  = "warn"
file   = "/opt/data/influxdb/logs/influxdb.log"         # stdout to log to standard out

# Configure the admin server
[admin]
port   = 8083              # binding is disabled if the port isn't set
assets = "/opt/data/apps/influxdb/current/admin"

# Configure the http api
[api]
port     = 8086    # binding is disabled if the port isn't set
ssl-port = 8084    # Ssl support is enabled if you set a port and cert
ssl-cert = "/opt/data/influxdb/shared/influxdb-prod.int.axa.xx.key.pem"

read-timeout = "5s"

[input_plugins]
  # Configure the graphite api
  [input_plugins.graphite]
  enabled = true
  port = 2005
  database = "graphite"  # store graphite data in this database

# Raft configuration
[raft]
port = 8090
dir  = "/opt/data/influxdb/raft"
# election-timeout = "1s"

[storage]
dir = "/opt/data/influxdb/db"
write-buffer-size = 10000

[cluster]
seed-servers = ["camphorwood:8090"]
protobuf_port = 8099
protobuf_timeout = "2s" # the write timeout on the protobuf conn any duration parseable by time.ParseDuration
protobuf_heartbeat = "200ms" # the heartbeat interval between the servers. must be parseable by time.ParseDuration
protobuf_min_backoff = "1s" # the minimum backoff after a failed heartbeat attempt
protobuf_max_backoff = "10s" # the maxmimum backoff after a failed heartbeat attempt
write-buffer-size = 10000
max-response-buffer-size = 100000
concurrent-shard-query-limit = 10

[leveldb]
max-open-files = 10000
lru-cache-size = "512m"
max-open-shards = 0
point-batch-size = 100

[sharding]
  replication-factor = 2

  [sharding.short-term]
  duration = "7d"
  split = 1

  [sharding.long-term]
  duration = "30d"
  split = 1

[wal]
dir   = "/opt/data/influxdb/wal"
flush-after = 1000 
bookmark-after = 1000
index-after = 1000
requests-per-logfile = 10000
@chobie
Copy link
Contributor

chobie commented Jul 1, 2014

Interesting, what kind of os have you used?

@prune998
Copy link
Author

prune998 commented Jul 1, 2014

ubuntu 12.04
Linux poplar 3.8.0-33-generic #48~precise1-Ubuntu SMP Thu Oct 24 16:28:06 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

@prune998
Copy link
Author

prune998 commented Jul 1, 2014

a little bit later...
image

@prune998
Copy link
Author

prune998 commented Jul 1, 2014

just to illustrate how SWAP memory also gets used (and freed in case or restart) :
image

@damm
Copy link

damm commented Jul 1, 2014

This host has other things on it right? any way you can configure collectd or whatever you use to just monitor influxdb's resources?

I run mine in Docker so I enabled Diamond's Memory collector (not to dissuade @prune998's problem)

screen shot 2014-07-01 at 4 27 02 pm

@prune998
Copy link
Author

prune998 commented Jul 2, 2014

the graphs you are seeing are from diamond. the hosts are only running influxdb, plus a local logstash shipper, diamond... admin tools finaly.

@chobie
Copy link
Contributor

chobie commented Jul 2, 2014

I've checked with influxdb v0.7.3 and 2 nodes, 2 replication factor, graphite plugin (tcp, udp)+ collectd but it works fine.
could you paste below command results as far as you can?

chobie:$ ps aux | sort -nk +4 | tail
chobie:
$ cat /proc/[influxdb_pid]/status

@prune998
Copy link
Author

prune998 commented Jul 2, 2014

first command :

root@poplar:~# ps aux | sort -nk +4 | tail
snmp     129852  0.0  0.0  51388  3344 ?        S    May30   7:41 /usr/sbin/snmpd -Lf /dev/null -u snmp -g snmp -I -smux -p /var/run/snmpd.pid
USER        PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
5404     596241  0.0  0.1  27776  8024 pts/4    Ss   Jul01   0:00 -bash
5404     949640  0.0  0.1  27780  7144 pts/0    Ss   May21   0:00 -bash
root       1039  0.0  0.1  58448  8072 ?        Ss    2013  86:52 /usr/bin/python /usr/local/bin/supervisord -c /opt/data/apps/supervisord/etc/supervisord.conf --logfile /opt/data/apps/supervisord/log/supervisord.log --loglevel debug --nodaemon
root     596344  0.0  0.1  23756  4512 pts/4    S    Jul01   0:00 /bin/bash
syslog   317730  0.0  0.1 256456  4448 ?        Sl   Apr09   0:27 rsyslogd -c5
root     309249  0.7  0.5 464340 22360 ?        Ssl  Jun11 233:50 /usr/bin/python /usr/bin/diamond --foreground --skip-change-user --skip-fork --skip-pidfile
root     575016  0.2  3.5 1504340 144856 ?      Sl   Apr30 184:41 /opt/data/apps/java/bin/java -Xmx256M -Xms16M -jar /opt/data/apps/logstash/share/logstash.jar agent -f /opt/data/apps/logstash/conf/logstash.conf
5404     596558 28.1 21.4 3517840 867668 pts/0  Sl   Jul01 304:40 /opt/data/apps/influxdb/current/influxdb -pidfile /opt/data/apps/influxdb/shared/influxdb.pid -config /opt/data/influxdb/shared/config.toml

second :

root@poplar:~# cat /proc/596558/status
Name:   influxdb
State:  S (sleeping)
Tgid:   596558
Pid:    596558
PPid:   1
TracerPid:      0
Uid:    5404    5404    5404    5404
Gid:    5404    5404    5404    5404
FDSize: 8192
Groups: 5404 6201
VmPeak:  3598152 kB
VmSize:  3520136 kB
VmLck:         0 kB
VmPin:         0 kB
VmHWM:    882320 kB
VmRSS:    868396 kB
VmData:  1480320 kB
VmStk:       136 kB
VmExe:     10472 kB
VmLib:      4144 kB
VmPTE:      5716 kB
VmSwap:    32184 kB
Threads:        11
SigQ:   0/31482
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: 0000000000000000
SigIgn: 0000000000000001
SigCgt: ffffffffffc1fefe
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: 0000001fffffffff
Seccomp:        0
Cpus_allowed:   3
Cpus_allowed_list:      0-1
Mems_allowed:   00000000,00000001
Mems_allowed_list:      0
voluntary_ctxt_switches:        6856291
nonvoluntary_ctxt_switches:     1951645

On the second server, the "master" where all clients are writing and where grafana, our only "reader" is querying :

root@camphorwood:/opt/data/influxdb# ps aux | sort -nk +4 | tail
root          8  0.0  0.0      0     0 ?        S    Apr09   0:00 [migration/0]
root          9  0.0  0.0      0     0 ?        S    Apr09   0:00 [rcu_bh]
snmp     199733  0.0  0.0  51404  2540 ?        S    May30   7:15 /usr/sbin/snmpd -Lf /dev/null -u snmp -g snmp -I -smux -p /var/run/snmpd.pid
USER        PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
5404     168948  0.0  0.1  81040  4472 ?        S    May28   2:41 nginx: worker process
root       1241  0.0  0.1  58292  7500 ?        Ss   Apr09  36:17 /usr/bin/python /usr/local/bin/supervisord -c /opt/data/apps/supervisord/etc/supervisord.conf --logfile /opt/data/apps/supervisord/log/supervisord.log --loglevel info --nodaemon
syslog      500  0.0  0.1 256456  5640 ?        Sl   Apr09   0:34 rsyslogd -c5
root     399546  0.2  0.6 464276 25656 ?        Ssl  Jun13  57:43 /usr/bin/python /usr/bin/diamond --foreground --skip-change-user --skip-fork --skip-pidfile
root      56450  0.2  2.1 1504352 85520 ?       Sl   Apr30 186:17 /opt/data/apps/java/bin/java -Xmx256M -Xms16M -jar /opt/data/apps/logstash/share/logstash.jar agent -f /opt/data/apps/logstash/conf/logstash.conf
5404     471172 40.1 61.7 6485020 2498432 ?     Sl   Jun17 8553:14 /opt/data/apps/influxdb/current/influxdb -pidfile /opt/data/apps/influxdb/shared/influxdb.pid -config /opt/data/influxdb/shared/config.toml

and :

root@camphorwood:/opt/data/influxdb# cat /proc/471172/status
Name:   influxdb
State:  S (sleeping)
Tgid:   471172
Pid:    471172
PPid:   1
TracerPid:      0
Uid:    5404    5404    5404    5404
Gid:    5404    5404    5404    5404
FDSize: 32768
Groups: 5404 6201
VmPeak:  6512156 kB
VmSize:  6485020 kB
VmLck:         0 kB
VmPin:         0 kB
VmHWM:   2616308 kB
VmRSS:   2498316 kB
VmData:  4276756 kB
VmStk:       136 kB
VmExe:     10472 kB
VmLib:      4144 kB
VmPTE:     11308 kB
VmSwap:   871544 kB
Threads:        24
SigQ:   0/31479
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: 0000000000000000
SigIgn: 0000000000000001
SigCgt: ffffffffffc1fefe
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: 0000001fffffffff
Seccomp:        0
Cpus_allowed:   3
Cpus_allowed_list:      0-1
Mems_allowed:   00000000,00000001
Mems_allowed_list:      0
voluntary_ctxt_switches:        135993796
nonvoluntary_ctxt_switches:     86164564

note that, as I haven't restarted influxdb on the second server for a long time, the SWAP is all used, taken by influxdb.

@prune998
Copy link
Author

prune998 commented Jul 2, 2014

FYI, limits in the process :

root@poplar:~# cat /proc/596558/limits
Limit                     Soft Limit           Hard Limit           Units
Max cpu time              unlimited            unlimited            seconds
Max file size             unlimited            unlimited            bytes
Max data size             unlimited            unlimited            bytes
Max stack size            8388608              unlimited            bytes
Max core file size        0                    10240000             bytes
Max resident set          unlimited            unlimited            bytes
Max processes             31482                31482                processes
Max open files            64000                64000                files
Max locked memory         65536                65536                bytes
Max address space         unlimited            unlimited            bytes
Max file locks            unlimited            unlimited            locks
Max pending signals       31482                31482                signals
Max msgqueue size         819200               819200               bytes
Max nice priority         0                    0
Max realtime priority     0                    0
Max realtime timeout      unlimited            unlimited            us

@prune998
Copy link
Author

prune998 commented Jul 3, 2014

here is the graph after 3 days... talks for itself...
image

While I have no clue on how to dig any further, any idea welcome...

@chobie
Copy link
Contributor

chobie commented Jul 3, 2014

Hmm, Probably you should ask about this topic on influxdb mailing list https://groups.google.com/forum/#!forum/influxdb
Some people running InfluxDB cluster so they might know what is the problem.

I've sent statistic API (#635) but I'm not sure that helps you as It doesn't correct graphite and protobuf server metrics right now.

@jvshahid
Copy link
Contributor

jvshahid commented Jul 3, 2014

@prune998 @chobie sorry for the late response. This is probably caused by the large write-buffer-size under the cluster section. Since you killed one of the nodes, the other node (that's still up) kept the data in memory that was meant to be sent to the down server. Try to lower the buffer size.

@prune998
Copy link
Author

prune998 commented Jul 3, 2014

the problem is there on the first node too. It's not related to killing one.
I'm going to lower the buffer size and restart both nodes, see what happen, but I'm pretty sure the memory will get consumed over time.

@prune998
Copy link
Author

prune998 commented Jul 3, 2014

I changed the LRU to 256m and restarted both nodes. buffer is still 10000 but does not seem huge regarding the number of expected metrics/s.
Result is still the same : huge consumption of memory on both nodes, with almost only data writes through HTTP, no query.
Camphorwood is the "master" node where every clients connects. Poplar is the "replica".
image

@freeformz
Copy link

What does cat /proc/meminfo look like on these servers?

On Thu, Jul 3, 2014 at 4:09 PM, Prune [email protected] wrote:

I changed the LRU to 256m and restarted both nodes. buffer is still 10000
but does not seem huge regarding the number of expected metrics/s.
Result is still the same : huge consumption of memory on both nodes, with
almost only data writes through HTTP, no query.
[image: image]
https://cloud.githubusercontent.com/assets/1110398/3476770/e40d202c-0306-11e4-9f68-86ac4e30227b.png


Reply to this email directly or view it on GitHub
#705 (comment).

Edward Muller
@freeformz

@prune998
Copy link
Author

prune998 commented Jul 8, 2014

it finaly occured :

[7774552.260363] Out of memory: Kill process 709610 (influxdb) score 898 or sacrifice child
[7774552.260474] Killed process 709610 (influxdb) total-vm:7519752kB, anon-rss:3665560kB, file-rss:2168kB

will come back with the meminfo once all the memory is gone again...

@prune998
Copy link
Author

This is all taken in the "master" (=more used server) :

cat /proc/meminfo

MemTotal:        4048272 kB
MemFree:         1099452 kB
Buffers:            1784 kB
Cached:            34952 kB
SwapCached:        30236 kB
Active:          2126348 kB
Inactive:         706752 kB
Active(anon):    2108260 kB
Inactive(anon):   688208 kB
Active(file):      18088 kB
Inactive(file):    18544 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:       1000444 kB
SwapFree:         727260 kB
Dirty:              5336 kB
Writeback:             0 kB
AnonPages:       2782128 kB
Mapped:            11976 kB
Shmem:                12 kB
Slab:              44580 kB
SReclaimable:      27252 kB
SUnreclaim:        17328 kB
KernelStack:        1624 kB
PageTables:        16256 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     3024580 kB
Committed_AS:    3502616 kB
VmallocTotal:   34359738367 kB
VmallocUsed:      282088 kB
VmallocChunk:   34359447064 kB
HardwareCorrupted:     0 kB
AnonHugePages:   1132544 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:       67520 kB
DirectMap2M:     4126720 kB

cat /proc/6032/limits

Limit                     Soft Limit           Hard Limit           Units
Max cpu time              unlimited            unlimited            seconds
Max file size             unlimited            unlimited            bytes
Max data size             unlimited            unlimited            bytes
Max stack size            8388608              unlimited            bytes
Max core file size        0                    10240000             bytes
Max resident set          unlimited            unlimited            bytes
Max processes             31450                31450                processes
Max open files            1048576              1048576              files
Max locked memory         65536                65536                bytes
Max address space         unlimited            unlimited            bytes
Max file locks            unlimited            unlimited            locks
Max pending signals       31450                31450                signals
Max msgqueue size         819200               819200               bytes
Max nice priority         0                    0
Max realtime priority     0                    0
Max realtime timeout      unlimited            unlimited            us

cat /proc/6032/status

Name:   influxdb
State:  S (sleeping)
Tgid:   6032
Ngid:   0
Pid:    6032
PPid:   1
TracerPid:      0
Uid:    5404    5404    5404    5404
Gid:    5404    5404    5404    5404
FDSize: 4096
Groups: 5404 6201
VmPeak:  6482416 kB
VmSize:  6415364 kB
VmLck:         0 kB
VmPin:         0 kB
VmHWM:   2855240 kB
VmRSS:   2692168 kB
VmData:  4431540 kB
VmStk:       136 kB
VmExe:     10472 kB
VmLib:      4224 kB
VmPTE:     10652 kB
VmSwap:    54964 kB
Threads:        34
SigQ:   0/31450
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: 0000000000000000
SigIgn: 0000000000000001
SigCgt: ffffffffffc1fefe
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: 0000001fffffffff
Seccomp:        0
Cpus_allowed:   3
Cpus_allowed_list:      0-1
Mems_allowed:   00000000,00000001
Mems_allowed_list:      0
voluntary_ctxt_switches:        12824517
nonvoluntary_ctxt_switches:     3912236

ps aux | sort -nk +4 | tail

root          8  0.1  0.0      0     0 ?        S    Jul09   2:45 [rcuos/0]
root          9  0.0  0.0      0     0 ?        S    Jul09   0:54 [rcuos/1]
root        977  0.0  0.0  15260   156 ?        S    Jul09   0:00 upstart-socket-bridge --daemon
snmp       1618  0.0  0.0  45600  1580 ?        S    Jul09   0:44 /usr/sbin/snmpd -Lf /dev/null -u snmp -g snmp -I -smux -p /var/run/snmpd.pid
syslog      596  0.0  0.0 262172  1036 ?        Ssl  Jul09   0:00 rsyslogd
USER        PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root       1324  0.0  0.1  58784  6456 ?        Ss   Jul09   0:29 /usr/bin/python /usr/local/bin/supervisord -c /opt/data/apps/supervisord/etc/supervisord.conf --logfile /opt/data/apps/supervisord/log/supervisord.log --loglevel info --nodaemon
root        998 27.2  0.6 454632 26764 ?        Ssl  Jul09 566:34 /usr/bin/python /usr/bin/diamond --foreground --skip-change-user --skip-fork --skip-pidfile
root       1507  0.2  1.3 1506636 55144 ?       Sl   Jul09   5:20 /opt/data/apps/java/bin/java -Xmx256M -Xms16M -jar /opt/data/apps/logstash/share/logstash.jar agent -f /opt/data/apps/logstash/conf/logstash.conf
belisar+   6032 54.0 66.5 6415364 2692148 pts/0 Sl   Jul09 777:02 /opt/data/apps/influxdb/current/influxdb -pidfile /opt/data/apps/influxdb/shared/influxdb.pid -config /opt/data/influxdb/shared/config.toml

cat /proc/6032/io

rchar: 67483263810
wchar: 110460417342
syscr: 19357180
syscw: 99278789
read_bytes: 28777271296
write_bytes: 483360903168
cancelled_write_bytes: 918810624

--> don't know why so many cancelled_write_bytes...

Looking at the smap, I can see the memory is taken in the heap space :

cat /proc/6032/smaps

00400000-00e3a000 r-xp 00000000 00:20 1212626                            /opt/data/apps/influxdb/versions/0.7.3/influxdb
Size:              10472 kB
Rss:                4444 kB
Pss:                4444 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:      4444 kB
Private_Dirty:         0 kB
Referenced:         4444 kB
Anonymous:             0 kB
AnonHugePages:         0 kB
Swap:                  0 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Locked:                0 kB
VmFlags: rd ex mr mw me dw sd
01039000-0105a000 rw-p 00a39000 00:20 1212626                            /opt/data/apps/influxdb/versions/0.7.3/influxdb
Size:                132 kB
Rss:                  88 kB
Pss:                  88 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:        52 kB
Private_Dirty:        36 kB
Referenced:           88 kB
Anonymous:            36 kB
AnonHugePages:         0 kB
Swap:                  0 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Locked:                0 kB
VmFlags: rd wr mr mw me dw ac sd
0105a000-0107b000 rw-p 00000000 00:00 0
Size:                132 kB
Rss:                 112 kB
Pss:                 112 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:       112 kB
Referenced:          108 kB
Anonymous:           112 kB
AnonHugePages:         0 kB
Swap:                  0 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Locked:                0 kB
VmFlags: rd wr mr mw me ac sd
01d95000-042e1000 rw-p 00000000 00:00 0                                  [heap]
Size:              38192 kB
Rss:               35100 kB
Pss:               35100 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:     35100 kB
Referenced:        23736 kB
Anonymous:         35100 kB
AnonHugePages:     18432 kB
Swap:               2964 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Locked:                0 kB
VmFlags: rd wr mr mw me ac sd
c000000000-c0004ed000 rw-p 00000000 00:00 0
Size:               5044 kB
Rss:                3976 kB
Pss:                3976 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:      3976 kB
Referenced:         3908 kB
Anonymous:          3976 kB
AnonHugePages:      2048 kB
Swap:               1068 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Locked:                0 kB
VmFlags: rd wr mr mw me ac sd
c206262000-c2ad9e0000 rw-p 00000000 00:00 0
Size:            2743800 kB   <------------------------------------------------------- there
Rss:             1924180 kB
Pss:             1924180 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:       320 kB
Private_Dirty:   1923860 kB
Referenced:      1907940 kB
Anonymous:       1924180 kB
AnonHugePages:    743424 kB
Swap:              42208 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Locked:                0 kB
VmFlags: rd wr mr mw me ac sd
7f4a20000000-7f4a201b2000 rw-p 00000000 00:00 0
Size:               1736 kB
Rss:                1736 kB
Pss:                1736 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:      1736 kB
Referenced:         1736 kB
Anonymous:          1736 kB
AnonHugePages:         0 kB
Swap:                  0 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Locked:                0 kB
VmFlags: rd wr mr mw me nr sd
7f4a201b2000-7f4a24000000 ---p 00000000 00:00 0
Size:              63800 kB
Rss:                   0 kB
Pss:                   0 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             0 kB
AnonHugePages:         0 kB
Swap:                  0 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Locked:                0 kB
...

Also, I find it strange that, in the status, the FDSize is 4096... while the max limit is far more than that...

In my config I have :

[leveldb]

# Maximum mmap open files, this will affect the virtual memory used by
# the process
max-open-files = 10000

# LRU cache size, LRU is used by leveldb to store contents of the
# uncompressed sstables. You can use `m` or `g` prefix for megabytes
# and gigabytes, respectively.
lru-cache-size = "256m"

Since few days I also have a lot of :
master side :

[2014/07/10 14:20:06 CEST] [EROR] (coordinator.(*ProtobufClient).readResponses:174) Error while reading messsage size: &{%!d(string=EOF)}
[2014/07/10 14:20:06 CEST] [EROR] (coordinator.(*ProtobufClient).readResponses:174) Error while reading messsage size: &{%!d(string=EOF)}
[2014/07/10 14:20:06 CEST] [EROR] (coordinator.(*ProtobufClient).readResponses:174) Error while reading messsage size: &{%!d(string=EOF)}

slave side :

[2014/07/10 14:19:31 CEST] [EROR] (coordinator.(*ProtobufServer).handleRequestTooLarge:129) request too large, dumping: 10.235.8.12:15801 (2216231)
[2014/07/10 14:19:41 CEST] [EROR] (coordinator.(*ProtobufServer).handleConnection:101) Error, closing connection: proto: required field "Type" not set

Despite of this and the RAM consumption, it seems to be working fine...

both versions are 0.7.3 on linux Ubuntu 10.04 :

Linux poplar 3.13.0-29-generic #53-Ubuntu SMP Wed Jun 4 21:00:20 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

@Kenterfie
Copy link

I use it currently for a big test with 50k-100k metrics every 10 seconds and influxdb use currently the complete 16 gig of ram after short time. Is it a normal situation or is something leaking? The influxdb process stops sometimes without error or something else in log.txt.

I use the current version 0.8.0 for my tests.

Linux 3.13.0-34-generic #60-Ubuntu SMP Wed Aug 13 15:45:27 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

@jvshahid
Copy link
Contributor

I'm inclined to close this issue since there's nothing actionable to do here. The issue is full of images and memory information that aren't useful in debugging the issue and is honestly getting out of hand. Without any further information I'm inclined to say this is caused by the buffer size being too big. Unfortunately, I mentioned this earlier in the thread and no one cared to test lowering the buffer size. @Kenterfie let's track this on the mailing list, can you send an email with an explanation of the setup and the configuration that you use. I honestly would like to get this problem fixed as soon as i can if someone can provide a script and a setup (i.e. cluster vs. single node, toml configuration used) that reproduce the issue I'll drop everything I'm working on and take a look immediately. Let me know on the mailing list if you can help.

@prune998
Copy link
Author

I opened another issue. Same problem on 0.8.2. Please, tell me what you need to work on this issue, as it seem you did not found it in this issue.

#941

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants