-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Marvel eats up Master's heap #9130
Comments
@mosiddi what version of Elasticsearch are you using? There have been some issues with certain stats calls which made them quite slow. These should have been fixed in v1.4.2. Also, do you have swap enabled? I'm wondering if you're seeing slow GCs thanks to swapping. |
hi @clintongormley, |
@mosiddi with the default settings you're bound to have problems with slow GC. Swap is the enemy of the JVM. Also, I suggest upgrading Elasticsearch before retrying your tests. A few stats issues have been fixed since then. I'll leave this open for now, until you have had a chance to rerun your tests without swap and with the latest version of Elasticsearch. |
Thanks @clintongormley ! I'll try the tests out with 1.4.2. Can you point me to instructions which details how to disable swap for ES in Windows VMs? |
@mosiddi I can't I'm afraid, but I'd just google disabling windows page file? |
Thanks @clintongormley , I'll look into this and keep you updated, |
I updated ES to latest version (1.4.2) on my ES master and can still see the same pattern. I'll turn off paging and see if that helps. |
Tried updating the VM's page file size (virtual memory) also to 0 MB and didn't see any noticeable difference in heap usage. Please note that when i was doing the experiments, I wasn't doing any admin operations on cluster and I had ~700 indices. When I set the marvel.agent.indices to only one index, the heap usage came down. |
A quick answer to:
Marvel uses the master node to issue indices stats calls (and indeed cluster stats). This means the master acts as a coordinating node for this. At the moment the indices stats calls translate to one request per shard (see #7990) which results in some load on the master and the receiving nodes. That said, marvel does the call, waits for it to complete, sleeps for 10s (default) and does it again. Even with 700 indices (7000 shards, assuming defaults) the load should be low. |
Thanks @bleskes ... The pattern I saw was on an interval of ~1 hour, heap grew from 25% to 75% and came back... and the pattern continued. Do you want any more data from my side to look further into this? |
@mosiddi do see a quick spike in memory use or just a slow growth and then a quick decline? |
slow growth (less than an hour) and a quick decline (in few minutes) |
sounds like the normal garbage collection of Java - slow growth of memory and a certain limit is reached (75% in ES) and then it's cleaned up, hence the quick drop. I'm going to close this issue, feel free to reopen if you feel there is anything else going on. |
One more interesting observation - When I stop marvel generating index aliases, and do a few set of create alias operations, I do see same memory pattern and timeouts. This time the growth is within 15 minutes range. The # of existing indexes is ~1300 in the test bed where I saw issue. |
@bleskes and @clintongormley : Though the above issue i mention is not related to Marvel but seen in same test setup... Do you have any insight on why would create alias timeout when the master CPU is within 25% and aliases creation requests are coming at rate of 5 - 10 per minute |
In one of my experiments, I was trying to see how much admin request load a master can handle before it throws the timeout exceptions.
My master machine configuration was A2 Azure VM and I had 7 node cluster (3 queries, 3 data and 1 master). I tried very simple experiment -
I was able to create ~650 indexes. What I noticed after a few hours was my master's heap had a pattern of growing from 25% to 75% and coming back to 25% every time. There were few failures as well when it was @75%. The call stack was Marvel's exporter and was doing Index Stats.
2 questions -
The text was updated successfully, but these errors were encountered: