Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Marvel eats up Master's heap #9130

Closed
mosiddi opened this issue Jan 4, 2015 · 15 comments
Closed

Marvel eats up Master's heap #9130

mosiddi opened this issue Jan 4, 2015 · 15 comments

Comments

@mosiddi
Copy link

mosiddi commented Jan 4, 2015

In one of my experiments, I was trying to see how much admin request load a master can handle before it throws the timeout exceptions.

My master machine configuration was A2 Azure VM and I had 7 node cluster (3 queries, 3 data and 1 master). I tried very simple experiment -

"Spawn 1000 create index request to master in different threads and stop all thread on seeing first exception"

I was able to create ~650 indexes. What I noticed after a few hours was my master's heap had a pattern of growing from 25% to 75% and coming back to 25% every time. There were few failures as well when it was @75%. The call stack was Marvel's exporter and was doing Index Stats.

2 questions -

  1. Does Marvel takes some more stats like index, cluster from master node when compared to other nodes?
  2. How Marvel ensures the heap it is consuming is not eating up too much?
@clintongormley
Copy link
Contributor

@mosiddi what version of Elasticsearch are you using? There have been some issues with certain stats calls which made them quite slow. These should have been fixed in v1.4.2.

Also, do you have swap enabled? I'm wondering if you're seeing slow GCs thanks to swapping.

@mosiddi
Copy link
Author

mosiddi commented Jan 5, 2015

hi @clintongormley,
We are using v1.3.4. It is a windows azure VM and we haven't set anything explicit related to swap so the default settings are present.

@clintongormley
Copy link
Contributor

@mosiddi with the default settings you're bound to have problems with slow GC. Swap is the enemy of the JVM.

Also, I suggest upgrading Elasticsearch before retrying your tests. A few stats issues have been fixed since then.

I'll leave this open for now, until you have had a chance to rerun your tests without swap and with the latest version of Elasticsearch.

@mosiddi
Copy link
Author

mosiddi commented Jan 5, 2015

Thanks @clintongormley ! I'll try the tests out with 1.4.2. Can you point me to instructions which details how to disable swap for ES in Windows VMs?

@clintongormley
Copy link
Contributor

@mosiddi I can't I'm afraid, but I'd just google disabling windows page file?

@mosiddi
Copy link
Author

mosiddi commented Jan 5, 2015

Thanks @clintongormley , I'll look into this and keep you updated,

@mosiddi
Copy link
Author

mosiddi commented Jan 6, 2015

I updated ES to latest version (1.4.2) on my ES master and can still see the same pattern. I'll turn off paging and see if that helps.

@mosiddi
Copy link
Author

mosiddi commented Jan 6, 2015

Tried updating the VM's page file size (virtual memory) also to 0 MB and didn't see any noticeable difference in heap usage.

Please note that when i was doing the experiments, I wasn't doing any admin operations on cluster and I had ~700 indices.

When I set the marvel.agent.indices to only one index, the heap usage came down.

@bleskes
Copy link
Contributor

bleskes commented Jan 7, 2015

A quick answer to:

Does Marvel takes some more stats like index, cluster from master node when compared to other nodes?

Marvel uses the master node to issue indices stats calls (and indeed cluster stats). This means the master acts as a coordinating node for this. At the moment the indices stats calls translate to one request per shard (see #7990) which results in some load on the master and the receiving nodes. That said, marvel does the call, waits for it to complete, sleeps for 10s (default) and does it again. Even with 700 indices (7000 shards, assuming defaults) the load should be low.

@mosiddi
Copy link
Author

mosiddi commented Jan 7, 2015

Thanks @bleskes ... The pattern I saw was on an interval of ~1 hour, heap grew from 25% to 75% and came back... and the pattern continued.

Do you want any more data from my side to look further into this?

@bleskes
Copy link
Contributor

bleskes commented Jan 7, 2015

@mosiddi do see a quick spike in memory use or just a slow growth and then a quick decline?

@mosiddi
Copy link
Author

mosiddi commented Jan 7, 2015

slow growth (less than an hour) and a quick decline (in few minutes)

@bleskes
Copy link
Contributor

bleskes commented Jan 7, 2015

sounds like the normal garbage collection of Java - slow growth of memory and a certain limit is reached (75% in ES) and then it's cleaned up, hence the quick drop. I'm going to close this issue, feel free to reopen if you feel there is anything else going on.

@bleskes bleskes closed this as completed Jan 7, 2015
@mosiddi
Copy link
Author

mosiddi commented Jan 7, 2015

One more interesting observation - When I stop marvel generating index aliases, and do a few set of create alias operations, I do see same memory pattern and timeouts. This time the growth is within 15 minutes range.

The # of existing indexes is ~1300 in the test bed where I saw issue.

@mosiddi
Copy link
Author

mosiddi commented Jan 7, 2015

@bleskes and @clintongormley : Though the above issue i mention is not related to Marvel but seen in same test setup... Do you have any insight on why would create alias timeout when the master CPU is within 25% and aliases creation requests are coming at rate of 5 - 10 per minute

@mosiddi mosiddi changed the title Marvel eats up Master's heap Index Create requests eats up heap and timesout (was Marvel eats up Master's heap) Jan 7, 2015
@mosiddi mosiddi changed the title Index Create requests eats up heap and timesout (was Marvel eats up Master's heap) Index Alias Create requests eats up heap and timesout (was Marvel eats up Master's heap) Jan 7, 2015
@mosiddi mosiddi changed the title Index Alias Create requests eats up heap and timesout (was Marvel eats up Master's heap) Marvel eats up Master's heap Jan 8, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants