Skip to content
This repository has been archived by the owner on Oct 23, 2024. It is now read-only.

CPU usage increased dramatically 0.8.1-RC1 -> master #1497

Closed
bobrik opened this issue May 9, 2015 · 45 comments
Closed

CPU usage increased dramatically 0.8.1-RC1 -> master #1497

bobrik opened this issue May 9, 2015 · 45 comments
Assignees
Milestone

Comments

@bobrik
Copy link
Contributor

bobrik commented May 9, 2015

I built and deployed 964e430. Running against 0.22.1 masters:

image

Upgrade started at 16:20, last node was updated at 16:31. Revert to 0.8.1-RC1 happened at 16:41.

I mentioned performance in #1472 as well.

@kolloch
Copy link
Contributor

kolloch commented May 11, 2015

Hi @bobrik,

thanks for reporting this. With which Mesos library version (native) are you using with Marathon? Note that we updated the Dockerfile to use the latest Mesos-Libraries just today because the corresponding base image was just recently released.

@lloesche have you seen something similar?

@bobrik
Copy link
Contributor Author

bobrik commented May 11, 2015

Turns out that I was using 0.22.0 as a base image, will try with 0.22.1.

@bobrik
Copy link
Contributor Author

bobrik commented May 11, 2015

Nah, it's still bad:

image

Elasticsearch has hot_threads api, do you have something similar so I can give more meaningful data?

@kolloch
Copy link
Contributor

kolloch commented May 12, 2015

You can use poor man's profiling tool: Could you use jstack a couple of times on the Marathon process and send us the stack traces?

> jstack <MARATHON_PID> >stackX.txt
> jstack <MARATHON_PID> >stackX.txt
> jstack <MARATHON_PID> >stackX.txt
...

You might have to enter the process space with docker exec to do that.

@bobrik
Copy link
Contributor Author

bobrik commented May 12, 2015

@drexin drexin self-assigned this May 12, 2015
@drexin
Copy link
Contributor

drexin commented May 12, 2015

@bobrik I couldn't reproduce this. I ran v0.8.2-RC2 in a docker and started 100 tasks without the Marathon process using considerably more than 10% CPU. What does your setup look like?

@bobrik
Copy link
Contributor Author

bobrik commented May 12, 2015

Marathon is running on Intel(R) Xeon(R) CPU E3-1230 V2 @ 3.30GHz, 8 cores.

Health checks are running once per 3-5 seconds.

image

@drexin
Copy link
Contributor

drexin commented May 12, 2015

@bobrik Woud it be possible for you to change the health checks to COMMAND checks that call curl and see if the CPU usage is still that high then?

@bobrik
Copy link
Contributor Author

bobrik commented May 12, 2015

I tried, but it didn't work:

        healthChecks:
          - protocol: COMMAND
            command:
              value: "curl -f -X GET http://$HOST:$PORT0/?n=marathon_healthcheck"
            gracePeriodSeconds: 15
            maxConsecutiveFailures: 300
            intervalSeconds: 2
            timeoutSeconds: 5
{"log":"[2015-05-12 21:26:54,159] INFO Received status update for task topface_lenny-test.67ca9fbb-f8ec-11e4-ab20-56847afe9799: TASK_RUNNING () (mesosphere.marathon.MarathonScheduler:148)\n","stream":"stdout","time":"2015-05-12T21:26:54.159345729Z"}
{"log":"[2015-05-12 21:26:54,160] INFO Received status for [topface_lenny-test.67ca9fbb-f8ec-11e4-ab20-56847afe9799] with version [2015-05-12T21:18:15.625Z] and healthy [false] (mesosphere.marathon.health.MarathonHealthCheckManager:150)\n","stream":"stdout","time":"2015-05-12T21:26:54.160155174Z"}
{"log":"[2015-05-12 21:26:54,160] INFO Forwarding health result [Unhealthy(topface_lenny-test.67ca9fbb-f8ec-11e4-ab20-56847afe9799,2015-05-12T21:18:15.625Z,,2015-05-12T21:26:54.160Z)] to health check actor [Actor[akka://marathon/user/$N#1518643597]] (mesosphere.marathon.health.MarathonHealthCheckManager:171)\n","stream":"stdout","time":"2015-05-12T21:26:54.1603141Z"}

No more info is provided to resolve the issue. Task is healthy and works with http check. Can be related to #1380.

@bobrik
Copy link
Contributor Author

bobrik commented May 12, 2015

        healthChecks:
          - protocol: COMMAND
            command:
              value: "true"
            gracePeriodSeconds: 15
            maxConsecutiveFailures: 300
            intervalSeconds: 2
            timeoutSeconds: 5

120 instances, 0.8.1-RC1. Not sure if upgrading to master wouldn't kill marathon completely.

image

@bobrik
Copy link
Contributor Author

bobrik commented May 13, 2015

Probably worth mentioning: each of 3 marathons receives 1 rps for /v2/apps?embed=apps.tasks for service discovery reasons, in total it's 3 rps to master.

@kolloch
Copy link
Contributor

kolloch commented May 13, 2015

I guess these queries are more or less the only thing I see in the stack traces that sticks out:

"qtp265226115-746" prio=10 tid=0x00007f0964023000 nid=0xd88 waiting on condition [0x00007f091bbf8000]
   java.lang.Thread.State: TIMED_WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for  <0x00000007dafdd118> (a scala.concurrent.impl.Promise$CompletionLatch)
    at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1033)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326)
    at scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:208)
    at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:218)
    at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
    at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:190)
    at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
    at scala.concurrent.Await$.result(package.scala:190)
    at mesosphere.marathon.api.RestResource$class.result(RestResource.scala:44)
    at mesosphere.marathon.api.v2.AppsResource.result(AppsResource.scala:29)
    at mesosphere.marathon.api.v2.AppsResource.mesosphere$marathon$api$v2$AppsResource$$enrichedTasks(AppsResource.scala:294)
    at mesosphere.marathon.api.v2.AppsResource$$anonfun$5.apply(AppsResource.scala:57)
    at mesosphere.marathon.api.v2.AppsResource$$anonfun$5.apply(AppsResource.scala:55)
    at scala.collection.immutable.Stream$$anonfun$map$1.apply(Stream.scala:418)
    at scala.collection.immutable.Stream$$anonfun$map$1.apply(Stream.scala:418)
    at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1222)
    - locked <0x00000007dafdbf40> (a scala.collection.immutable.Stream$Cons)
    at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1212)
    at scala.collection.immutable.Stream$$anonfun$map$1.apply(Stream.scala:418)
    at scala.collection.immutable.Stream$$anonfun$map$1.apply(Stream.scala:418)
    at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1222)
    - locked <0x00000007dafdbf90> (a scala.collection.immutable.Stream$Cons)
    at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1212)
    at scala.collection.immutable.Stream.foreach(Stream.scala:595)
    at play.api.libs.json.JsValueSerializer.serialize(JsValue.scala:311)
    at play.api.libs.json.JsValueSerializer$$anonfun$serialize$2.apply(JsValue.scala:320)
    at play.api.libs.json.JsValueSerializer$$anonfun$serialize$2.apply(JsValue.scala:318)
    at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
    at play.api.libs.json.JsValueSerializer.serialize(JsValue.scala:318)
    at play.api.libs.json.JsValueSerializer.serialize(JsValue.scala:302)
    at com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:114)
    at com.fasterxml.jackson.databind.ObjectMapper.writeValue(ObjectMapper.java:1887)
    at play.api.libs.json.JacksonJson$.generateFromJsValue(JsValue.scala:495)
    at play.api.libs.json.Json$.stringify(Json.scala:51)
    at play.api.libs.json.JsValue$class.toString(JsValue.scala:80)
    at play.api.libs.json.JsObject.toString(JsValue.scala:166)
    at mesosphere.marathon.api.v2.AppsResource.index(AppsResource.scala:87)
    at sun.reflect.GeneratedMethodAccessor54.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
    at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:185)
    at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
    at com.codahale.metrics.jersey.InstrumentedResourceMethodDispatchProvider$TimedRequestDispatcher.dispatch(InstrumentedResourceMethodDispatchProvider.java:30)
    at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:302)
    at com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
    at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
    at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)
    at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1542)
    at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1473)
    at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1419)
    at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1409)
    at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:409)
    at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:540)
    at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:715)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:848)
    at com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263)
    at com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178)
    at com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
    at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62)
    at mesosphere.marathon.api.CacheDisablingFilter.doFilter(CacheDisablingFilter.scala:18)
    at com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
    at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
    at mesosphere.marathon.api.CORSFilter.doFilter(CORSFilter.scala:46)
    at com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
    at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
    at mesosphere.marathon.api.LeaderProxyFilter.doFilter(LeaderProxyFilter.scala:56)
    at com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
    at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
    at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118)
    at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113)
    at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1467)
    at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:501)
    at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1086)
    at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:429)
    at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1020)
    at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
    at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
    at com.codahale.metrics.jetty8.InstrumentedHandler.handle(InstrumentedHandler.java:192)
    at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
    at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
    at org.eclipse.jetty.server.Server.handle(Server.java:370)
    at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:494)
    at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:971)
    at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1033)
    at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:644)
    at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
    at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82)
    at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:696)
    at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:53)
    at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
    at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
    at java.lang.Thread.run(Thread.java:745)

I don't know yet why this would have changed recently.

@bobrik
Copy link
Contributor Author

bobrik commented May 13, 2015

I've shut down service discovery updater for a few minutes on 0.8.1-RC1 and look at that:

image

The second decrease in cpu usage on the graph happened when I closed browser tab with marathon.

@bobrik
Copy link
Contributor Author

bobrik commented May 14, 2015

Flamegraphs: https://gist.github.com/bobrik/969d322bb28c6a649cf7

https://github.com/jrudolph/perf-map-agent

0.8.1-RC1:

flamegraph-0.8.1-rc1

0.8.2-SNAPSHOT:

flamegraph-0.8.2-snapshot

Blue line, higher is 0.8.2:

image

@dgromov
Copy link

dgromov commented May 17, 2015

@drexin, were you using OpenJDK when you tried reproducing this? I wonder if that had something to do with it. https://gist.github.com/bobrik/87b8903cc3d502afe888 suggests that these numbers are all using that.

@apuckey
Copy link

apuckey commented May 18, 2015

I'm using Sun Java 1.7 not openjdk if that makes a difference

@kolloch kolloch added this to the 0.8.2 milestone May 18, 2015
@kolloch kolloch assigned aquamatthias and unassigned drexin May 18, 2015
@kolloch
Copy link
Contributor

kolloch commented May 18, 2015

Hi @bobrik, BTW I hope that it is clear that we really appreciate your detailed reporting. Unfortunately, we can't reproduce it so far.

I could imagine that it has something to do with the Mesos Library changes. Did you try 0.8.1-RC1 with the 0.22.1 mesos libraries by any chance? I am not sure if that is a supported configuration but if that exhibits the same CPU pattern, the reason could lie in the new Mesos Library version.

I think it might make sense to implement #1539 soon and check if your problems persist.

What do you think?

@bobrik
Copy link
Contributor Author

bobrik commented May 18, 2015

Should I just try 0.8.1-RC1 tag on top of mesosphere/mesos:0.22.1 docker image?

Removing native code sounds like a good idea, too much is happening there.

@kolloch
Copy link
Contributor

kolloch commented May 19, 2015

Hi @bobrik, if it's not a big hassle (at least in comparison to the things you have already done), trying 0.8.1-RC1 on top of mesosphere/mesos:0.22.1 would be grand. 👍

@bobrik
Copy link
Contributor Author

bobrik commented May 19, 2015

Looks like marathon itself is to blame and 0.22.1 is even better with 0.8.2-SNAPSHOT. Snapshot is from today's master, btw.

~ 12:10 marathon 0.8.1-RC1, mesos 0.22.1
~ 12:25 marathon 0.8.1-RC1, mesos 0.22.0
~ 12:38 marathon 0.8.2-SNAPSHOT, mesos 0.22.1
~ 12:47 marathon 0.8.2-SNAPSHOT, mesos 0.22.0

image

@bobrik
Copy link
Contributor Author

bobrik commented May 21, 2015

Ok, I'll try to collect metrics from master on 0.8.1-RC1 and 0.8.2-RC3 with --enable_metrics after 30 minutes of regular load.

Meanwhile, can you tell me what is needed from zk when I ask for /v2/apps?embed=apps.tasks? I thought that marathon keeps everything in memory even though state is written to zk for recovery.

@kolloch
Copy link
Contributor

kolloch commented May 21, 2015

Hi @bobrik, actually, reads currently go to Zookeeper as well. We want to change that. Basically, it is also a trade-off between looking at current user problems (which needs time) and rewriting some of the code (which needs time) which might actually solve these issues anyway.

So, without wanting to sound smart, analyzing this issue actually prevents me from rewriting code. But I do not like to release 0.8.2 before we understand the implications.

@bobrik
Copy link
Contributor Author

bobrik commented May 21, 2015

Here are the graphs from zk cluster, looks suspicious if you ask me:

image

There is a lot smaller load from 0.8.1-RC1, especially in terms of bytes per second.

@kolloch
Copy link
Contributor

kolloch commented May 21, 2015

Hi @bobrik,

I cannot really make sense of your graphs. What's the old, what's the new version? What makes you suspicious?

Can you actually tell us the configuration parameters you start marathon with? I assume, they are the same between the old and the new version? Thanks.

@bobrik
Copy link
Contributor Author

bobrik commented May 21, 2015

Sorry for not making it clear. Environment vars for marathon (in an ansible playbook):

          MARATHON_MASTER: zk://web488:2181,web489:2181,web490:2181/mesos
          MARATHON_ZK: zk://web488:2181,web489:2181,web490:2181/marathon-new
          MARATHON_ZK_MAX_VERSIONS: 10
          MARATHON_HOSTNAME: "{{ inventory_hostname }}"

They are the same for both versions.

Now to the graphs, new ones this time, hope they are more clear. Here I ran 0.8.2-RC3 for 40 minutes, then 0.8.1-RC1 for 40 minutes, then 0.8.1-RC1 on top of 0.22.1 libs for 10 minutes.

Marathon cluster:

image

Zookeeper cluster for marathon and mesos, same time:

image

Enormous difference in the used bandwidth to zookeeper is suspicious: 200 kb/s vs 5000 kb/s. CPU load and packet rate are alos higher with 0.8.2-RC3.

Metrics for this, for 0.8.2 with --enable-metrics: https://gist.github.com/bobrik/96997d0030338fa1dc15

Does it make sense now? Thank you for your patience.

@kolloch kolloch removed the analyze label May 26, 2015
kolloch pushed a commit that referenced this issue May 26, 2015
….statuses

and make MarathonHealthCheckManager data structures more efficient
kolloch pushed a commit that referenced this issue May 26, 2015
….statuses

and make MarathonHealthCheckManager data structures more efficient
@kolloch
Copy link
Contributor

kolloch commented May 26, 2015

Hi @bobrik,

thanks to your extensive reporting, we found the offender. The metrics in the gist helped us out.

If you are really adventurous, you can checkout the pk/1497_health_statuses_more_efficient branch. Otherwise, you can wait for us to merge to master and release another RC.

@bobrik
Copy link
Contributor Author

bobrik commented May 26, 2015

Thanks, I'll test it tomorrow. Is there an issue to remove unnecessary zk read requests?

@bobrik
Copy link
Contributor Author

bobrik commented May 27, 2015

Marathon master built from origin/pr/1568, running from 12:09:

image

ZK traffic didn't change compared to 0.8.1-RC1, though. Workload is slightly different since I query more groups now.

@kolloch
Copy link
Contributor

kolloch commented May 27, 2015

Hi @bobrik, that is surprising. Can you export the metrics for us again?

@bobrik
Copy link
Contributor Author

bobrik commented May 27, 2015

Metrics after 10 minutes: https://gist.github.com/bobrik/bb8b852eb1156624a3b8

@kolloch
Copy link
Contributor

kolloch commented May 27, 2015

I'll try to summarize the findings (correct me if I am wrong).

When

  • using the proposed fixed version instead of 0.8.1-RC1
  • with the same load (which is different from the load you tested with before)

you see that

  • The CPU usage goes up from ~600ms to ~740ms .
  • The ZK traffic doesn't change significantly.

So the new version still uses more CPU than 0.8.1-RC1 but is otherwise fine.

The increased CPU could be potentially explained by more requests. At least in the last comparison with full metrics for some reason we saw significantly more requests against the new version, maybe because of faster response times. The new version has a mean response time of 35ms for AppResource.index compared to 126ms for the old version.

If that is correct, we would like to release a new RC with the fix.

aquamatthias added a commit that referenced this issue May 27, 2015
…_efficient

Fixes #1497 - Do not query app versions in MarathonHealthCheckManager
@bobrik
Copy link
Contributor Author

bobrik commented May 27, 2015

The increased CPU could be potentially explained by more requests.

0.8.1-RC1:

Document Path:          /v2/apps?embed=apps.tasks
Document Length:        73417 bytes

Concurrency Level:      10
Time taken for tests:   51.741 seconds
Complete requests:      1000
Failed requests:        0
Write errors:           0
Total transferred:      73581000 bytes
HTML transferred:       73417000 bytes
Requests per second:    19.33 [#/sec] (mean)
Time per request:       517.413 [ms] (mean)
Time per request:       51.741 [ms] (mean, across all concurrent requests)
Transfer rate:          1388.76 [Kbytes/sec] received

PR 1568:

Document Path:          /v2/apps?embed=apps.tasks
Document Length:        76127 bytes

Concurrency Level:      10
Time taken for tests:   84.561 seconds
Complete requests:      1000
Failed requests:        0
Write errors:           0
Total transferred:      76291000 bytes
HTML transferred:       76127000 bytes
Requests per second:    11.83 [#/sec] (mean)
Time per request:       845.606 [ms] (mean)
Time per request:       84.561 [ms] (mean, across all concurrent requests)
Transfer rate:          881.06 [Kbytes/sec] received

Much better than master, but still worse than 0.8.1-RC1.

With reduced background usage (only /v2/apps?embed=apps.tasks at 3rps) CPU usage is roughly the same at 320ms, but max RPS differ:

0.8.1-RC1:

Document Path:          /v2/apps?embed=apps.tasks
Document Length:        72913 bytes

Concurrency Level:      10
Time taken for tests:   47.749 seconds
Complete requests:      1000
Failed requests:        0
Write errors:           0
Total transferred:      73077000 bytes
HTML transferred:       72913000 bytes
Requests per second:    20.94 [#/sec] (mean)
Time per request:       477.487 [ms] (mean)
Time per request:       47.749 [ms] (mean, across all concurrent requests)
Transfer rate:          1494.58 [Kbytes/sec] received

PR 1568:

Document Path:          /v2/apps?embed=apps.tasks
Document Length:        75623 bytes

Concurrency Level:      10
Time taken for tests:   70.917 seconds
Complete requests:      1000
Failed requests:        0
Write errors:           0
Total transferred:      75787000 bytes
HTML transferred:       75623000 bytes
Requests per second:    14.10 [#/sec] (mean)
Time per request:       709.175 [ms] (mean)
Time per request:       70.917 [ms] (mean, across all concurrent requests)
Transfer rate:          1043.62 [Kbytes/sec] received

Go ahead with RC, i'll reduce the load with label selectors, sse and probably mesos api as the source of truth.

@kolloch
Copy link
Contributor

kolloch commented May 27, 2015

Hi @bobrik, the strange thing is that the metrics that you gave us earlier told a different story.

If you want, you can still send us the related metrics and I'll have a look.

We will not release master as an RC but the old RC with only this single fix. Maybe it works better, maybe not.

kolloch pushed a commit that referenced this issue May 27, 2015
….statuses

and make MarathonHealthCheckManager data structures more efficient
@d2iq-archive d2iq-archive locked and limited conversation to collaborators Mar 27, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants