scope: fix stop command for OSX #1801

alepuccetti · 2016-08-16T17:35:06Z

Before this patch, scope stop was not stopping scope-app

alepuccetti · 2016-08-17T09:38:04Z

PTAL

2opremio · 2016-08-17T11:03:29Z

@alepuccetti Thanks a lot. Good catch! Can you please recreate the PR against branch release-0.17 ?

EDIT: typo corrected (I meant release-0.17 not release-0.17.1)

tomwilkie · 2016-08-17T11:11:12Z

No need to recreate it, you can change the target now: https://github.com/blog/2224-change-the-base-branch-of-a-pull-request

alepuccetti · 2016-08-17T11:12:23Z

@tomwilkie too late 😝

Thanks for the tip

Problem: Decoding a corrupt report grows the 'missing' list. Since we are waiting for 'len(keys)-len(missing)' decoder go-routines, this results in waiting for fewer go-routines than we should. The surplus go-routines leak and we ignore their reports. And since the keys of the ignored reports are not included in 'missing', we won't attempt to fetch them from S3 either. Oops. Fix: calculate the number of go-routines once, at the beginning.

A lot of time could pass between recording the request count and hit count pertaining to a particular report fetching batch, which skewed calculations cache hit ratios. Fix that by defering the request count recording to the end, which is when we record the hit count.

We fell victim to variable shadowing here. Each store would be fed the original list of report keys, instead of only the ones that weren't found in the previous store. So if a single report was missing from the in-process cache, we would then fetch all reports from memcache. And if that in turn was missing a single report we would fetch all reports from S3. We chain report stores for a reason - to reduce latency and, in case of the in-process cache, eliminate decoding costs. So this bug has a huge impact on query service performance. To make matters worse, we return *all* the reports - now possibly in triplicate. Fortunately, the SmartMerger filters these out, so at least we were not incurring extra merge costs.

This can be done by calling TESTDIRS="./report ./probe" make tests

So it displays differences behind interface that would otherwise go unnoticed (like string vs []byte).

Changing plugin's ID only complicates control handling in plugins so let's ban it.

The socket has the "http_requests.sock" filename, so ID should be "http_requests", not "http-requests".

We will want to put plugin id in a control id, which is sent to an app and then to GUI. When we get a control request from GUI, we will want to extract the plugin ID from the control name. To do it unambiguously we need some separator made of chars that are not allowed in a plugin name. This is to avoid the situation when there are two plugins: "Plugin" and "PluginFoo". "Plugin" exposes a control named "FooControl" and "PluginFoo" exposes a control named "Control". Faking the control names which will be sent to the app would result in two "PluginFooControl". One possible option for plugin ID and control name separator would be "/", but that won't work, since the request sent from GUI to the app to <probe>/<node>/<control> would actually be <probe>/<node>/<plugin>/<control> and as such wouldn't match the URL template in RegisterControlRoutes().

Plugin ID must be non-empty when the plugin is created and the followup reports cannot change it.

It is not a singleton anymore. Instead it is an object with a registry backend. The default registry backend is provided, which is equivalent to what used to be before. Custom backend can be provided for testing purposes. The registry also supports batch operations to remove and add handlers as an atomic step.

Thanks to that, plugins can react to requests from controls they exposed. To make it work, plugins registry modifies each plugin's report by prepending the plugin ID to the control name the plugin has exposed before sending it to the app. Then the registry installs the control request handler for this faked control name, which forwards the request to the correct plugin. This adds a new API endpoint to plugins next to "/report" - a "/control" entry. The body of the request is the JSON-encoded xfer.Request instance.

The solution is taken from the toplevel Makefile.

It exposes a button that allows switching between showing an iowait statistics and an idle statistics. When the button is pressed it should be replaced with other button. The button is shown in the host node. This is a rather nasty case as it shows several problems: - Button control races - The way the NodeControl currently works creates races between plugins adding buttons to the same node. This is because NodeControls are not really merged, but rather one of the two are chosen based on a NodeControls' timestamps, so the older one is thrown away entirely. In the end GUI can switch randomly between showing controls from one plugin or from another. - Showing outdated statistics - When pressing the button to switch to show the other statistics, the old ones are still shown for several seconds. - Slowness of the updates in GUI - Pressing the button yields no immediate reaction. Changes happen after several seconds. Probably related to the previous point.

This is better than writing JSON strings by hand, which is error-prone.

Plugins are queried for reports two times in a second. That's often enough to get the shortcut reports. The reports are sent together with the response.

Just to make sure that all the node controls are rewritten, even those that don't have a counterpart in topology controls.

This commit makes the LatestMap type a sort of a base class, that should not be used directly. This also adds a generator for LatestMap "concrete" types with a specific data type in the LatestEntry.

This LatestMap will hold a struct that has more information about the state of the node control.

This allows plugins to add controls to nodes that already have some controls set by other plugin. Previously only the last plugin that sets the controls in the node would have its controls visible. That was because of NodeControls' Merge function that actually weren't merging data from two inputs, but rather returning data that was newer and discarding the older one.

Give a bit more information about how to write a plugin.

The new probe will convert all node's LatestControls to Controls, so the old app can consume them. Also, the new app will convert all node's Controls to LatestControl, so it can consume the reports from old probes.

* Squashed 'tools/' changes from e9e7e6b..db5efc0 db5efc0 Merge pull request weaveworks#28 from weaveworks/mike/add-image-tag 5312c40 Import image-tag script into build tools so it can be shared 7e850f8 Fix logs path dda9785 Update deploy api f2f4e5b Fix the wcloud client 3925eb6 Merge pull request weaveworks#27 from weaveworks/wcloud-events 77355b9 Lint d9a1c6c Add wcloud events, update flags and error nicely when there is no config git-subtree-dir: tools git-subtree-split: db5efc0 * Remove ./image-tag and use ./tools/image-tag instead image-tag is now shared code from the build-tools repo

Before this patch, `scope stop` was not stopping scope-app

alepuccetti · 2016-08-17T11:21:14Z

Closed in favor of #1811

rade and others added 25 commits August 17, 2016 13:17

Allow testing only a subset of directories

a9cf8be

This can be done by calling TESTDIRS="./report ./probe" make tests

Make dumper a bit more verbose

da8fa3a

So it displays differences behind interface that would otherwise go unnoticed (like string vs []byte).

Ban the possibility of changing plugin's ID

2b3820c

Changing plugin's ID only complicates control handling in plugins so let's ban it.

Use consistent plugin ID in the http-requests plugin

25b8d8e

The socket has the "http_requests.sock" filename, so ID should be "http_requests", not "http-requests".

Remove impossible case of an empty plugin ID

bc28a9a

Plugin ID must be non-empty when the plugin is created and the followup reports cannot change it.

Test the case when a plugin reports more than one plugin

3a6c150

Run docker with sudo if necessary in iowait makefile

d3d93ab

The solution is taken from the toplevel Makefile.

Marshal structs in the iowait example plugin

1791ab8

This is better than writing JSON strings by hand, which is error-prone.

Marshal structs in the plugins registry tests

3171852

This is better than writing JSON strings by hand, which is error-prone.

Add shortcut reports for plugins.

00995f1

Plugins are queried for reports two times in a second. That's often enough to get the shortcut reports. The reports are sent together with the response.

Extend the node control rewriting test

645d3af

Just to make sure that all the node controls are rewritten, even those that don't have a counterpart in topology controls.

Make LatestMap "generic"

ab8c4e7

This commit makes the LatestMap type a sort of a base class, that should not be used directly. This also adds a generator for LatestMap "concrete" types with a specific data type in the LatestEntry.

Add a concrete version of LatestMap for node controls

f2b3095

This LatestMap will hold a struct that has more information about the state of the node control.

Rewrite plugin readme

d177746

Give a bit more information about how to write a plugin.

Ensure backward compatilibity in report's node controls

b4a67e3

The new probe will convert all node's LatestControls to Controls, so the old app can consume them. Also, the new app will convert all node's Controls to LatestControl, so it can consume the reports from old probes.

scope: fix stop command for OSX

0e551bb

Before this patch, `scope stop` was not stopping scope-app

alepuccetti force-pushed the alessandro/fix-scope-stop-osx branch from df1768f to 0e551bb Compare August 17, 2016 11:17

alepuccetti closed this Aug 17, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scope: fix stop command for OSX #1801

scope: fix stop command for OSX #1801

alepuccetti commented Aug 16, 2016

alepuccetti commented Aug 17, 2016

2opremio commented Aug 17, 2016 •

edited

Loading

tomwilkie commented Aug 17, 2016

alepuccetti commented Aug 17, 2016 •

edited

Loading

alepuccetti commented Aug 17, 2016

scope: fix stop command for OSX #1801

scope: fix stop command for OSX #1801

Conversation

alepuccetti commented Aug 16, 2016

alepuccetti commented Aug 17, 2016

2opremio commented Aug 17, 2016 • edited Loading

tomwilkie commented Aug 17, 2016

alepuccetti commented Aug 17, 2016 • edited Loading

alepuccetti commented Aug 17, 2016

2opremio commented Aug 17, 2016 •

edited

Loading

alepuccetti commented Aug 17, 2016 •

edited

Loading