Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scope: fix stop command for OSX #1801

Conversation

alepuccetti
Copy link

Before this patch, scope stop was not stopping scope-app

@alepuccetti
Copy link
Author

PTAL

@2opremio
Copy link
Contributor

2opremio commented Aug 17, 2016

@alepuccetti Thanks a lot. Good catch! Can you please recreate the PR against branch release-0.17 ?

EDIT: typo corrected (I meant release-0.17 not release-0.17.1)

@tomwilkie
Copy link
Contributor

No need to recreate it, you can change the target now: https://github.com/blog/2224-change-the-base-branch-of-a-pull-request

@alepuccetti
Copy link
Author

alepuccetti commented Aug 17, 2016

@tomwilkie too late 😝

Thanks for the tip

rade and others added 25 commits August 17, 2016 13:17
Problem: Decoding a corrupt report grows the 'missing' list. Since we
are waiting for 'len(keys)-len(missing)' decoder go-routines, this
results in waiting for fewer go-routines than we should. The surplus
go-routines leak and we ignore their reports. And since the keys of the
ignored reports are not included in 'missing', we won't attempt to fetch
them from S3 either. Oops.

Fix: calculate the number of go-routines once, at the beginning.
A lot of time could pass between recording the request count and hit
count pertaining to a particular report fetching batch, which skewed
calculations cache hit ratios.

Fix that by defering the request count recording to the end, which is
when we record the hit count.
We fell victim to variable shadowing here. Each store would be fed the
original list of report keys, instead of only the ones that weren't
found in the previous store. So if a single report was missing from the
in-process cache, we would then fetch all reports from memcache. And if
that in turn was missing a single report we would fetch all reports from
S3.

We chain report stores for a reason - to reduce latency and, in case of
the in-process cache, eliminate decoding costs. So this bug has a huge
impact on query service performance.

To make matters worse, we return *all* the reports - now possibly in
triplicate. Fortunately, the SmartMerger filters these out, so at least
we were not incurring extra merge costs.
This can be done by calling TESTDIRS="./report ./probe" make tests
So it displays differences behind interface that would otherwise go
unnoticed (like string vs []byte).
Changing plugin's ID only complicates control handling in plugins so
let's ban it.
The socket has the "http_requests.sock" filename, so ID should be
"http_requests", not "http-requests".
We will want to put plugin id in a control id, which is sent to an app
and then to GUI. When we get a control request from GUI, we will want
to extract the plugin ID from the control name. To do it unambiguously
we need some separator made of chars that are not allowed in a plugin
name. This is to avoid the situation when there are two plugins:
"Plugin" and "PluginFoo". "Plugin" exposes a control named
"FooControl" and "PluginFoo" exposes a control named "Control". Faking
the control names which will be sent to the app would result in two
"PluginFooControl".

One possible option for plugin ID and control name separator would be
"/", but that won't work, since the request sent from GUI to the app
to <probe>/<node>/<control> would actually be
<probe>/<node>/<plugin>/<control> and as such wouldn't match the URL
template in RegisterControlRoutes().
Plugin ID must be non-empty when the plugin is created and the
followup reports cannot change it.
It is not a singleton anymore. Instead it is an object with a registry
backend. The default registry backend is provided, which is equivalent
to what used to be before. Custom backend can be provided for testing
purposes.

The registry also supports batch operations to remove and add handlers
as an atomic step.
Thanks to that, plugins can react to requests from controls they
exposed.

To make it work, plugins registry modifies each plugin's report by
prepending the plugin ID to the control name the plugin has exposed
before sending it to the app. Then the registry installs the control
request handler for this faked control name, which forwards the
request to the correct plugin.

This adds a new API endpoint to plugins next to "/report" - a
"/control" entry. The body of the request is the JSON-encoded
xfer.Request instance.
The solution is taken from the toplevel Makefile.
It exposes a button that allows switching between showing an iowait
statistics and an idle statistics. When the button is pressed it
should be replaced with other button. The button is shown in the host
node.

This is a rather nasty case as it shows several problems:

- Button control races
  - The way the NodeControl currently works creates races between
    plugins adding buttons to the same node. This is because
    NodeControls are not really merged, but rather one of the two are
    chosen based on a NodeControls' timestamps, so the older one is
    thrown away entirely. In the end GUI can switch randomly between
    showing controls from one plugin or from another.

- Showing outdated statistics
  - When pressing the button to switch to show the other statistics,
    the old ones are still shown for several seconds.

- Slowness of the updates in GUI
  - Pressing the button yields no immediate reaction. Changes happen
    after several seconds. Probably related to the previous point.
This is better than writing JSON strings by hand, which is
error-prone.
This is better than writing JSON strings by hand, which is
error-prone.
Plugins are queried for reports two times in a second. That's often
enough to get the shortcut reports. The reports are sent together with
the response.
Just to make sure that all the node controls are rewritten, even those
that don't have a counterpart in topology controls.
This commit makes the LatestMap type a sort of a base class, that
should not be used directly. This also adds a generator for LatestMap
"concrete" types with a specific data type in the LatestEntry.
This LatestMap will hold a struct that has more information about the
state of the node control.
This allows plugins to add controls to nodes that already have some
controls set by other plugin. Previously only the last plugin that
sets the controls in the node would have its controls visible. That
was because of NodeControls' Merge function that actually weren't
merging data from two inputs, but rather returning data that was newer
and discarding the older one.
Give a bit more information about how to write a plugin.
The new probe will convert all node's LatestControls to Controls, so
the old app can consume them. Also, the new app will convert all
node's Controls to LatestControl, so it can consume the reports from
old probes.
* Squashed 'tools/' changes from e9e7e6b..db5efc0

db5efc0 Merge pull request weaveworks#28 from weaveworks/mike/add-image-tag
5312c40 Import image-tag script into build tools so it can be shared
7e850f8 Fix logs path
dda9785 Update deploy api
f2f4e5b Fix the wcloud client
3925eb6 Merge pull request weaveworks#27 from weaveworks/wcloud-events
77355b9 Lint
d9a1c6c Add wcloud events, update flags and error nicely when there is no config

git-subtree-dir: tools
git-subtree-split: db5efc0

* Remove ./image-tag and use ./tools/image-tag instead

image-tag is now shared code from the build-tools repo
Before this patch, `scope stop` was not stopping scope-app
@alepuccetti alepuccetti force-pushed the alessandro/fix-scope-stop-osx branch from df1768f to 0e551bb Compare August 17, 2016 11:17
@alepuccetti
Copy link
Author

Closed in favor of #1811

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants