Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plugins #1126

Merged
merged 16 commits into from
Apr 12, 2016
Merged

Plugins #1126

merged 16 commits into from
Apr 12, 2016

Conversation

paulbellamy
Copy link
Contributor

Fixes #554

Commits are still super-messy, and tests are earmarked, as this is both a work-in-progress and a proof-of-concept.

HTTP Request latency is shown on the weavescope node (we can correlate it due to weavescope being in the host network).

Plugin registry turned out very similar to #809.

TODO:

  • figure out what the sysdig querying looks like and how flexible it is
    • do some plugin data from the raw sysdig stream
      • bandwidth usage per-process, maybe
      • can we parse stuff fast enough? or filter it better?
  • get better data from sysdig plugin
    • Either need to:
      • implement the plugin in lua (possible?)
      • make the lua chisel more flexible and add an interface for the go plugin to control it
      • just add more aggregations for each type of data
  • show plugin list in UI
    • styling
    • maybe adjust timers? or include it with report/node updates?
  • clean up and make plugins code sane/tested
  • add safe zero-values for everything in a report so json deserialization is safer on malformed inputs
  • all templates (not just extra, and including metrics) need to come from the probe
  • add deadline/timeout on plugin calls
  • add api versioning to plugin handshake
  • re-organize code to be consistent with controls
  • clean up todos
  • consider rest api instead of jsonrpc
  • add periodic retry on dead plugin sockets (for if they restart)
  • replace example plugin with something else Pushed eBPF plugin into a separate PR.
    • add check to ebpf plugin for if kernel is new enough
    • port ebpf plugin to python, and do counting in kernel
    • figure out how to get pids from ebpf, so we can get more accurate counts per/process
  • consider an async "plugin-push" interface instead of probe polling. Considered it. Delaying to a future "api version"
  • see about removing render/detailed/labels.go
  • final test fixup
  • documentation
  • update plugins when they change
    • lets ditch handshake, and just assume all plugins respond to /report, then parse the pluginspec from that.

@paulbellamy paulbellamy force-pushed the plugins branch 2 times, most recently from ec2f8dc to f1e935b Compare March 4, 2016 17:42
@paulbellamy
Copy link
Contributor Author

@davkal should we display in the UI which plugins are enabled? what would be a good way of displaying this?

@davkal
Copy link
Contributor

davkal commented Mar 7, 2016

Yes, let's place it in the bottom left for now, because

  • it's a status
  • it may have multiple entries/rows that can stack nicely

Some thing like:

10 NODES (408 FILTERED)
Stopped containers hidden SHOW
System containers hidden SHOW
[plugin icon] pluginX, pluginY

The DS could be as simple as [{id, label, description},...]. The description would show on hover. If plugins support a state (on/off), we can add that too.

@paulbellamy
Copy link
Contributor Author

In order to get more/better data from the sysdig probe the best plan seems to be using the "chisel" system, which are a bunch of lua scripts sysdig can run. We could just capture the event stream, but that will likely lead to a lot of unneeded serialization/deserialization.

We could either write individual chisels for bits they have, and plumb them through, or we could look into writing one generic meta-chisel which would let us dynamically grab bits and do queries. We could either implement the entire plugin in lua (probably? There seems to be a jsonrpc library for lua), or we could do the barebones filtering/querying in the lua chisel, and drive that from a go plugin.

I've included a http_txns_by_pid.lua chisel (based on the existing httplog one), as an example of a chisel which would format the data for a specific query. It adds inbound/outbound http request rates to process nodes.

Some other things we could potentially get from the sysdig probe:

  • bandwidth usage
  • protocol detection/inspection
  • more http metrics (latency, error rates, error codes, avg size, breakdown by endpoints/hosts/etc)
  • same for other protocols (memcached, redis, etc)
  • might be able to tie in with tracing?
    • could maybe even do "session capture", like chrome's network request logging
  • unix socket connections
  • socket queue lengths
  • slow system calls / system call length

@paulbellamy paulbellamy force-pushed the plugins branch 2 times, most recently from e145ca2 to f485f2e Compare March 8, 2016 09:17
Version: Version,
Hostname: hostname.Get(),
})
func apiHandler(rep Reporter) func(context.Context, http.ResponseWriter, *http.Request) {

This comment was marked as abuse.

This comment was marked as abuse.

@tomwilkie
Copy link
Contributor

First quick pass says its looking good.

I'm thinking plugins/ should be moved into probe/, with some of the common datatypes moved into report.

@paulbellamy
Copy link
Contributor Author

I'm thinking plugins/ should be moved into probe/, with some of the common datatypes moved into report.

I'm not so sure on that yet. I tried it moving it into probe/, but it was a bit odd since it's required by report and render as well. By the time we do control and pipe plugins (and wire all that through the UI) it would probably have hooks used all over. We could split the common datatypes into report, but it is sort of nice to be able to re-use them between the registry and the rest of the code.

I had a vague eye to keeping the plugin registry and lifecycle-management code separate so it could be extracted into a separate library at some point.

Will see about refactoring it as it goes.

@tomwilkie
Copy link
Contributor

We could split the common datatypes into report, but it is sort of nice to be able to re-use them between the registry and the rest of the code.

This is what I'd do - see controls; common stuff goes in common/xfer or report, probe specific stuff in probe and app specific stuff in app.

Whilst I don't feel particularly strongly about this, it would be nice if this was consistent across the whole code base, where 'vertical' features are split up like this. It used to make a difference on build times, but now we just build on big binary.

@paulbellamy paulbellamy force-pushed the plugins branch 3 times, most recently from 9c51e0c to 0a49f7c Compare March 14, 2016 11:47
@paulbellamy paulbellamy force-pushed the plugins branch 2 times, most recently from b65c2ef to 87f008e Compare March 29, 2016 15:41
@paulbellamy paulbellamy self-assigned this Mar 30, 2016
@paulbellamy paulbellamy force-pushed the plugins branch 3 times, most recently from 7c22443 to 90fb733 Compare March 31, 2016 13:47

// Registry maintains a list of available plugins by name.
type Registry struct {
root string

This comment was marked as abuse.

@paulbellamy
Copy link
Contributor Author

Raised separate issue for indicating broken plugins in the UI.

'api_version': '1',
}
self.respond(json.dumps(spec))

def respond(self, body):

This comment was marked as abuse.

@2opremio
Copy link
Contributor

LGTM, (I would like to have #1271 for the release though)

paulbellamy and others added 14 commits April 12, 2016 17:20
Squash of:
* Include plugins in the report
* show plugin list in the UI
* moving metric and metadata templates into the probe reports
* update js for prime -> priority
* added retry to plugin handshake
* added iowait plugin
* review feedback
* plugin documentation
* It sends unexpected TCP RSTs (causing connection reset by peer errors in the python plugin)

  Exception happened during processing of request from
  Traceback (most recent call last):
    File "/usr/lib/python2.7/SocketServer.py", line 295, in _handle_request_noblock
      self.process_request(request, client_address)
    File "/usr/lib/python2.7/SocketServer.py", line 321, in process_request
      self.finish_request(request, client_address)
    File "./http-requests.py", line 145, in finish_request
      self.RequestHandlerClass(request, '-', self)
    File "/usr/lib/python2.7/SocketServer.py", line 658, in __init__
      self.handle()
    File "/usr/lib/python2.7/BaseHTTPServer.py", line 349, in handle
      self.handle_one_request()
    File "/usr/lib/python2.7/BaseHTTPServer.py", line 312, in handle_one_request
      self.raw_requestline = self.rfile.readline(65537)
    File "/usr/lib/python2.7/socket.py", line 480, in readline
      data = self._sock.recv(self._rbufsize)
  error: [Errno 104] Connection reset by peer

* It doesn't reuse connections

```json
{
"Processes: { ... }

This comment was marked as abuse.

@paulbellamy paulbellamy merged commit f211d48 into master Apr 12, 2016
@paulbellamy paulbellamy deleted the plugins branch April 12, 2016 17:03
@2opremio
Copy link
Contributor

Awesome job @paulbellamy! \o/

@alepuccetti
Copy link

Should this PR fix also #1022?

@2opremio
Copy link
Contributor

@alepuccetti I think you are referring to #1837 , not this PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RFC: Probe plugins
7 participants