Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

metrics: add support for PCP #354

Closed
suyash opened this issue Sep 3, 2016 · 9 comments
Closed

metrics: add support for PCP #354

suyash opened this issue Sep 3, 2016 · 9 comments

Comments

@suyash
Copy link
Contributor

suyash commented Sep 3, 2016

I was a part of this year's Google Summer of Code with the Performance Co-Pilot organization. PCP, although principally built for reporting low level application specific metrics, also supports instrumentation through a specially configured agent, mmv. My project was to build an instrumentation client library for that API, and it is currently at performancecopilot/speed.

One of the goals of the project was to demonstrate real world usage of the implemented client library, and I am trying to fulfill that requirement by implementing PCP support for this library. My fork contains a pcp subpackage inside metrics at https://github.com/suyash/kit.

I have also modified 3 examples to report metrics to PCP instead of prometheus and they are implemented in a separate tree. For stringsvc2, I have added some screenshots.

Currently I cannot support With labels for metrics, because while we do have a similar concept (https://github.com/performancecopilot/speed#instancemetric), it requires all labels to be specified upfront, and reinitializing a new metric each time seemed overkill. Instead I am trying to explore supporting mutating instances

Anyways, in line with the contributing guidelines, I am opening this issue for discussion

@peterbourgon
Copy link
Member

Neat! I'll take a look in the next few days.

Currently I cannot support With labels for metrics, because while we do have a similar concept (https://github.com/performancecopilot/speed#instancemetric), it requires all labels to be specified upfront,

This is also true of Prometheus! Just make sure the With invocations are storing the label-values in a way that's retrievable by the observe method, and have the observe method pass the values to the underlying implementation. You can be permissive, and say that required unspecified dimensions get an "unknown" value, and non-required erroneously-specified dimensions are ignored. Or, you can be super strict, and panic on either condition. Either one is fine from my POV.

@peterbourgon
Copy link
Member

Global state is always a bad idea, and in Go kit it's strictly forbidden. So you'll need to find a way to arrange things so that you don't have a package global registry and client. I haven't dug too deeply, but one way might be by encapsulating them in a pcp.Reporter struct, and have the NewMetric constructors be methods on that type.

@suyash
Copy link
Contributor Author

suyash commented Sep 5, 2016

@peterbourgon I have updated my fork. Now, metrics are created using a Reporter instance, which in turn is created using an application name.

Regarding With, my understanding of the method is that it returns a new metric with the same value as the original metric but with some metadata properties attached to it.

My approach to this was that all the labels passed in the parameters would first be serialized using some mechanism to one single string. Then that serialized string will be used as the key, with the value being the actual value for those keys.

I thought of doing this using map and singleton metrics, but then I thought that in speed, if we can make InstanceDomain an internal concept and remove it from the public API, we can support adding instances after creating a metric.

the current usage of an instance metric, say a CounterVector here is this

c, err := speed.NewPCPCounterVector(
    map[string]uint64{
        "instance1": 0,
        "instance2": 1,
    }, "another.simple.counter"
)

i1val := c.Val("instance1") // gets value for instance1

here a CounterVector is created with "instance1" and "instance2" as instances for 2 values.

From what I understand about prometheus, labels are external properties for metrics and not a part of metrics values themselves. PCP metrics have only 2 external properties, the 2 optional description strings.

Also, another thing to note is that the data type for all instances in an InstanceMetric is fixed, and can only be selected from a fixed subset (https://github.com/performancecopilot/speed/blob/master/metrics.go#L17-L26), so we can't add labels as instance values for an instance metric

Another solution I can think of is passing all "label sets" beforehand, so the possible sets for stringsvc2 will be

[]string{"method", "uppercase", "error", "true"}
[]string{"method", "uppercase", "error", "false"}
[]string{"method", "count"}

and using serialization, but that seems infeasible for large applications

@peterbourgon
Copy link
Member

Great! The design of the new package looks reasonable from a high level.

But—I'd like to have a conversation with you about the design of PCP before I get into more details of what Go kit integration would look like. Something lower-latency than GitHub issues would be ideal: are you on the Gophers Slack? We can also have it here if that's easier for you.

@suyash
Copy link
Contributor Author

suyash commented Sep 12, 2016

@peterbourgon I have joined the slack channel

To give a general overview of the design, PCP is a low level system monitoring toolkit, the individual systems are monitored through system specific agents, which are programmed in a manner in which they know what to monitor for useful information.

Metrics can be of 2 types, singleton and instance. A singleton metric is just a single value of a particular type. with particular semantics and unit. An instance metric has specific instances with string names, each having a particular value, but all instances should have the same type, semantics and unit.

The metric names are of form a.b.c.d where the scope increases with each period. Typically the first identifier is the agent name itself, then a particular subsystem, then its particular property and so on.

The most commonly used and installed agent by default is the linux agent, exposing a bunch of filesystem, network and kernel related metrics like network.tcp.maxconn. The vector project uses some of these by default (see the default widgets created on http://vectoross.io/demo/, if you notice the url, you can see the metric names)

To see all agents that ship with PCP, see https://github.com/performancecopilot/pcp/tree/master/src/pmdas

PCP supports instrumentation through an especially configured agent called MMV, which monitors for memory mapped files in PCP_TMP_DIR and if the file is in a particular format, makes the metrics visible to PCP. The metric names begin with mmv, then an app name, and then a custom metric name, so mmv.stringsvc.request.count. The idea is that small client applications can expose metrics with minimum setup to PCP. The C API to implement instrumentation is bundled with PCP itself, while a Java API is implemented by a project called parfait, and now a golang API is implemented by speed.

The PCP daemon binds to port 44321. This is the port that is used by different client tools like pmval, pminfer and so on... There is also a web daemon that binds to port 44323 that can take JSON encoded requests and is the one used by vector.

There are a couple of books, a users and administrators guide to get familiar with the client tools, and a programmers guide to understand the agent and instrumentation API and possibly write your own agents.

Pinging @natoscott and @lberk in case I missed something

@peterbourgon
Copy link
Member

peterbourgon commented Sep 12, 2016

Thanks to @suyash and his patient explanations on Slack, I have a pretty good understanding of how PCP works. Let me summarize, primarily for my own retrospective edification :)

Instrumented applications import the speed client, which typically opens a memory-mapped file that's shared by the PCP daemon. Metric observations are simple writes to that file, which are picked up by the daemon. An alternative write path involves connecting to the daemon over HTTP and sending periodic JSON blobs. [edit: there is only one write path; see below]

The data model has two concepts. A singleton metric, which is a single set of label values and a concomitant value; most analagous to a Graphite dotted.notation.metric and value. And an instance metric, which is the same but parameterized in one dimension; for example, http.request.[get_foo,post_foo].count would be an instance metric with get_foo and post_foo instances.

So I think the closest thing to this would be package metrics/graphite. You'd have two constructors for your Reporter struct: one using the default memory-mapped file approach, currently called NewReporter, I suggest renaming to NewReporterMmap or something like this; and another POSTing JSON to an endpoint, maybe call it NewReporterHTTP or something. [edit: only one constructor; see below] All metrics can treat With as a no-op, same as Graphite. And I think each instance of an instance metric should be treated as its own Go kit metric. That is, http.request.get_foo.count is just a separate counter to http.request.post_foo.count. Hopefully this lines up with your thinking.

There are some problematic aspects to the speed client package re: global state and init functions, but since it's sequestered in metrics/pcp those problems are effectively opt-in and I'm fine with it.

So I'd be happy to accept a PR along those lines. I hope that's enough to go on; please feel free to ping me with any further questions! :)

@suyash
Copy link
Contributor Author

suyash commented Sep 12, 2016

@peterbourgon just a couple of things

  • JSON over web daemon is an alternative for reading, not writing, for writing/reporting metrics there is only one interface, memory mapped files, for reading metric values there are multiple client tools that can send requests to the port the PCP daemon binds to (44321), while the web daemon (44323) is a separate program that can report metric values through JSON requests, but not update them

There are some problematic aspects to the speed client package re: global state and init functions, but since it's sequestered in metrics/pcp those problems are effectively opt-in and I'm fine with it.

Once you are happy with the above points I would be happy to send a PR :)

@peterbourgon
Copy link
Member

peterbourgon commented Sep 13, 2016

JSON over web daemon is an alternative for reading, not writing . . .

Ah, OK! I'll update my comment above. That means you need only the one constructor.

I don't think there is any global state . . .

Here's what I mean:

In effect, any global variable or any use of func init is global state. To pick one of those examples, the global logger is problematic because users may not want or be able to use the logrus package in their code. But, again, since this is only pulled in to a Go kit project if a user would import metrics/pcp, I'm basically OK with it.

@suyash
Copy link
Contributor Author

suyash commented Oct 17, 2016

closing this as #363 was merged

@suyash suyash closed this as completed Oct 17, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants