-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Metrics on event deliverability #4577
Comments
Simple alternative is to leave this completely up to plugin devs to use events for metrics. This could be done now and implemented into the existing plugins. e.g. a job to run every hour and capture an event with some metrics |
This is a great idea. I'm also pretty sure we will take significant inspiration from Segment. |
Perhaps this fits well with #2965? Users who don't use features like this should not see anything :) |
Just realized that for the new |
Yup, that's exactly the point of As I imagine you'd like to see per plugin how it has been performing, it makes sense to put these graphs on the plugins page itself. We might need a proper "plugin page" (/project/plugins/123) first, as the drawers are already pretty crammed with things. |
Ok, here's the current plan for implementing this: PhasesPhase 1: Metrics for ApproachWhile Phase 1 could be tackled in a very non-scalable way, it's worth approaching it in a way that scales to Phase 2. The method I currently have in mind is:
DiscussionThe idea for adding the We could also limit the number of metrics a plugin can handle (2? 5?). Thoughts, suggestions? |
In the name of poking the idea, could you give an example of Specifically interested in what types of metrics could be measured and presented this way. For instance Segment has: delivered on first try, delivered on retry, not delivered due to failure. But also they aggregate all failures (regardless of whether the event ended up being delivered successfully on retry or not) separately to allow diagnosing the causes of failures. Would we be able to do both in this approach, or otherwise on which one would we focus first? I think it's also worth figuring out how this would be incremented in a way allowing aggregation by time, because a row per increment would be potentially a lot of data, but with other approaches we need to be considerate to not end up with incorrect numbers. |
Definitely agree with the idea behind the approach- make things so we can extend to phase 2 reasonably easily. I'd also like to make it easy to aggregate different kind of metrics without changing our architecture / data collection schemas. Just trying to think this through: Assuming we want decent capability here, it's worth following a statsd-like interface ( https://github.com/statsd/statsd ) - if not using statsd directly for backend aggregation. So we expose an interface like:
which is used inside the plugin to increment whatever metric we want. Hmm, at this point, we actually don't need anything like We could even use a visualisation library, like In short, |
I think they key point here is that I'd want to use PostHog for this. Maybe I didn't make it clear. The instance/status dashboards that Karl built are what I'd want to get to - no building anything from scratch, not adding another lib, 100% leveraging what we already have. The new column would be just for knowing what props to access on events. For Definitely don't want this to be fully-fledged either. Some minor metrics functionality should be more than enough. The idea here is to just have the structure in place to potentially expose a metrics API for everyone, but we might as well just keep it in only for |
And for @neilkakkar's second point, the only reason for declaring them beforehand is because I want to use PostHog events, and I want to easily know what props in them to access and plot in PostHog graphs. |
it sounds like we're thinking of different things. Can you give me an example please of how such a plugin would look like? From the sound of it, we're adding these metrics as properties to the event itself? How does that work with say, retries, and failures? Oh, and then, the graph is basically just a trend for that event property? Ha, nice! In this case, (assuming I'm now understanding correctly), the |
Let me try again. Imagine I have 2 plugins: "S3 Export Plugin" and "Schema Enforcer Plugin". S3 Export Plugin wants to tell its users how many events it has seen, and how many of those it has successfully exported. It specifies: export const metrics = ['events-seen', 'successful-deliveries'] Schema Enforcer wants to tell users how many events it received followed the right schema, and how many did not. It specifies: export const metrics = ['events-seen', 'events-with-right-schema'] When events come in, the plugins increment the metrics appropriately. What happens "under the hood" is that these are kept in memory by the plugin server, under some At a regular interval (e.g. every hour), a scheduled job sends an event per plugin with the current metric values. e.g. : {
event: 's3-export-plugin-metrics',
properties: {
'events-seen': 560,
'successful-deliveries': 420
}
} We then use PostHog graphs to display a Trend graph with the following specs: Graph: Line graph, over 30 days, shown daily For It uses PostHog events, and PostHog graphs, so there's no new stuff here to store or display graphs. We just need a layer to enable metrics capturing. And if you don't have Hence it's a new property of the plugin, as it isn't configurable by the user and doesn't relate to an "instance" of a plugin ( |
Okay, this makes a lot more sense now, thanks! (Was it clear to everyone else from the beginning?) In this case it sounds like we have a default operation: increment, and we're basically aggregating number of events happening in say, an hour, and exposing just the special names of things we can count. This includes, say, Then, there are custom properties that devs might want to expose, which they can increment per event using the (Sorry for so many questions about this lol, I'm just trying to get a better understanding of what we're doing here). So, now, I have an event coming in every hour like:
Displaying the plugin graph is multiple trendlines, one for each property (by default)? And they can pick and choose from these properties to decide what to show and what not to show. Final Question (I hope): Aren't properties exactly what |
No worries at all! Thanks for asking all these questions - forces me to spec this out better and challenge the approach. As to your question: Well, consider this - we don't have a list of properties that exist for a given event name. You can see that when you select a math operation, we will suggest to you every numerical property for your team, not the numerical props for the event you selected. So how could we decide what lines to show on a graph? How do I know what property names to use? If a plugin doesn't specify their metrics in some way, how do we know what to display? We can also derive them from the Even if it was guaranteed (it isn't) that an event would always only have the same properties (which would be just the metrics), we'd still need a bit of a workaround to infer them, so the easiest is to just go ahead and store their names upfront. |
I do think the point at which we infer and store metrics can be discussed though. We wouldn't need people to However, it then becomes messier to limit the number of metrics for example, as we would have to cut a plugin off mid-execution once we encounter the n-th metric. With a deliberate export, we can limit the number when setting up the plugin, and make sure any call with a unspecified metric is dropped. |
What's the idea behind restricting number of metrics? |
Doesn't need to happen. But it's good to know we could, just as we have timeouts, maybe we don't want a loop with a million iterations updating a randomly-generated metric. |
With the goal of running arbitrary code on our servers, the more we have the potential to add guards to things, the better |
Just making this explicit / ensuring I got this right: This approach restricts us quite a bit in what we can do with metrics. (which may be fine, if we're only looking for basic stats). What might be a bit iffy is how these metrics aren't connected to anything else? Like, I can't know what all failed/succeeded, given I have N number of failures? But then, that's what the plugin logs are for, so it's fine as long as we're logging those failures too. We probably can't aggregate over all failed events, but that's again something we probably don't want to do anyway. And there's |
I like the idea of inferring plugin metrics more: can't we infer these at the final step? Say, you load a bunch of events with the specific name, and then read the properties on the frontend to generate the graphs? (not sure if this is possible, but should be, given we have all the data on the frontend loaded?) Basically I don't want to get users to export metric names, then, say, forget some, make typos, etc. and see their metrics not populating. And more broadly, I don't want to store metric names anywhere at all. It seems to me that it constrains things more than it should, and doesn't sit right with the models that we have so far? (Users generate events, filter on props, get their graphs). Maybe what's new / exciting here is something like "smart prefilling", or how we restrict properties to the given event and prepopulate it by default. Regarding issues with cutting these off: as long as we're in control of storing metrics data ourselves, we can always drop them over a certain limit, if we wanted to? We don't have to cut-off plugin execution, just not input the metric if it's over a certain count. |
Again, this might be impossible to do now (I don't think so, but maybe I don't know enough). In which case, perfectly happy to go with the existing approach! |
Doing everything on the frontend is possible, but I don't like how hacky it is. Also don't see what the big deal is looking at metrics like capabilities, since they wouldn't vary much, nor change per installation. Regarding how this would be done in the frontend, we'd need an event name that can be inferred from the plugin name (easy), but then its properties would also need to be inferrable in general, with disregard for the event name (since we don't know all the props an event has). So we would need to prepend prop names with the plugin identifier, filter numerical props by this identifier, and build all of this out in the frontend, which would be slower, hackier, and less readable. I wouldn't want to set up 20 plugin graphs entirely on the frontend. Also because we'd need to check all installed plugins, whereas with If you take a look at how the dashboards on /instance/status are done, you'll see that the heavy lifting is on the backend. The one difference is they have static prop names. |
I'd like to move forward with this unless there's a more neat frontend solution in sight. If you're worried about typos or something, one thing that could be done is build a jobs-like API:
|
Another thing I guess is plugin versions, we'd probably want to segregate metrics by version somehow? |
And to be fair, typos will be picked up during development with an error, so not sure that's a big deal |
Another solution is to have them defined in |
I think I've come to a decision. Will implement and check back. |
The new(ish) direction makes a lot of sense, and sounds promising to me! It makes sense that we'll need to store the default aggregation operation for metrics, and that goes hand-in-hand with the metric name. Good work! :) About versioning, soon(ish?), we'll have something similar for archive/sources with the install step, and I think it's safe to defer this for metrics to that point in time as well. (would affect capabilities, and metrics are closely related to that, data-flow wise) |
Oh versioning is essentially handled too (to some extent) - I'm passing the plugin tag along with the event so that should allow graphs to filter events by the tag the installed plugin is on. Doesn't work for plugins without a tag though. |
I think this can be considered done |
Tasks
Requested by a potential user on a call.
If I'm exporting events somewhere else, I'd like some metrics as to what percentage of my events actually get delivered.
Segment provides this:
They infer success from the lack of an error being thrown.
One step towards "UI plugins" that could help here is allow plugin devs to set metrics that would appear on a dashboard. A native API like this, for example:
This way we don't have to deal with the problem of rendering arbitrary code yet, but give plugin devs the option to show some more stuff on the UI. Just a quick idea though.
The text was updated successfully, but these errors were encountered: