-
Notifications
You must be signed in to change notification settings - Fork 544
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TraceQL Metrics] New baseline comparison function #3695
Conversation
Instead of requiring max cardinality what if we also make it optional and default to a sensible value? I'm thinking it would be better to return the values Tempo got until max and an error instead of returning an error and nil when it reaches max cardinality. This way that attribute would still show some value, instead of none. Think of a graph with no data and an error label vs a graph with some data and an error/warning label, what would you prefer? I agree that the timestamps in the function are ugly, would be more TraceQL-y to have it as |
Yep can do that. I think 10 is a sensible default.
The main rationale was to avoid computing the exact topN values, which requires continuing to count and pass all values up to the query-frontend. There are two alternatives that are lossy but should be workable:
|
My proposal was inline with your |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
What this PR does:
This adds a new metrics function
compare
which is used to split the stream of spans into two groups: a selection and a baseline. Then it returns time-series for all attributes found on the spans to highlight the differences between the two groups. This is kind of hard to describe so there are some example outputs below:Function signature:
The function is used like other metrics functions, which it is placed after any search query, and converts it into a metrics query:
...any spanset pipeline... | compare({subset filters}, <topN>, <start timestamp>, <end timestamp>)
Example:
{ resource.service.name="a" && span.http.path="/myapi" } | compare({status=error})
Parameters:
{status=error}
(what is different about errors?) or{duration>1s}
(what is different about slow spans?)span:startTime
directly in the language, so it could be part of the filter.Output:
The outputs are flat time-series for each attribute/value found in the spans. This function has a built-in select(*) so there can be a lot. Each series has a label
__meta_type
which denotes which group it is in, eitherselection
orbaseline
.Example output series:
When an attribute reaches the cardinality limit there will also be present an error indicator. This example means the attribute
resource.cluster
had too many values.Remaining Work
step
equal toend-start
, so effectively it is a range query that returns a single datapoint.Which issue(s) this PR fixes:
Fixes #
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]