Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Autocomplete filtering optimisations #2957

Closed
2 tasks done
mapno opened this issue Sep 22, 2023 · 1 comment · Fixed by #2942
Closed
2 tasks done

Autocomplete filtering optimisations #2957

mapno opened this issue Sep 22, 2023 · 1 comment · Fixed by #2942
Assignees

Comments

@mapno
Copy link
Member

mapno commented Sep 22, 2023

Context

Autocomplete filtering is the ability of Tempo to suggest tag values to Grafana, and thus the user, based on conditions already present in the search. Using an example, when typing the query { resource.environment = "prod" && resource.service = | }, Tempo returns possible service names that are only present in production environments.

Previous work

This was originally discussed in #1868, which resulted in a number of PRs (#2253, #2433) that add a new query param q to tag value search, that accepts a TraceQL for filtering down values.

The implementation is based on heavily reusing the TraceQL code for building iterators, fetching results and the collecting only the values for the desired attributes. While this approach works well, it doesn't scale as good as expected in big production clusters.

Design optimisations

As a result of reusing TraceQL code, autocomplete filtering does a lot more work that's necessary for the feature. The TraceQL engine is built around the concept of spansets, which require fetching data to build spans for returning as results to the user. That same logic is used in autocomplete filtering, which then throws away most of the retrieved data.

The autocomplete filtering code path can be optimised to do only the work that's necessary, improving performance.

  • Create new collectors that gather values for the wanted attribute, instead of building high-level objects (ie. spans, spansets).
  • Simplify iterator builder logic and create only the required iterators
    • Because of the need for building spansets, Tempo retrieves data (eg. span start/end time, ) that is of no use for autocomplete. Such as span.kind when there are no span conditions (because we still need to build spans for the engine).

Other paths

  • Reduce the number of inspected blocks/data: a config parameter controls how many blocks should be inspected for tag value queries, reducing the amount of data read.
  • Stop search faster by calculating cardinality of collected values.
    • Algorithms like HyperLogLog could be used to quit early if we're not getting new values after a while.
  • Return after a configured amount of time (eg. 2s) with whatever data it's been fetched. In most use-cases, searching for longer is useless as the user won't wait for the results.

Tasklist

Tasks

Preview Give feedback
  1. area/tracing datasource/Tempo type/bug
    adrapereira
@mapno mapno self-assigned this Sep 22, 2023
@mapno mapno moved this from Todo to In Progress in Tempo squad Sep 22, 2023
@mapno mapno mentioned this issue Oct 3, 2023
16 tasks
@mapno
Copy link
Member Author

mapno commented Nov 9, 2023

Follow-up: #3127

@glamcoder glamcoder moved this from In Progress to In Review in Tempo squad Nov 21, 2023
@github-project-automation github-project-automation bot moved this from In Review to Done in Tempo squad Nov 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

1 participant