[storage] limitations of Cassandra search on LIMIT and complex queries #166

vprithvi · 2017-05-16T18:27:25Z

When querying for traces using serviceName, operationName and a tag with the default LIMIT of 20, some results might be omitted.

This is because of this logic which does the following:

Retrieve all traceIDs matching the operation name
Retrieve all traceIDs matching tags
Intersect 1 & 2

Because Cassandra doesn't guarantee ordering, this could eliminate results.

I propose that we do the following instead (or in addition to what we do now),

Retrieve all traceIds matching tags
Filter by operation name

The reason for retrieving traceIds matching tags first targets the use case when somebody is searching for a jaeger-debug-id or some other tag with low cardinality, guaranteeing them a result when it exists.

The text was updated successfully, but these errors were encountered:

yurishkuro · 2017-05-16T18:32:35Z

This is a hack that will work for very sparse tags like jaeger-debug-id, but will not work in other cases, e.g. when searching by a tag like "error=true" or http.status_code, because "retrieve all traces" becomes impossible due to volume.

vprithvi · 2017-05-16T19:31:54Z

I see value in having this hack with the limit parameter (in addition to the current behavior) so that it still retrieves results for low cardinality tags.

Dieterbe · 2017-11-29T16:52:49Z

I have a problem and I'm not sure if it's the same as what's discussed here. (the descriptions of current and desired logic don't mention how the limit comes into play) but what i'm seeing is that much fewer results are being returned when doing a tag search, i have to "artificially" raise the limit to get the amount i want. (e.g. with limit 20 i may get 1 result, with limit 200 i get 26 results), but the problem is only when doing tag searching, it's fine if i don't have a tag clause in the query.

yurishkuro · 2017-11-29T17:10:29Z

this is a known (and hard to solve) issue in Cassandra storage implementation

rbtcollins · 2017-11-29T22:27:51Z

So, this may be hard to solve, but I want to suggest that its critical to usability: its a non-obvious limitation that will cause lots of head-scratching and push-back from users of a deployment.

Can you perhaps detail the Cassandra limitations that drive this behaviour somewhere? Also, what is the recommended backend? We went with Cassandra because the Uber blog post suggests Uber is running Cassandra :)

yurishkuro · 2017-11-29T23:00:55Z

"its critical to usability" - fwiw Zipkin lives with the same limitation for years. You need to bump the number if you need more exotic searches. We could rename it from LIMIT to something amorphous like "search depth" in the UI.

I wouldn't say Cassandra is the recommended backend, it's mostly an operational preference for people. But Elastic doesn't have that LIMIT problem because of how ES itself implements it (fanout to all nodes where each node returns LIMIT results). The benefit of Cassandra is higher throughput.

The main issue for say a query with two tags is that we're maintaining exact-match Cassandra indices, e.g. {service-name}-{tag-key}-{tag-value} => {trace-id}. So if you do a search by two tags, we execute two queries, both with the LIMIT provided in the input, and then intersect the resulting sets of trace IDs. Cassandra 3.4+ supports SASI indices that we thought would address this issue (they sort of work similar to ES and you need to fan out request to all nodes in the cluster), but their performance turned out to be even worse than ES, and not just on writes, but on reads (update: it is possible we didn't use it correctly).

We've discussed a possible hack of repeating queries for each tag by gradually increasing LIMIT for each query until the intersection is also of LIMIT size. Never had a chance to try to implement it, not even sure how well it would work.

So in summary - we have no plans to fix this just yet. Silver lining - we're looking into other solutions based on aggregations that could make the whole point of searching for individual traces less important.

Dieterbe · 2018-06-07T13:28:49Z

I do agree that it's a usability problem.
It's easy to forget about this limitation and then people run into issues like:

un-filtered search -> get results, look at a trace. copy paste a tag
search for the tag
no results ??

this makes jaeger look like an unreliable piece of software and people don't want to use it.

gouthamve · 2018-08-23T11:45:53Z

Hi, this is causing a lot of inconsistent results, and not giving me what I want. See the behaviour here: https://youtu.be/m7qZJIyCmGY

Essentially, this is giving me only the last 3-4 traces and the inconsistency b/w queries is worrying especially because all the spans might not yet be added to the traces being shown and I'd want to see older traces.

Could we atleast show a warning that the results will be off and point to this ticket maybe? Quite frankly, I wouldn't have been able to find this issue (thanks @Dieterbe for the pointer).

tiffon · 2018-08-31T02:17:46Z

@gouthamve, thanks to you and the others on this thread for calling out this issue.

This is definitely a severe issue, and it's great to know the extent it's affecting you.

I'll break the problems described into two broad categories:

Challenges with search in Cassandra
Challenges with results containing incomplete traces

For # 1, we have two tracks for addressing it. For the longer-term, we're currently prototyping a more robust (and expressive) search, and we expect to be able go live with it by the end of the year. It should be able to address 1 as well as lay the ground work for looking at aggregated data. In the shorter term, we're looking at ways to keep users more informed about the limitations of the Cassandra search. To this end, we created Inform users of jaegertracing/jaeger#166 when Cassandra is the backing store.

The UI ticket (ui-243) is definitely not a solution, but would you say it would have been helpful to be aware that it's a known issue?

Determining a resolution to # 2 is still a work in progress. One of the main challenges is, it's impossible to know, with 100% certainty, when a trace is complete. One approach we're considering is to show the number of spans associated with a trace when viewing search results and to update that number, in real time. The idea being if that number goes up while a user is viewing the search results, then the trace is probably not complete. But, whether this is the right approach or not is still TBD. I wish I had better news, on this front.

Lastly, your feedback is super useful; thanks again for letting us know this came up in a severe fashion.

rbtcollins · 2018-08-31T02:30:32Z

Re: 2 - is there a separate ticket for that? I have some thoughts but don't think this is the right ticket.

tiffon · 2018-08-31T15:55:39Z

@rbtcollins Great! Currently, we don't have a ticket for issues around incomplete traces. Can you start one to capture your thoughts?

yurishkuro · 2018-11-10T17:40:11Z

I can see three things we could do here (higher priority first):

Make it clear in the documentation the limitation of Cassandra search and recommend Elasticsearch to people who are interested in the search use case (as opposed to data-mining driven navigation)
For Cassandra users, implement Inform users of jaegertracing/jaeger#166 when Cassandra is the backing store jaeger-ui#243
Consider a different Cassandra storage implementation that uses SASI indices (we had an earlier experience with SASI that resulted in unusable query latency, but I think we did it incorrectly. There has been improvements in Zipkin on that front, including using a new tokenizer specifically implemented in Cassandra to support Zipkin tag search).

We should discuss it at the next project call next Friday.

dobegor · 2019-05-03T15:11:43Z

Is there any progress regarding this? ES users still can't specify tags alongside minDuration.

yurishkuro · 2019-05-07T15:17:05Z

this might be fixed by #1477, once released

black-adder mentioned this issue Jan 17, 2018

Incorrect trace count returned when in UI when not using filters with Cassandra storage #652

Closed

Dieterbe mentioned this issue Aug 9, 2018

slow query log grafana/metrictank#982

Closed

tiffon mentioned this issue Aug 30, 2018

Inform users of jaegertracing/jaeger#166 when Cassandra is the backing store jaegertracing/jaeger-ui#243

Open

This was referenced Sep 1, 2018

Search returns results in different order every time #892

Closed

cannot query for minduration and tags simultaneously #690

Closed

Unnecessary restriction on tags + min-duration queries #1047

Closed

tiffon mentioned this issue Sep 25, 2018

Performance for large number of traces jaegertracing/jaeger-ui#247

Open

pavolloffay added the area/storage label Sep 25, 2018

yurishkuro changed the title ~~jaeger-query might not return search results due to LIMIT parameter~~ [storage] limitations of Cassandra search on LIMIT and complex queries Nov 10, 2018

This was referenced Nov 10, 2018

[cassandra] does not handle limit parameter correctly #411

Closed

Strange limit behavior #1167

Closed

Queries with operation and tags fail with Cassandra as storage #768

Closed

vprithvi mentioned this issue Nov 10, 2018

Improve log/tag search #238

Open

yurishkuro added the help wanted Features that maintainers are willing to accept but do not have cycles to implement label Nov 10, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[storage] limitations of Cassandra search on LIMIT and complex queries #166

[storage] limitations of Cassandra search on LIMIT and complex queries #166

vprithvi commented May 16, 2017

yurishkuro commented May 16, 2017

vprithvi commented May 16, 2017

Dieterbe commented Nov 29, 2017

yurishkuro commented Nov 29, 2017

rbtcollins commented Nov 29, 2017

yurishkuro commented Nov 29, 2017 •

edited

Loading

Dieterbe commented Jun 7, 2018

gouthamve commented Aug 23, 2018 •

edited

Loading

tiffon commented Aug 31, 2018

rbtcollins commented Aug 31, 2018

tiffon commented Aug 31, 2018

yurishkuro commented Nov 10, 2018

dobegor commented May 3, 2019

yurishkuro commented May 7, 2019

[storage] limitations of Cassandra search on LIMIT and complex queries #166

[storage] limitations of Cassandra search on LIMIT and complex queries #166

Comments

vprithvi commented May 16, 2017

yurishkuro commented May 16, 2017

vprithvi commented May 16, 2017

Dieterbe commented Nov 29, 2017

yurishkuro commented Nov 29, 2017

rbtcollins commented Nov 29, 2017

yurishkuro commented Nov 29, 2017 • edited Loading

Dieterbe commented Jun 7, 2018

gouthamve commented Aug 23, 2018 • edited Loading

tiffon commented Aug 31, 2018

rbtcollins commented Aug 31, 2018

tiffon commented Aug 31, 2018

yurishkuro commented Nov 10, 2018

dobegor commented May 3, 2019

yurishkuro commented May 7, 2019

yurishkuro commented Nov 29, 2017 •

edited

Loading

gouthamve commented Aug 23, 2018 •

edited

Loading