Span with lots of Logs #571

phal0r · 2017-11-29T18:56:54Z

Hey Guys,

first of all, this is not a feature request, but I want to have some more opinions. We want to use jaeger for centralizing our logs of process executions. The data model is a perfect fit, but we want to store 2.000-5.000 logs in one span, since some steps generate lot of important logs.

I can think about 3 ways to solve this problem:
a) this is a concern of jaeger and should be supported in future versions
b) these spans should be split into subspans with less logs per span
c) spans should be stored without logs in jaeger and a third party system should store the logs with a reference of the span id

Now i am curious about your thoughts. Thanks in advance for thinking this through :)

yurishkuro · 2017-12-17T01:30:15Z

All three are possible solutions, all have pros and cons, there's no single "correct answer".

For (a), you might run into the UDP max packet size trying to export that many logs out of process in a single span. Also, indexing all those logs will be quite expensive, so it depends on what you're trying to do with them aside from making them accessible in the UI in a contextual way. Also, the UI might not be happy trying to display that many logs per span.

I would be curious to know what your use case is that requires so many log lines per single span. Are you talking about some long-running operation?

tiffon · 2017-12-19T14:08:50Z

One tangential consideration is to support log-levels as a collector configuration. Not addressing the OP's issue, directly, but could be part of concerted effort to enable larger-scale logging.

@yurishkuro Are logs indexed? If so, we can expose that in the search UI.

yurishkuro · 2017-12-19T15:33:28Z

by default logs are indexed by their fields

phal0r · 2017-12-26T20:57:36Z

@yurishkuro
I really enjoy repos with a good discussion culture, so thanks for that.

The use case is a modular processing system, where the processes itself are also modular and split up into steps, that can be run in series or parallel. We have an older ui, which renders these as a tree panel (known from senchas extjs). We want to replace this with jaeger, since the data model would be a perfect fit. The running process is a trace and every step is a span. sequentially and parallel running steps can be modeled with the parent relations. There is a log4j appender which controls the generation of the spans and redirects logging calls to the span.log interface. These processes can run for a longer time (this is why we are working on this jaegertracing/jaeger-client-java#231 , too), like several minutes.

Yes, the plan is to index the logs, so they are searchable. It wouldn't make much sense to display all the 5000 logs at once. In the old ui it was possible to page through the logs. The logs were saved in an oracle database and had the id of the process run (in jaeger terms a trace) as a foreign key.

The jaeger ui would need pagination and a log search endpoint by span id and/or trace id, if we consider this as a good feature for jaeger.

Also, in my current understading a span is the smallest piece of data, that is sent via thrift. I guess we would need to change that, in order to sent these amounts of logs to a collector. Maybe small sets of logs . But I didn't think this through, I am just thinking out loud.

@tiffon The question is, do we really need loglevels? In more recent pieces of software I am more used to tags, which allow for fine-grained filtering of logs for appenders, but I am curious on your experience with tags so far. I guess the place for this functionality would be the client, which decides to include a log or drop it based on (preconfigured) filter rules.

yurishkuro · 2017-12-26T21:39:33Z

The use case is a modular processing system, where the processes itself are also modular and split up into steps, that can be run in series or parallel.

My first instinct is that it would make a lot more sense to represent each "step" as a separate span. This indirectly addresses the question of many logs per span, but more importantly it seems like a better representation of the application and its transaction semantics. After all, a span is supposed to represent an operation within the application, which sounds very much like those "steps". The idea is that you should be able to reason easily about what happened within a span, and I don't see how that's possible if the span contains up to 5000 events (logs). And if your "steps" can run in parallel, it makes even more sense to split them into different spans, using adequate span references between them to properly describe the causality relationships, say in order to calculate critical path through a larger transaction.

phal0r · 2017-12-26T22:09:33Z

Yeah, maybe I made myself unclear. This is exactly our intention. Every span represents a step of this process. Jaeger is a perfect fit to store and visualize this structure. Even a failing step would be easily visible through the jaeger mechanics, since a span can be marked as errorneous.

Maybe the most important information I forgot: The reason why one span has that many logs is, that there are loops inside a step, which generate this mass of logs.

phal0r · 2018-01-10T16:01:29Z

@yurishkuro
Would love to get your feedback on this.

yurishkuro · 2018-01-10T17:44:12Z

Sorry, I don't think I have any more insight. Like we said, there multiple possible solutions, you need to evaluate pros and cons in your specific case. Personally I am having hard time seeing how 5000 logs per span are useful, it sounds like information overload.

Do you have a specific question I can help with?

Side note: at Uber we collect logs separately, usually with Kafka/elk stack, and we have plans to build integration into Jaeger UI to pull logs for a given span from an external source.

phal0r · 2018-01-13T11:54:03Z

@yurishkuro
Ok, I understand. My feeling ist, that (a) as a jaeger feature with pagination and everything would change a lot in jaeger itself, since that it would be necessary to stream the logs somehow. This seems not like the way to go.

Generating a span for each loop seems like we could give it a try, since a span could also hold context information of every loop. You mentioned in the v1 release post, that it's possible to work with traces containing 50.000 spans and more. So this should be sufficient.

Otherwise I would go for (c) and I would be interested, if you have more precise plans, how pulling external logs could look like. If we decide to go this route, it would be good to align our implementation with your envisioned structure :)

yurishkuro · 2018-01-13T20:13:12Z

Otherwise I would go for (c) and I would be interested, if you have more precise plans, how pulling external logs could look like.

We're currently in the design phase for our internal unified log query service that is meant to abstract away the actual location of the service logs (e.g. Elasticsearch, Kafka, HDFS, even files on the hosts). Once we have that we can look into integrating Jaeger with it (#649).

otisg · 2018-04-12T18:25:07Z

c) spans should be stored without logs in jaeger and a third party system should store the logs with a reference of the span id

+1 for this approach and +11 for #649

jpkrohling · 2019-04-02T13:59:06Z

@phal0r what did you end up doing? Would be nice to have your insights after all this time :)

phal0r · 2019-04-04T07:29:27Z

@jpkrohling
yeah, that was a long time ago :)

we ended up doing c) and defined a log format for all our applications. We didn't just want to store log messages as string, but more structured ones. So we put the data into elasticsearch and defined a mapping for these indexes. As suggested spanId and traceId are attributes of the log messages. The first implementation is to display these logs in a grafana table (i.e. filtered by errors). In grafana it is easy to define an additional column, which creates clickable jaeger links. This is straightforward as we have the traceId and can just construct these links to the jaeger ui.

So far, there we don't have a convenient way for the other direction from trace to logs, but it is easy to copy the traceId and search for it in Kibana for example. We also had in mind, that this leaves the route open for #649 to integrate 3rd party logs into jaeger in the future.

jpkrohling · 2019-04-04T07:49:05Z

Thanks for your comment. My question was indeed in relation to #649 and your experience might help us shape that feature.

stale · 2022-01-09T04:04:41Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

stale · 2022-04-16T07:22:13Z

This issue has been automatically closed due to inactivity.

jpkrohling added the question label Apr 2, 2019

stale bot added the stale The issue/PR has become stale and may be auto-closed label Jan 9, 2022

stale bot closed this as completed Apr 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Span with lots of Logs #571

Span with lots of Logs #571

phal0r commented Nov 29, 2017

yurishkuro commented Dec 17, 2017

tiffon commented Dec 19, 2017

yurishkuro commented Dec 19, 2017

phal0r commented Dec 26, 2017

yurishkuro commented Dec 26, 2017

phal0r commented Dec 26, 2017

phal0r commented Jan 10, 2018

yurishkuro commented Jan 10, 2018

phal0r commented Jan 13, 2018

yurishkuro commented Jan 13, 2018

otisg commented Apr 12, 2018

jpkrohling commented Apr 2, 2019

phal0r commented Apr 4, 2019

jpkrohling commented Apr 4, 2019

stale bot commented Jan 9, 2022

stale bot commented Apr 16, 2022

Span with lots of Logs #571

Span with lots of Logs #571

Comments

phal0r commented Nov 29, 2017

yurishkuro commented Dec 17, 2017

tiffon commented Dec 19, 2017

yurishkuro commented Dec 19, 2017

phal0r commented Dec 26, 2017

yurishkuro commented Dec 26, 2017

phal0r commented Dec 26, 2017

phal0r commented Jan 10, 2018

yurishkuro commented Jan 10, 2018

phal0r commented Jan 13, 2018

yurishkuro commented Jan 13, 2018

otisg commented Apr 12, 2018

jpkrohling commented Apr 2, 2019

phal0r commented Apr 4, 2019

jpkrohling commented Apr 4, 2019

stale bot commented Jan 9, 2022

stale bot commented Apr 16, 2022