Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Span with lots of Logs #571

Closed
phal0r opened this issue Nov 29, 2017 · 16 comments
Closed

Span with lots of Logs #571

phal0r opened this issue Nov 29, 2017 · 16 comments
Labels
stale The issue/PR has become stale and may be auto-closed

Comments

@phal0r
Copy link

phal0r commented Nov 29, 2017

Hey Guys,

first of all, this is not a feature request, but I want to have some more opinions. We want to use jaeger for centralizing our logs of process executions. The data model is a perfect fit, but we want to store 2.000-5.000 logs in one span, since some steps generate lot of important logs.

I can think about 3 ways to solve this problem:
a) this is a concern of jaeger and should be supported in future versions
b) these spans should be split into subspans with less logs per span
c) spans should be stored without logs in jaeger and a third party system should store the logs with a reference of the span id

Now i am curious about your thoughts. Thanks in advance for thinking this through :)

@yurishkuro
Copy link
Member

All three are possible solutions, all have pros and cons, there's no single "correct answer".

For (a), you might run into the UDP max packet size trying to export that many logs out of process in a single span. Also, indexing all those logs will be quite expensive, so it depends on what you're trying to do with them aside from making them accessible in the UI in a contextual way. Also, the UI might not be happy trying to display that many logs per span.

I would be curious to know what your use case is that requires so many log lines per single span. Are you talking about some long-running operation?

@tiffon
Copy link
Member

tiffon commented Dec 19, 2017

One tangential consideration is to support log-levels as a collector configuration. Not addressing the OP's issue, directly, but could be part of concerted effort to enable larger-scale logging.

@yurishkuro Are logs indexed? If so, we can expose that in the search UI.

@yurishkuro
Copy link
Member

by default logs are indexed by their fields

@phal0r
Copy link
Author

phal0r commented Dec 26, 2017

@yurishkuro
I really enjoy repos with a good discussion culture, so thanks for that.

The use case is a modular processing system, where the processes itself are also modular and split up into steps, that can be run in series or parallel. We have an older ui, which renders these as a tree panel (known from senchas extjs). We want to replace this with jaeger, since the data model would be a perfect fit. The running process is a trace and every step is a span. sequentially and parallel running steps can be modeled with the parent relations. There is a log4j appender which controls the generation of the spans and redirects logging calls to the span.log interface. These processes can run for a longer time (this is why we are working on this jaegertracing/jaeger-client-java#231 , too), like several minutes.

Yes, the plan is to index the logs, so they are searchable. It wouldn't make much sense to display all the 5000 logs at once. In the old ui it was possible to page through the logs. The logs were saved in an oracle database and had the id of the process run (in jaeger terms a trace) as a foreign key.

The jaeger ui would need pagination and a log search endpoint by span id and/or trace id, if we consider this as a good feature for jaeger.

Also, in my current understading a span is the smallest piece of data, that is sent via thrift. I guess we would need to change that, in order to sent these amounts of logs to a collector. Maybe small sets of logs . But I didn't think this through, I am just thinking out loud.

@tiffon The question is, do we really need loglevels? In more recent pieces of software I am more used to tags, which allow for fine-grained filtering of logs for appenders, but I am curious on your experience with tags so far. I guess the place for this functionality would be the client, which decides to include a log or drop it based on (preconfigured) filter rules.

@yurishkuro
Copy link
Member

The use case is a modular processing system, where the processes itself are also modular and split up into steps, that can be run in series or parallel.

My first instinct is that it would make a lot more sense to represent each "step" as a separate span. This indirectly addresses the question of many logs per span, but more importantly it seems like a better representation of the application and its transaction semantics. After all, a span is supposed to represent an operation within the application, which sounds very much like those "steps". The idea is that you should be able to reason easily about what happened within a span, and I don't see how that's possible if the span contains up to 5000 events (logs). And if your "steps" can run in parallel, it makes even more sense to split them into different spans, using adequate span references between them to properly describe the causality relationships, say in order to calculate critical path through a larger transaction.

@phal0r
Copy link
Author

phal0r commented Dec 26, 2017

Yeah, maybe I made myself unclear. This is exactly our intention. Every span represents a step of this process. Jaeger is a perfect fit to store and visualize this structure. Even a failing step would be easily visible through the jaeger mechanics, since a span can be marked as errorneous.

Maybe the most important information I forgot: The reason why one span has that many logs is, that there are loops inside a step, which generate this mass of logs.

@phal0r
Copy link
Author

phal0r commented Jan 10, 2018

@yurishkuro
Would love to get your feedback on this.

@yurishkuro
Copy link
Member

Sorry, I don't think I have any more insight. Like we said, there multiple possible solutions, you need to evaluate pros and cons in your specific case. Personally I am having hard time seeing how 5000 logs per span are useful, it sounds like information overload.

Do you have a specific question I can help with?

Side note: at Uber we collect logs separately, usually with Kafka/elk stack, and we have plans to build integration into Jaeger UI to pull logs for a given span from an external source.

@phal0r
Copy link
Author

phal0r commented Jan 13, 2018

@yurishkuro
Ok, I understand. My feeling ist, that (a) as a jaeger feature with pagination and everything would change a lot in jaeger itself, since that it would be necessary to stream the logs somehow. This seems not like the way to go.

Generating a span for each loop seems like we could give it a try, since a span could also hold context information of every loop. You mentioned in the v1 release post, that it's possible to work with traces containing 50.000 spans and more. So this should be sufficient.

Otherwise I would go for (c) and I would be interested, if you have more precise plans, how pulling external logs could look like. If we decide to go this route, it would be good to align our implementation with your envisioned structure :)

@yurishkuro
Copy link
Member

Otherwise I would go for (c) and I would be interested, if you have more precise plans, how pulling external logs could look like.

We're currently in the design phase for our internal unified log query service that is meant to abstract away the actual location of the service logs (e.g. Elasticsearch, Kafka, HDFS, even files on the hosts). Once we have that we can look into integrating Jaeger with it (#649).

@otisg
Copy link

otisg commented Apr 12, 2018

c) spans should be stored without logs in jaeger and a third party system should store the logs with a reference of the span id

+1 for this approach and +11 for #649

@jpkrohling
Copy link
Contributor

@phal0r what did you end up doing? Would be nice to have your insights after all this time :)

@phal0r
Copy link
Author

phal0r commented Apr 4, 2019

@jpkrohling
yeah, that was a long time ago :)

we ended up doing c) and defined a log format for all our applications. We didn't just want to store log messages as string, but more structured ones. So we put the data into elasticsearch and defined a mapping for these indexes. As suggested spanId and traceId are attributes of the log messages. The first implementation is to display these logs in a grafana table (i.e. filtered by errors). In grafana it is easy to define an additional column, which creates clickable jaeger links. This is straightforward as we have the traceId and can just construct these links to the jaeger ui.

So far, there we don't have a convenient way for the other direction from trace to logs, but it is easy to copy the traceId and search for it in Kibana for example. We also had in mind, that this leaves the route open for #649 to integrate 3rd party logs into jaeger in the future.

@jpkrohling
Copy link
Contributor

Thanks for your comment. My question was indeed in relation to #649 and your experience might help us shape that feature.

@stale
Copy link

stale bot commented Jan 9, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

@stale stale bot added the stale The issue/PR has become stale and may be auto-closed label Jan 9, 2022
@stale
Copy link

stale bot commented Apr 16, 2022

This issue has been automatically closed due to inactivity.

@stale stale bot closed this as completed Apr 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale The issue/PR has become stale and may be auto-closed
Projects
None yet
Development

No branches or pull requests

5 participants