-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Supportability: improve own logs #2102
Comments
Can we please discuss this before continuing? I'm not sure the first item in the list is a good move, especially without a heads-up to current users of the collector. I think moving from structured JSON logs to "mix of tab-separated and JSON" logs is a step back. People who want to have human-readable logs should be able to use the Before this change, I could simply pipe the console output to At the very least, this new (breaking) change should be behind a flag so that people can control which behavior they want. Before #2106/#2109, where how logs looked like in the
And here's how it looks now:
|
@jpkrohling Yes, let's discuss. I understand what you say, however I disagree. Collector is a unique piece of infrastructure. It needs to be observable and debuggable even when the observability system is broken. This means we cannot rely only on traditional approaches to observe the Collector. One of the important areas where I think we have to be different from the traditional practices is readability of logs. Typically for every other service using JSON is highly desirable since it is machine readable and can be collected in the logging system precisely where it is searchable and queryable and where most people will be looking at the logs. Tab-delimited logs are vastly more readable when all you have is the console.
I agree. I think non-JSON logs should be the default and we can have a command line option to output JSON logs. We can also add this change to the CHANELOG to bring more visibility to it. |
@jpkrohling see #2177 |
Thanks for addressing my concerns, @tigrannajaryan!
Agree, but I think it ends up depending on how we see/deal with the collector in production. If we have only a few instances (pet), we are likely to look at logs for the individual instances. However, for highly elastic scenarios (cattle), we'd rather have the logs being sent elsewhere and post-processed. In which case, having it easy to be parsed by machines is preferable. Based on the same train of thought, #2098 might probably not bring many benefits. |
My issue is closed as a duplicate, so I'd like to comment it here. I have a question about this point
I think there is a simple solution to add a global system logs rate limit with a corresponding flag. Why do I think it's actually enough for any purposes:
Even if you don't like the solution that I offer, this is a critical thing we cannot limit the total number of logs. Some implemented components do logging on every request (maybe error request, but anyway) and it's very hard to get rid of all those places |
I agree with your proposal as a whole, and I think our logger (zap) does support rate limiting. In any case, it would be good to have bug reports against those components. Logging on error is desirable, but we might be able to optimize hot paths... |
Yes, there is actually a possibility to tune logger. It's a pity I cannot do it from the box. Need to do manipulations with code. Is there a position on not exposing more flags? |
Not that I'm aware of. Each new flag comes with the need to document and maintain it, but nothing will prevent us from adding one if they are justifiable. |
* Correct status transform in OTLP exporter * Add changes to changelog
As for formatting, I think logfmt should be an option alongside JSON (and perhaps use the ZAP log structure). I find it far more readable raw, but still parsable (eg: within Grafana). |
The Collector's own logs are an important source of information for troubleshooting. In some cases own logs, available locally are the only available information for troubleshooting. Other source, such as own metrics require the Collector to be correctly configured to scrape itself, send the metrics to the backend and for the backend to be available. Even zPages, which are exposed locally by the Collector may not be available if for example the Collector crashes. In such cases logs are the only useful source of troubleshooting information.
In order to increase the value of Collector's log I suggest to make a few improvements:
Additional ideas that may be worth doing:
Note: we need to be careful to not flood the logs.
The text was updated successfully, but these errors were encountered: