-
Notifications
You must be signed in to change notification settings - Fork 899
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OTEP: Recording exceptions as log based events #4333
base: main
Are you sure you want to change the base?
Conversation
I think this is a related issue: |
b06a09f
to
76c7d85
Compare
1a1ea49
to
5ddfd05
Compare
db27087
to
e9f38aa
Compare
A small doubt:
Although (I think) it's not called out, I'm understanding exceptions should now be explicitly reported as both 1) Span.Event and 2) Log/Event? i.e. coding wise you should do this: currentSpan.recordException(e);
logger.logRecordBuilder
.addException(e); Is this the case? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I'm very supportive. Just some nits and one mitigation I'd like to see called out/addressed.
In my opinion what matters is not what's possible, what matters is what the real world experience is going to be for users and backend implementers. And the reality is that this change may have a serious negative impact on both. |
I don't think Otel ever recommended Exceptions MUST/SHOULD be reported via SpanEvents. It had conventions for reporting exception via SpanEvent and Logs. (logs convention came later than Span), but never recommended one over other. This OTEP would be the first time Otel officially makes a recommendation on the preferred way of reporting exception. (That is my read. Happy to be corrected!) |
I agree with your interpretation. My point is that recording events via logs does have downsides. Right now you have to opt into it so it's okay to have to do something like the |
I think this depends on the backend/vendor. Recording exceptions via logs has downsides. Recording exceptions via SpanEvents also has downsides too. Having no recommendation from Otel is also not good. It is definitely possible to have something in the Otel Spec for SDKs, to have a feature flag to control this based on user preference. (The feature flag, can do the conversion of exceptions from LogRecord to SpanEvents OR vice-versa, with or without duplicating.). It is also possible to have feature-flag for instrumentations, but it may be easier to have an SDK level thing that'll ensure consistent behavior, irrespective of the instrumentation used. |
to @alexmojaki
You could bring them on the SDK level with a events-to-span-events processor. It's a great question how the span-events -> logs migration would look like and whether such processor should be provided by contrib component, individual distros or by default.
Correlated logs with errors and other details without parent span are pretty valuable regardless of tracing or sampling.
this is part of the previous OTEP on Events - https://github.com/open-telemetry/opentelemetry-specification/blob/main/oteps/0265-event-vision.md#relationship-to-span-events. In the long term, we hope to replace span events with log-based events with migration story for consumers.
We report
Does it include the stack trace? The rest is (or can be) stored for a single 'terminal' exception that the span ends with. Or is your goal to store exception chain and/or handled exceptions that happened during span lifetime? The counter-argument I have is that there are a lot of exceptions that happen during span execution and most of them are already recorded as logs by runtimes, client libs, frameworks, etc. I.e. you're already in the world where exceptions are not exported along with spans. It's a fair ask though that user app may want to associate arbitrary exceptions with a span today and we're taking it away when moving to logs. |
Co-authored-by: Trask Stalnaker <[email protected]>
Co-authored-by: Trask Stalnaker <[email protected]>
Co-authored-by: Trask Stalnaker <[email protected]>
@adriangb @alexmojaki @samuelcolvin would you mind summarizing where we are with each of your concerns above as separate review comments (on the Files changed tab at the top, pick a relevant-ish line and add the summary of where we're at there) so we can have "threaded" discussions on each one of them? I really want to follow along and see what needs to be addressed, but am having trouble following with everything as mainline comments 😅 thank you ❤️ |
2. Recording exceptions as logs will result in UX degradation for users | ||
leveraging trace-only backends such as Jaeger. | ||
|
||
3. Having exceptions exported and stored along with span is beneficial for some backends. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commenting here to follow the request of commenting somewhere vaguely relevant for a threaded discussion.
Let me check if I understand things correctly. Currently this code:
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import SimpleSpanProcessor, ConsoleSpanExporter
from requests import ConnectionError
provider = TracerProvider()
provider.add_span_processor(SimpleSpanProcessor(ConsoleSpanExporter()))
tracer = provider.get_tracer(__name__)
with tracer.start_as_current_span("foo"):
raise ConnectionError("bar")
prints a span containing:
{
"status": {
"status_code": "ERROR",
"description": "ConnectionError: bar"
},
"attributes": {},
"events": [
{
"name": "exception",
"timestamp": "2025-02-07T10:35:52.726398Z",
"attributes": {
"exception.type": "requests.exceptions.ConnectionError",
"exception.message": "bar",
"exception.stacktrace": "Traceback...",
"exception.escaped": "False"
}
}
]
}
Am I correct that the goal is to instead emit the following?
{
"status": {
"status_code": "ERROR",
"description": "bar"
},
"attributes": {
"error.type": "requests.exceptions.ConnectionError"
}
}
The differences being:
- Span events are gone, and the stacktrace will only be found in a child event-log.
- The span event attribute
exception.type
containing the fully qualified exception type name is now in the span attributeerror.type
- The status description no longer contains the (unqualified) exception name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am I correct that the goal is to instead emit the following?
Not the goal for this OTEP. This OTEP tells how to record exception and errors on logs and does not change how the span status or attributes are populated.
Span events are gone, and the stacktrace will only be found in a child event-log.
yes
The span event attribute exception.type containing the fully qualified exception type name is now in the span attribute error.type
It's already in the spec, not part of this OTEP - https://github.com/open-telemetry/semantic-conventions/blob/main/docs/general/recording-errors.md.
Error.type
on span can as well be something more specific depending on what tracing instrumentation decided to put into error.type
. Error logs and traces are likely to be captured at different layers and there is no assumption that all exceptions will make it to the tracing layer. Tracing layer may get nothing, or aggregated exception, or a different one suppressing the original one.
The status description no longer contains the (unqualified) exception name.
This OTEP does not change anything about span status.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not the goal for this OTEP.
I understand that, but this OTEP is removing the span events so I want to make sure that the spec (or the plan for the spec) ensures that at least exception.type
and exception.message
are still directly in the span.
The span event attribute exception.type containing the fully qualified exception type name is now in the span attribute error.type
It's already in the spec, not part of this OTEP - https://github.com/open-telemetry/semantic-conventions/blob/main/docs/general/recording-errors.md.
But before seeing this issue I'd never heard of error.type
before. That page is apparently less than a month old but the semantic convention seems to be significantly older so I'm trying to figure out what happened. The Python SDK currently doesn't set that attribute based on the snippet above, is that just because it's a bit behind? https://github.com/open-telemetry/semantic-conventions/blob/main/docs/general/recording-errors.md#recording-exceptions shows error.type
and the span status being recorded manually. Are they not supposed to be recorded automatically by the appropriate tracer/span API methods when an exception bubbles through? BTW the snippet doesn't actually end the span.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This attribute is mentioned in every semantic conventions, e.g. https://github.com/open-telemetry/semantic-conventions/blob/main/docs/http/http-spans.md and the recording-errors page is a generalization. Python should set error.type
in specific instrumentations that report spans.
If python supports ending span with exception in a generic way, it should start setting error.type
there - please create an issue for them if they don't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clarification on "I understand that, but this OTEP is removing the span events "
It does not remove span events. It's a design proposal for the future when events are stabilized to stop using span events. The actionable part of your feedback that I see is that we should consider having something like Span.end(exception)
instead of Span.recordException(exception)
that would record the exception.*
attributes (except the stacktrace) on span and ensure all the details are populated. Did I get it right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like that, yes.
What's the plan for Span.recordException
? Will that method be removed, or will it switch to generating a child event-log?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is no concrete plan yet. It'd be great if you could create an issue in this repo to start the discussion on migration details.
Hi, I have a question regarding the proposed In a more traditional logging based stack, when logging an exception, the developer should add a human readable error message explaining the underlying exception's effect on the current process. Without these messages, an observer has a very hard time figuring out what broke from a business perspective. {
"level": "ERROR",
"message": "Could not generate invoice for order XYZ.",
"exception": "java.lang.NullPointerException",
"stack_trace": "..."
} If I understand this OTEP correctly, this log message would not change much and the corresponding trace would look like this: {
"status": {
"status_code": "ERROR",
"description": "Could not generate invoice for order XYZ."
},
"attributes": {
"error.type": "java.lang.NullPointerException"
}
} Is this correct? |
@ChristopherKlinge please check out https://github.com/open-telemetry/semantic-conventions/blob/main/docs/general/recording-errors.md which covers existing This OTEP is about exception and errors on logs and encourages to add additional information to logs. |
In their next major version, log bridges SHOULD start using `setException` method or | ||
follow defaults documented for OTel SDK when explicitly recording `exception.*` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In their next major version, log bridges SHOULD start using `setException` method or | |
follow defaults documented for OTel SDK when explicitly recording `exception.*` | |
In their next major version, log bridges SHOULD start using the `setException` method or | |
follow the defaults documented for OTel SDK when explicitly recording `exception.*` |
Related to open-telemetry/semantic-conventions#1536
Changes
Recording exceptions as span events is problematic since it
This OTEP provides guidance on how to record exceptions using OpenTelemetry logs focusing on minimizing duplication and providing context to reduce the noise.
If accepted, the follow-up spec changes are expected to replace existing (stable) documents:
Related OTEP(s) #CHANGELOG.md
file updated for non-trivial changesspec-compliance-matrix.md
updated if necessary