-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exporterhelper does not respect context deadlines #11183
Labels
Comments
This was referenced Sep 17, 2024
This was referenced Oct 7, 2024
jackgopack4
pushed a commit
to jackgopack4/opentelemetry-collector
that referenced
this issue
Oct 8, 2024
… retry (open-telemetry#11331) #### Description The retry sender will delay until the context is canceled, where instead it could fail fast with a transient error and a clear message that no more retries are possible given the configuration. #### Link to tracking issue Part of open-telemetry#11183 #### Testing One new test.
HongChenTW
pushed a commit
to HongChenTW/opentelemetry-collector
that referenced
this issue
Dec 19, 2024
… retry (open-telemetry#11331) #### Description The retry sender will delay until the context is canceled, where instead it could fail fast with a transient error and a clear message that no more retries are possible given the configuration. #### Link to tracking issue Part of open-telemetry#11183 #### Testing One new test.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Describe the bug
In a gateway configuration of the collector, it is likely for an OTLP receiver and exporter to be configured. Any gRPC exporter/receiver pair that both propagate timeout information are likely to fall into this problem scenario. (For example, the OTel-Arrow components will behave the same after open-telemetry/opentelemetry-collector-contrib#34742 merges.)
In a synchronous pipeline, it means the caller's context is likely to have a deadline before it reaches an exporter, typically added by the pipeline receiver or propagated from an earlier pipeline segment's exporter.
Note that the timeout sender does recognize the incoming deadline, but not intentionally. The Golang context package will never increase a deadline, but it will shorten a deadline:
The result is that exporterhelper will pass through a timeout shorter than its configured timeout. This may be working as intended for some, and a bug for others. When I set a 30s timeout on the exporter and my receiver observes a 5s timeout instead, it's likely to cause confusion.
The behavior of the retry sender is even more problematic. The retry sender is likely to create the problem described above by ignoring the arriving context timeout. When the retry sender's max_elapsed setting is greater than the arriving context timeout, we are virutally assured of issuing one short-timeout request at the end of a series of retries.
Say you've configured your exporterhelper like this:
If there is an incoming context deadline that is shorter than the configured max_elapsed setting, retry sending will continue past the original deadline. Since the arriving context deadline is preserved through all of this, the final export request is likely to have a timeout shorter than 30s. This leads to confusion.
Steps to reproduce
To reproduce this I created a client application that would send as fast as it can through a collector pipeline w/ both timeout and retry sender configured.
What did you expect to see?
I was expecting to see what I saw, because I've seen this in production and wanted to reproduce it.
What did you wish to see instead?
I have several wishes here.
First, the retry sender should consider the context deadline. When
backoffDelay
is computed it should be compared against the context deadline (if set). If there is a deadline that is less than the backoff delay, we should not fall into the selct statement and wait for cancelation, we should fail fast at that point w/ an gRPC Aborted code, message "insufficient context deadline for retry".Second, the exporter timeout sender would ideally have a way to impose a minimum timeout. When the timeout sender is configured with a 5s timeout and the arriving context has a 1s deadline, I think I'd like two optional behaviors:
Third, I would like for the OTLP receiver to support the same feature, a configuration that allows ignoring Receiver to ignore arriving timeouts when they are unreasonably short. This will have to be done on a per-receiver basis, but possibly a
receiverhelper
library could help receivers have uniform behavior.What version did you use?
v0.108.x
What config did you use?
as shown above.
Environment
Additional context
The text was updated successfully, but these errors were encountered: