Incompatibility between default retry settings and timeout settings #30305

rapphil · 2024-01-05T01:38:06Z

Component(s)

exporter/prometheusremotewrite

What happened?

Description

The prometheus remote write exporter implements its own retry logic (implemented in this PR) and does not use the queued_retry from the exporter helper. This has to be done so that we avoid out of order samples - data is split into smaller chunks and then submitted to workers that will send it to the backend using a retry strategy in case of failure. Each time series is guaranteed to be only in a single chunk, which guarantees that there won't be out of order samples.

This component is using the default timeout setting of 5s. However the retry settings are not consistent with this value: the max time that each request performed by a worker can be retried is 1 minute.

Therefore we can see that there is a great chance of timeout errors happening in case os consecutive retries.

Expected Result

We expect that the timeout settings is consistent with the retry logic implemented inside the component.

Actual Result

Consecutive retries can generate timeout errors.

Proposal

we would like to propose to remove the timeout from the exporter helper and instead set a timeout on the context just before requests are sent to the backend

Collector version

v0.90.1

Environment information

Environment

OS: (e.g., "Ubuntu 20.04")
Compiler(if manually compiled): (e.g., "go 14.2")

OpenTelemetry Collector configuration

No response

Log output

No response

Additional context

No response

github-actions · 2024-01-05T02:00:01Z

Pinging code owners for exporter/prometheusremotewrite: @Aneurysm9 @rapphil. See Adding Labels via Comments if you do not have permissions to add labels yourself.

jmacd · 2024-01-10T17:30:16Z

we would like to propose to remove the timeout from the exporter helper and instead set a timeout on the context just before requests are sent to the backend

I support this idea.

github-actions · 2024-03-11T03:31:16Z

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

exporter/prometheusremotewrite: @Aneurysm9 @rapphil

See Adding Labels via Comments if you do not have permissions to add labels yourself.

github-actions · 2024-05-10T05:19:26Z

This issue has been closed as inactive because it has been stale for 120 days with no activity.

rapphil added bug Something isn't working needs triage New item requiring triage labels Jan 5, 2024

bryan-aguilar added priority:p2 Medium exporter/prometheusremotewrite and removed needs triage New item requiring triage labels Jan 5, 2024

github-actions bot mentioned this issue Jan 9, 2024

Weekly Report: 2024-01-02 - 2024-01-09 #30334

Closed

github-actions bot added the Stale label Mar 11, 2024

github-actions bot added the closed as inactive label May 10, 2024

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale May 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incompatibility between default retry settings and timeout settings #30305

Incompatibility between default retry settings and timeout settings #30305

rapphil commented Jan 5, 2024 •

edited

Loading

github-actions bot commented Jan 5, 2024

jmacd commented Jan 10, 2024

github-actions bot commented Mar 11, 2024

github-actions bot commented May 10, 2024

Incompatibility between default retry settings and timeout settings #30305

Incompatibility between default retry settings and timeout settings #30305

Comments

rapphil commented Jan 5, 2024 • edited Loading

Component(s)

What happened?

Description

Expected Result

Actual Result

Proposal

Collector version

Environment information

Environment

OpenTelemetry Collector configuration

Log output

Additional context

github-actions bot commented Jan 5, 2024

jmacd commented Jan 10, 2024

github-actions bot commented Mar 11, 2024

github-actions bot commented May 10, 2024

rapphil commented Jan 5, 2024 •

edited

Loading