Timeout config key collision #37332

vigr · 2025-01-20T09:16:09Z

Component(s)

exporter/prometheusremotewrite

What happened?

Description

There is a collision of config keys here:

exporter/prometheusremotewriteexporter - Timeout for whole operation of PushMetrics collides with Timeout of HTTP requests to Victoria Metrics. The collision makes it impossible to configure different timeouts for whole operation vs each request thus preventing retries in case of timeouts.

Steps to Reproduce

Try to configure both Config.Timout and Config.ClientConfig.Timeout.

Expected Result

The timeouts are different.

Actual Result

The timeouts are equal and have the value given by "timeout" key in config file.

Collector version

v0.117.0,v0.111.0

Environment information

Environment

OS: (e.g., "Ubuntu 20.04")
Compiler(if manually compiled): (e.g., "go 14.2")

OpenTelemetry Collector configuration

receivers:
  otlp:
    protocols:
      http:
        endpoint: "0.0.0.0:4318"
      grpc:
        endpoint: "0.0.0.0:4317"

exporters:
  prometheusremotewrite:
    retry_on_failure:
      enabled: true
      initial_interval: 1s
      max_interval: 3s
      max_elapsed_time: 30s

    timeout: 5s
    endpoint: "http://127.0.0.1:8080/vm_insert"
    resource_to_telemetry_conversion:
      enabled: true

service:
  telemetry:
    metrics:
      level: detailed
      address: "0.0.0.0:8888"
  pipelines:
    metrics:
      receivers:
        - otlp
      exporters:
        - prometheusremotewrite

Log output

2025-02-12T14:58:17.100+0300	info	[email protected]/service.go:164	Setting up own telemetry...
2025-02-12T14:58:17.100+0300	warn	[email protected]/service.go:213	service::telemetry::metrics::address is being deprecated in favor of service::telemetry::metrics::readers
2025-02-12T14:58:17.100+0300	info	telemetry/metrics.go:70	Serving metrics	{"address": "0.0.0.0:8888", "metrics level": "Detailed"}
2025-02-12T14:58:17.101+0300	info	[email protected]/service.go:230	Starting otelcontribcol...	{"Version": "0.117.0-dev", "NumCPU": 8}
2025-02-12T14:58:17.101+0300	info	extensions/extensions.go:39	Starting extensions...
2025-02-12T14:58:17.101+0300	info	[email protected]/otlp.go:112	Starting GRPC server	{"kind": "receiver", "name": "otlp", "data_type": "metrics", "endpoint": "0.0.0.0:4317"}
2025-02-12T14:58:17.102+0300	info	[email protected]/otlp.go:169	Starting HTTP server	{"kind": "receiver", "name": "otlp", "data_type": "metrics", "endpoint": "0.0.0.0:4318"}
2025-02-12T14:58:17.102+0300	info	[email protected]/service.go:253	Everything is ready. Begin running and processing data.
2025-02-12T14:58:27.511+0300	error	internal/queue_sender.go:105	Exporting failed. Dropping data.	{"kind": "exporter", "data_type": "metrics", "name": "prometheusremotewrite", "error": "Permanent error: Permanent error: context deadline exceeded", "dropped_items": 1}
go.opentelemetry.io/collector/exporter/exporterhelper/internal.NewQueueSender.func1
	/go/pkg/mod/go.opentelemetry.io/collector/[email protected]/exporterhelper/internal/queue_sender.go:105
go.opentelemetry.io/collector/exporter/internal/queue.(*Consumers[...]).Start.func1
	/go/pkg/mod/go.opentelemetry.io/collector/[email protected]/internal/queue/consumers.go:43

Additional context

The text was updated successfully, but these errors were encountered:

github-actions · 2025-01-20T09:16:25Z

Pinging code owners:

exporter/prometheusremotewrite: @Aneurysm9 @rapphil @dashpole @ArthurSens

See Adding Labels via Comments if you do not have permissions to add labels yourself.

dashpole · 2025-01-30T18:55:58Z

Do you actually need different timeouts for the whole operation and for each request? Or do you actually just want a timeout for each request (and the overall timeout is preventing retries)?

vigr · 2025-02-12T12:44:42Z

I need each request to be retried individually in case it times out; but the overall operation timeout prevents it.

Each call to prwExporter.PushMetrics supplies a certain amount of metrics data that is then sorted by timestamp (batchTimeSeries) and split into batches of up to max_batch_size_bytes bytes. These are then sent sequentially over HTTP to remote write endpoint - 1 request per batch. If the operation itself times out, all the batches are lost (see example in the logs attached).

vigr added bug Something isn't working needs triage New item requiring triage labels Jan 20, 2025

github-actions bot added the exporter/prometheusremotewrite label Jan 20, 2025

github-actions bot mentioned this issue Jan 21, 2025

Weekly Report: 2025-01-14 - 2025-01-21 #37358

Open

github-actions bot mentioned this issue Jan 28, 2025

Weekly Report: 2025-01-21 - 2025-01-28 #37519

Open

This was referenced Jan 30, 2025

[exporter/prometheusremotewriteexporter] fix config key collision: un-squash client config (issue 37332) #37588

Closed

fix config key collision: un-squash client config (issue 37332) #37593

Open

dashpole self-assigned this Jan 30, 2025

dashpole removed the needs triage New item requiring triage label Jan 30, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Timeout config key collision #37332

Timeout config key collision #37332

vigr commented Jan 20, 2025 •

edited

Loading

github-actions bot commented Jan 20, 2025

dashpole commented Jan 30, 2025

vigr commented Feb 12, 2025 •

edited

Loading

Timeout config key collision #37332

Timeout config key collision #37332

Comments

vigr commented Jan 20, 2025 • edited Loading

Component(s)

What happened?

Description

Steps to Reproduce

Expected Result

Actual Result

Collector version

Environment information

Environment

OpenTelemetry Collector configuration

Log output

Additional context

github-actions bot commented Jan 20, 2025

dashpole commented Jan 30, 2025

vigr commented Feb 12, 2025 • edited Loading

vigr commented Jan 20, 2025 •

edited

Loading

vigr commented Feb 12, 2025 •

edited

Loading