Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timeout config key collision #37332

Open
vigr opened this issue Jan 20, 2025 · 3 comments · May be fixed by #37593
Open

Timeout config key collision #37332

vigr opened this issue Jan 20, 2025 · 3 comments · May be fixed by #37593
Assignees
Labels

Comments

@vigr
Copy link

vigr commented Jan 20, 2025

Component(s)

exporter/prometheusremotewrite

What happened?

Description

There is a collision of config keys here:

  • exporter/prometheusremotewriteexporter - Timeout for whole operation of PushMetrics collides with Timeout of HTTP requests to Victoria Metrics. The collision makes it impossible to configure different timeouts for whole operation vs each request thus preventing retries in case of timeouts.

Steps to Reproduce

Try to configure both Config.Timout and Config.ClientConfig.Timeout.

Expected Result

The timeouts are different.

Actual Result

The timeouts are equal and have the value given by "timeout" key in config file.

Collector version

v0.117.0,v0.111.0

Environment information

Environment

OS: (e.g., "Ubuntu 20.04")
Compiler(if manually compiled): (e.g., "go 14.2")

OpenTelemetry Collector configuration

receivers:
  otlp:
    protocols:
      http:
        endpoint: "0.0.0.0:4318"
      grpc:
        endpoint: "0.0.0.0:4317"

exporters:
  prometheusremotewrite:
    retry_on_failure:
      enabled: true
      initial_interval: 1s
      max_interval: 3s
      max_elapsed_time: 30s

    timeout: 5s
    endpoint: "http://127.0.0.1:8080/vm_insert"
    resource_to_telemetry_conversion:
      enabled: true

service:
  telemetry:
    metrics:
      level: detailed
      address: "0.0.0.0:8888"
  pipelines:
    metrics:
      receivers:
        - otlp
      exporters:
        - prometheusremotewrite

Log output

2025-02-12T14:58:17.100+0300	info	[email protected]/service.go:164	Setting up own telemetry...
2025-02-12T14:58:17.100+0300	warn	[email protected]/service.go:213	service::telemetry::metrics::address is being deprecated in favor of service::telemetry::metrics::readers
2025-02-12T14:58:17.100+0300	info	telemetry/metrics.go:70	Serving metrics	{"address": "0.0.0.0:8888", "metrics level": "Detailed"}
2025-02-12T14:58:17.101+0300	info	[email protected]/service.go:230	Starting otelcontribcol...	{"Version": "0.117.0-dev", "NumCPU": 8}
2025-02-12T14:58:17.101+0300	info	extensions/extensions.go:39	Starting extensions...
2025-02-12T14:58:17.101+0300	info	[email protected]/otlp.go:112	Starting GRPC server	{"kind": "receiver", "name": "otlp", "data_type": "metrics", "endpoint": "0.0.0.0:4317"}
2025-02-12T14:58:17.102+0300	info	[email protected]/otlp.go:169	Starting HTTP server	{"kind": "receiver", "name": "otlp", "data_type": "metrics", "endpoint": "0.0.0.0:4318"}
2025-02-12T14:58:17.102+0300	info	[email protected]/service.go:253	Everything is ready. Begin running and processing data.
2025-02-12T14:58:27.511+0300	error	internal/queue_sender.go:105	Exporting failed. Dropping data.	{"kind": "exporter", "data_type": "metrics", "name": "prometheusremotewrite", "error": "Permanent error: Permanent error: context deadline exceeded", "dropped_items": 1}
go.opentelemetry.io/collector/exporter/exporterhelper/internal.NewQueueSender.func1
	/go/pkg/mod/go.opentelemetry.io/collector/[email protected]/exporterhelper/internal/queue_sender.go:105
go.opentelemetry.io/collector/exporter/internal/queue.(*Consumers[...]).Start.func1
	/go/pkg/mod/go.opentelemetry.io/collector/[email protected]/internal/queue/consumers.go:43

Additional context

@vigr vigr added bug Something isn't working needs triage New item requiring triage labels Jan 20, 2025
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@dashpole
Copy link
Contributor

Do you actually need different timeouts for the whole operation and for each request? Or do you actually just want a timeout for each request (and the overall timeout is preventing retries)?

@vigr
Copy link
Author

vigr commented Feb 12, 2025

I need each request to be retried individually in case it times out; but the overall operation timeout prevents it.

Each call to prwExporter.PushMetrics supplies a certain amount of metrics data that is then sorted by timestamp (batchTimeSeries) and split into batches of up to max_batch_size_bytes bytes. These are then sent sequentially over HTTP to remote write endpoint - 1 request per batch. If the operation itself times out, all the batches are lost (see example in the logs attached).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment