Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[receiver/mongodbatlas] default collection interval results some metrics being not populated #18032

Closed
egeturgay opened this issue Jan 25, 2023 · 5 comments · Fixed by #20439
Closed
Labels

Comments

@egeturgay
Copy link

Component(s)

receiver/mongodbatlas

What happened?

Description

Currently, this receiver is taking the default receiver collection interval from the scraper library which is set to 1 minute.

Mongo Atlas API seems to provide null values for metrics if the start/end time window is 1 minute for these metrics below.
Increasing the collection interval to 3 minutes seem to allow these values to be populated correctly.

Could you please consider making the default to be set to 3 minutes and document the "collection_interval" setting for the receiver? If increasing the default is not ideal, It might worth adding this as a note in the receiver document as It took us considerable time to troubleshoot this and find how to override the default interval.

Listed metrics not having values if a low interval is set:

mongodbatlas.disk.partition.latency.average
mongodbatlas.process.asserts
mongodbatlas.process.cache.io
mongodbatlas.process.cpu.children.normalized.usage.average
mongodbatlas.process.cpu.children.usage.average
mongodbatlas.process.cpu.normalized.usage.average
mongodbatlas.process.cpu.usage.average
mongodbatlas.process.db.document.rate
mongodbatlas.process.db.operations.rate
mongodbatlas.process.db.operations.time
mongodbatlas.process.db.query_executor.scanned
mongodbatlas.process.db.query_targeting.scanned_per_returned
mongodbatlas.process.network.io
mongodbatlas.process.network.requests
mongodbatlas.process.page_faults
mongodbatlas.system.cpu.normalized.usage.average
mongodbatlas.system.cpu.usage.average
mongodbatlas.system.network.io.average
mongodbatlas.system.paging.io.average

Steps to Reproduce

Monitoring for metrics via fileexporter should suggest a metric like "mongodbatlas.process.db.document.rate" is not consistently being exported as the other metrics. You should also be able to see this by hitting the mongo API process metrics and having the start/end time set 1 minute apart.

Expected Result

All metrics to be populated with values

Actual Result

Only 31 metrics are being populated out of 50 without the interval change

Collector version

0.70

Environment information

Environment

OS: Amazon Linux

OpenTelemetry Collector configuration

exporters:
  file:
    path: /log/exporter.log
receivers:
  mongodbatlas/redacted:
    logs:
      enabled: false
      projects:
      - collect_audit_logs: false
        name: redacted
    granularity: PT1M
    # collection_interval: 3m # this undocumented setting (in the receiver) seems to fix the problem for us
service:
  pipelines:
    metrics/mongodbatlas:
      exporters:
      - file
      receivers:
      - mongodbatlas/redacted
  telemetry:
    metrics:
      address: 0.0.0.0:8889
    logs:
      level: debug

Log output

No response

Additional context

Example output from Mongo API when the start/end (interval) is set to 1 minute

    "dataPoints" : [ {
      "timestamp" : "2023-01-24T16:37:35Z",
      "value" : null
    } ],
    "name" : "DOCUMENT_METRICS_RETURNED",
    "units" : "SCALAR_PER_SECOND"
  }, {
    "dataPoints" : [ {
      "timestamp" : "2023-01-24T16:37:35Z",
      "value" : null
    } ],
    "name" : "DOCUMENT_METRICS_INSERTED",
    "units" : "SCALAR_PER_SECOND"
  }
@egeturgay egeturgay added bug Something isn't working needs triage New item requiring triage labels Jan 25, 2023
@github-actions
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@atoulme atoulme removed the needs triage New item requiring triage label Jan 26, 2023
@github-actions
Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Mar 28, 2023
@djaglowski
Copy link
Member

cc @schmikei, WDYT?

@schmikei
Copy link
Contributor

schmikei commented Mar 28, 2023

cc @schmikei, WDYT?

I think at a bare minimum we should document usage of collection_interval parameter since it was not documented in the README.md before. I think that upping the default collection_interval to 3m seems appropriate given its somewhat common that users experience issues OOTB. I ended up validating that the behavior of this seems to match my test environment as well for quick collection_intervals.

@djaglowski
Copy link
Member

Seems worth changing to me. I think we can fairly call this a bug fix since these metrics were never meant to be missing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
4 participants