Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log flooded with string to non-string with op EQ #14467

Closed
atoulme opened this issue Sep 24, 2022 · 13 comments
Closed

Log flooded with string to non-string with op EQ #14467

atoulme opened this issue Sep 24, 2022 · 13 comments
Labels
bug Something isn't working priority:p2 Medium processor/transform Transform processor

Comments

@atoulme
Copy link
Contributor

atoulme commented Sep 24, 2022

What happened?

Description

Upgraded to 0.60.0.
Using a transform processor with a clause such as:

    transform:
      metrics:
        queries:
          - set(attributes["foo"], "bar") where attributes["foo"] == "foobar"

Steps to Reproduce

Set up a pipeline using a transform processor that uses an eq operator.

Expected Result

The pipeline should perform correctly.

Actual Result

Logs are flooded with:

string to non-string with op EQ
string to non-string with op EQ
string to non-string with op EQ
string to non-string with op EQ
string to non-string with op EQ
string to non-string with op EQ
string to non-string with op EQ
string to non-string with op EQ

Collector version

0.60.0

Environment information

Environment

OS: (e.g., "Ubuntu 20.04")
Compiler(if manually compiled): (e.g., "go 14.2")

OpenTelemetry Collector configuration

receivers:
    splunk_hec:

exporters:
    splunk_hec/metrics:
        # Splunk HTTP Event Collector token.
        token: "00000000-0000-0000-0000-0000000000000"
        # URL to a Splunk instance to send data to.
        endpoint: "https://splunk:8088/services/collector"
        # Optional Splunk source: https://docs.splunk.com/Splexicon:Source
        source: "app"
        # Optional Splunk source type: https://docs.splunk.com/Splexicon:Sourcetype
        sourcetype: "telegraf"
        # Splunk index, optional name of the Splunk index targeted.
        index: "metrics"
        # Maximum HTTP connections to use simultaneously when sending data. Defaults to 100.
        max_connections: 20
        # Whether to disable gzip compression over HTTP. Defaults to false.
        disable_compression: false
        # HTTP timeout when sending data. Defaults to 10s.
        timeout: 10s
        # Whether to skip checking the certificate of the HEC endpoint when sending data over HTTPS. Defaults to false.
        # For this demo, we use a self-signed certificate on the Splunk docker instance, so this flag is set to true.
        insecure_skip_verify: true

processors:
    batch:
    transform:
      metrics:
        queries:
          - set(attributes["foo"], "bar") where attributes["foo"] == "foobar"
extensions:
    health_check:
      endpoint: 0.0.0.0:13133
    pprof:
      endpoint: :1888
    zpages:
      endpoint: :55679

service:
    telemetry:
      logs:
        level: "debug"
    extensions: [pprof, zpages, health_check]
    pipelines:
      metrics:
        receivers: [splunk_hec]
        processors: [batch, transform]
        exporters: [splunk_hec/metrics]

Log output

No response

Additional context

No response

@atoulme atoulme added bug Something isn't working needs triage New item requiring triage labels Sep 24, 2022
@dmitryax dmitryax added priority:p2 Medium processor/transform Transform processor and removed needs triage New item requiring triage labels Sep 26, 2022
@github-actions
Copy link
Contributor

Pinging code owners: @TylerHelmuth @kentquirk @bogdandrutu. See Adding Labels via Comments if you do not have permissions to add labels yourself.

@TylerHelmuth
Copy link
Member

haha in trying to help users we destroyed their logs. Should definitely be logger.Debug.

@dmitryax
Copy link
Member

dmitryax commented Sep 26, 2022

@atoulme please add more context about the incoming data. I assume the metrics just don't always have the "foo" attribute, right? I don't think the output is incorrect in that case. Setting debug log level for all the logs can also hide other potential issues that user should be aware of, so I'm not sure if that's the right solution.

I believe the expression should have an additional condition verifying that metrics have foo attribute. Something like this:

set(attributes["foo"], "bar") where "foo" in attributes && attributes["foo"] == "foobar"

@TylerHelmuth does OTTL currently support something like this?

@atoulme
Copy link
Contributor Author

atoulme commented Sep 26, 2022

Yes, specifically set a pipeline that doesn't have this attribute set to reproduce the issue.
At the very least, this is a breaking change behavior that wasn't documented as part of the release and produces surprising output for end users, flooding their logs. I have not identified a workaround to address this behavior, please help.

@kentquirk
Copy link
Member

OTTL does short-circuit evaluation and so something like
set(attributes["foo"], "bar") where attributes["foo"] != nil && attributes["foo"] == "foobar" should prevent the error for the time being.

@TylerHelmuth
Copy link
Member

@dmitryax In this situation the invalidComparison func exists to ensure that a boolean always bubbles up based on the comparison rules. We didn't want to silently use the "default" return value of false in these invalid situations, which is where the print statement came from.

I was initially thinking this log, and others like it, would be Debug level so that customers can set their log level to debug when trying to troubleshoot issues and then see this problem. I thought Debug log level printed all log statements; does it exclude Error, Warning, and Info?

@dmitryax
Copy link
Member

@dmitryax In this situation the invalidComparison func exists to ensure that a boolean always bubbles up based on the comparison rules. We didn't want to silently use the "default" return value of false in these invalid situations, which is where the print statement came from.

I was initially thinking this log, and others like it, would be Debug level so that customers can set their log level to debug when trying to troubleshoot issues and then see this problem.

Just using print is not the best option. I agree that this particular error should be of debug level. But we need to bring logger there as @atoulme did.

I thought Debug log level printed all log statements; does it exclude Error, Warning, and Info?

If a user sets telemetry log level to Debug in their config, logs of all levels that are equal to debug or higher will be printed, it's everything excluding trace level

@TylerHelmuth
Copy link
Member

Just using print is not the best option. I agree that this particular error should be of debug level. But we need to bring logger there as @atoulme did.

Agreed, his PR is what we need. @kentquirk and I have already provided feedback for next steps with that PR (member functions)

@atoulme
Copy link
Contributor Author

atoulme commented Sep 27, 2022

OTTL does short-circuit evaluation and so something like set(attributes["foo"], "bar") where attributes["foo"] != nil && attributes["foo"] == "foobar" should prevent the error for the time being.

We tried this - it appears && is not supported and should be replaced with and.

With set(attributes["foo"], "bar") where attributes["foo"] != nil and attributes["foo"] == "foobar", the workaround works properly. Thanks!

@atoulme
Copy link
Contributor Author

atoulme commented Sep 27, 2022

An update further on the workaround. Upon using the workaround, the test case above complies and there are no longer any logs generated.

However, please see this example:

receivers:
    splunk_hec:

exporters:
    logging:

processors:
    batch:
    attributes/example:
      actions:
        - key: foo
          action: upsert
          value: "bar"
    transform:
      logs:
        queries:
          - set(attributes["foo"], "foobar") where attributes["foo"] != nil and attributes["foo"] == "bar"

extensions:
    health_check:
      endpoint: 0.0.0.0:13133
    pprof:
      endpoint: :1888
    zpages:
      endpoint: :55679

service:
    extensions: [pprof, zpages, health_check]
    pipelines:
      logs:
        receivers: [splunk_hec]
        processors: [batch, attributes/example, transform]
        exporters: [logging]

In this example, we make sure that the value of attributes["foo"] is not nil. This generates logs:

string to non-string with op NE
string to non-string with op NE
string to non-string with op NE
string to non-string with op NE
string to non-string with op NE
string to non-string with op NE
string to non-string with op NE
string to non-string with op NE
string to non-string with op NE
string to non-string with op NE
string to non-string with op NE

This is because the first check is now comparing a string and nil, which is considered a violation of comparison types.

@dmitryax
Copy link
Member

Ok looks like != nil is incorrect way to handle this use case. We need a key in map or map has key functionality in OTTL.

@TylerHelmuth
Copy link
Member

Ya or to relax on what we consider an invalid comparison. @kentquirk what do you think?

@atoulme
Copy link
Contributor Author

atoulme commented Sep 30, 2022

This is now fixed in 0.61.0. Closing.

@atoulme atoulme closed this as completed Sep 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working priority:p2 Medium processor/transform Transform processor
Projects
None yet
Development

No branches or pull requests

4 participants