Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SplunkHecReceiver] Add resource configuration for custom metadata #21466

Closed
splunkericl opened this issue May 2, 2023 · 20 comments
Closed

[SplunkHecReceiver] Add resource configuration for custom metadata #21466

splunkericl opened this issue May 2, 2023 · 20 comments

Comments

@splunkericl
Copy link
Contributor

Is your feature request related to a problem? Please describe.
As an user of SplunkHecReceiver, I want to add my own custom metadata as part of the resources for the log records and metrics. Our application requires these metadata to identify which receiver is sending data in the processor and observe metrics on our processor.

Describe the solution you'd like
A new config option is added:

resources map[string]any{}

When log records are created, what is specified in the resources are appended as part of the resources.

Describe alternatives you've considered
Adding stanza libs similar to filelogreceiver and syslogreceiver. However, these operators can't be added easily on top of the existing http server.

@splunkericl
Copy link
Contributor Author

@atoulme @dmitryax I submitted a PR to show my suggested approach. Please let me know if there is a better way.

@dmitryax
Copy link
Member

dmitryax commented May 2, 2023

@splunkericl not sure why do we need to add this functionality to the receiver. You can add any attributes with additional processor in your pipeline:

  resource:
    attributes:
      - action: insert
        key: extra_attr
        value: attr_value

That the purpose of the resource processor

@splunkericl
Copy link
Contributor Author

However, resource processor will apply to every event. We are interested to know which events come from what receivers. Our pipeline can have data from different receivers and we want the unique resource attribute for each receiver. The ultimate use case is showing the data topology going through.

@atoulme
Copy link
Contributor

atoulme commented May 2, 2023

Can you define different pipelines with different processors?

@splunkericl
Copy link
Contributor Author

So an example pipeline:
Pipeline 1:

  • receivers: [HEC, Syslog, S2S]
    • Each receiver sends an identifier of name in resources
  • processors: custom implemented processor. It will:
    • extract the name identifier and observe metrics.
    • extract the kind of receiver that sends these events and observe metrics.
    • does some other processings as well
  • exporters: [HEC]

@atoulme
Copy link
Contributor

atoulme commented May 2, 2023

There is no difference from us between having a pipeline with 3 receivers or 3 pipelines with one receiver. It helps with metrics since each pipeline gets its own metrics iirc.

The resource processor extends the attributes processor, so you should be able in any case to express a match on the data flowing through the pipeline to tag selectively based on the data properties: https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/processor/attributesprocessor/README.md

@dmitryax dmitryax transferred this issue from open-telemetry/opentelemetry-collector May 3, 2023
@dmitryax
Copy link
Member

dmitryax commented May 3, 2023

Most of the receivers set Scope Name field on the emitted data like otelcol/apache (the format will likely be changed based once #21382 is resolved). We can make sure the Splunk HEC receiver also sets the Scope Name as well. Then you can use your custom processor to create any resource attributes from that scope name. How does it sound?

@atoulme
Copy link
Contributor

atoulme commented May 3, 2023

The resource processor extends the attributes processor, so you should be able in any case to express a match on the data flowing through the pipeline to tag selectively based on the data properties: https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/processor/attributesprocessor/README.md

I was wrong, sorry. The resource processor does not allow matching. It's likely the transform processor would be more versatile for your use case.

We can make sure the Splunk HEC receiver also sets the Scope Name as well.

I checked and we don't currently set the scope name in the HEC receiver, that can certainly be added.

@splunkericl
Copy link
Contributor Author

Most of the receivers set Scope Name field on the emitted data like otelcol/apache (the format will likely be changed based once #21382 is resolved). We can make sure the Splunk HEC receiver also sets the Scope Name as well. Then you can use your custom processor to create any resource attributes from that scope name. How does it sound?

This helps to identify which receiver it comes from. But we still need one metadata, datasetName, that is specific to our business logic. We can have multiple pipelines each with a HEC receiver but they would have different datasetName.

@dmitryax
Copy link
Member

dmitryax commented May 3, 2023

This helps to identify which receiver it comes from. But we still need one metadata, datasetName, that is specific to our business logic. We can have multiple pipelines each with a HEC receiver but they would have different datasetName.

So you want to have different instances of HEC receivers in one pipeline?

@splunkericl
Copy link
Contributor Author

So you want to have different instances of HEC receivers in one pipeline?

That is also a possibility in the future. But for today's use case, our processor is shared among different pipeline so it needs a way to identify which HEC receiver is sending the data. So an example is:

  • log/pipeline_1:
    • receivers:
      • splunkhecreceiver/id1:
        • datasetName: a
    • processors
      • customProcessor
  • log/pipeline_2:
    • receivers:
      • splunkhecreceiver/id2:
        • datasetName: b
    • processors
      • customProcessor

If the processor only checks the scope name, it wouldn't know if it is id1 or id2 sending the data, correct? Unless there are other metadata that let us identify them that I am missing?

@dmitryax
Copy link
Member

dmitryax commented May 3, 2023

If you already have several pipelines, why don't just add different resource processors in each of them?

log/pipeline_1:
  receivers:
    - splunkhec/id1
  processors:
    - resource/a
    - customProcessor
log/pipeline_2:
  receivers:
    - splunkhec/id2
  processors:
    - resource/b
    - customProcessor

where

processors:
  resource/a:
    attributes:
      - action: insert
        key: datasetName
        value: a
  resource/b:
    attributes:
      - action: insert
        key: datasetName
        value: b

@splunkericl
Copy link
Contributor Author

This would work if we only have one receiver but the pipeline can have other receivers too:

log/pipeline_1:
  receivers:
    - splunkhec/id1
      - datasetName: a
    - syslog
      - datasetName: b
    - s2s
      - datasetName: c 
  processors:
    - resource/a
    - customProcessor

And since resource processor doesn't support conditional statements, it would amend dataset name for all log records.

@dmitryax
Copy link
Member

dmitryax commented May 3, 2023

Ok, then instead of resource processor you can use transform processor

transform:
  log_statements:
  - context: log
    statements:
      - set(attributes["datasetName"], "a") where instrumentation_scope.name == "otelcol/splunkhec"
      - set(attributes["datasetName"], "b") where instrumentation_scope.name == "otelcol/syslog"
      - set(attributes["datasetName"], "c") where instrumentation_scope.name == "otelcol/s2s"

@splunkericl
Copy link
Contributor Author

ok. This should work.

Just to double check, set(attributes["datasetName"], "a") sets the resource attributes of plog.ResourceLogs but not the attributes of a plog.LogRecord, correct?

And will the instrumentation scope change for hecreceiver as part of 21382?

@dmitryax
Copy link
Member

dmitryax commented May 4, 2023

Just to double check, set(attributes["datasetName"], "a") sets the resource attributes of plog.ResourceLogs but not the attributes of a plog.LogRecord, correct?

It sets attributes on a plog.LogRecord, but it doesn't matter if you use hec exporter. If you need them on ResourAttributes for any other reason, you can add https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/groupbyattrsprocessor

@splunkericl
Copy link
Contributor Author

It sets attributes on a plog.LogRecord, but it doesn't matter if you use hec exporter.

If we put these on attributes inside plog.LogRecord, we have to remove them somewhere inside the processor. Or else these attributes will make it to the destinations(e.g: splunk hec exporter puts attributes on fields) and show up to the customers.

Group by processor doesn't seem to remove log attributes but simply group these records?

@dmitryax
Copy link
Member

Yes, you would have to remove them anyway. It doesn't matter whether they are set on resources or log records. transform can help with dropping them.

@github-actions
Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

@github-actions github-actions bot added the Stale label Jul 26, 2023
@github-actions
Copy link
Contributor

This issue has been closed as inactive because it has been stale for 120 days with no activity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants