[SplunkHecReceiver] Add resource configuration for custom metadata #21466

splunkericl · 2023-05-02T18:30:29Z

Is your feature request related to a problem? Please describe.
As an user of SplunkHecReceiver, I want to add my own custom metadata as part of the resources for the log records and metrics. Our application requires these metadata to identify which receiver is sending data in the processor and observe metrics on our processor.

Describe the solution you'd like
A new config option is added:

resources map[string]any{}

When log records are created, what is specified in the resources are appended as part of the resources.

Describe alternatives you've considered
Adding stanza libs similar to filelogreceiver and syslogreceiver. However, these operators can't be added easily on top of the existing http server.

The text was updated successfully, but these errors were encountered:

splunkericl · 2023-05-02T22:46:28Z

@atoulme @dmitryax I submitted a PR to show my suggested approach. Please let me know if there is a better way.

dmitryax · 2023-05-02T22:54:07Z

@splunkericl not sure why do we need to add this functionality to the receiver. You can add any attributes with additional processor in your pipeline:

  resource:
    attributes:
      - action: insert
        key: extra_attr
        value: attr_value

That the purpose of the resource processor

splunkericl · 2023-05-02T23:01:34Z

However, resource processor will apply to every event. We are interested to know which events come from what receivers. Our pipeline can have data from different receivers and we want the unique resource attribute for each receiver. The ultimate use case is showing the data topology going through.

atoulme · 2023-05-02T23:16:33Z

Can you define different pipelines with different processors?

splunkericl · 2023-05-02T23:24:45Z

So an example pipeline:
Pipeline 1:

receivers: [HEC, Syslog, S2S]
- Each receiver sends an identifier of name in resources
processors: custom implemented processor. It will:
- extract the name identifier and observe metrics.
- extract the kind of receiver that sends these events and observe metrics.
- does some other processings as well
exporters: [HEC]

atoulme · 2023-05-02T23:31:25Z

There is no difference from us between having a pipeline with 3 receivers or 3 pipelines with one receiver. It helps with metrics since each pipeline gets its own metrics iirc.

The resource processor extends the attributes processor, so you should be able in any case to express a match on the data flowing through the pipeline to tag selectively based on the data properties: https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/processor/attributesprocessor/README.md

dmitryax · 2023-05-03T02:43:04Z

Most of the receivers set Scope Name field on the emitted data like otelcol/apache (the format will likely be changed based once #21382 is resolved). We can make sure the Splunk HEC receiver also sets the Scope Name as well. Then you can use your custom processor to create any resource attributes from that scope name. How does it sound?

atoulme · 2023-05-03T04:13:03Z

The resource processor extends the attributes processor, so you should be able in any case to express a match on the data flowing through the pipeline to tag selectively based on the data properties: https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/processor/attributesprocessor/README.md

I was wrong, sorry. The resource processor does not allow matching. It's likely the transform processor would be more versatile for your use case.

We can make sure the Splunk HEC receiver also sets the Scope Name as well.

I checked and we don't currently set the scope name in the HEC receiver, that can certainly be added.

splunkericl · 2023-05-03T17:18:10Z

Most of the receivers set Scope Name field on the emitted data like otelcol/apache (the format will likely be changed based once #21382 is resolved). We can make sure the Splunk HEC receiver also sets the Scope Name as well. Then you can use your custom processor to create any resource attributes from that scope name. How does it sound?

This helps to identify which receiver it comes from. But we still need one metadata, datasetName, that is specific to our business logic. We can have multiple pipelines each with a HEC receiver but they would have different datasetName.

dmitryax · 2023-05-03T18:08:09Z

This helps to identify which receiver it comes from. But we still need one metadata, datasetName, that is specific to our business logic. We can have multiple pipelines each with a HEC receiver but they would have different datasetName.

So you want to have different instances of HEC receivers in one pipeline?

splunkericl · 2023-05-03T18:35:36Z

So you want to have different instances of HEC receivers in one pipeline?

That is also a possibility in the future. But for today's use case, our processor is shared among different pipeline so it needs a way to identify which HEC receiver is sending the data. So an example is:

log/pipeline_1:
- receivers:
  - splunkhecreceiver/id1:
    - datasetName: a
- processors
  - customProcessor
log/pipeline_2:
- receivers:
  - splunkhecreceiver/id2:
    - datasetName: b
- processors
  - customProcessor

If the processor only checks the scope name, it wouldn't know if it is id1 or id2 sending the data, correct? Unless there are other metadata that let us identify them that I am missing?

dmitryax · 2023-05-03T19:06:39Z

If you already have several pipelines, why don't just add different resource processors in each of them?

log/pipeline_1:
  receivers:
    - splunkhec/id1
  processors:
    - resource/a
    - customProcessor
log/pipeline_2:
  receivers:
    - splunkhec/id2
  processors:
    - resource/b
    - customProcessor

where

processors:
  resource/a:
    attributes:
      - action: insert
        key: datasetName
        value: a
  resource/b:
    attributes:
      - action: insert
        key: datasetName
        value: b

splunkericl · 2023-05-03T19:38:52Z

This would work if we only have one receiver but the pipeline can have other receivers too:

log/pipeline_1:
  receivers:
    - splunkhec/id1
      - datasetName: a
    - syslog
      - datasetName: b
    - s2s
      - datasetName: c 
  processors:
    - resource/a
    - customProcessor

And since resource processor doesn't support conditional statements, it would amend dataset name for all log records.

dmitryax · 2023-05-03T22:08:47Z

Ok, then instead of resource processor you can use transform processor

transform:
  log_statements:
  - context: log
    statements:
      - set(attributes["datasetName"], "a") where instrumentation_scope.name == "otelcol/splunkhec"
      - set(attributes["datasetName"], "b") where instrumentation_scope.name == "otelcol/syslog"
      - set(attributes["datasetName"], "c") where instrumentation_scope.name == "otelcol/s2s"

splunkericl · 2023-05-04T16:42:16Z

ok. This should work.

Just to double check, set(attributes["datasetName"], "a") sets the resource attributes of plog.ResourceLogs but not the attributes of a plog.LogRecord, correct?

And will the instrumentation scope change for hecreceiver as part of 21382?

dmitryax · 2023-05-04T17:31:40Z

Just to double check, set(attributes["datasetName"], "a") sets the resource attributes of plog.ResourceLogs but not the attributes of a plog.LogRecord, correct?

It sets attributes on a plog.LogRecord, but it doesn't matter if you use hec exporter. If you need them on ResourAttributes for any other reason, you can add https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/groupbyattrsprocessor

splunkericl · 2023-05-04T17:56:47Z

It sets attributes on a plog.LogRecord, but it doesn't matter if you use hec exporter.

If we put these on attributes inside plog.LogRecord, we have to remove them somewhere inside the processor. Or else these attributes will make it to the destinations(e.g: splunk hec exporter puts attributes on fields) and show up to the customers.

Group by processor doesn't seem to remove log attributes but simply group these records?

dmitryax · 2023-05-26T22:31:07Z

Yes, you would have to remove them anyway. It doesn't matter whether they are set on resources or log records. transform can help with dropping them.

github-actions · 2023-07-26T03:33:31Z

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

github-actions · 2023-09-24T05:20:24Z

This issue has been closed as inactive because it has been stale for 120 days with no activity.

splunkericl mentioned this issue May 2, 2023

Add resource configuration in splunkhecreceiver #21414

Closed

dmitryax transferred this issue from open-telemetry/opentelemetry-collector May 3, 2023

github-actions bot added the Stale label Jul 26, 2023

github-actions bot added the closed as inactive label Sep 24, 2023

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Sep 24, 2023

mithunbelur mentioned this issue Oct 4, 2024

Processor needs to know the source of its data/signal like type and instance of receiver. #35596

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SplunkHecReceiver] Add resource configuration for custom metadata #21466

[SplunkHecReceiver] Add resource configuration for custom metadata #21466

splunkericl commented May 2, 2023

splunkericl commented May 2, 2023

dmitryax commented May 2, 2023 •

edited

Loading

splunkericl commented May 2, 2023

atoulme commented May 2, 2023

splunkericl commented May 2, 2023

atoulme commented May 2, 2023

dmitryax commented May 3, 2023 •

edited

Loading

atoulme commented May 3, 2023

splunkericl commented May 3, 2023

dmitryax commented May 3, 2023

splunkericl commented May 3, 2023

dmitryax commented May 3, 2023 •

edited

Loading

splunkericl commented May 3, 2023

dmitryax commented May 3, 2023 •

edited

Loading

splunkericl commented May 4, 2023

dmitryax commented May 4, 2023

splunkericl commented May 4, 2023

dmitryax commented May 26, 2023

github-actions bot commented Jul 26, 2023

github-actions bot commented Sep 24, 2023

[SplunkHecReceiver] Add resource configuration for custom metadata #21466

[SplunkHecReceiver] Add resource configuration for custom metadata #21466

Comments

splunkericl commented May 2, 2023

splunkericl commented May 2, 2023

dmitryax commented May 2, 2023 • edited Loading

splunkericl commented May 2, 2023

atoulme commented May 2, 2023

splunkericl commented May 2, 2023

atoulme commented May 2, 2023

dmitryax commented May 3, 2023 • edited Loading

atoulme commented May 3, 2023

splunkericl commented May 3, 2023

dmitryax commented May 3, 2023

splunkericl commented May 3, 2023

dmitryax commented May 3, 2023 • edited Loading

splunkericl commented May 3, 2023

dmitryax commented May 3, 2023 • edited Loading

splunkericl commented May 4, 2023

dmitryax commented May 4, 2023

splunkericl commented May 4, 2023

dmitryax commented May 26, 2023

github-actions bot commented Jul 26, 2023

github-actions bot commented Sep 24, 2023

dmitryax commented May 2, 2023 •

edited

Loading

dmitryax commented May 3, 2023 •

edited

Loading

dmitryax commented May 3, 2023 •

edited

Loading

dmitryax commented May 3, 2023 •

edited

Loading