Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DSET-4558: Add metrics #61

Merged
merged 9 commits into from
Nov 22, 2023
Merged

DSET-4558: Add metrics #61

merged 9 commits into from
Nov 22, 2023

Conversation

martin-majlis-s1
Copy link
Collaborator

@martin-majlis-s1 martin-majlis-s1 commented Nov 14, 2023

Jira Link: https://sentinelone.atlassian.net/browse/DSET-4558

🥅 Goal

Introduce metrics so that we have better visibility into what is going on there.

🛠️ Solution

There is customer request to provide metrics related to the processing - open-telemetry/opentelemetry-collector-contrib#27650. Initially I was trying to solve it by implementing it in the datasetexporter - open-telemetry/opentelemetry-collector-contrib#27487 - but I believe, that it's better to implement this directly in the library.

Statistics are now part of separate structure, that holds raw numbers that are shown in logs as well as metrics counters (if they should be collected). Otel metrics are collected if the optional parameter meter is passed as not nil. This may be configurable in the exporter itself.

To make it work it's also important to pass additional parameter to the otel collector - --feature-gates=telemetry.useOtelForInternalMetrics.

🏫 Testing

To be able to test it:

  1. Modify opentelemetry-collector-contrib to use the modified library and do not use cached layers:
diff --git a/Makefile b/Makefile
index 30594c1b7a..6419c4d0db 100644
--- a/Makefile
+++ b/Makefile
@@ -208,7 +208,7 @@ run:
 docker-component: check-component
        GOOS=linux GOARCH=amd64 $(MAKE) $(COMPONENT)
        cp ./bin/$(COMPONENT)_linux_amd64 ./cmd/$(COMPONENT)/$(COMPONENT)
-       docker build -t $(COMPONENT) ./cmd/$(COMPONENT)/
+       docker build --no-cache -t $(COMPONENT) ./cmd/$(COMPONENT)/
        rm ./cmd/$(COMPONENT)/$(COMPONENT)

 .PHONY: check-component
diff --git a/cmd/otelcontribcol/go.mod b/cmd/otelcontribcol/go.mod
index 73c2146146..fda73dc204 100644
--- a/cmd/otelcontribcol/go.mod
+++ b/cmd/otelcontribcol/go.mod
@@ -1149,3 +1149,5 @@ replace github.com/open-telemetry/opentelemetry-collector-contrib/pkg/translator
 replace github.com/open-telemetry/opentelemetry-collector-contrib/pkg/translator/skywalking => ../../pkg/translator/skywalking

 replace github.com/open-telemetry/opentelemetry-collector-contrib/internal/collectd => ../../internal/collectd
+
+replace github.com/scalyr/dataset-go => ../../../dataset-go
\ No newline at end of file
diff --git a/exporter/datasetexporter/go.mod b/exporter/datasetexporter/go.mod
index c8181b324d..2b15007d46 100644
--- a/exporter/datasetexporter/go.mod
+++ b/exporter/datasetexporter/go.mod
@@ -67,3 +67,5 @@ replace github.com/open-telemetry/opentelemetry-collector-contrib/pkg/pdatatest
 replace github.com/open-telemetry/opentelemetry-collector-contrib/pkg/pdatautil => ../../pkg/pdatautil

 replace github.com/open-telemetry/opentelemetry-collector-contrib/pkg/golden => ../../pkg/golden
+
+replace github.com/scalyr/dataset-go => ../../../dataset-go
diff --git a/go.mod b/go.mod
index b845553002..a0ead72a6b 100644
--- a/go.mod
+++ b/go.mod
@@ -1133,3 +1133,6 @@ replace github.com/open-telemetry/opentelemetry-collector-contrib/receiver/azure
 replace github.com/open-telemetry/opentelemetry-collector-contrib/pkg/golden => ./pkg/golden

 replace github.com/open-telemetry/opentelemetry-collector-contrib/internal/collectd => ./internal/collectd
+
+
+replace github.com/scalyr/dataset-go => ../dataset-go/
  1. Use make docker-otelcontribcol to build docker images
  2. Modify opentelemetry-demo to use built image instead of the official one:
diff --cc docker-compose.yml
index 001f7c8,f00e675..0000000
--- a/docker-compose.yml
+++ b/docker-compose.yml
@@@ -646,16 -644,18 +646,19 @@@ services

    # OpenTelemetry Collector
    otelcol:
-    image: otel/opentelemetry-collector-contrib:0.86.0
+    image: otelcontribcol:latest
     container_name: otel-col
            memory: 125M
      restart: unless-stopped
-    command: [ "--config=/etc/otelcol-config.yml", "--config=/etc/otelcol-config-extras.yml" ]
+    command: [ "--config=/etc/otelcol-config.yml", "--config=/etc/otelcol-config-extras.yml", "--feature-gates=telemetry.useOtelForInternalMetrics" ]
      volumes:
        - ./src/otelcollector/otelcol-config.yml:/etc/otelcol-config.yml
  1. Modify src/otelcollector/otelcol-config.yml to include datasetexporter configuration.
  2. Run it - docker compose up --abort-on-container-exit
  3. Check logs that OTel metrics are collected:
otel-col                          | 2023-11-21T14:22:18.993Z    info    [email protected]/datasetexporter.go:34   Creating new DataSetExporter    {"kind": "exporter", "data_type": "traces", "name": "dataset", "config": "DatasetURL: https://foo.scalyr.com; BufferSettings: {MaxLifetime:5s GroupBy:[resource_name resource_type] RetryInitialInterval:5s RetryMaxInterval:30s RetryMaxElapsedTime:5m0s RetryShutdownTimeout:30s}; LogsSettings: {ExportResourceInfo:true ExportResourcePrefix:resource.attributes. ExportScopeInfo:true ExportScopePrefix:scope.attributes. DecomposeComplexMessageField:false DecomposedComplexMessagePrefix:body.map. exportSettings:{ExportSeparator:. ExportDistinguishingSuffix:_}}; TracesSettings: {exportSettings:{ExportSeparator:. ExportDistinguishingSuffix:_}}; ServerHostSettings: {UseHostName:true ServerHost:martin}; RetrySettings: {Enabled:true InitialInterval:5s RandomizationFactor:0.5 Multiplier:1.5 MaxInterval:30s MaxElapsedTime:5m0s}; QueueSettings: {Enabled:true NumConsumers:10 QueueSize:1000 StorageID:<nil>}; TimeoutSettings: {Timeout:5s}", "entity": "logs"}
otel-col                          | 2023-11-21T14:22:18.993Z    info    client/client.go:159    Using User-Agent:       {"kind": "exporter", "data_type": "traces", "name": "dataset", "User-Agent": "dataset-go;0.16.0;2023-11-21;cad2b32c-81db-4690-b4bd-a5a53a11c2ec;linux;amd64;10;OtelCollector;0.88.0-dev;logs"}
otel-col                          | 2023-11-21T14:22:19.016Z    info    [email protected]/exporter.go:275  Development component. May change in the future.        {"kind": "exporter", "data_type": "logs", "name": "debug"}
otel-col                          | 2023-11-21T14:22:19.016Z    info    [email protected]/datasetexporter.go:34   Creating new DataSetExporter    {"kind": "exporter", "data_type": "logs", "name": "dataset", "config": "DatasetURL: https://foo.scalyr.com; BufferSettings: {MaxLifetime:5s GroupBy:[resource_name resource_type] RetryInitialInterval:5s RetryMaxInterval:30s RetryMaxElapsedTime:5m0s RetryShutdownTimeout:30s}; LogsSettings: {ExportResourceInfo:true ExportResourcePrefix:resource.attributes. ExportScopeInfo:true ExportScopePrefix:scope.attributes. DecomposeComplexMessageField:false DecomposedComplexMessagePrefix:body.map. exportSettings:{ExportSeparator:. ExportDistinguishingSuffix:_}}; TracesSettings: {exportSettings:{ExportSeparator:. ExportDistinguishingSuffix:_}}; ServerHostSettings: {UseHostName:true ServerHost:martin}; RetrySettings: {Enabled:true InitialInterval:5s RandomizationFactor:0.5 Multiplier:1.5 MaxInterval:30s MaxElapsedTime:5m0s}; QueueSettings: {Enabled:true NumConsumers:10 QueueSize:1000 StorageID:<nil>}; TimeoutSettings: {Timeout:5s}", "entity": "logs"}
otel-col                          | 2023-11-21T14:22:19.017Z    info    client/client.go:109    Using config:   {"kind": "exporter", "data_type": "logs", "name": "dataset", "config": "Endpoint: https://foo.scalyr.com, Tokens: (WriteLog: true, ReadLog: false, WriteConfig: false, ReadConfig: false), BufferSettings: (MaxLifetime: 5s, MaxSize: 6225920, GroupBy: [resource_name resource_type], RetryRandomizationFactor: 0.500000, RetryMultiplier: 1.500000, RetryInitialInterval: 5s, RetryMaxInterval: 30s, RetryMaxElapsedTime: 5m0s, RetryShutdownTimeout: 30s), ServerHostSettings: (UseHostName: true, ServerHost: martin), Debug: (false)", "version": "0.16.0", "releaseDate": "2023-11-21"}
otel-col                          | 2023-11-21T14:22:19.017Z    info    client/client.go:124    Adjusted config:        {"kind": "exporter", "data_type": "logs", "name": "dataset", "config": "Endpoint: https://foo.scalyr.com, Tokens: (WriteLog: true, ReadLog: false, WriteConfig: false, ReadConfig: false), BufferSettings: (MaxLifetime: 5s, MaxSize: 6225920, GroupBy: [resource_name resource_type logfile serverHost], RetryRandomizationFactor: 0.500000, RetryMultiplier: 1.500000, RetryInitialInterval: 5s, RetryMaxInterval: 30s, RetryMaxElapsedTime: 5m0s, RetryShutdownTimeout: 30s), ServerHostSettings: (UseHostName: true, ServerHost: martin), Debug: (false)"}
otel-col                          | 2023-11-21T14:22:19.017Z    info    client/client.go:159    Using User-Agent:       {"kind": "exporter", "data_type": "logs", "name": "dataset", "User-Agent": "dataset-go;0.16.0;2023-11-21;3cea1685-8354-40fa-99bd-eb3bac341b41;linux;amd64;10;OtelCollector;0.88.0-dev;logs"}
otel-col                          | 2023-11-21T14:22:19.017Z    info    statistics/statistics.go:71     Initialising statistics {"kind": "exporter", "data_type": "logs", "name": "dataset"}
otel-col                          | 2023-11-21T14:22:19.017Z    info    statistics/statistics.go:106    OTel metrics WILL be collected  {"kind": "exporter", "data_type": "logs", "name": "dataset"}
otel-col                          | 2023-11-21T14:22:19.017Z    info    statistics/statistics.go:156    Histogram buckets for payload size:     {"kind": "exporter", "data_type": "logs", "name": "dataset", "buckets": [6225.92, 62259.200000000004, 311296, 622592, 1245184, 2490368, 3735552, 5292032, 6225920, 6848512.000000001, 12451840]}
otel-col                          | 2023-11-21T14:22:19.017Z    info    statistics/statistics.go:173    Histogram buckets for response times:   {"kind": "exporter", "data_type": "logs", "name": "dataset", "buckets": [4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192]}
  1. Go to grafana: http://localhost:8080/grafana/explore?
  2. Check that metrics are there:
    Screenshot 2023-11-21 at 10 02 32

Screenshot 2023-11-21 at 10 07 46

Screenshot 2023-11-21 at 10 01 59

@codecov-commenter
Copy link

codecov-commenter commented Nov 14, 2023

Codecov Report

Merging #61 (bac4a51) into main (29d4782) will decrease coverage by 0.28%.
The diff coverage is 80.62%.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main      #61      +/-   ##
==========================================
- Coverage   75.84%   75.56%   -0.28%     
==========================================
  Files          11       11              
  Lines        1763     1923     +160     
==========================================
+ Hits         1337     1453     +116     
- Misses        359      391      +32     
- Partials       67       79      +12     
Files Coverage Δ
pkg/client/client.go 85.60% <72.22%> (-1.76%) ⬇️
pkg/client/add_events.go 81.56% <75.86%> (+0.16%) ⬆️
pkg/statistics/statistics.go 81.82% <81.82%> (ø)

@martin-majlis-s1
Copy link
Collaborator Author

martin-majlis-s1 commented Nov 21, 2023

Update: It's here!!!

It looks, that there is some issue with GitHub - https://github.com/scalyr/dataset-go/commits/DSET-4558-use-otel-metrics - I can see commits there, but it's not updated here. :/

Screenshot 2023-11-21 at 11 06 17

@martin-majlis-s1 martin-majlis-s1 merged commit 3113629 into main Nov 22, 2023
10 checks passed
@martin-majlis-s1 martin-majlis-s1 deleted the DSET-4558-use-otel-metrics branch November 22, 2023 08:52
Copy link
Collaborator

@tomaz-s1 tomaz-s1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks 👍


// ProcessingTime is duration of the processing
func (stats QueueStats) ProcessingTime() time.Duration {
return stats.processingTime
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this is overall aggregate value (counter)? Or is it a gauge, value since the last processing or similar?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the amount of time that has passed from receiving first event.

codeboten pushed a commit to open-telemetry/opentelemetry-collector-contrib that referenced this pull request Nov 28, 2023
Upgrade to new version of the library.

This PR is implementing following issues:

* #27650 - metrics are not collected via open telemetry, so they can be
monitored. It's better version of the previous PR #27487 which was not
working.
* #27652 - it's configurable with the `debug` option whether
`session_key` is included or not

Other change is that fields that are specified as part of the `group_by`
configuration are now transferred as part of the session info.

**Link to tracking Issue:** #27650, #27652

**Testing:** 

1. Build docker image - make docker-otelcontribcol
2. Checkout https://github.com/open-telemetry/opentelemetry-demo
3. Update configuration in `docker-compose.yaml` and in the
`src/otelcollector/otelcol-config.yml`:
* In `docker-compose.yaml` switch image to the newly build one in step 1
* In `docker-compose.yaml` enable feature gate for collecting metrics -
`--feature-gates=telemetry.useOtelForInternalMetrics`
* In `src/otelcollector/otelcol-config.yml` enable metrics scraping by
prometheus
* In `src/otelcollector/otelcol-config.yml` add configuration for
dataset
```diff
diff --git a/docker-compose.yml b/docker-compose.yml
index 001f7c8..d7edd0d 100644
--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -646,14 +646,16 @@ services:

   # OpenTelemetry Collector
   otelcol:
-    image: otel/opentelemetry-collector-contrib:0.86.0
+    image: otelcontribcol:latest
     container_name: otel-col
     deploy:
       resources:
         limits:
           memory: 125M
     restart: unless-stopped
-    command: [ "--config=/etc/otelcol-config.yml", "--config=/etc/otelcol-config-extras.yml" ]
+    command: [ "--config=/etc/otelcol-config.yml", "--config=/etc/otelcol-config-extras.yml", "--feature-gates=telemetry.useOtelForInternalMetrics" ]
     volumes:
       - ./src/otelcollector/otelcol-config.yml:/etc/otelcol-config.yml
       - ./src/otelcollector/otelcol-config-extras.yml:/etc/otelcol-config-extras.yml
diff --git a/src/otelcollector/otelcol-config.yml b/src/otelcollector/otelcol-config.yml
index f2568ae..9944562 100644
--- a/src/otelcollector/otelcol-config.yml
+++ b/src/otelcollector/otelcol-config.yml
@@ -15,6 +15,14 @@ receivers:
     targets:
       - endpoint: http://frontendproxy:${env:ENVOY_PORT}

+  prometheus:
+    config:
+      scrape_configs:
+        - job_name: 'otel-collector'
+          scrape_interval: 5s
+          static_configs:
+            - targets: ['0.0.0.0:8888']
+
 exporters:
   debug:
   otlp:
@@ -29,6 +37,22 @@ exporters:
     endpoint: "http://prometheus:9090/api/v1/otlp"
     tls:
       insecure: true
+  logging:
+  dataset:
+    api_key: API_KEY
+    dataset_url: https://SERVER.scalyr.com
+    debug: true
+    buffer:
+      group_by:
+        - resource_name
+        - resource_type
+    logs:
+      export_resource_info_on_event: true
+    server_host:
+      server_host: Martin
+      use_hostname: false
+  dataset/aaa:
+    api_key: API_KEY
+    dataset_url: https://SERVER.scalyr.com
+    debug: true
+    buffer:
+      group_by:
+        - resource_name
+        - resource_type
+    logs:
+      export_resource_info_on_event: true
+    server_host:
+      server_host: MartinAAA
+      use_hostname: false

 processors:
   batch:
@@ -47,6 +71,11 @@ processors:
           - set(description, "") where name == "queueSize"
           # FIXME: remove when this issue is resolved: open-telemetry/opentelemetry-python-contrib#1958
           - set(description, "") where name == "http.client.duration"
+  attributes:
+    actions:
+      - key: otel.demo
+        value: 29446
+        action: upsert

 connectors:
   spanmetrics:
@@ -55,13 +84,13 @@ service:
   pipelines:
     traces:
       receivers: [otlp]
-      processors: [batch]
-      exporters: [otlp, debug, spanmetrics]
+      processors: [batch, attributes]
+      exporters: [otlp, debug, spanmetrics, dataset, dataset/aaa]
     metrics:
-      receivers: [httpcheck/frontendproxy, otlp, spanmetrics]
+      receivers: [httpcheck/frontendproxy, otlp, spanmetrics, prometheus]
       processors: [filter/ottl, transform, batch]
       exporters: [otlphttp/prometheus, debug]
     logs:
       receivers: [otlp]
-      processors: [batch]
-      exporters: [otlp/logs, debug]
+      processors: [batch, attributes]
+      exporters: [otlp/logs, debug, dataset, dataset/aaa]
```
4. Run the demo - `docker compose up --abort-on-container-exit`
5. Check, that metrics are in Grafana -
http://localhost:8080/grafana/explore?
<img width="838" alt="Screenshot 2023-11-27 at 12 29 29"
src="https://github.com/open-telemetry/opentelemetry-collector-contrib/assets/122797378/43d365dd-37d8-4528-b768-1d7f0ac34989">
6. Check some metrics
![Screenshot 2023-11-22 at 14 06
56](https://github.com/open-telemetry/opentelemetry-collector-contrib/assets/122797378/81306486-eb5e-49b1-87ed-25d1eb8afcf8)
<img width="1356" alt="Screenshot 2023-11-27 at 12 59 10"
src="https://github.com/open-telemetry/opentelemetry-collector-contrib/assets/122797378/34c36e45-850e-4e74-a18a-0a54ce97cbd3">
7. Check that data are available in dataset ![Screenshot 2023-11-22 at
13 33
50](https://github.com/open-telemetry/opentelemetry-collector-contrib/assets/122797378/77cb2f31-be14-463b-91a7-fd10f8dbfe3a)

**Documentation:** 

**Library changes:**
* Group By & Debug - scalyr/dataset-go#62
* Metrics  - scalyr/dataset-go#61

---------

Co-authored-by: Andrzej Stencel <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants