Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[receiver/dockerstats] support pids_stats metrics #21111

Merged
merged 9 commits into from
May 19, 2023

Conversation

sigilioso
Copy link
Contributor

Description:

Adds support for pids_stats metrics.

pids_stats metrics were introduced in docker engine API v1.23 and that's the reason why docker's API default version has been bumped to 1.23. However, the minimum supported version has not been updated because the metrics are optional and reported only when pids cgroup is supported and the kernel is >= 4.3. On top of that, pids_stats may be reported even if the API request is performed using previous versions. Example:

curl --unix-socket /var/run/docker.sock -H "Content-Type: application/json" http://1.22/containers/78de07328afff50a9777b07dd36a28c709dffe081baaf67235db618843399643/stats?stream=false |jq
{
(...)
  "pids_stats": {
    "current": 1,
    "limit": 2192
  },
(...)
}

Link to tracking Issue: #21041

Testing:

ScarapeV2 test has been updated:

  • Expectation files for the previous tests now include pids.count metric (already being there in the corresponding stats json files)
  • Included new ScrapeV2 test case where pids_stastmetrics are not reported.
  • Included new ScrapeV2 tests case where both new metrics container.pids.count and container.pids.max are reported.

Tests where executed as follows:

❯ make test
go test -race -timeout 300s -parallel 4 --tags="" ./...
ok      github.com/open-telemetry/opentelemetry-collector-contrib/receiver/dockerstatsreceiver  0.805s
ok      github.com/open-telemetry/opentelemetry-collector-contrib/receiver/dockerstatsreceiver/internal/metadata        0.500s

❯ go test --tags integration ./...
ok      github.com/open-telemetry/opentelemetry-collector-contrib/receiver/dockerstatsreceiver  33.664s
ok      github.com/open-telemetry/opentelemetry-collector-contrib/receiver/dockerstatsreceiver/internal/metadata        0.404s

Documentation:
New metrics documentation was generated using mdatagen

@@ -126,6 +126,7 @@ func (r *receiver) recordContainerStats(now pcommon.Timestamp, containerStats *d
r.recordMemoryMetrics(now, &containerStats.MemoryStats)
r.recordBlkioMetrics(now, &containerStats.BlkioStats)
r.recordNetworkMetrics(now, &containerStats.Networks)
r.recordPidsMetrics(now, &containerStats.PidsStats)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the behaviour on API 1.22, or <=4.3 kernel? I would like to ensure there is no nil pointer crash. I would expect it just does not record the metric. But I can't see any explicit handling of these cases

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since PidStats is not defined as a pointer in docker types it will be an empty struct, therefore this pointer cannot be nil even if pids_stats are not reported.

The stats response in the new test case pids_no_stats does not include pids_stats metrics, showing this use-case.

})
assert.NoError(t, err)
defer pidsStatsMaxEngineMock.Close()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not strictly related to your features, but I think we should probably refactor the test case struct so it defines the engine mocks. Otherwise these tests are going to get messy as more testcases are added.
Like this:

mockDockerEngine: func () *httptest.Server {
  mockServer, err := dockerMockServer(...)
  assert.NoError(t, err)
  return mockServer
}()

Also you may need to loop over the test cases, and defer mockServer.Close() (before t.Run()) to ensure they all get closed.

Copy link
Contributor Author

@sigilioso sigilioso Apr 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion! I didn't consider changing the test structure but it makes a lot of sense. I moved the mock definition inside the test case struct as suggested. Since it is in a function and the server is created inside of each t.Run I think it is safe closing it inside as well. Please, let me know your thoughts.
Changed here: f9b58f3

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for improving that, it looks good.

assert.NoError(t, err)
defer noPidsStatsEngineMock.Close()

pidsStatsMaxEngineMock, err := dockerMockServer(&map[string]string{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this test case differ from the existing test cases? You've added the pid stats into those too (which is good). Is it possible we can remove this test case, or does it provide something different to the others?

Copy link
Contributor Author

@sigilioso sigilioso Apr 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this test case both metrics are returned in the stats response:

  "pids_stats": {
    "current": 1,
    "limit": 2192
  },

(limit is only reported if a limit is defined, when it is unlimited it is not reported)

Then, the corresponding expectation file also include both metrics.

On the other hand, the existing test cases' stats response returned the current value only, which is the only value added in the expectation file.

@@ -628,3 +628,19 @@ metrics:
aggregation: cumulative
attributes:
- interface

# Pids
container.pids.count:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to my comment about the semantic convention: also consider the naming of the metric and try and somewhat align it to the semantic convention. Do you think container.processes.count is better for alignment? I'll let you make that call, but it's something to consider.

Copy link
Contributor Author

@sigilioso sigilioso Apr 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently processes is more widely used and is more aligned with the semantic convention, but taking into account that the value is named pids in cgroups and docker API does not change it, maybe keeping pids here makes these new metrics easier to identify and understand. WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I tend to agree. It's reasonable to keep it as pids here.

@sigilioso sigilioso force-pushed the include-pids-stats branch from 74f6e9a to f9b58f3 Compare April 26, 2023 08:58
@sigilioso sigilioso requested a review from jamesmoessis April 26, 2023 15:48
aggregation: cumulative
monotonic: false

container.pids.max:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason you've gone with "max" as the terminology here? Seeing as docker and cgroups report it as "limit" it might make sense for the metric name to be container.pids.limit. This is also consistent with other metrics in this component such as container.memory.usage.limit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took it from cgroups docs:

In order to use the pids controller, set the maximum number of tasks in pids.max (this is not available in the root cgroup for obvious reasons). The number of processes currently in the cgroup is given by pids.current.

But, as you stated, it is reported as limit and that name would be consistent with other metrics and the conventions, so I'm changing it. Thanks again! 🙂

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed here: 653ad44

@sigilioso sigilioso requested a review from jamesmoessis April 27, 2023 06:34
Copy link
Contributor

@jamesmoessis jamesmoessis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

Depending on whether #21166 gets merged before this, you may have some conflicts to resolve.

@sigilioso sigilioso force-pushed the include-pids-stats branch 2 times, most recently from 75946c9 to 9d71853 Compare May 9, 2023 06:15
@sigilioso sigilioso force-pushed the include-pids-stats branch from 9d71853 to 3f67fc1 Compare May 9, 2023 06:31
@jamesmoessis
Copy link
Contributor

@codeboten as the codeowner this looks good - just needs approver/maintainer review.

@atoulme atoulme added the ready to merge Code review completed; ready to merge by maintainers label May 19, 2023
@codeboten codeboten merged commit f61d138 into open-telemetry:main May 19, 2023
@github-actions github-actions bot added this to the next release milestone May 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready to merge Code review completed; ready to merge by maintainers receiver/dockerstats
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants