-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[receiver/dockerstats] support pids_stats metrics #21111
Conversation
@@ -126,6 +126,7 @@ func (r *receiver) recordContainerStats(now pcommon.Timestamp, containerStats *d | |||
r.recordMemoryMetrics(now, &containerStats.MemoryStats) | |||
r.recordBlkioMetrics(now, &containerStats.BlkioStats) | |||
r.recordNetworkMetrics(now, &containerStats.Networks) | |||
r.recordPidsMetrics(now, &containerStats.PidsStats) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the behaviour on API 1.22, or <=4.3 kernel? I would like to ensure there is no nil pointer crash. I would expect it just does not record the metric. But I can't see any explicit handling of these cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since PidStats is not defined as a pointer in docker types it will be an empty struct, therefore this pointer cannot be nil even if pids_stats
are not reported.
The stats response in the new test case pids_no_stats does not include pids_stats
metrics, showing this use-case.
}) | ||
assert.NoError(t, err) | ||
defer pidsStatsMaxEngineMock.Close() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not strictly related to your features, but I think we should probably refactor the test case struct so it defines the engine mocks. Otherwise these tests are going to get messy as more testcases are added.
Like this:
mockDockerEngine: func () *httptest.Server {
mockServer, err := dockerMockServer(...)
assert.NoError(t, err)
return mockServer
}()
Also you may need to loop over the test cases, and defer mockServer.Close()
(before t.Run()
) to ensure they all get closed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion! I didn't consider changing the test structure but it makes a lot of sense. I moved the mock definition inside the test case struct as suggested. Since it is in a function and the server is created inside of each t.Run
I think it is safe closing it inside as well. Please, let me know your thoughts.
Changed here: f9b58f3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for improving that, it looks good.
assert.NoError(t, err) | ||
defer noPidsStatsEngineMock.Close() | ||
|
||
pidsStatsMaxEngineMock, err := dockerMockServer(&map[string]string{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this test case differ from the existing test cases? You've added the pid stats into those too (which is good). Is it possible we can remove this test case, or does it provide something different to the others?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this test case both metrics are returned in the stats response:
"pids_stats": {
"current": 1,
"limit": 2192
},
(limit
is only reported if a limit is defined, when it is unlimited it is not reported)
Then, the corresponding expectation file also include both metrics.
On the other hand, the existing test cases' stats response returned the current
value only, which is the only value added in the expectation file.
@@ -628,3 +628,19 @@ metrics: | |||
aggregation: cumulative | |||
attributes: | |||
- interface | |||
|
|||
# Pids | |||
container.pids.count: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related to my comment about the semantic convention: also consider the naming of the metric and try and somewhat align it to the semantic convention. Do you think container.processes.count
is better for alignment? I'll let you make that call, but it's something to consider.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apparently processes
is more widely used and is more aligned with the semantic convention, but taking into account that the value is named pids
in cgroups and docker API does not change it, maybe keeping pids
here makes these new metrics easier to identify and understand. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I tend to agree. It's reasonable to keep it as pids
here.
74f6e9a
to
f9b58f3
Compare
aggregation: cumulative | ||
monotonic: false | ||
|
||
container.pids.max: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason you've gone with "max" as the terminology here? Seeing as docker and cgroups report it as "limit" it might make sense for the metric name to be container.pids.limit
. This is also consistent with other metrics in this component such as container.memory.usage.limit
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took it from cgroups docs:
In order to use the
pids
controller, set the maximum number of tasks in pids.max (this is not available in the root cgroup for obvious reasons). The number of processes currently in the cgroup is given by pids.current.
But, as you stated, it is reported as limit and that name would be consistent with other metrics and the conventions, so I'm changing it. Thanks again! 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed here: 653ad44
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
Depending on whether #21166 gets merged before this, you may have some conflicts to resolve.
653ad44
to
9a877df
Compare
75946c9
to
9d71853
Compare
9d71853
to
3f67fc1
Compare
@codeboten as the codeowner this looks good - just needs approver/maintainer review. |
Description:
Adds support for
pids_stats
metrics.pids_stats metrics were introduced in docker engine API v1.23 and that's the reason why docker's API default version has been bumped to 1.23. However, the minimum supported version has not been updated because the metrics are optional and reported only when pids cgroup is supported and the kernel is >= 4.3. On top of that,
pids_stats
may be reported even if the API request is performed using previous versions. Example:Link to tracking Issue: #21041
Testing:
ScarapeV2
test has been updated:pids.count
metric (already being there in the corresponding stats json files)ScrapeV2
test case wherepids_stast
metrics are not reported.ScrapeV2
tests case where both new metricscontainer.pids.count
andcontainer.pids.max
are reported.Tests where executed as follows:
Documentation:
New metrics documentation was generated using
mdatagen