Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Procstat: collect connections and listener for each proc #8755

Closed

Conversation

adrianlzt
Copy link
Contributor

@adrianlzt adrianlzt commented Jan 27, 2021

Procstat modified to add a new metric "procstat_tcp" with to values
"conn" and "listen".
It also adds number of connections for each TCP state.

Those metrics will be added only if it they are activated specifically.

Those values will contain a comma separated value of the endpoints where
the proc is connecting to or listening (IPv4 and IPv6).
Only for linux.

Local, virtual and docker interfaces are ignored.

If some proc is listening on 0.0.0.0, an endpoint for each of the
"public" (those not ignored as internal) IPv4 IPs is created.
If it is listening on :: (IPv6) an endpoint is created for each IPv4 and
IPv6.

For programs with one parent and several children, all listening in some
endpoint, only the parent process is taken into account. Child endpoints
are ignored.

Connections made to this cost (local port is one of the listening ports)
are ignored. To avoid having servers with thousands of connections.
We prefer to collect that info in the clients.

It is also added connection info (number of connections in each of the
TCP states) for each proc. Improving PR #5402

To gather tcp connections netlink is used, to avoid the cost of parsing
/proc/net/tcp(6), but /proc should be readed to get the mapping between
inodes and pids.

Required for all PRs:

  • Associated README.md updated.
  • Has appropriate unit tests.

Copy link
Contributor

@telegraf-tiger telegraf-tiger bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for the pull request!
🤝 ✒️ Just a reminder that the CLA has not yet been signed, and we'll need it before merging. Please sign the CLA when you get a chance, then post a comment here saying !signed-cla

@adrianlzt
Copy link
Contributor Author

!signed-cla

Copy link
Contributor

@telegraf-tiger telegraf-tiger bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤝 ✒️ Just a reminder that the CLA has not yet been signed, and we'll need it before merging. Please sign the CLA when you get a chance, then post a comment here saying !signed-cla

@adrianlzt adrianlzt force-pushed the feature/procstat_network_info branch 6 times, most recently from 4a0d64e to b5e0415 Compare January 27, 2021 11:49
@adrianlzt adrianlzt changed the title Procstat: collect connections and listener for each proc WIP: Procstat: collect connections and listener for each proc Jan 27, 2021
@adrianlzt
Copy link
Contributor Author

I want to add someway to only gather the procs with connections, to avoid the extra cost of parsing all procs.

@sspaink
Copy link
Contributor

sspaink commented Jan 27, 2021

!signed-cla

Copy link
Contributor

@telegraf-tiger telegraf-tiger bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤝 ✅ CLA has been signed. Thank you!

@sspaink
Copy link
Contributor

sspaink commented Jan 27, 2021

@adrianlzt Sorry for the bot being a nuisance, there was a bug. Thanks for working on this PR!

Procstat modified to add a new metric "procstat_tcp" with to values
"conn" and "listen".
It also adds number of connections for each TCP state.

Those metrics will be added only if it they are activated specifically.

Those values will contain a comma separated value of the endpoints where
the proc is connecting to or listening (IPv4 and IPv6).
Only for linux.

Local, virtual and docker interfaces are ignored.

If some proc is listening on 0.0.0.0, an endpoint for each of the
"public" (those not ignored as internal) IPv4 IPs is created.
If it is listening on :: (IPv6) an endpoint is created for each IPv4 and
IPv6.

For programs with one parent and several children, all listening in some
endpoint, only the parent process is taken into account. Child endpoints
are ignored.

Connections made to this cost (local port is one of the listening ports)
are ignored. To avoid having servers with thousands of connections.
We prefer to collect that info in the clients.

It is also added connection info (number of connections in each of the
TCP states) for each proc. Improving PR influxdata#5402

To gather tcp connections netlink is used, to avoid the cost of parsing
/proc/net/tcp(6), but /proc should be readed to get the mapping between
inodes and pids.
@adrianlzt adrianlzt force-pushed the feature/procstat_network_info branch from b5e0415 to 4698e5b Compare January 28, 2021 14:24
@adrianlzt
Copy link
Contributor Author

Added an option to skip some metrics as suggested

@adrianlzt adrianlzt changed the title WIP: Procstat: collect connections and listener for each proc Procstat: collect connections and listener for each proc Jan 28, 2021
If we want to collect only the metric procstat_lookup or procstat_tcp is
a big overhead having to parse lots of files in /proc to get number of
fds, CPU time, memory usage, etc.

Added a new configuration option to be able to select which metrics to
collect.
If the configuration parameter, 'metrics_include', is undefined, the
plugin behaves like before adding this parameter.

The difference (measured with ~1100 procs) between gathering the
procstat_tcp metrics alone or with the default collection of metrics is
around one tenth (~50ms alone, ~500ms with default collection).
@sjwang90 sjwang90 added area/procstat feat Improvement on an existing feature such as adding a new setting/mode to an existing plugin labels Jan 29, 2021
/proc/n/fd dirs were not being closed till the end of mapInodesToPid
function, leaving a lot of file descriptors open.
This could reach the hard limit in Linux breaking the functionality of
Telegraf.
@adrianlzt adrianlzt force-pushed the feature/procstat_network_info branch from 4698e5b to 81657b5 Compare February 9, 2021 12:17
netInfo.Fetch() should read all files in /proc/n/fd dirs to map
connections to pids.
This operation should only be executed if we are going to use that
information.
Copy link
Contributor

@ssoroka ssoroka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks pretty thorough. I think it needs a bit of cleanup on error handling and logging, and some flow control around that. if you add Log telegraf.Logger to the struct the plugin system will populate it, and you can use testutils.Logger{} in tests.

Q: does github.com/elastic/gosigar/ use cgo?

Comment on lines +113 to +114
// TODO allow passing a list of MAC headers to ignore
// TODO allow passing exclude list of regexp to match interface names
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider removing this, I think you could accomplish it with tagexclude or a processor

Comment on lines +177 to +178
fmt.Printf("Error opening /proc: %v", err)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be using the struct's Log telegraf.Logger to log this.
Missing return


dirContents, err := fd.Readdirnames(0)
if err != nil {
fmt.Printf("Error reading files in /proc: %v", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be using the struct's Log telegraf.Logger to log this.
Missing return

Comment on lines +195 to +207
pid, err := strconv.ParseUint(pidStr, 10, 32)
if err != nil {
// exclude files with a not numeric name. We only want to access pid directories
return
}

pidDir, err := os.Open("/proc/" + pidStr + "/fd/")
if err != nil {
// ignore errors:
// - missing directory, pid has already finished
// - permission denied
return
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be nice to handle the errors specifically and return others

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be really good to log all of the errors in this function at the Debug level

@@ -10,6 +10,7 @@ import (

type Process interface {
PID() PID
Ppid() (PID, error)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider uppercase

Suggested change
Ppid() (PID, error)
PPID() (PID, error)

if (p.metricEnabled(MetricsTCPStats) || p.metricEnabled(MetricsConnectionsEndpoints)) && len(p.procs) > 0 {
err = netInfo.Fetch()
if err != nil {
acc.AddError(fmt.Errorf("E! [inputs.procstat] getting TCP network info: %v", err))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
acc.AddError(fmt.Errorf("E! [inputs.procstat] getting TCP network info: %v", err))
acc.AddError(fmt.Errorf("getting TCP network info: %w", err))

} else {
// Ignore errors because pid was not found. It is normal to have procs without connections
if !errors.Is(err, ErrorPIDNotFound) {
acc.AddError(fmt.Errorf("D! [inputs.procstat] not able to get connections for pid=%v: %v", proc.PID(), err))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add Log telegraf.Logger to the struct and use that instead for debug.

Suggested change
acc.AddError(fmt.Errorf("D! [inputs.procstat] not able to get connections for pid=%v: %v", proc.PID(), err))
p.Log.Debug("not able to get connections for pid=%v: %v", proc.PID(), err)

func addConnectionStats(pidConnections []ConnInfo, fields map[string]interface{}, prefix string) {
}

func addConnectionEnpoints(acc telegraf.Accumulator, proc Process, netInfo NetworkInfo) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing d change to addConnectionEndpoints and update uses of it

@@ -60,6 +60,35 @@ Processes can be selected for monitoring using one of several methods:
## the native finder performs the search directly in a manor dependent on the
## platform. Default is 'pgrep'
# pid_finder = "pgrep"

## Select wich extra metrics should be added:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing h update to which

@@ -90,6 +142,35 @@ var sampleConfig = `
## the native finder performs the search directly in a manor dependent on the
## platform. Default is 'pgrep'
# pid_finder = "pgrep"

## Select wich extra metrics should be added:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing h update to which

fields[prefix+"tcp_closing"] = counts[linux.TCP_CLOSING]
}

// addConnectionEnpoints add listen and connection endpoints to the procstat_tcp metric.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing d change toaddConnectionEndpoints

// If listen is 0.0.0.0 or ::, it will be added one value for each of the IP addresses of the host.
// Listeners in private IPs are ignored (maybe a flag could be added, but now the reasoning is matching connections between hosts).
// Connections made to this server are ignored (the local port is one of the listening ports).
func addConnectionEnpoints(acc telegraf.Accumulator, proc Process, netInfo NetworkInfo) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing d change to addConnectionEndpoints

@@ -157,11 +186,49 @@ the `win_perf_counters` input plugin as a more mature alternative.
- running (int)
- result_code (int, success = 0, lookup_error = 1)

If ``connections_stats`` enabled, added fields:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be tcp_stats?

@sspaink
Copy link
Contributor

sspaink commented Jun 21, 2022

Closing as there hasn't been any response to the latest reviews, but if you are still seeing this pull request merged please comment or re-open and we can help work through any outstanding issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/procstat feat Improvement on an existing feature such as adding a new setting/mode to an existing plugin
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants