Procstat: collect connections and listener for each proc #8755

adrianlzt · 2021-01-27T09:47:17Z

Procstat modified to add a new metric "procstat_tcp" with to values
"conn" and "listen".
It also adds number of connections for each TCP state.

Those metrics will be added only if it they are activated specifically.

Those values will contain a comma separated value of the endpoints where
the proc is connecting to or listening (IPv4 and IPv6).
Only for linux.

Local, virtual and docker interfaces are ignored.

If some proc is listening on 0.0.0.0, an endpoint for each of the
"public" (those not ignored as internal) IPv4 IPs is created.
If it is listening on :: (IPv6) an endpoint is created for each IPv4 and
IPv6.

For programs with one parent and several children, all listening in some
endpoint, only the parent process is taken into account. Child endpoints
are ignored.

Connections made to this cost (local port is one of the listening ports)
are ignored. To avoid having servers with thousands of connections.
We prefer to collect that info in the clients.

It is also added connection info (number of connections in each of the
TCP states) for each proc. Improving PR #5402

To gather tcp connections netlink is used, to avoid the cost of parsing
/proc/net/tcp(6), but /proc should be readed to get the mapping between
inodes and pids.

Required for all PRs:

Associated README.md updated.
Has appropriate unit tests.

telegraf-tiger

Thanks so much for the pull request!
🤝 ✒️ Just a reminder that the CLA has not yet been signed, and we'll need it before merging. Please sign the CLA when you get a chance, then post a comment here saying !signed-cla

adrianlzt · 2021-01-27T09:48:47Z

!signed-cla

telegraf-tiger

🤝 ✒️ Just a reminder that the CLA has not yet been signed, and we'll need it before merging. Please sign the CLA when you get a chance, then post a comment here saying !signed-cla

adrianlzt · 2021-01-27T16:11:39Z

I want to add someway to only gather the procs with connections, to avoid the extra cost of parsing all procs.

sspaink · 2021-01-27T18:04:57Z

!signed-cla

telegraf-tiger

🤝 ✅ CLA has been signed. Thank you!

sspaink · 2021-01-27T18:06:08Z

@adrianlzt Sorry for the bot being a nuisance, there was a bug. Thanks for working on this PR!

Procstat modified to add a new metric "procstat_tcp" with to values "conn" and "listen". It also adds number of connections for each TCP state. Those metrics will be added only if it they are activated specifically. Those values will contain a comma separated value of the endpoints where the proc is connecting to or listening (IPv4 and IPv6). Only for linux. Local, virtual and docker interfaces are ignored. If some proc is listening on 0.0.0.0, an endpoint for each of the "public" (those not ignored as internal) IPv4 IPs is created. If it is listening on :: (IPv6) an endpoint is created for each IPv4 and IPv6. For programs with one parent and several children, all listening in some endpoint, only the parent process is taken into account. Child endpoints are ignored. Connections made to this cost (local port is one of the listening ports) are ignored. To avoid having servers with thousands of connections. We prefer to collect that info in the clients. It is also added connection info (number of connections in each of the TCP states) for each proc. Improving PR influxdata#5402 To gather tcp connections netlink is used, to avoid the cost of parsing /proc/net/tcp(6), but /proc should be readed to get the mapping between inodes and pids.

adrianlzt · 2021-01-28T14:26:07Z

Added an option to skip some metrics as suggested

If we want to collect only the metric procstat_lookup or procstat_tcp is a big overhead having to parse lots of files in /proc to get number of fds, CPU time, memory usage, etc. Added a new configuration option to be able to select which metrics to collect. If the configuration parameter, 'metrics_include', is undefined, the plugin behaves like before adding this parameter. The difference (measured with ~1100 procs) between gathering the procstat_tcp metrics alone or with the default collection of metrics is around one tenth (~50ms alone, ~500ms with default collection).

/proc/n/fd dirs were not being closed till the end of mapInodesToPid function, leaving a lot of file descriptors open. This could reach the hard limit in Linux breaking the functionality of Telegraf.

netInfo.Fetch() should read all files in /proc/n/fd dirs to map connections to pids. This operation should only be executed if we are going to use that information.

ssoroka

looks pretty thorough. I think it needs a bit of cleanup on error handling and logging, and some flow control around that. if you add Log telegraf.Logger to the struct the plugin system will populate it, and you can use testutils.Logger{} in tests.

Q: does github.com/elastic/gosigar/ use cgo?

ssoroka · 2021-04-13T21:42:45Z

plugins/inputs/procstat/connections_linux.go

+		// TODO allow passing a list of MAC headers to ignore
+		// TODO allow passing exclude list of regexp to match interface names


consider removing this, I think you could accomplish it with tagexclude or a processor

ssoroka · 2021-04-13T21:44:30Z

plugins/inputs/procstat/connections_linux.go

+		fmt.Printf("Error opening /proc: %v", err)
+	}


should be using the struct's Log telegraf.Logger to log this.
Missing return

ssoroka · 2021-04-13T21:44:35Z

plugins/inputs/procstat/connections_linux.go

+
+	dirContents, err := fd.Readdirnames(0)
+	if err != nil {
+		fmt.Printf("Error reading files in /proc: %v", err)


should be using the struct's Log telegraf.Logger to log this.
Missing return

ssoroka · 2021-04-13T21:45:33Z

plugins/inputs/procstat/connections_linux.go

+	pid, err := strconv.ParseUint(pidStr, 10, 32)
+	if err != nil {
+		// exclude files with a not numeric name. We only want to access pid directories
+		return
+	}
+
+	pidDir, err := os.Open("/proc/" + pidStr + "/fd/")
+	if err != nil {
+		// ignore errors:
+		//   - missing directory, pid has already finished
+		//   - permission denied
+		return
+	}


might be nice to handle the errors specifically and return others

Would be really good to log all of the errors in this function at the Debug level

ssoroka · 2021-04-13T21:47:23Z

plugins/inputs/procstat/process.go

@@ -10,6 +10,7 @@ import (

 type Process interface {
 	PID() PID
+	Ppid() (PID, error)


consider uppercase

Suggested change

Ppid() (PID, error)

PPID() (PID, error)

ssoroka · 2021-04-13T21:51:53Z

plugins/inputs/procstat/procstat.go

+	if (p.metricEnabled(MetricsTCPStats) || p.metricEnabled(MetricsConnectionsEndpoints)) && len(p.procs) > 0 {
+		err = netInfo.Fetch()
+		if err != nil {
+			acc.AddError(fmt.Errorf("E! [inputs.procstat] getting TCP network info: %v", err))


Suggested change

acc.AddError(fmt.Errorf("E! [inputs.procstat] getting TCP network info: %v", err))

acc.AddError(fmt.Errorf("getting TCP network info: %w", err))

ssoroka · 2021-04-13T21:57:52Z

plugins/inputs/procstat/procstat.go

+		} else {
+			// Ignore errors because pid was not found. It is normal to have procs without connections
+			if !errors.Is(err, ErrorPIDNotFound) {
+				acc.AddError(fmt.Errorf("D! [inputs.procstat] not able to get connections for pid=%v: %v", proc.PID(), err))


add Log telegraf.Logger to the struct and use that instead for debug.

Suggested change

acc.AddError(fmt.Errorf("D! [inputs.procstat] not able to get connections for pid=%v: %v", proc.PID(), err))

p.Log.Debug("not able to get connections for pid=%v: %v", proc.PID(), err)

helenosheaa · 2021-04-13T21:58:00Z

plugins/inputs/procstat/procstat_fallback.go

+func addConnectionStats(pidConnections []ConnInfo, fields map[string]interface{}, prefix string) {
+}
+
+func addConnectionEnpoints(acc telegraf.Accumulator, proc Process, netInfo NetworkInfo) error {


missing d change to addConnectionEndpoints and update uses of it

helenosheaa · 2021-04-13T21:58:22Z

plugins/inputs/procstat/README.md

@@ -60,6 +60,35 @@ Processes can be selected for monitoring using one of several methods:
  ## the native finder performs the search directly in a manor dependent on the
  ## platform.  Default is 'pgrep'
  # pid_finder = "pgrep"
+
+  ## Select wich extra metrics should be added:


missing h update to which

helenosheaa · 2021-04-13T21:58:42Z

plugins/inputs/procstat/procstat.go

@@ -90,6 +142,35 @@ var sampleConfig = `
  ## the native finder performs the search directly in a manor dependent on the
  ## platform.  Default is 'pgrep'
  # pid_finder = "pgrep"
+
+  ## Select wich extra metrics should be added:


missing h update to which

helenosheaa · 2021-04-13T21:59:53Z

plugins/inputs/procstat/procstat_linux.go

+	fields[prefix+"tcp_closing"] = counts[linux.TCP_CLOSING]
+}
+
+// addConnectionEnpoints add listen and connection endpoints to the procstat_tcp metric.


missing d change toaddConnectionEndpoints

helenosheaa · 2021-04-13T22:00:14Z

plugins/inputs/procstat/procstat_linux.go

+// If listen is 0.0.0.0 or ::, it will be added one value for each of the IP addresses of the host.
+// Listeners in private IPs are ignored (maybe a flag could be added, but now the reasoning is matching connections between hosts).
+// Connections made to this server are ignored (the local port is one of the listening ports).
+func addConnectionEnpoints(acc telegraf.Accumulator, proc Process, netInfo NetworkInfo) error {


missing d change to addConnectionEndpoints

helenosheaa · 2021-04-13T22:03:51Z

plugins/inputs/procstat/README.md

@@ -157,11 +186,49 @@ the `win_perf_counters` input plugin as a more mature alternative.
    - running (int)
    - result_code (int, success = 0, lookup_error = 1)

+If ``connections_stats`` enabled, added fields:


should this be tcp_stats?

sspaink · 2022-06-21T15:28:51Z

Closing as there hasn't been any response to the latest reviews, but if you are still seeing this pull request merged please comment or re-open and we can help work through any outstanding issues.

telegraf-tiger bot requested changes Jan 27, 2021

View reviewed changes

adrianlzt mentioned this pull request Jan 27, 2021

Add connection info per process to procstat #5402

Closed

3 tasks

adrianlzt force-pushed the feature/procstat_network_info branch 6 times, most recently from 4a0d64e to b5e0415 Compare January 27, 2021 11:49

adrianlzt changed the title ~~Procstat: collect connections and listener for each proc~~ WIP: Procstat: collect connections and listener for each proc Jan 27, 2021

telegraf-tiger bot approved these changes Jan 27, 2021

View reviewed changes

adrianlzt force-pushed the feature/procstat_network_info branch from b5e0415 to 4698e5b Compare January 28, 2021 14:24

adrianlzt changed the title ~~WIP: Procstat: collect connections and listener for each proc~~ Procstat: collect connections and listener for each proc Jan 28, 2021

sjwang90 added area/procstat feat Improvement on an existing feature such as adding a new setting/mode to an existing plugin labels Jan 29, 2021

Procstat: close /proc/n/fd dirs earlier

81657b5

/proc/n/fd dirs were not being closed till the end of mapInodesToPid function, leaving a lot of file descriptors open. This could reach the hard limit in Linux breaking the functionality of Telegraf.

adrianlzt force-pushed the feature/procstat_network_info branch from 4698e5b to 81657b5 Compare February 9, 2021 12:17

procstat: avoid collection inode info if not being used

660f86f

netInfo.Fetch() should read all files in /proc/n/fd dirs to map connections to pids. This operation should only be executed if we are going to use that information.

ssoroka suggested changes Apr 13, 2021

View reviewed changes

helenosheaa reviewed Apr 13, 2021

View reviewed changes

sspaink closed this Jun 21, 2022

adrianlzt mentioned this pull request Aug 22, 2023

feat(inputs.procstat): new option to select fields and tcp info #13814

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Procstat: collect connections and listener for each proc #8755

Procstat: collect connections and listener for each proc #8755

adrianlzt commented Jan 27, 2021 •

edited

Loading

telegraf-tiger bot left a comment

adrianlzt commented Jan 27, 2021

telegraf-tiger bot left a comment

adrianlzt commented Jan 27, 2021

sspaink commented Jan 27, 2021

telegraf-tiger bot left a comment

sspaink commented Jan 27, 2021

adrianlzt commented Jan 28, 2021

ssoroka left a comment

ssoroka Apr 13, 2021

ssoroka Apr 13, 2021

ssoroka Apr 13, 2021

ssoroka Apr 13, 2021

ssoroka Apr 13, 2021

ssoroka Apr 13, 2021

ssoroka Apr 13, 2021

ssoroka Apr 13, 2021

helenosheaa Apr 13, 2021

helenosheaa Apr 13, 2021

helenosheaa Apr 13, 2021

helenosheaa Apr 13, 2021

helenosheaa Apr 13, 2021

helenosheaa Apr 13, 2021

sspaink commented Jun 21, 2022

		// TODO allow passing a list of MAC headers to ignore
		// TODO allow passing exclude list of regexp to match interface names

	acc.AddError(fmt.Errorf("E! [inputs.procstat] getting TCP network info: %v", err))
	acc.AddError(fmt.Errorf("getting TCP network info: %w", err))

	acc.AddError(fmt.Errorf("D! [inputs.procstat] not able to get connections for pid=%v: %v", proc.PID(), err))
	p.Log.Debug("not able to get connections for pid=%v: %v", proc.PID(), err)

Procstat: collect connections and listener for each proc #8755

Procstat: collect connections and listener for each proc #8755

Conversation

adrianlzt commented Jan 27, 2021 • edited Loading

Required for all PRs:

telegraf-tiger bot left a comment

Choose a reason for hiding this comment

adrianlzt commented Jan 27, 2021

telegraf-tiger bot left a comment

Choose a reason for hiding this comment

adrianlzt commented Jan 27, 2021

sspaink commented Jan 27, 2021

telegraf-tiger bot left a comment

Choose a reason for hiding this comment

sspaink commented Jan 27, 2021

adrianlzt commented Jan 28, 2021

ssoroka left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sspaink commented Jun 21, 2022

adrianlzt commented Jan 27, 2021 •

edited

Loading