Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[system/process] add support for mutlierr #166

Merged
merged 28 commits into from
Jul 23, 2024
Merged
Show file tree
Hide file tree
Changes from 17 commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
9374d23
chore: initial commit, use errors.join
VihasMakwana Jul 18, 2024
f27b495
chore: introduce new helper, nits
VihasMakwana Jul 18, 2024
ed559b2
fix: clean the functino
VihasMakwana Jul 18, 2024
0e93586
fix: add wrpped error
VihasMakwana Jul 18, 2024
1a83875
fix: update tests
VihasMakwana Jul 18, 2024
ced452b
Merge branch 'main' into multierror-enhancement
VihasMakwana Jul 18, 2024
63f95b8
fix: fix argument order
VihasMakwana Jul 18, 2024
1b4ad46
fix: tests
VihasMakwana Jul 18, 2024
6e2ab10
fix: update ListStates
VihasMakwana Jul 18, 2024
3ccb849
fix: windows support
VihasMakwana Jul 18, 2024
557c766
fix: verbose
VihasMakwana Jul 18, 2024
00fd657
fix: verbose
VihasMakwana Jul 18, 2024
13e6bf0
chore: update container tests
VihasMakwana Jul 18, 2024
8aa9dd0
chore: add helper
VihasMakwana Jul 18, 2024
5049500
chore: remame function
VihasMakwana Jul 18, 2024
82526c3
fix: comments
VihasMakwana Jul 18, 2024
33f4f40
chore: add comments
VihasMakwana Jul 18, 2024
a4b22de
simplify build tags
VihasMakwana Jul 19, 2024
c1f9abd
fix: don't exported canIgnore
VihasMakwana Jul 19, 2024
a98038a
Update metric/system/process/process.go
VihasMakwana Jul 22, 2024
6a7cb8a
chore: simplify code. remove helpers
VihasMakwana Jul 22, 2024
6c2f29e
fix: add wrappers and unwrap for recusive lookup
VihasMakwana Jul 22, 2024
ceb1dff
fix: fix bug, nil pointer
VihasMakwana Jul 22, 2024
4cada94
fix: bug, nil pointer
VihasMakwana Jul 22, 2024
0d0ff39
fix: nits
VihasMakwana Jul 22, 2024
30a6fa5
chore: add test cases
VihasMakwana Jul 22, 2024
7db28ef
chore: comments
VihasMakwana Jul 22, 2024
b020a9d
fix: add license
VihasMakwana Jul 23, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions metric/system/process/helpers.go
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,9 @@
package process

import (
"errors"
"math"
"syscall"
"time"

"github.com/elastic/elastic-agent-libs/opt"
Expand Down Expand Up @@ -98,3 +100,10 @@ func GetProcCPUPercentage(s0, s1 ProcState) ProcState {
return s1

}

func CanIgnore(err error) bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be exported, as it seems to be used from within the same package?

Suggested change
func CanIgnore(err error) bool {
func canIgnore(err error) bool {

Same goes for CanDegrade.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi,
Regarding CanDegrade, we use this helper in metricbeat here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, @ycombinator we will also use CanDegrade here to ignore non-fatal errors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made CanIgnore local to this package.

// While monitoring a set of processes, some processes might get killed after we get all the PIDs
// So, there's no need to capture "process not found" error.

return errors.Is(err, syscall.ESRCH)
}
36 changes: 36 additions & 0 deletions metric/system/process/helpers_others.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
// Licensed to Elasticsearch B.V. under one or more contributor
// license agreements. See the NOTICE file distributed with
// this work for additional information regarding copyright
// ownership. Elasticsearch B.V. licenses this file to you under
// the Apache License, Version 2.0 (the "License"); you may
// not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.

//go:build darwin || freebsd || linux || aix || netbsd || openbsd

package process

import (
"errors"
"syscall"
)

func CanDegrade(err error) bool {
// Check for errors which aren't fatal in nature and would be only used to change status to DEGRADED by metricbeat
if err == nil {
return true
}
return (errors.Is(err, syscall.EACCES) ||
errors.Is(err, syscall.EPERM) ||
errors.Is(err, syscall.EINVAL) ||
errors.Is(err, NonFatalErr{}))
}
39 changes: 39 additions & 0 deletions metric/system/process/helpers_windows.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
// Licensed to Elasticsearch B.V. under one or more contributor
// license agreements. See the NOTICE file distributed with
// this work for additional information regarding copyright
// ownership. Elasticsearch B.V. licenses this file to you under
// the Apache License, Version 2.0 (the "License"); you may
// not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.

//go:build windows

package process

import (
"errors"
"syscall"

"golang.org/x/sys/windows"
)

func CanDegrade(err error) bool {
// Check for errors which aren't fatal in nature and would be only used to change status to DEGRADED by metricbeat
if err == nil {
return true
}
return (errors.Is(err, windows.ERROR_ACCESS_DENIED) ||
errors.Is(err, syscall.EPERM) ||
errors.Is(err, syscall.EINVAL) ||
errors.Is(err, windows.ERROR_INVALID_PARAMETER) ||
errors.Is(err, NonFatalErr{}))
}
36 changes: 21 additions & 15 deletions metric/system/process/process.go
Original file line number Diff line number Diff line change
Expand Up @@ -54,11 +54,11 @@ func ListStates(hostfs resolve.Resolver) ([]ProcState, error) {

// actually fetch the PIDs from the OS-specific code
_, plist, err := init.FetchPids()
if err != nil {
if err != nil && !CanDegrade(err) {
return nil, fmt.Errorf("error gathering PIDs: %w", err)
}

return plist, nil
return plist, err
}

// GetPIDState returns the state of a given PID
Expand Down Expand Up @@ -90,10 +90,10 @@ func (procStats *Stats) Get() ([]mapstr.M, []mapstr.M, error) {
}

// actually fetch the PIDs from the OS-specific code
pidMap, plist, err := procStats.FetchPids()
pidMap, plist, wrappedErr := procStats.FetchPids()

if err != nil {
return nil, nil, fmt.Errorf("error gathering PIDs: %w", err)
if wrappedErr != nil && !CanDegrade(wrappedErr) {
return nil, nil, fmt.Errorf("error gathering PIDs: %w", wrappedErr)
}
// We use this to track processes over time.
procStats.ProcsMap.SetMap(pidMap)
Expand Down Expand Up @@ -133,13 +133,13 @@ func (procStats *Stats) Get() ([]mapstr.M, []mapstr.M, error) {
rootEvents = append(rootEvents, rootMap)
}

return procs, rootEvents, nil
return procs, rootEvents, wrappedErr
}

// GetOne fetches process data for a given PID if its name matches the regexes provided from the host.
func (procStats *Stats) GetOne(pid int) (mapstr.M, error) {
pidStat, _, err := procStats.pidFill(pid, false)
if err != nil {
if err != nil && !CanDegrade(err) {
return nil, fmt.Errorf("error fetching PID %d: %w", pid, err)
}

Expand All @@ -152,7 +152,7 @@ func (procStats *Stats) GetOne(pid int) (mapstr.M, error) {
// event formatted as expected by ECS
func (procStats *Stats) GetOneRootEvent(pid int) (mapstr.M, mapstr.M, error) {
pidStat, _, err := procStats.pidFill(pid, false)
if err != nil {
if err != nil && !CanDegrade(err) {
return nil, nil, fmt.Errorf("error fetching PID %d: %w", pid, err)
}

Expand Down Expand Up @@ -180,7 +180,7 @@ func (procStats *Stats) GetSelf() (ProcState, error) {
}

pidStat, _, err := procStats.pidFill(self, false)
if err != nil {
if err != nil && !CanDegrade(err) {
return ProcState{}, fmt.Errorf("error fetching PID %d: %w", self, err)
}

Expand All @@ -191,23 +191,28 @@ func (procStats *Stats) GetSelf() (ProcState, error) {

// pidIter wraps a few lines of generic code that all OS-specific FetchPids() functions must call.
// this also handles the process of adding to the maps/lists in order to limit the code duplication in all the OS implementations
func (procStats *Stats) pidIter(pid int, procMap ProcsMap, proclist []ProcState) (ProcsMap, []ProcState) {
func (procStats *Stats) pidIter(pid int, procMap ProcsMap, proclist []ProcState) (ProcsMap, []ProcState, error) {
status, saved, err := procStats.pidFill(pid, true)
var nonFatalErr error
if err != nil {
if !errors.Is(err, NonFatalErr{}) {
procStats.logger.Debugf("Error fetching PID info for %d, skipping: %s", pid, err)
return procMap, proclist
if CanIgnore(err) {
return procMap, proclist, nil
}
return procMap, proclist, err
}
nonFatalErr = fmt.Errorf("non fatal error fetching PID some info for %d, metrics are valid, but partial: %w", pid, err)
procStats.logger.Debugf("Non fatal error fetching PID some info for %d, metrics are valid, but partial: %s", pid, err)
}
if !saved {
procStats.logger.Debugf("Process name does not match the provided regex; PID=%d; name=%s", pid, status.Name)
return procMap, proclist
return procMap, proclist, nonFatalErr
}
procMap[pid] = status
proclist = append(proclist, status)

return procMap, proclist
return procMap, proclist, nonFatalErr
}

// NonFatalErr is returned when there was an error
Expand Down Expand Up @@ -238,7 +243,7 @@ func (c NonFatalErr) Is(other error) bool {
// The second return value will only be false if an event has been filtered out.
func (procStats *Stats) pidFill(pid int, filter bool) (ProcState, bool, error) {
// Fetch proc state so we can get the name for filtering based on user's filter.

var wrappedErr error
// OS-specific entrypoint, get basic info so we can at least run matchProcess
status, err := GetInfoForPid(procStats.Hostfs, pid)
if err != nil {
Expand All @@ -265,6 +270,7 @@ func (procStats *Stats) pidFill(pid int, filter bool) (ProcState, bool, error) {
if !errors.Is(err, NonFatalErr{}) {
return status, true, fmt.Errorf("FillPidMetrics: %w", err)
}
wrappedErr = errors.Join(wrappedErr, fmt.Errorf("non-fatal error fetching PID metrics for %d, metrics are valid, but partial: %w", pid, err))
procStats.logger.Debugf("Non-fatal error fetching PID metrics for %d, metrics are valid, but partial: %s", pid, err)
}

Expand Down Expand Up @@ -320,7 +326,7 @@ func (procStats *Stats) pidFill(pid int, filter bool) (ProcState, bool, error) {
}
}

return status, true, nil
return status, true, wrappedErr
}

// cacheCmdLine fills out Env and arg metrics from any stored previous metrics for the pid
Expand Down
6 changes: 4 additions & 2 deletions metric/system/process/process_aix.go
Original file line number Diff line number Diff line change
Expand Up @@ -46,20 +46,22 @@ func (procStats *Stats) FetchPids() (ProcsMap, []ProcState, error) {
pid := C.pid_t(0)

procMap := make(ProcsMap, 0)
var wrappedErr err
var plist []ProcState
for {
// getprocs first argument is a void*
num, err := C.getprocs(unsafe.Pointer(&info), C.sizeof_struct_procsinfo64, nil, 0, &pid, 1)
if err != nil {
return nil, nil, fmt.Errorf("error fetching PIDs: %w", err)
}
procMap, plist = procStats.pidIter(int(info.pi_pid), procMap, plist)
procMap, plist, err = procStats.pidIter(int(pid), procMap, plist)
wrappedErr = errors.Join(wrappedErr, err)

if num == 0 {
break
}
}
return procMap, plist, nil
return procMap, plist, wrappedErr
}

// GetInfoForPid returns basic info for the process
Expand Down
3 changes: 2 additions & 1 deletion metric/system/process/process_container_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
package process

import (
"fmt"
"os"
"os/user"
"runtime"
Expand Down Expand Up @@ -112,7 +113,7 @@ func TestSystemHostFromContainer(t *testing.T) {
validateProcResult(t, result)
} else {
_, roots, err := testStats.Get()
require.NoError(t, err)
require.True(t, CanDegrade(err), fmt.Sprintf("Fatal error: %s", err))

for _, proc := range roots {
t.Logf("proc: %d: %s", proc["process"].(map[string]interface{})["pid"],
Expand Down
12 changes: 8 additions & 4 deletions metric/system/process/process_darwin.go
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ import "C"
import (
"bytes"
"encoding/binary"
"errors"
"fmt"
"io"
"os"
Expand Down Expand Up @@ -73,6 +74,8 @@ func (procStats *Stats) FetchPids() (ProcsMap, []ProcState, error) {

procMap := make(ProcsMap, num)
plist := make([]ProcState, 0, num)
var wrappedErr error
var err error

for i := 0; i < num; i++ {
if err := binary.Read(bbuf, binary.LittleEndian, &pid); err != nil {
Expand All @@ -82,10 +85,11 @@ func (procStats *Stats) FetchPids() (ProcsMap, []ProcState, error) {
if pid == 0 {
continue
}
procMap, plist = procStats.pidIter(int(pid), procMap, plist)
procMap, plist, err = procStats.pidIter(int(pid), procMap, plist)
wrappedErr = errors.Join(wrappedErr, err)
}

return procMap, plist, nil
return procMap, plist, wrappedErr
}

// GetInfoForPid returns basic info for the process
Expand All @@ -98,9 +102,9 @@ func GetInfoForPid(_ resolve.Resolver, pid int) (ProcState, error) {
// For docs, see the link below. Check the `proc_taskallinfo` struct, which
// is a composition of `proc_bsdinfo` and `proc_taskinfo`.
// https://opensource.apple.com/source/xnu/xnu-1504.3.12/bsd/sys/proc_info.h.auto.html
n := C.proc_pidinfo(C.int(pid), C.PROC_PIDTASKALLINFO, 0, ptr, size)
n, err := C.proc_pidinfo(C.int(pid), C.PROC_PIDTASKALLINFO, 0, ptr, size)
if n != size {
return ProcState{}, fmt.Errorf("could not read process info for pid %d: proc_pidinfo returned %d", pid, int(n))
return ProcState{}, fmt.Errorf("could not read process info for pid %d: proc_pidinfo returned %d, err: %w", pid, int(n), err)
}

status := ProcState{}
Expand Down
6 changes: 4 additions & 2 deletions metric/system/process/process_linux_common.go
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,7 @@ func (procStats *Stats) FetchPids() (ProcsMap, []ProcState, error) {

procMap := make(ProcsMap, len(names))
plist := make([]ProcState, 0, len(names))
var wrappedErr error

// Iterate over the directory, fetch just enough info so we can filter based on user input.
logger := logp.L()
Expand All @@ -99,10 +100,11 @@ func (procStats *Stats) FetchPids() (ProcsMap, []ProcState, error) {
logger.Debugf("Error converting PID name %s", name)
continue
}
procMap, plist = procStats.pidIter(pid, procMap, plist)
procMap, plist, err = procStats.pidIter(pid, procMap, plist)
wrappedErr = errors.Join(wrappedErr, err)
}

return procMap, plist, nil
return procMap, plist, wrappedErr
}

func FillPidMetrics(hostfs resolve.Resolver, pid int, state ProcState, filter func(string) bool) (ProcState, error) {
Expand Down
6 changes: 3 additions & 3 deletions metric/system/process/process_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -169,12 +169,12 @@ func TestGetOne(t *testing.T) {
assert.NoError(t, err, "Init")

_, _, err = testConfig.Get()
assert.NoError(t, err, "GetOne")
assert.True(t, CanDegrade(err), fmt.Sprintf("Fatal Error: %s", err))

time.Sleep(time.Second * 2)

procData, _, err := testConfig.Get()
assert.NoError(t, err, "GetOne")
assert.True(t, CanDegrade(err), fmt.Sprintf("Fatal Error: %s", err))

t.Logf("Proc: %s", procData[0].StringToPrint())
}
Expand Down Expand Up @@ -267,7 +267,7 @@ func TestFilter(t *testing.T) {

func TestProcessList(t *testing.T) {
plist, err := ListStates(resolve.NewTestResolver("/"))
assert.NoError(t, err, "ListStates")
assert.True(t, CanDegrade(err), fmt.Sprintf("Fatal Error: %s", err))

for _, proc := range plist {
assert.NotEmpty(t, proc.State)
Expand Down
6 changes: 4 additions & 2 deletions metric/system/process/process_windows.go
Original file line number Diff line number Diff line change
Expand Up @@ -41,14 +41,16 @@ func (procStats *Stats) FetchPids() (ProcsMap, []ProcState, error) {

procMap := make(ProcsMap, len(pids))
plist := make([]ProcState, 0, len(pids))
var wrappedErr error
// This is probably the only implementation that doesn't benefit from our
// little fillPid callback system. We'll need to iterate over everything
// manually.
for _, pid := range pids {
procMap, plist = procStats.pidIter(int(pid), procMap, plist)
procMap, plist, err = procStats.pidIter(int(pid), procMap, plist)
wrappedErr = errors.Join(wrappedErr, err)
}

return procMap, plist, nil
return procMap, plist, wrappedErr
}

// GetSelfPid is the darwin implementation; see the linux version in
Expand Down
Loading