-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NETOBSERV-1625: (follow-up) mention other possible cause for ebpf drops #640
Conversation
@jotak: This pull request references NETOBSERV-1625 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
hey @msherif1234 : I think #632 missed some potential causes for drops. In agent, the drops metric is used in several scenarios:
On the first one, I see another problem: the alert would be triggered if there are filtered-out flows with the new filter feature. I think that filter feature should use a different metric, wdyt? |
Alert uses https://github.com/netobserv/network-observability-operator/blob/main/controllers/ebpf/agent-metrics.go#L130 unlike filter metrics uses |
I am fine with the apis edit but as I said above there is no issue for filter metrics, regarding the limiter drops I don't recall adding this not sure what real cases can lead to it |
// Possible values are:<br> | ||
// - `NetObservDroppedFlows`, which is triggered when eBPF agent hashmap table is full.<br> | ||
// `NetObservDroppedFlows`, which is triggered when the eBPF agent is dropping flows, such as when the BPF hashmap is full.<br> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we state the other possible reason for drop here i.e limiter capacity exceeded and possible recovery config if any ? from what I see I dn't think there is anything can be done to avoid limiter more of internal go limits ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the capacity limiter is not related to go limits (if you're talking about the GOMEMLIMIT), it's some sort of backpressure management, cf the warning text that is logged when it's triggered: https://github.com/netobserv/netobserv-ebpf-agent/blob/bf91cbef0008f6d6bc8b1c748729c18ec6b14d35/pkg/flow/limiter.go#L50-L53
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool Thanks!!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc'd @skrthomas this might need to be doc under the ebpf agent alerting WDYT?
Oh yes good point I didn't notice |
@jotak: This pull request references NETOBSERV-1625 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #640 +/- ##
==========================================
- Coverage 67.10% 66.96% -0.15%
==========================================
Files 68 68
Lines 7804 7804
==========================================
- Hits 5237 5226 -11
- Misses 2192 2199 +7
- Partials 375 379 +4
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
/lgtm |
thanks @msherif1234 |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jotak The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Description
Dependencies
n/a
Checklist
If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.