-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NETOBSERV-1112: This patch fixes a bug where RTT was not visible for flow logs at times. #159
Conversation
/ok-to-test |
New image: It will expire after two weeks. To deploy this build, run from the operator repo, assuming the operator is running: USER=netobserv VERSION=3f822c7 make set-agent-image |
Codecov Report
@@ Coverage Diff @@
## main #159 +/- ##
==========================================
- Coverage 38.82% 38.60% -0.23%
==========================================
Files 31 31
Lines 2246 2259 +13
==========================================
Hits 872 872
- Misses 1325 1338 +13
Partials 49 49
Flags with carried forward coverage won't be shown. Click here to find out more.
|
/ok-to-test |
New image: It will expire after two weeks. To deploy this build, run from the operator repo, assuming the operator is running: USER=netobserv VERSION=3f822c7 make set-agent-image |
New image: It will expire after two weeks. To deploy this build, run from the operator repo, assuming the operator is running: USER=netobserv VERSION=0ffaeef make set-agent-image |
@dushyantbehl: This pull request references NETOBSERV-1112 which is a valid jira issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
bpf/types.h
Outdated
// No need to emit this struct. It's used only in kernel space | ||
typedef struct flow_seq_id_t { | ||
u16 src_port; | ||
u16 dst_port; | ||
u8 src_ip[IP_MAX_LEN]; | ||
u8 dst_ip[IP_MAX_LEN]; | ||
u32 seq_id; | ||
u8 transport_protocol; | ||
u32 if_index; // OS interface index | ||
u8 __padding; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need to add manual padding attribute will takecare of it, also seq_id is becoming more like flow_id
I feel we can use the global hash map to calculate RTT directly none tcp fields like icmp ones can be set to 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added transport_protocol
to make it future proof, to address your comment of seq id collision. and to make it similar to the dns seq_id
to combine them in future if needed.
Padding is something I can remove.
if index is needed because flow agent connects to many interfaces in the same path which can affect time stamp when SYN and ACK were recorded to a. large extent if the same flow passes through eth0 -> bridge -> container veth -> container namespace ethernet etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah its just more and more becoming similar to flow_id
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There won't be more additions to it in my opinion. This map also stores seq_id which is the primary identifier here which does not have to be part of flow_id.
pkg/ebpf/tracer.go
Outdated
@@ -101,6 +101,8 @@ func NewFlowFetcher(cfg *FlowFetcherConfig) (*FlowFetcher, error) { | |||
if enableRtt == 0 { | |||
// Cannot set the size of map to be 0 so set it to 1. | |||
spec.Maps[flowSequencesMap].MaxEntries = uint32(1) | |||
} else { | |||
log.Infof("RTT calculations are enabled") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this for debugging ? we can know what is enabled or not from looking at the configs env var correct ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was just meant as info to see this via logs yeah...can be removed safely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only problem I see is that if both directions (Ingress and Egress are not enabled) RTT calculations will be disabled and message will be present only in logs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it even possible to disable tc on one side but not the other ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well do you mean tc queues? I am not sure but don't think its possible.
What I had meant is if ebpf agent somehow is set to attach only at one of Ingress or Egress, the DIRECTION configuration for agent. I believe right now this option is not exposed to the operator so we shouldn't worry too much.
@dushyantbehl: This pull request references NETOBSERV-1112 which is a valid jira issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/lgtm |
@dushyantbehl it might be a naive question, but the problem that it is solving, couldn't it be "just" solved by matching the interface on ACK? (ie. making sure the ACK we get is on the same interface as the initial SYN) ? I guess by adding the interface as a seq map values? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dushyantbehl @msherif1234 Was it tested end to end in ocp?
If so, I'm fine to merge if you're both ok with it (I just had a comment, wondering if there was a simpler approach possible - as this is a quite different algorithm now)
dst->if_index = src->if_index; | ||
|
||
// Fields which should be reversed | ||
dst->direction = (src->direction == INGRESS) ? EGRESS : INGRESS; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not a big deal but:
dst->direction = (src->direction == INGRESS) ? EGRESS : INGRESS; | |
dst->direction = 1 - src->direction; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did an end to end testing at my level and @jpinsonneau and @ronensc (who reported the issue at their end) did the testing too.
on your suggestion, I fear if we write it like that and in future change INGRESS and EGRESS to be something other than 0
and 1
then the trick might fail silently, hence maybe better to write it explictily.
@jotak yes what you suggest is also part of the solution to match interface when matching SYN and ACK and it is done by adding interface to the seq_id map structure. netobserv-ebpf-agent/bpf/types.h Line 133 in 5f9a07f
|
I have the feeling that this new algorithm will still cause troubles especially when there is sampling. When you say:
That works when we capture all the flows, because there is either "SYN(X) / ACK(X)" or "SYN(Y) / ACK(Y)" that corresponds to the actual RTT, whereas the other one is just process time and must be ignored. So taking the biggest should, in general, match the RTT and that's fine... But that doesn't work with sampling if you don't capture the flows that are relevant for RTT, then you would store the wrong value, which could screw up all the metrics computed from that. So I have the feeling we need to find something else. It sounds like the root cause of these problem is that we're never sure what INGRESS or EGRESS really mean (and we solved that problem in FLP with something called "reinterpret_direction", but we can't have the same solution here, we need to find something else). I'm not sure how we could do that. Maybe by reversing the meaning of INGRESS and EGRESS depending on which interface we're on? Alternatively maybe we should restrict on which interfaces we are computing RTT, e.g. in your graph above we would use only eth0 and ignore veth. The downside is that it decreases the chances to get the RTT when sampling is on. wdyt? |
This flow makes 3 packets, SAMPLING the way it is done right now interacts with all packets so we have to implement a workaround or change SAMPLING for this.
We cannot directly determine from ebpf which type of interface we are connected on, a veth interface could either be
There will be a high chance to miss some flows in the system if we take this approach in my opinion. Traffic from one pod on one host to another pod on a different host is encapsulated in host IP so if we only calculate at interfaces like
|
at somepoint I was working on new hook that just triggered when tcp state change independent off tc hook https://github.com/netobserv/netobserv-ebpf-agent/pull/106/files#diff-f256390f160e8c3dfdbea43540fc179790fed2b28286efdf60bea2a4711f26c1R53 I feel that would get us out of the issue with sampling and missing state update ? I am planning to bring that PR back to life at somepoint as its quiet out of date but instead of generating events we will just create flows in the hashmap |
So in that case I think this would be on the agent to not show values of microseconds magnitude, rather than later in the pipeline: because that's due to implementation details of the agent, and the other components should be able to trust what the agent sends.
Maybe we should try asking around ... ovn, ovs folks might have an idea that we're missing, about how we could solve that? We get the @msherif1234 yeah that would be an interesting thing to try, I agree! |
@dushyantbehl: This pull request references NETOBSERV-1112 which is a valid jira issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Actually I feel that doing this filtering at agent will be much more hardcoded behavior, a very simple fix is to mark in SAMPLING to not throw away SYN/ACK packets which are needed for many other behavior as well.
This is not doable based on if_index or anything else, it just gives name and id of the interface while the direction of capture depends on where the traffic is coming from. With the current patch only 2 packets are being looked at and RTT is being calculated based on them which is the standard approach. |
bpf/configs.h
Outdated
@@ -6,5 +6,6 @@ | |||
volatile const u32 sampling = 0; | |||
volatile const u8 trace_messages = 0; | |||
volatile const u8 enable_rtt = 0; | |||
volatile const u64 min_rtt = 50000; //50 micro seconds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @dushyantbehl ! this will avoid having inconsistent values. We can iterate later to find a better solution. I don't despair of finding a good solution closer to your first implementation, which I found simpler but would need a twist, as you put it in this PR description :-)
If that's good for @msherif1234 , that's good for me too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jotak
bpf/configs.h
Outdated
@@ -6,5 +6,6 @@ | |||
volatile const u32 sampling = 0; | |||
volatile const u8 trace_messages = 0; | |||
volatile const u8 enable_rtt = 0; | |||
volatile const u64 min_rtt = 50000; //50 micro seconds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are u planning to make this configurable ? because I don't see the userspace pieces to do that if not then u can define in it flows.c where its actually used and no need to make volatile in that case
bpf/rtt_tracker.h
Outdated
u64 *prev_ts = (u64 *)bpf_map_lookup_elem(&flow_sequences, seq_id); | ||
if (prev_ts != NULL) { | ||
u64 rtt = pkt->current_ts - *prev_ts; | ||
// Because of SAMPLING the way it is done if we miss one of SYN/SYN+ACK/ACK |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will also add FIXME here and indicate this a temp workaround for this case till we have better solution
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me. And since this is fixme I would make the above variable non volatile.
Signed-off-by: Dushyant Behl <[email protected]> Signed-off-by: Dushyant Behl <[email protected]>
/lgtm |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dushyantbehl The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
The calculation of RTT in ebpf agent was initially implemented to be doing something like this,
We were storing the timestamp of outgoing SYN and incoming ACK, calculating the RTT as Incoming ACK - Outgoing SYN.
This works for calculations done on both client and server endpoints as both have one SYN which is sent out and one ack which is received in response to that to start a TCP connection.
ebpf agent on the other hand does not connect only at a single interface in Openshift environment. If we look at the same communication being done from inside a container it would look like this.
In this case the container contains a
veth
pair and ebpf agent attaches to theveth
interface in the root namespace, let's callveth
for avpeer
in the container namespace.Any packet emerging from
vpeer
which is put at theegress
queue of container will come at theingress
queue of theveth
interface. Hence if we try to calculate RTT using the previous logic of outgoing SYN and incoming ACK then we would end up calculating the timestamp ofSYN(Y) and ACK(Y)
which will be just the response time of the container and not a correct value of the RTT.Also note that the ebpf agent attaches to many interfaces in Openshift and can attach to interfaces like
eth0
which always act like a Client or Server endpoints and hence the RTT calculation of Incoming ACK - Outgoing SYN will work for such endpoints.So to account for both the cases we changed the RTT calculation in a simple way, we simply take the RTT of SYN/SYN+ACK/ACK separately, i.e. we will calculate RTT of
SYN(X)
andACK(X)
andSYN(Y)
andACK(Y)
and simply take the maximum of both values.This is needed because ebpf agent attaches at both type of interfaces, root veth interfaces which act like middleboxes and
eth0
interfaces which act like final connection endpoints.Note that the flows are in reverse direction so we need to calculate RTT for reverse flows in ebpf agent but its pretty easy to do.
Also, this calculation is interface specific so we need to add interface
id
to the sequence identifier.