-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Connection tracking - hash mechanism #201
Conversation
Codecov Report
@@ Coverage Diff @@
## main #201 +/- ##
==========================================
+ Coverage 58.37% 58.49% +0.12%
==========================================
Files 58 59 +1
Lines 3351 3409 +58
==========================================
+ Hits 1956 1994 +38
- Misses 1267 1282 +15
- Partials 128 133 +5
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
pkg/pipeline/conntrack/hash.go
Outdated
|
||
// Compute the total hash | ||
hash := fnv.New32a() | ||
for _, fgName := range keyFields.Hash.FieldGroups { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain somewhere why we need different FieldGroups in KeyFields and in Hash? What is the purpose of these additional FieldGroups?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One is of type []FieldGroup and one is of type []string. Using the same name for these 2 different fields is a bit confusing. I assume that the []string is derived from Name field of the []FieldGroup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are correct.
KeyFields.FieldGroups
is where you define the field groups
and
KeyFields.Hash.FieldGroups
is where you reference the defined field groups (by name). I'll rename it and add some comments to make it a bit clearer.
Please take a look at these configuration examples:
https://gist.github.com/ronensc/0136cd1c3a761e2d6067679362fa72c6
https://gist.github.com/ronensc/c6d0a07b0503bbd7a36bc4ef0b6b5b29
pkg/api/conn_track.go
Outdated
type ConnTrackHash struct { | ||
FieldGroups []string `yaml:"fieldGroups" doc:"list of field groups"` | ||
FieldGroupA string `yaml:"fieldGroupA" doc:"field group A"` | ||
FieldGroupB string `yaml:"fieldGroupB" doc:"field group B"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need a description somewhere of what is meant by A and B?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add a comment to describe their meaning
pkg/pipeline/conntrack/hash_test.go
Outdated
}, | ||
Hash: api.ConnTrackHash{ | ||
FieldGroups: []string{"protocol"}, | ||
FieldGroupA: "src", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use of FieldGroupA field (non-blank) should be explained somewhere.
pkg/pipeline/conntrack/hash.go
Outdated
// ComputeHash computes the hash of a flow log according to keyFields. | ||
// Two flow logs will have the same hash if they belong to the same connection. | ||
func ComputeHash(flowLog config.GenericMap, keyFields api.KeyFields) ([]byte, error) { | ||
type hashType []byte |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ronensc If you externalize the hashType
type you will be able to also return hashType
and not []byte
pkg/pipeline/conntrack/hash.go
Outdated
func computeHashFields(flowLog config.GenericMap, fieldNames []string) ([]byte, error) { | ||
h := fnv.New32a() | ||
for _, fn := range fieldNames { | ||
// TODO: How should we handle a missing fieldName? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Info log and skip ( for now) --- this will cause in different Hash anyway ... maybe later we will make that a parameter
pkg/pipeline/conntrack/hash.go
Outdated
} | ||
|
||
// Compute the total hash | ||
hash := fnv.New32a() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we "trick" somehow (maybe interface?) so that it will be a parameter to the functions what length of Hash to use >???
@mariomac maybe you have an idea ??? ^^^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, actually the fnv.New32a
and rest of hashers implement the io.Writer
interface. You can change the function signature to something like:
func computeHashFields(flowLog config.GenericMap, fieldNames []string, hasher io.Writer) ([]byte, error) {
and invoke it like:
h, err := computeHashFields(flowLog, fg.Fields, fnv.New32a())
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or, if you want to be even more restrictive in what you can pass as argument, you can use the hash.Hash
interface instead of io.Writer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Look good to me ... more than good actually :-) I added some comments but nothing really prevents merging.,
f := flowLog[fn] | ||
f, ok := flowLog[fn] | ||
if !ok { | ||
log.Warningf("Missing field %v", fn) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be risky. It could end up flooding the log file. Is there a way to avoid repeated logs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can replace it with a prometheus metric. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe this needs to be tracked as a metric. It doesn't seem like it's likely to occur but can if someone wants to be malicious. For now, maybe let it go, but we should have a general solution to avoid spamming the log file.
Name: "dst", | ||
Fields: []string{ | ||
"DstAddr", | ||
"DstPort", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you include the ephemeral port in the hash, it will count a lot of connections. We need to get an agreement on what a "connection" is. Example: Let's say you access a web page. Is that one connection? With this implementation, it could be anywhere from 1 to 6 connections.
A typical web page will refer to JavaScript files, CSS files, images, etc. The browser will need to fetch these files. Pretty much all modern browsers today will allow up to 6 simultaneous connections to the same domain, and will reuse these connections to fetch all the files. Each of these connections will have a different ephemeral port.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. I tried to make the decision of what is a connection configurable. But it's still not flexible enough to support this use-case.
This specific unit test configures the classic 5-tuple to distinguish between connections. The other unit test uses a slightly different configuration. It uses the same 5-tuple but includes both flow directions in the same connection.
This is the first PR of the new Connection tracking module. Only the hash mechanism is implemented here. many more PRs are to come...