-
Notifications
You must be signed in to change notification settings - Fork 712
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"Intern" map keys #2865
"Intern" map keys #2865
Conversation
report/map_helpers.go
Outdated
// Try to avoid an allocation by looking the key up | ||
var ok bool | ||
key, ok = commonKeys[string(b)] | ||
if !ok { |
This comment was marked as abuse.
This comment was marked as abuse.
Sorry, something went wrong.
report/map_keys.go
Outdated
"reverse_dns_names": "reverse_dns_names", | ||
"snooped_dns_names": "snooped_dns_names", | ||
"threads": "threads", | ||
} |
This comment was marked as abuse.
This comment was marked as abuse.
Sorry, something went wrong.
This comment was marked as abuse.
This comment was marked as abuse.
Sorry, something went wrong.
This comment was marked as abuse.
This comment was marked as abuse.
Sorry, something went wrong.
This comment was marked as abuse.
This comment was marked as abuse.
Sorry, something went wrong.
Ideas: Maybe the lookup should be done via a perfect hash function. I also experimented with putting all the values into |
7befe97
to
e188685
Compare
I have re-based this. |
6ea9f86
to
501df6c
Compare
In my benchmarks I am only seeing a ~3.2% drop in allocations during report reading.
master
branch
Moreover, nearly all of that comes from just 8 keys: Looking into this further, it appears that the change in this PR doesn't affect latest-map decoding because we have a custom decoder for that. |
Fixed. With that change I am seeing an allocation reduction of ~26% compared to master
|
I applied a quick hack to ascertain the possible gain from interning all keys: diff --git a/extras/generate_latest_map b/extras/generate_latest_map
index 48d44863..2bc78d12 100755
--- a/extras/generate_latest_map
+++ b/extras/generate_latest_map
@@ -244,6 +244,7 @@ function generate_latest_map() {
var ok bool
if key, ok = commonKeys[string(b)]; !ok {
key = string(b)
+ commonKeys[key] = key
}
}
i := m.locate(key)
diff --git a/report/latest_map_generated.go b/report/latest_map_generated.go
index 2956447d..ddc136ca 100644
--- a/report/latest_map_generated.go
+++ b/report/latest_map_generated.go
@@ -212,6 +212,7 @@ func (m *StringLatestMap) CodecDecodeSelf(decoder *codec.Decoder) {
var ok bool
if key, ok = commonKeys[string(b)]; !ok {
key = string(b)
+ commonKeys[key] = key
}
}
i := m.locate(key)
@@ -434,6 +435,7 @@ func (m *NodeControlDataLatestMap) CodecDecodeSelf(decoder *codec.Decoder) {
var ok bool
if key, ok = commonKeys[string(b)]; !ok {
key = string(b)
+ commonKeys[key] = key
}
}
i := m.locate(key)
diff --git a/report/map_helpers.go b/report/map_helpers.go
index 79dd278e..0d2b0311 100644
--- a/report/map_helpers.go
+++ b/report/map_helpers.go
@@ -115,6 +115,7 @@ func mapRead(decoder *codec.Decoder, decodeValue func(isNil bool) interface{}) p
var ok bool
if key, ok = commonKeys[string(b)]; !ok {
key = string(b)
+ commonKeys[key] = key
}
} This actually only led to a modest improvement to ~28% compared to master
|
In fact since #2870 |
Here's how I get a count of the top keys from a scope report
ATM this is massively skewed for weaveworks clusters by #2662
|
9b28c36
to
4232f5f
Compare
@bboreham please take a look at my last commit and let me know what you think. I am constructing the map from constants declared in our code. That is less brittle but means we won't catch common labels or env vars, or deprecated keys in old reports. We could add these manually, but given the benchmark results below I don't think it's worth it. I minimised the diff in that commit. There are some extra changes I want to make:
I reckon that's a good rearrangements of the dependencies: Here's a comparison of running
against master, this branch prior to my commit, and this branch after my commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be two separate PRs in one - refactoring the string constants and interning keys. I guess it would be ok to do it as two commits in that order.
The commit description "intern LatestMap keys we know" seems a bit off - it's a refactor of duplicated strings.
report/map_keys.go
Outdated
|
||
// lookupCommonKey tris to avoid an allocation in populating the key | ||
// by looking it up | ||
func lookupCommonKey(key *string, b []byte) { |
This comment was marked as abuse.
This comment was marked as abuse.
Sorry, something went wrong.
This comment was marked as abuse.
This comment was marked as abuse.
Sorry, something went wrong.
This comment was marked as abuse.
This comment was marked as abuse.
Sorry, something went wrong.
report/map_keys.go
Outdated
ECSScaleDown: ECSScaleDown, | ||
} | ||
|
||
// lookupCommonKey tris to avoid an allocation in populating the key |
This comment was marked as abuse.
This comment was marked as abuse.
Sorry, something went wrong.
probe/docker/tagger.go
Outdated
Domain = "domain" // TODO this is ambiguous, be more specific | ||
Name = "name" // TODO this is ambiguous, be more specific | ||
ContainerID = report.DockerContainerID | ||
Name = report.Name // TODO this is ambiguous, be more specific |
This comment was marked as abuse.
This comment was marked as abuse.
Sorry, something went wrong.
This comment was marked as abuse.
This comment was marked as abuse.
Sorry, something went wrong.
4a46cfd
to
944587a
Compare
Both the probe and the app (for rendering) need to know about them.
944587a
to
e24d3e9
Compare
@bboreham PTAL |
LGTM. Since this was my PR originally I can't Approve it. |
Somewhat speculative approach; it works well in micro-benchmarks.
This is based after #2863 because it needs the
DecodeStringAsBytes
(re-)introduced there.I created a static map so we don't have to lock access from multiple threads and don't have to worry about it getting clogged with values that are only used once or twice.
Relates to #1010