Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Intern" map keys #2865

Merged
merged 2 commits into from
Dec 14, 2017
Merged

"Intern" map keys #2865

merged 2 commits into from
Dec 14, 2017

Conversation

bboreham
Copy link
Collaborator

Somewhat speculative approach; it works well in micro-benchmarks.

This is based after #2863 because it needs the DecodeStringAsBytes (re-)introduced there.

I created a static map so we don't have to lock access from multiple threads and don't have to worry about it getting clogged with values that are only used once or twice.

Relates to #1010

@bboreham bboreham changed the title WIP: Intern ps.Map keys WIP: "Intern" ps.Map keys Sep 25, 2017
// Try to avoid an allocation by looking the key up
var ok bool
key, ok = commonKeys[string(b)]
if !ok {

This comment was marked as abuse.

"reverse_dns_names": "reverse_dns_names",
"snooped_dns_names": "snooped_dns_names",
"threads": "threads",
}

This comment was marked as abuse.

This comment was marked as abuse.

This comment was marked as abuse.

This comment was marked as abuse.

@bboreham
Copy link
Collaborator Author

bboreham commented Oct 2, 2017

Ideas:

Maybe the lookup should be done via a perfect hash function.

I also experimented with putting all the values into case statements on the grounds that the compiler ought to know how to do it; so far I don't understand what I am seeing.

@rade
Copy link
Member

rade commented Nov 29, 2017

I have re-based this.

@rade rade force-pushed the intern-map-keys branch 2 times, most recently from 6ea9f86 to 501df6c Compare November 30, 2017 07:37
@rade
Copy link
Member

rade commented Nov 30, 2017

In my benchmarks I am only seeing a ~3.2% drop in allocations during report reading.

go test -tags 'netgo unsafe' -run=x -cpu=2 -bench=ReportUnmarshal --bench-report-path=/tmp/prod-20171129/

master

BenchmarkReportUnmarshal-2   	       1	12567724701 ns/op	2679082008 B/op	19529959 allocs/op

branch

BenchmarkReportUnmarshal-2   	       1	11961347915 ns/op	2666462296 B/op	18906998 allocs/op

Moreover, nearly all of that comes from just 8 keys: deployment, host, container, container_image, pod, deployment, reverse_dns_names, snooped_dns_names. None of these are latest-map keys. The first four are entries in parent sets (and deployment shows up frequently there because of #2662). The latter two are common "sets" entries for endpoints.

Looking into this further, it appears that the change in this PR doesn't affect latest-map decoding because we have a custom decoder for that.

@rade
Copy link
Member

rade commented Nov 30, 2017

it appears that the change in this PR doesn't affect latest-map decoding because we have a custom decoder for that.

Fixed. With that change I am seeing an allocation reduction of ~26% compared to master

BenchmarkReportUnmarshal-2   	       1	11665006452 ns/op	2532008296 B/op	14407494 allocs/op

@rade
Copy link
Member

rade commented Nov 30, 2017

I applied a quick hack to ascertain the possible gain from interning all keys:

diff --git a/extras/generate_latest_map b/extras/generate_latest_map
index 48d44863..2bc78d12 100755
--- a/extras/generate_latest_map
+++ b/extras/generate_latest_map
@@ -244,6 +244,7 @@ function generate_latest_map() {
                var ok bool
                if key, ok = commonKeys[string(b)]; !ok {
                        key = string(b)
+                       commonKeys[key] = key
                }
             }
             i := m.locate(key)
diff --git a/report/latest_map_generated.go b/report/latest_map_generated.go
index 2956447d..ddc136ca 100644
--- a/report/latest_map_generated.go
+++ b/report/latest_map_generated.go
@@ -212,6 +212,7 @@ func (m *StringLatestMap) CodecDecodeSelf(decoder *codec.Decoder) {
                        var ok bool
                        if key, ok = commonKeys[string(b)]; !ok {
                                key = string(b)
+                               commonKeys[key] = key
                        }
                }
                i := m.locate(key)
@@ -434,6 +435,7 @@ func (m *NodeControlDataLatestMap) CodecDecodeSelf(decoder *codec.Decoder) {
                        var ok bool
                        if key, ok = commonKeys[string(b)]; !ok {
                                key = string(b)
+                               commonKeys[key] = key
                        }
                }
                i := m.locate(key)
diff --git a/report/map_helpers.go b/report/map_helpers.go
index 79dd278e..0d2b0311 100644
--- a/report/map_helpers.go
+++ b/report/map_helpers.go
@@ -115,6 +115,7 @@ func mapRead(decoder *codec.Decoder, decodeValue func(isNil bool) interface{}) p
                        var ok bool
                        if key, ok = commonKeys[string(b)]; !ok {
                                key = string(b)
+                               commonKeys[key] = key
                        }
                }

This actually only led to a modest improvement to ~28% compared to master

BenchmarkReportUnmarshal-2   	       1	11494403535 ns/op	2518734696 B/op	14039184 allocs/op

@rade
Copy link
Member

rade commented Nov 30, 2017

it appears that the change in this PR doesn't affect latest-map decoding because we have a custom decoder for that.

In fact since #2870 LatestMap is based on a sorted slice, rather than on ps.Map. Allocations for keys are a major cost their too though.

@rade
Copy link
Member

rade commented Nov 30, 2017

Here's how I get a count of the top keys from a scope report

cat report.json | jq '.[].nodes? | .[]? | (.sets,.parents,.latest,.latestControls) | select(. != null) | keys | .[]' | sort | uniq -c | sort -rn | head -n 30

ATM this is massively skewed for weaveworks clusters by #2662

@rade rade changed the title WIP: "Intern" ps.Map keys WIP: "Intern" map keys Nov 30, 2017
@rade rade force-pushed the intern-map-keys branch 3 times, most recently from 9b28c36 to 4232f5f Compare December 10, 2017 11:43
@rade
Copy link
Member

rade commented Dec 10, 2017

@bboreham please take a look at my last commit and let me know what you think. I am constructing the map from constants declared in our code. That is less brittle but means we won't catch common labels or env vars, or deprecated keys in old reports. We could add these manually, but given the benchmark results below I don't think it's worth it.

I minimised the diff in that commit. There are some extra changes I want to make:

  • remove the constant aliases in the probe packages, and instead use the report constants directly
  • remove the imports of probe packages from render, and instead use the report constants

I reckon that's a good rearrangements of the dependencies: report defines the protocol between probes and the app/renderer.

Here's a comparison of running

go test -tags 'netgo unsafe' -run=x -cpu=2 -benchmem -bench='ReportUnmarshal' --bench-report-path=/tmp/prod-20171204-15s

against master, this branch prior to my commit, and this branch after my commit

BenchmarkReportUnmarshal-2   	       1	2542708115 ns/op	458429904 B/op	 3477396 allocs/op
BenchmarkReportUnmarshal-2   	       1	2419631234 ns/op	447161120 B/op	 2947783 allocs/op
BenchmarkReportUnmarshal-2   	       1	2445833014 ns/op	447220928 B/op	 2941192 allocs/op

Copy link
Collaborator Author

@bboreham bboreham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be two separate PRs in one - refactoring the string constants and interning keys. I guess it would be ok to do it as two commits in that order.

The commit description "intern LatestMap keys we know" seems a bit off - it's a refactor of duplicated strings.


// lookupCommonKey tris to avoid an allocation in populating the key
// by looking it up
func lookupCommonKey(key *string, b []byte) {

This comment was marked as abuse.

This comment was marked as abuse.

This comment was marked as abuse.

ECSScaleDown: ECSScaleDown,
}

// lookupCommonKey tris to avoid an allocation in populating the key

This comment was marked as abuse.

Domain = "domain" // TODO this is ambiguous, be more specific
Name = "name" // TODO this is ambiguous, be more specific
ContainerID = report.DockerContainerID
Name = report.Name // TODO this is ambiguous, be more specific

This comment was marked as abuse.

This comment was marked as abuse.

@rade rade force-pushed the intern-map-keys branch 2 times, most recently from 4a46cfd to 944587a Compare December 11, 2017 19:34
rade added 2 commits December 11, 2017 20:26
Both the probe and the app (for rendering) need to know about them.
@rade
Copy link
Member

rade commented Dec 11, 2017

@bboreham PTAL

@bboreham bboreham changed the title WIP: "Intern" map keys "Intern" map keys Dec 14, 2017
@bboreham
Copy link
Collaborator Author

LGTM. Since this was my PR originally I can't Approve it.

@rade rade merged commit eaa3942 into master Dec 14, 2017
@rade rade mentioned this pull request Dec 15, 2017
@rade rade deleted the intern-map-keys branch December 25, 2017 10:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants