Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pool the transient maps used for Msg Pack, Truncate and Len #1006

Merged
merged 1 commit into from
Sep 23, 2019

Conversation

charlievieth
Copy link
Contributor

This improves runtime by 20-40% and more importantly significantly reduces memory usage and allocations. The goal here is to reduce allocations.

Benchmarks:

benchmark                                          old ns/op     new ns/op     delta
BenchmarkMsgLength-12                              337           263           -21.96%
BenchmarkMsgLengthNoCompression-12                 48.0          49.6          +3.33%
BenchmarkMsgLengthPack-12                          1010          963           -4.65%
BenchmarkMsgLengthMassive-12                       25409         14551         -42.73%
BenchmarkMsgLengthOnlyQuestion-12                  13.3          13.6          +2.26%
BenchmarkMsgLengthEscapedName-12                   108           112           +3.70%
BenchmarkPackDomainName-12                         140           143           +2.14%
BenchmarkUnpackDomainName-12                       118           119           +0.85%
BenchmarkUnpackDomainNameUnprintable-12            97.0          97.5          +0.52%
BenchmarkUnpackDomainNameLongest-12                510           507           -0.59%
BenchmarkUnpackDomainNameLongestUnprintable-12     1270          1276          +0.47%
BenchmarkCopy-12                                   222           228           +2.70%
BenchmarkPackA-12                                  48.7          48.7          +0.00%
BenchmarkUnpackA-12                                122           123           +0.82%
BenchmarkPackMX-12                                 88.1          88.4          +0.34%
BenchmarkUnpackMX-12                               146           145           -0.68%
BenchmarkPackAAAAA-12                              41.3          41.0          -0.73%
BenchmarkUnpackAAAA-12                             132           131           -0.76%
BenchmarkPackMsg-12                                965           935           -3.11%
BenchmarkPackMsgMassive-12                         42512         31269         -26.45%
BenchmarkPackMsgOnlyQuestion-12                    191           200           +4.71%
BenchmarkUnpackMsg-12                              1038          1037          -0.10%
BenchmarkIdGeneration-12                           13.5          13.5          +0.00%
BenchmarkReverseAddr/IP4-12                        103           104           +0.97%
BenchmarkReverseAddr/IP6-12                        110           112           +1.82%
BenchmarkGenerate-12                               156365        155104        -0.81%
BenchmarkSplitLabels-12                            40.7          40.2          -1.23%
BenchmarkLenLabels-12                              26.2          26.4          +0.76%
BenchmarkCompareDomainName-12                      108           108           +0.00%
BenchmarkIsSubDomain-12                            350           351           +0.29%
BenchmarkUnpackString-12                           78.0          78.5          +0.64%
BenchmarkMsgTruncate-12                            8926          5953          -33.31%
BenchmarkHashName/150-12                           20891         21504         +2.93%
BenchmarkHashName/2500-12                          338180        339422        +0.37%
BenchmarkHashName/5000-12                          659772        670657        +1.65%
BenchmarkHashName/10000-12                         1330291       1348682       +1.38%
BenchmarkHashName/65535-12                         8839862       8698766       -1.60%
BenchmarkDedup-12                                  1553          1568          +0.97%
BenchmarkNewRR-12                                  2090          2089          -0.05%
BenchmarkReadRR-12                                 2358          2339          -0.81%
BenchmarkParseZone-12                              37784         37818         +0.09%
BenchmarkZoneParser-12                             10718         11111         +3.67%
BenchmarkMuxMatch/lowercase-12                     61.6          58.9          -4.38%
BenchmarkMuxMatch/uppercase-12                     100           105           +5.00%
BenchmarkServe-12                                  130004        141492        +8.84%
BenchmarkServe6-12                                 143604        137140        -4.50%
BenchmarkServeCompress-12                          134134        139924        +4.32%
BenchmarkSprintName-12                             171           174           +1.75%
BenchmarkSprintTxtOctet-12                         116           120           +3.45%
BenchmarkSprintTxt-12                              206           203           -1.46%

benchmark                                          old allocs     new allocs     delta
BenchmarkMsgLength-12                              2              0              -100.00%
BenchmarkMsgLengthNoCompression-12                 0              0              +0.00%
BenchmarkMsgLengthPack-12                          3              1              -66.67%
BenchmarkMsgLengthMassive-12                       11             0              -100.00%
BenchmarkMsgLengthOnlyQuestion-12                  0              0              +0.00%
BenchmarkMsgLengthEscapedName-12                   0              0              +0.00%
BenchmarkPackDomainName-12                         0              0              +0.00%
BenchmarkUnpackDomainName-12                       1              1              +0.00%
BenchmarkUnpackDomainNameUnprintable-12            1              1              +0.00%
BenchmarkUnpackDomainNameLongest-12                1              1              +0.00%
BenchmarkUnpackDomainNameLongestUnprintable-12     1              1              +0.00%
BenchmarkCopy-12                                   7              7              +0.00%
BenchmarkPackA-12                                  0              0              +0.00%
BenchmarkUnpackA-12                                3              3              +0.00%
BenchmarkPackMX-12                                 0              0              +0.00%
BenchmarkUnpackMX-12                               4              4              +0.00%
BenchmarkPackAAAAA-12                              0              0              +0.00%
BenchmarkUnpackAAAA-12                             3              3              +0.00%
BenchmarkPackMsg-12                                2              0              -100.00%
BenchmarkPackMsgMassive-12                         12             1              -91.67%
BenchmarkPackMsgOnlyQuestion-12                    0              0              +0.00%
BenchmarkUnpackMsg-12                              12             12             +0.00%
BenchmarkIdGeneration-12                           0              0              +0.00%
BenchmarkReverseAddr/IP4-12                        2              2              +0.00%
BenchmarkReverseAddr/IP6-12                        2              2              +0.00%
BenchmarkGenerate-12                               1429           1429           +0.00%
BenchmarkSplitLabels-12                            1              1              +0.00%
BenchmarkLenLabels-12                              0              0              +0.00%
BenchmarkCompareDomainName-12                      2              2              +0.00%
BenchmarkIsSubDomain-12                            6              6              +0.00%
BenchmarkUnpackString-12                           2              2              +0.00%
BenchmarkMsgTruncate-12                            6              0              -100.00%
BenchmarkHashName/150-12                           6              6              +0.00%
BenchmarkHashName/2500-12                          6              6              +0.00%
BenchmarkHashName/5000-12                          6              6              +0.00%
BenchmarkHashName/10000-12                         6              6              +0.00%
BenchmarkHashName/65535-12                         6              6              +0.00%
BenchmarkDedup-12                                  31             31             +0.00%
BenchmarkNewRR-12                                  14             14             +0.00%
BenchmarkReadRR-12                                 16             16             +0.00%
BenchmarkParseZone-12                              92             92             +0.00%
BenchmarkZoneParser-12                             81             81             +0.00%
BenchmarkMuxMatch/lowercase-12                     0              0              +0.00%
BenchmarkMuxMatch/uppercase-12                     1              1              +0.00%
BenchmarkServe-12                                  53             53             +0.00%
BenchmarkServe6-12                                 50             50             +0.00%
BenchmarkServeCompress-12                          55             53             -3.64%
BenchmarkSprintName-12                             2              2              +0.00%
BenchmarkSprintTxtOctet-12                         2              2              +0.00%
BenchmarkSprintTxt-12                              2              2              +0.00%

benchmark                                          old bytes     new bytes     delta
BenchmarkMsgLength-12                              192           0             -100.00%
BenchmarkMsgLengthNoCompression-12                 0             0             +0.00%
BenchmarkMsgLengthPack-12                          528           320           -39.39%
BenchmarkMsgLengthMassive-12                       10865         0             -100.00%
BenchmarkMsgLengthOnlyQuestion-12                  0             0             +0.00%
BenchmarkMsgLengthEscapedName-12                   0             0             +0.00%
BenchmarkPackDomainName-12                         0             0             +0.00%
BenchmarkUnpackDomainName-12                       64            64            +0.00%
BenchmarkUnpackDomainNameUnprintable-12            48            48            +0.00%
BenchmarkUnpackDomainNameLongest-12                256           256           +0.00%
BenchmarkUnpackDomainNameLongestUnprintable-12     1024          1024          +0.00%
BenchmarkCopy-12                                   288           288           +0.00%
BenchmarkPackA-12                                  0             0             +0.00%
BenchmarkUnpackA-12                                100           100           +0.00%
BenchmarkPackMX-12                                 0             0             +0.00%
BenchmarkUnpackMX-12                               116           116           +0.00%
BenchmarkPackAAAAA-12                              0             0             +0.00%
BenchmarkUnpackAAAA-12                             112           112           +0.00%
BenchmarkPackMsg-12                                208           0             -100.00%
BenchmarkPackMsgMassive-12                         18981         6787          -64.24%
BenchmarkPackMsgOnlyQuestion-12                    0             0             +0.00%
BenchmarkUnpackMsg-12                              592           592           +0.00%
BenchmarkIdGeneration-12                           0             0             +0.00%
BenchmarkReverseAddr/IP4-12                        48            48            +0.00%
BenchmarkReverseAddr/IP6-12                        96            96            +0.00%
BenchmarkGenerate-12                               33887         33887         +0.00%
BenchmarkSplitLabels-12                            32            32            +0.00%
BenchmarkLenLabels-12                              0             0             +0.00%
BenchmarkCompareDomainName-12                      64            64            +0.00%
BenchmarkIsSubDomain-12                            192           192           +0.00%
BenchmarkUnpackString-12                           48            48            +0.00%
BenchmarkMsgTruncate-12                            2398          52            -97.83%
BenchmarkHashName/150-12                           468           468           +0.00%
BenchmarkHashName/2500-12                          468           468           +0.00%
BenchmarkHashName/5000-12                          468           468           +0.00%
BenchmarkHashName/10000-12                         468           468           +0.00%
BenchmarkHashName/65535-12                         468           468           +0.00%
BenchmarkDedup-12                                  624           624           +0.00%
BenchmarkNewRR-12                                  688           688           +0.00%
BenchmarkReadRR-12                                 1696          1696          +0.00%
BenchmarkParseZone-12                              84417         84417         +0.00%
BenchmarkZoneParser-12                             1968          1968          +0.00%
BenchmarkMuxMatch/lowercase-12                     0             0             +0.00%
BenchmarkMuxMatch/uppercase-12                     32            32            +0.00%
BenchmarkServe-12                                  3265          3265          +0.00%
BenchmarkServe6-12                                 3106          3105          -0.03%
BenchmarkServeCompress-12                          3473          3266          -5.96%
BenchmarkSprintName-12                             48            48            +0.00%
BenchmarkSprintTxtOctet-12                         80            80            +0.00%
BenchmarkSprintTxt-12                              80            80            +0.00%

This improves runtime by 20-40% and more importantly significantly
reduces memory usage and allocations.
@codecov-io
Copy link

codecov-io commented Sep 21, 2019

Codecov Report

Merging #1006 into master will increase coverage by 0.11%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1006      +/-   ##
==========================================
+ Coverage   54.63%   54.75%   +0.11%     
==========================================
  Files          41       41              
  Lines        9842     9859      +17     
==========================================
+ Hits         5377     5398      +21     
+ Misses       3438     3435       -3     
+ Partials     1027     1026       -1
Impacted Files Coverage Δ
msg_truncate.go 80.43% <100%> (+1.36%) ⬆️
msg.go 78.01% <100%> (+1.17%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2acbc9e...88359b0. Read the comment docs.

@miekg
Copy link
Owner

miekg commented Sep 21, 2019

thanks!

this is somewhat concerning:

BenchmarkMuxMatch/uppercase-12                     100           105           +5.00%
BenchmarkServe-12                                  130004        141492        +8.84%
BenchmarkServe6-12                                 143604        137140        -4.50%

As this does include some (localhost) networking it might not be conclusive, is Server-12 really almost 10% slower?

@tmthrgd
Copy link
Collaborator

tmthrgd commented Sep 21, 2019

I don’t believe this is a correct usage of sync.Pool, see golang/go#23199.

@charlievieth
Copy link
Contributor Author

As this does include some (localhost) networking it might not be conclusive, is Server-12 really almost 10% slower?

There is a ton of variability in timing of the Serve benchmarks (MBP 2018, macOS 10.14.6, Core i9 @2.9 GHz). TBH, they're only good for memory/alloc stats.

Results from benchstat (thanks @tmthrgd for showing me this tool)
below:

name              time/op
Serve-12           234µs ±59%
Serve6-12          231µs ±49%
ServeCompress-12   219µs ±68%

name              alloc/op
Serve-12          3.27kB ± 0%
Serve6-12         3.11kB ± 0%
ServeCompress-12  3.27kB ± 0%

name              allocs/op
Serve-12            53.0 ± 0%
Serve6-12           50.0 ± 0%
ServeCompress-12    53.0 ± 0%

@charlievieth
Copy link
Contributor Author

I don’t believe this is a correct usage of sync.Pool, see golang/go#23199.

I think the specific issue that you're referring to is that we are not setting an upper bound on the size of objects that we store in the pool.

I'm not terribly familiar with this part of DNS, but maybe @miekg could weigh in on what an appropriate upper limit would be (for reference BenchmarkPackMsgMassive stores 164 elements in the map)?

@miekg
Copy link
Owner

miekg commented Sep 22, 2019

This does look to have real world impact... ?

name              time/op
Serve-12           234µs ±59%
Serve6-12          231µs ±49%
ServeCompress-12   219µs ±68%

We storing compression pointers in the Pool; there is an upper limit to this (before we know we need them), so every different label (list) in a reply is stored. Its kinda hard to say what the exact limit is, but lets say:

  • max size 64K B for entire message
  • 3 octet label
  • different 3 octets in the entire message

So roughly 21845.333 elements in the map

@miekg
Copy link
Owner

miekg commented Sep 22, 2019 via email

@charlievieth
Copy link
Contributor Author

This does look to have real world impact... ?

That just shows that there is tremendous variability (+/- ~50-65%) in the timing of the Serve benchmarks. This existed before this change.

@charlievieth
Copy link
Contributor Author

So using a max map size of 16,384 (64K B / 4) could lead to storing maps of the following sizes (this is based on how much memory was allocated during map creation):

CompressionPackPool (map[string]uint16)    696358 B/op  680.03 KB
CompressionPool     (map[string]struct{})  630816 B/op  616 KB

Which isn't terrible, but could lead to a large amount of data residing in the sync.Pool in situations where there are many goroutines and the occasional requirement for a large compression map.

This could be handled by bucketing the Pools, but this would be difficult as we don't know how big of a map each request will require. A compromise might be to use a limit of 8k instead - though this would decrease the effectiveness of the pool when and where it is needed most.

That said, it was CoreDNS that led me here and I don't think this will be an issue in its use case. The pool always presents the risk of caching large objects, but I think in a steady-state environment this really won't be an issue.

Additionally, re-running the benchmarks on Ubuntu 18.04 (instead of my Mac with its garbage networking) shows an increase in Serve performance:

benchmark                    old ns/op     new ns/op     delta
BenchmarkServe-4             95307         89116         -6.50%
BenchmarkServe6-4            85613         68566         -19.91%
BenchmarkServeCompress-4     96287         84568         -12.17%

@miekg
Copy link
Owner

miekg commented Sep 23, 2019

Thanks for the benchmarking. Let's merge. The compression pointer map is something that I wish I had a better solution for, but hard to do (if you don't want to rewrite the entire lib)

@miekg miekg merged commit 9578cae into miekg:master Sep 23, 2019
@tmthrgd
Copy link
Collaborator

tmthrgd commented Sep 23, 2019

I really do think we'll hit both golang/go#23199 and golang/go#27735 with this change and end up causing memory issues. It's definitely not the correct usage of sync.Pool. I've actually personally moved away from using sync.Pool in many cases because it's quite finicky.

@miekg
Copy link
Owner

miekg commented Sep 23, 2019

I'm fine w/ reverting...

@miekg
Copy link
Owner

miekg commented Sep 24, 2019

@tmthrgd what are you proposing. Keep? Revert? Change?

@tmthrgd
Copy link
Collaborator

tmthrgd commented Sep 26, 2019

@miekg I think revert. While we may be able to do something here to reduce allocations, this isn’t correct as it stands.

@charlievieth
Copy link
Contributor Author

@tmthrgd The concerns around sync.Pool are real, but in my experience require fairly broken broken code to exercise. I've been using the sync.Pool in large high-throughput systems for years without issue.

It really only becomes and issue when the system is heavily overloaded with goroutines (since this increases the size of the pool - can't steal from poolLocalInternal.private) at which point the memory left lingering in the sync.Pool is the least of your worries (since the service might not stay up long enough for it to become an issue).

There is a lot of benefit here and its hard to see how CoreDNS or any other responsible consumer of this library will run into the issue.

@miekg
Copy link
Owner

miekg commented Sep 26, 2019 via email

@miekg
Copy link
Owner

miekg commented Sep 26, 2019 via email

miekg added a commit that referenced this pull request Oct 2, 2019
…1006)"

This reverts #1006, see discussion on the PR. Def. worth exploring this
furhter and pushing a more correct approach.

This reverts commit 9578cae.
miekg added a commit that referenced this pull request Oct 3, 2019
…1006)" (#1017)

This reverts #1006, see discussion on the PR. Def. worth exploring this
furhter and pushing a more correct approach.

This reverts commit 9578cae.
@miekg
Copy link
Owner

miekg commented Oct 13, 2019

I've made a local change where I have 3 maps (small, medium and large), duplicated again the length. so this should make all elements in there of ~roughly the same size. Unless I'm holding it wrong the benchcmp are atrocious:

benchmark                     old allocs     new allocs     delta
BenchmarkPackMsg-4            2              4              +100.00%
BenchmarkPackMsgMassive-4     12             14             +16.67%

benchmark                     old bytes     new bytes     delta
BenchmarkPackMsg-4            208           800           +284.62%
BenchmarkPackMsgMassive-4     18980         19570         +3.11%

miekg added a commit that referenced this pull request Oct 13, 2019
This adds several pools to cache compression maps. It uses 3 buckets
to cache item of roughly the same size.

This improves upon: #1006, specifcally fixes the use of sync.Pool

Signed-off-by: Miek Gieben <[email protected]>
@miekg
Copy link
Owner

miekg commented Oct 13, 2019

see https://github.com/miekg/dns/pull/new/mappy , the mappy branch

aanm pushed a commit to cilium/dns that referenced this pull request Jul 29, 2022
This improves runtime by 20-40% and more importantly significantly
reduces memory usage and allocations.
aanm pushed a commit to cilium/dns that referenced this pull request Jul 29, 2022
…iekg#1006)" (miekg#1017)

This reverts miekg#1006, see discussion on the PR. Def. worth exploring this
furhter and pushing a more correct approach.

This reverts commit 9578cae.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants