Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revise model transformation approach #3565

Closed
axw opened this issue Mar 27, 2020 · 10 comments · Fixed by #10792
Closed

Revise model transformation approach #3565

axw opened this issue Mar 27, 2020 · 10 comments · Fixed by #10792

Comments

@axw
Copy link
Member

axw commented Mar 27, 2020

Follow-on from #3551

In our model transform code, we're currently heavily relying on utility.Set and friends, which has a couple of performance-related issues:

  • blindly copies all maps, even when using a single-use temporary map
  • relies on interfaces/type reflection, which is slower than type-specific calls

We also create a lot of maps even when they're ultimately not used due to being empty.

We should investigate extending our model transform utilities to cut down on heap allocations and unnecessary copying.

I created a quick hack branch to highlight some of the gains we can expect to achieve: https://github.com/axw/apm-server/pull/new/optimise-fields

$ benchstat /tmp/old.txt /tmp/new.txt
name                      old time/op    new time/op    delta
TransactionEventDecode-8    18.0µs ± 5%    11.0µs ±11%  -39.16%  (p=0.016 n=4+5)

name                      old alloc/op   new alloc/op   delta
TransactionEventDecode-8    16.8kB ± 0%    10.7kB ± 0%  -36.36%  (p=0.000 n=4+5)

name                      old allocs/op  new allocs/op  delta
TransactionEventDecode-8       125 ± 0%        88 ± 0%  -29.60%  (p=0.008 n=5+5)
@axw
Copy link
Member Author

axw commented Mar 27, 2020

What would be particularly nice is if libbeat accepted an interface for beat.Event.Fields, along the lines of:

type Fielder interface {
    GetField(string) (interface{}, bool)
    SetField(string, interface{}) (bool)
}

// Recycler is an optional interface for Fielder to implement, to enable recycling memory.
type Recycler interface {
    Recycle()
}

Fielders could optionally implement go-structform/gotype.Folder to customise libbeat/output encoding.

We could then create a generated ECS type structure which implements this interface, into which we transform our model objects. e.g.

type Fields struct {
    Service ServiceFields
    Transaction TransactionFields
    ...
}

func (f *Fields) GetField(key string) (interface{}, bool) {
    var prefix, suffix string
    if i := strings.IndexRune(key, '.'); i >= 0 {
        prefix, suffix = key[:i], key[i+1:]
    }
    switch prefix {
    case "service":
        return f.Service.GetField(suffix)
    case "transaction":
        return f.Service.GetField(suffix)
    }
    return nil, false
}

type ServiceFields struct {
    Name string
    Version string
    ...
}

func (f *ServiceFields) GetField(key string) (interface{}, bool) {
    switch key {
    case "name":
        return f.Name, f.Name != ""
    case "version":
        return f.Version, f.Version != ""
    }
}

We would use a sync.Pool for creating and recycling these objects, implementing the Recyler interface.

All of that combined would enable us to significantly cut down on allocations, while still enabling processors to do their job. There's a good chance I've oversimplified the requirements though.

axw added a commit that referenced this issue Apr 6, 2020
The primary motivation behind this change is to lay the groundwork for merging shared (i.e. stream) and per-event metadata at decode time, rather than transformation time, which we'll need for #3485.

We could merge metadata without these changes, but it would be more difficult and error prone. Making these change also provide some performance improvements – see below. Finally, there is also overlap between merging metadata and revising the decoders to enable memory use (#3551 (comment)).

In theory this could be a considered a breaking change, due to the fact that an empty string coming from an agent would no longer be recorded in output documents. In practice, it does not make sense for any of the metadata fields to have empty string values.

Due to the use of empty strings, we would have to change the behaviour of utility.Set to not record empty strings. Because I have only modified metadata types, and not all model types, I instead changed the metadata types' Fields methods to stop using utility.Set and implemented a limited version of #3565 which is more explicit about omitting empty strings.

These changes yield a significant performance improvement in micro-benchmarks, both in decoding and transformation. Decoding improvements can be attributed to fewer allocations, while transformation improvements can be attributed to:

- fewer allocations
-- no interface allocations, or unnecessary deep copying of maps, due to utility.Set
-- lazy map construction
- less reflection, due to not using utility.Set
- less pointer indirection

name                   old time/op    new time/op    delta
pkg:github.com/elastic/apm-server/model/metadata goos:linux goarch:amd64
MetadataSet/minimal-8    1.16µs ± 6%    0.38µs ±11%  -67.59%  (p=0.008 n=5+5)
MetadataSet/full-8       11.9µs ± 4%     5.3µs ± 6%  -55.53%  (p=0.008 n=5+5)
pkg:github.com/elastic/apm-server/model/modeldecoder goos:linux goarch:amd64
DecodeMetadata-8         9.70µs ± 1%    9.30µs ±17%     ~     (p=0.690 n=5+5)

name                   old alloc/op   new alloc/op   delta
pkg:github.com/elastic/apm-server/model/metadata goos:linux goarch:amd64
MetadataSet/minimal-8      896B ± 0%      368B ± 0%  -58.93%  (p=0.008 n=5+5)
MetadataSet/full-8       14.0kB ± 0%     6.2kB ± 0%  -55.36%  (p=0.008 n=5+5)
pkg:github.com/elastic/apm-server/model/modeldecoder goos:linux goarch:amd64
DecodeMetadata-8         1.31kB ± 0%    1.06kB ± 0%  -18.96%  (p=0.000 n=5+4)

name                   old allocs/op  new allocs/op  delta
pkg:github.com/elastic/apm-server/model/metadata goos:linux goarch:amd64
MetadataSet/minimal-8      10.0 ± 0%       4.0 ± 0%  -60.00%  (p=0.008 n=5+5)
MetadataSet/full-8          114 ± 0%        68 ± 0%  -40.35%  (p=0.008 n=5+5)
pkg:github.com/elastic/apm-server/model/modeldecoder goos:linux goarch:amd64
DecodeMetadata-8           61.0 ± 0%      28.0 ± 0%  -54.10%  (p=0.008 n=5+5)

* model/modeldecoder: benchmark DecodeMetadata

* Benchmark recycled memory decoding

* model/modeldecoder: update decoding

* model/metadata: use non-pointer fields

* Adapt inputs to model changes

* model/metadata: benchmark Metadata.Set

* model: fix golint error (Id->ID)
axw added a commit to axw/apm-server that referenced this issue Apr 8, 2020
The primary motivation behind this change is to lay the groundwork for merging shared (i.e. stream) and per-event metadata at decode time, rather than transformation time, which we'll need for elastic#3485.

We could merge metadata without these changes, but it would be more difficult and error prone. Making these change also provide some performance improvements – see below. Finally, there is also overlap between merging metadata and revising the decoders to enable memory use (elastic#3551 (comment)).

In theory this could be a considered a breaking change, due to the fact that an empty string coming from an agent would no longer be recorded in output documents. In practice, it does not make sense for any of the metadata fields to have empty string values.

Due to the use of empty strings, we would have to change the behaviour of utility.Set to not record empty strings. Because I have only modified metadata types, and not all model types, I instead changed the metadata types' Fields methods to stop using utility.Set and implemented a limited version of elastic#3565 which is more explicit about omitting empty strings.

These changes yield a significant performance improvement in micro-benchmarks, both in decoding and transformation. Decoding improvements can be attributed to fewer allocations, while transformation improvements can be attributed to:

- fewer allocations
-- no interface allocations, or unnecessary deep copying of maps, due to utility.Set
-- lazy map construction
- less reflection, due to not using utility.Set
- less pointer indirection

name                   old time/op    new time/op    delta
pkg:github.com/elastic/apm-server/model/metadata goos:linux goarch:amd64
MetadataSet/minimal-8    1.16µs ± 6%    0.38µs ±11%  -67.59%  (p=0.008 n=5+5)
MetadataSet/full-8       11.9µs ± 4%     5.3µs ± 6%  -55.53%  (p=0.008 n=5+5)
pkg:github.com/elastic/apm-server/model/modeldecoder goos:linux goarch:amd64
DecodeMetadata-8         9.70µs ± 1%    9.30µs ±17%     ~     (p=0.690 n=5+5)

name                   old alloc/op   new alloc/op   delta
pkg:github.com/elastic/apm-server/model/metadata goos:linux goarch:amd64
MetadataSet/minimal-8      896B ± 0%      368B ± 0%  -58.93%  (p=0.008 n=5+5)
MetadataSet/full-8       14.0kB ± 0%     6.2kB ± 0%  -55.36%  (p=0.008 n=5+5)
pkg:github.com/elastic/apm-server/model/modeldecoder goos:linux goarch:amd64
DecodeMetadata-8         1.31kB ± 0%    1.06kB ± 0%  -18.96%  (p=0.000 n=5+4)

name                   old allocs/op  new allocs/op  delta
pkg:github.com/elastic/apm-server/model/metadata goos:linux goarch:amd64
MetadataSet/minimal-8      10.0 ± 0%       4.0 ± 0%  -60.00%  (p=0.008 n=5+5)
MetadataSet/full-8          114 ± 0%        68 ± 0%  -40.35%  (p=0.008 n=5+5)
pkg:github.com/elastic/apm-server/model/modeldecoder goos:linux goarch:amd64
DecodeMetadata-8           61.0 ± 0%      28.0 ± 0%  -54.10%  (p=0.008 n=5+5)

* model/modeldecoder: benchmark DecodeMetadata

* Benchmark recycled memory decoding

* model/modeldecoder: update decoding

* model/metadata: use non-pointer fields

* Adapt inputs to model changes

* model/metadata: benchmark Metadata.Set

* model: fix golint error (Id->ID)
axw added a commit that referenced this issue Apr 8, 2020
The primary motivation behind this change is to lay the groundwork for merging shared (i.e. stream) and per-event metadata at decode time, rather than transformation time, which we'll need for #3485.

We could merge metadata without these changes, but it would be more difficult and error prone. Making these change also provide some performance improvements – see below. Finally, there is also overlap between merging metadata and revising the decoders to enable memory use (#3551 (comment)).

In theory this could be a considered a breaking change, due to the fact that an empty string coming from an agent would no longer be recorded in output documents. In practice, it does not make sense for any of the metadata fields to have empty string values.

Due to the use of empty strings, we would have to change the behaviour of utility.Set to not record empty strings. Because I have only modified metadata types, and not all model types, I instead changed the metadata types' Fields methods to stop using utility.Set and implemented a limited version of #3565 which is more explicit about omitting empty strings.

These changes yield a significant performance improvement in micro-benchmarks, both in decoding and transformation. Decoding improvements can be attributed to fewer allocations, while transformation improvements can be attributed to:

- fewer allocations
-- no interface allocations, or unnecessary deep copying of maps, due to utility.Set
-- lazy map construction
- less reflection, due to not using utility.Set
- less pointer indirection

name                   old time/op    new time/op    delta
pkg:github.com/elastic/apm-server/model/metadata goos:linux goarch:amd64
MetadataSet/minimal-8    1.16µs ± 6%    0.38µs ±11%  -67.59%  (p=0.008 n=5+5)
MetadataSet/full-8       11.9µs ± 4%     5.3µs ± 6%  -55.53%  (p=0.008 n=5+5)
pkg:github.com/elastic/apm-server/model/modeldecoder goos:linux goarch:amd64
DecodeMetadata-8         9.70µs ± 1%    9.30µs ±17%     ~     (p=0.690 n=5+5)

name                   old alloc/op   new alloc/op   delta
pkg:github.com/elastic/apm-server/model/metadata goos:linux goarch:amd64
MetadataSet/minimal-8      896B ± 0%      368B ± 0%  -58.93%  (p=0.008 n=5+5)
MetadataSet/full-8       14.0kB ± 0%     6.2kB ± 0%  -55.36%  (p=0.008 n=5+5)
pkg:github.com/elastic/apm-server/model/modeldecoder goos:linux goarch:amd64
DecodeMetadata-8         1.31kB ± 0%    1.06kB ± 0%  -18.96%  (p=0.000 n=5+4)

name                   old allocs/op  new allocs/op  delta
pkg:github.com/elastic/apm-server/model/metadata goos:linux goarch:amd64
MetadataSet/minimal-8      10.0 ± 0%       4.0 ± 0%  -60.00%  (p=0.008 n=5+5)
MetadataSet/full-8          114 ± 0%        68 ± 0%  -40.35%  (p=0.008 n=5+5)
pkg:github.com/elastic/apm-server/model/modeldecoder goos:linux goarch:amd64
DecodeMetadata-8           61.0 ± 0%      28.0 ± 0%  -54.10%  (p=0.008 n=5+5)

* model/modeldecoder: benchmark DecodeMetadata

* Benchmark recycled memory decoding

* model/modeldecoder: update decoding

* model/metadata: use non-pointer fields

* Adapt inputs to model changes

* model/metadata: benchmark Metadata.Set

* model: fix golint error (Id->ID)
@axw axw mentioned this issue Aug 25, 2020
10 tasks
@axw
Copy link
Member Author

axw commented Nov 30, 2020

I have looked into the "Fielder" interface idea above a little bit more. Assumptions about Fields being a common.MapStr are fairly pervasive, and it would take quite some time to change over to an interface. https://github.com/axw/beats/tree/fielder has some changes to use an interface (which is a bit broader than the one described above), but which will fail if most processors are used and a common.MapStr is not used.

A more pragmatic approach would be to just do this with the field values that we set in APM Server. We would continue to have Fields as a common.MapStr, but top-level field values would be structs that implement structform.Folder for encoding.

@axw
Copy link
Member Author

axw commented Mar 9, 2021

I spent a little time investigating the Folder approach mentioned above: https://github.com/axw/apm-server/pull/new/optimise-allocs. It's not complete, but good enough to compare performance for indexing mostly empty transactions.

I've tested sending 100 ~empty transactions repeatedly to APM Server, on the master branch (09cbdea) and the above mentioned branch. In both cases I ran APM Server with the "console" output, piped to /dev/null.

$ benchstat /tmp/old.txt /tmp/new.txt 
name               old time/op           new time/op           delta
_100_Transactions           1.77ms ± 7%           1.21ms ± 6%  -31.56%  (p=0.000 n=10+9)

name               old transactions/sec  new transactions/sec  delta
_100_Transactions            56.5k ± 6%            82.6k ± 6%  +46.09%  (p=0.000 n=10+9)

name               old alloc/op          new alloc/op          delta
_100_Transactions           1.10MB ± 2%           0.49MB ± 1%  -55.46%  (p=0.000 n=8+8)

name               old allocs/op         new allocs/op         delta
_100_Transactions            9.84k ± 1%            4.10k ± 1%  -58.38%  (p=0.000 n=8+8)

Before anyone gets too excited, just remember that this is using the console, not elasticsearch, output. Naturally when using the elasticsearch output we'll be more network bound. Nevertheless, this still shows we're burning CPU unnecessarily.

@simitt
Copy link
Contributor

simitt commented Mar 9, 2021

Before anyone gets too excited

Too late, I am already hooked on that change! 😄

@felixbarny
Copy link
Member

Once we have this change in, we should also update data-security.asciidoc. See also https://github.com/elastic/apm-server/pull/5068/files#r613198321

@axw
Copy link
Member Author

axw commented Oct 20, 2021

master:

goos: linux                                    
goarch: amd64                                  
pkg: github.com/elastic/apm-server/model/modelindexer               
cpu: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
BenchmarkModelIndexer-12         1314964              3856 ns/op            1297 B/op         12 allocs/op                               

4df662a (JSON encoding using reflection + fastjson):

goos: linux                                                
goarch: amd64                                                                                                                            
pkg: github.com/elastic/apm-server/model/modelindexer
cpu: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz                       
BenchmarkModelIndexer-12         2509297              2245 ns/op            1132 B/op         10 allocs/op

d1794a0 (JSON encoding using fastjson generator):

goos: linux                                                                                                                              
goarch: amd64                                  
pkg: github.com/elastic/apm-server/model/modelindexer               
cpu: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz                                                                                            
BenchmarkModelIndexer-12         3661497              1544 ns/op              60 B/op          1 allocs/op

@axw
Copy link
Member Author

axw commented Oct 20, 2021

It's worth noting that that benchmark is encoding a minimal event. As the number of fields grows, the number of allocations grows when using the master branch and reflection approaches. This is not the case when using the generated fastjson approach; it has constant allocations. In fact the only allocation is unrelated to fastjson; it's due to our need to concatenate the data stream type/dataset/namespace into an index name.

@felixbarny
Copy link
Member

That's an impressive improvement! 👏

@stuartnelson3
Copy link
Contributor

Added this as 8.3-candidate, but maybe as a follow-up to #4120 would be more appropriate

@axw
Copy link
Member Author

axw commented Mar 15, 2022

Yes, I think this should come after #4120. See the two phases described in that issue's description. I'll remove from 8.3-candidate for now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants