Revise model transformation approach #3565

axw · 2020-03-27T04:21:39Z

Follow-on from #3551

In our model transform code, we're currently heavily relying on utility.Set and friends, which has a couple of performance-related issues:

blindly copies all maps, even when using a single-use temporary map
relies on interfaces/type reflection, which is slower than type-specific calls

We also create a lot of maps even when they're ultimately not used due to being empty.

We should investigate extending our model transform utilities to cut down on heap allocations and unnecessary copying.

I created a quick hack branch to highlight some of the gains we can expect to achieve: https://github.com/axw/apm-server/pull/new/optimise-fields

$ benchstat /tmp/old.txt /tmp/new.txt
name                      old time/op    new time/op    delta
TransactionEventDecode-8    18.0µs ± 5%    11.0µs ±11%  -39.16%  (p=0.016 n=4+5)

name                      old alloc/op   new alloc/op   delta
TransactionEventDecode-8    16.8kB ± 0%    10.7kB ± 0%  -36.36%  (p=0.000 n=4+5)

name                      old allocs/op  new allocs/op  delta
TransactionEventDecode-8       125 ± 0%        88 ± 0%  -29.60%  (p=0.008 n=5+5)

The text was updated successfully, but these errors were encountered:

axw · 2020-03-27T06:35:14Z

What would be particularly nice is if libbeat accepted an interface for beat.Event.Fields, along the lines of:

type Fielder interface {
    GetField(string) (interface{}, bool)
    SetField(string, interface{}) (bool)
}

// Recycler is an optional interface for Fielder to implement, to enable recycling memory.
type Recycler interface {
    Recycle()
}

Fielders could optionally implement go-structform/gotype.Folder to customise libbeat/output encoding.

We could then create a generated ECS type structure which implements this interface, into which we transform our model objects. e.g.

type Fields struct {
    Service ServiceFields
    Transaction TransactionFields
    ...
}

func (f *Fields) GetField(key string) (interface{}, bool) {
    var prefix, suffix string
    if i := strings.IndexRune(key, '.'); i >= 0 {
        prefix, suffix = key[:i], key[i+1:]
    }
    switch prefix {
    case "service":
        return f.Service.GetField(suffix)
    case "transaction":
        return f.Service.GetField(suffix)
    }
    return nil, false
}

type ServiceFields struct {
    Name string
    Version string
    ...
}

func (f *ServiceFields) GetField(key string) (interface{}, bool) {
    switch key {
    case "name":
        return f.Name, f.Name != ""
    case "version":
        return f.Version, f.Version != ""
    }
}

We would use a sync.Pool for creating and recycling these objects, implementing the Recyler interface.

All of that combined would enable us to significantly cut down on allocations, while still enabling processors to do their job. There's a good chance I've oversimplified the requirements though.

The primary motivation behind this change is to lay the groundwork for merging shared (i.e. stream) and per-event metadata at decode time, rather than transformation time, which we'll need for #3485. We could merge metadata without these changes, but it would be more difficult and error prone. Making these change also provide some performance improvements – see below. Finally, there is also overlap between merging metadata and revising the decoders to enable memory use (#3551 (comment)). In theory this could be a considered a breaking change, due to the fact that an empty string coming from an agent would no longer be recorded in output documents. In practice, it does not make sense for any of the metadata fields to have empty string values. Due to the use of empty strings, we would have to change the behaviour of utility.Set to not record empty strings. Because I have only modified metadata types, and not all model types, I instead changed the metadata types' Fields methods to stop using utility.Set and implemented a limited version of #3565 which is more explicit about omitting empty strings. These changes yield a significant performance improvement in micro-benchmarks, both in decoding and transformation. Decoding improvements can be attributed to fewer allocations, while transformation improvements can be attributed to: - fewer allocations -- no interface allocations, or unnecessary deep copying of maps, due to utility.Set -- lazy map construction - less reflection, due to not using utility.Set - less pointer indirection name old time/op new time/op delta pkg:github.com/elastic/apm-server/model/metadata goos:linux goarch:amd64 MetadataSet/minimal-8 1.16µs ± 6% 0.38µs ±11% -67.59% (p=0.008 n=5+5) MetadataSet/full-8 11.9µs ± 4% 5.3µs ± 6% -55.53% (p=0.008 n=5+5) pkg:github.com/elastic/apm-server/model/modeldecoder goos:linux goarch:amd64 DecodeMetadata-8 9.70µs ± 1% 9.30µs ±17% ~ (p=0.690 n=5+5) name old alloc/op new alloc/op delta pkg:github.com/elastic/apm-server/model/metadata goos:linux goarch:amd64 MetadataSet/minimal-8 896B ± 0% 368B ± 0% -58.93% (p=0.008 n=5+5) MetadataSet/full-8 14.0kB ± 0% 6.2kB ± 0% -55.36% (p=0.008 n=5+5) pkg:github.com/elastic/apm-server/model/modeldecoder goos:linux goarch:amd64 DecodeMetadata-8 1.31kB ± 0% 1.06kB ± 0% -18.96% (p=0.000 n=5+4) name old allocs/op new allocs/op delta pkg:github.com/elastic/apm-server/model/metadata goos:linux goarch:amd64 MetadataSet/minimal-8 10.0 ± 0% 4.0 ± 0% -60.00% (p=0.008 n=5+5) MetadataSet/full-8 114 ± 0% 68 ± 0% -40.35% (p=0.008 n=5+5) pkg:github.com/elastic/apm-server/model/modeldecoder goos:linux goarch:amd64 DecodeMetadata-8 61.0 ± 0% 28.0 ± 0% -54.10% (p=0.008 n=5+5) * model/modeldecoder: benchmark DecodeMetadata * Benchmark recycled memory decoding * model/modeldecoder: update decoding * model/metadata: use non-pointer fields * Adapt inputs to model changes * model/metadata: benchmark Metadata.Set * model: fix golint error (Id->ID)

The primary motivation behind this change is to lay the groundwork for merging shared (i.e. stream) and per-event metadata at decode time, rather than transformation time, which we'll need for elastic#3485. We could merge metadata without these changes, but it would be more difficult and error prone. Making these change also provide some performance improvements – see below. Finally, there is also overlap between merging metadata and revising the decoders to enable memory use (elastic#3551 (comment)). In theory this could be a considered a breaking change, due to the fact that an empty string coming from an agent would no longer be recorded in output documents. In practice, it does not make sense for any of the metadata fields to have empty string values. Due to the use of empty strings, we would have to change the behaviour of utility.Set to not record empty strings. Because I have only modified metadata types, and not all model types, I instead changed the metadata types' Fields methods to stop using utility.Set and implemented a limited version of elastic#3565 which is more explicit about omitting empty strings. These changes yield a significant performance improvement in micro-benchmarks, both in decoding and transformation. Decoding improvements can be attributed to fewer allocations, while transformation improvements can be attributed to: - fewer allocations -- no interface allocations, or unnecessary deep copying of maps, due to utility.Set -- lazy map construction - less reflection, due to not using utility.Set - less pointer indirection name old time/op new time/op delta pkg:github.com/elastic/apm-server/model/metadata goos:linux goarch:amd64 MetadataSet/minimal-8 1.16µs ± 6% 0.38µs ±11% -67.59% (p=0.008 n=5+5) MetadataSet/full-8 11.9µs ± 4% 5.3µs ± 6% -55.53% (p=0.008 n=5+5) pkg:github.com/elastic/apm-server/model/modeldecoder goos:linux goarch:amd64 DecodeMetadata-8 9.70µs ± 1% 9.30µs ±17% ~ (p=0.690 n=5+5) name old alloc/op new alloc/op delta pkg:github.com/elastic/apm-server/model/metadata goos:linux goarch:amd64 MetadataSet/minimal-8 896B ± 0% 368B ± 0% -58.93% (p=0.008 n=5+5) MetadataSet/full-8 14.0kB ± 0% 6.2kB ± 0% -55.36% (p=0.008 n=5+5) pkg:github.com/elastic/apm-server/model/modeldecoder goos:linux goarch:amd64 DecodeMetadata-8 1.31kB ± 0% 1.06kB ± 0% -18.96% (p=0.000 n=5+4) name old allocs/op new allocs/op delta pkg:github.com/elastic/apm-server/model/metadata goos:linux goarch:amd64 MetadataSet/minimal-8 10.0 ± 0% 4.0 ± 0% -60.00% (p=0.008 n=5+5) MetadataSet/full-8 114 ± 0% 68 ± 0% -40.35% (p=0.008 n=5+5) pkg:github.com/elastic/apm-server/model/modeldecoder goos:linux goarch:amd64 DecodeMetadata-8 61.0 ± 0% 28.0 ± 0% -54.10% (p=0.008 n=5+5) * model/modeldecoder: benchmark DecodeMetadata * Benchmark recycled memory decoding * model/modeldecoder: update decoding * model/metadata: use non-pointer fields * Adapt inputs to model changes * model/metadata: benchmark Metadata.Set * model: fix golint error (Id->ID)

The primary motivation behind this change is to lay the groundwork for merging shared (i.e. stream) and per-event metadata at decode time, rather than transformation time, which we'll need for #3485. We could merge metadata without these changes, but it would be more difficult and error prone. Making these change also provide some performance improvements – see below. Finally, there is also overlap between merging metadata and revising the decoders to enable memory use (#3551 (comment)). In theory this could be a considered a breaking change, due to the fact that an empty string coming from an agent would no longer be recorded in output documents. In practice, it does not make sense for any of the metadata fields to have empty string values. Due to the use of empty strings, we would have to change the behaviour of utility.Set to not record empty strings. Because I have only modified metadata types, and not all model types, I instead changed the metadata types' Fields methods to stop using utility.Set and implemented a limited version of #3565 which is more explicit about omitting empty strings. These changes yield a significant performance improvement in micro-benchmarks, both in decoding and transformation. Decoding improvements can be attributed to fewer allocations, while transformation improvements can be attributed to: - fewer allocations -- no interface allocations, or unnecessary deep copying of maps, due to utility.Set -- lazy map construction - less reflection, due to not using utility.Set - less pointer indirection name old time/op new time/op delta pkg:github.com/elastic/apm-server/model/metadata goos:linux goarch:amd64 MetadataSet/minimal-8 1.16µs ± 6% 0.38µs ±11% -67.59% (p=0.008 n=5+5) MetadataSet/full-8 11.9µs ± 4% 5.3µs ± 6% -55.53% (p=0.008 n=5+5) pkg:github.com/elastic/apm-server/model/modeldecoder goos:linux goarch:amd64 DecodeMetadata-8 9.70µs ± 1% 9.30µs ±17% ~ (p=0.690 n=5+5) name old alloc/op new alloc/op delta pkg:github.com/elastic/apm-server/model/metadata goos:linux goarch:amd64 MetadataSet/minimal-8 896B ± 0% 368B ± 0% -58.93% (p=0.008 n=5+5) MetadataSet/full-8 14.0kB ± 0% 6.2kB ± 0% -55.36% (p=0.008 n=5+5) pkg:github.com/elastic/apm-server/model/modeldecoder goos:linux goarch:amd64 DecodeMetadata-8 1.31kB ± 0% 1.06kB ± 0% -18.96% (p=0.000 n=5+4) name old allocs/op new allocs/op delta pkg:github.com/elastic/apm-server/model/metadata goos:linux goarch:amd64 MetadataSet/minimal-8 10.0 ± 0% 4.0 ± 0% -60.00% (p=0.008 n=5+5) MetadataSet/full-8 114 ± 0% 68 ± 0% -40.35% (p=0.008 n=5+5) pkg:github.com/elastic/apm-server/model/modeldecoder goos:linux goarch:amd64 DecodeMetadata-8 61.0 ± 0% 28.0 ± 0% -54.10% (p=0.008 n=5+5) * model/modeldecoder: benchmark DecodeMetadata * Benchmark recycled memory decoding * model/modeldecoder: update decoding * model/metadata: use non-pointer fields * Adapt inputs to model changes * model/metadata: benchmark Metadata.Set * model: fix golint error (Id->ID)

axw · 2020-11-30T04:11:01Z

I have looked into the "Fielder" interface idea above a little bit more. Assumptions about Fields being a common.MapStr are fairly pervasive, and it would take quite some time to change over to an interface. https://github.com/axw/beats/tree/fielder has some changes to use an interface (which is a bit broader than the one described above), but which will fail if most processors are used and a common.MapStr is not used.

A more pragmatic approach would be to just do this with the field values that we set in APM Server. We would continue to have Fields as a common.MapStr, but top-level field values would be structs that implement structform.Folder for encoding.

axw · 2021-03-09T04:01:52Z

I spent a little time investigating the Folder approach mentioned above: https://github.com/axw/apm-server/pull/new/optimise-allocs. It's not complete, but good enough to compare performance for indexing mostly empty transactions.

I've tested sending 100 ~empty transactions repeatedly to APM Server, on the master branch (09cbdea) and the above mentioned branch. In both cases I ran APM Server with the "console" output, piped to /dev/null.

$ benchstat /tmp/old.txt /tmp/new.txt 
name               old time/op           new time/op           delta
_100_Transactions           1.77ms ± 7%           1.21ms ± 6%  -31.56%  (p=0.000 n=10+9)

name               old transactions/sec  new transactions/sec  delta
_100_Transactions            56.5k ± 6%            82.6k ± 6%  +46.09%  (p=0.000 n=10+9)

name               old alloc/op          new alloc/op          delta
_100_Transactions           1.10MB ± 2%           0.49MB ± 1%  -55.46%  (p=0.000 n=8+8)

name               old allocs/op         new allocs/op         delta
_100_Transactions            9.84k ± 1%            4.10k ± 1%  -58.38%  (p=0.000 n=8+8)

Before anyone gets too excited, just remember that this is using the console, not elasticsearch, output. Naturally when using the elasticsearch output we'll be more network bound. Nevertheless, this still shows we're burning CPU unnecessarily.

simitt · 2021-03-09T07:55:48Z

Before anyone gets too excited

Too late, I am already hooked on that change! 😄

felixbarny · 2021-04-14T17:01:28Z

Once we have this change in, we should also update data-security.asciidoc. See also https://github.com/elastic/apm-server/pull/5068/files#r613198321

axw · 2021-10-20T06:57:59Z

master:

goos: linux                                    
goarch: amd64                                  
pkg: github.com/elastic/apm-server/model/modelindexer               
cpu: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
BenchmarkModelIndexer-12         1314964              3856 ns/op            1297 B/op         12 allocs/op

4df662a (JSON encoding using reflection + fastjson):

goos: linux                                                
goarch: amd64                                                                                                                            
pkg: github.com/elastic/apm-server/model/modelindexer
cpu: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz                       
BenchmarkModelIndexer-12         2509297              2245 ns/op            1132 B/op         10 allocs/op

d1794a0 (JSON encoding using fastjson generator):

goos: linux                                                                                                                              
goarch: amd64                                  
pkg: github.com/elastic/apm-server/model/modelindexer               
cpu: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz                                                                                            
BenchmarkModelIndexer-12         3661497              1544 ns/op              60 B/op          1 allocs/op

axw · 2021-10-20T07:04:03Z

It's worth noting that that benchmark is encoding a minimal event. As the number of fields grows, the number of allocations grows when using the master branch and reflection approaches. This is not the case when using the generated fastjson approach; it has constant allocations. In fact the only allocation is unrelated to fastjson; it's due to our need to concatenate the data stream type/dataset/namespace into an index name.

felixbarny · 2021-10-20T07:18:27Z

That's an impressive improvement! 👏

stuartnelson3 · 2022-03-10T16:20:22Z

Added this as 8.3-candidate, but maybe as a follow-up to #4120 would be more appropriate

axw · 2022-03-15T12:26:59Z

Yes, I think this should come after #4120. See the two phases described in that issue's description. I'll remove from 8.3-candidate for now

axw mentioned this issue Apr 6, 2020

model/metadata: make all fields non-pointer types #3618

Merged

6 tasks

graphaelli added the [zube]: Backlog label Apr 22, 2020

axw mentioned this issue Aug 25, 2020

Span metrics #4077

Merged

10 tasks

axw mentioned this issue Nov 11, 2020

model: create a specification for valid event docs #4410

Open

axw mentioned this issue Dec 1, 2020

[docs] Add libbeat processor documentation #3624

Closed

axw mentioned this issue Feb 24, 2021

model: replace remaining *strings with strings #4868

Merged

axw mentioned this issue Mar 8, 2021

Move business logic out of model transformation #4927

Merged

bmorelli25 mentioned this issue Apr 13, 2021

docs: data security, filtering, and obfuscation #5068

Merged

axw mentioned this issue Apr 13, 2021

Deprecate libbeat processor support #5088

Closed

This was referenced Jul 6, 2021

model: introduce APMEvent; Batch is now []model.APMEvent #5613

Merged

Move source mapping to model processing #5631

Merged

This was referenced Jul 22, 2021

Move model value normalisation to modeldecoder #5784

Merged

Tidy up model to resembles ECS more closely #5785

Merged

simitt removed the [zube]: Backlog label Dec 31, 2021

axw mentioned this issue Jan 4, 2022

Introduce efficient codec for model events #4120

Closed

simitt added the enhancement label Jan 4, 2022

axw mentioned this issue Feb 3, 2022

Moving apmpackage to Kibana elastic/kibana#124342

Closed

axw mentioned this issue Feb 11, 2022

Set <event>.duration.us field in ingest pipeline #7261

Merged

2 tasks

stuartnelson3 added the 8.3-candidate label Mar 10, 2022

axw removed the 8.3-candidate label Mar 15, 2022

axw mentioned this issue Mar 31, 2022

Check ElasticSearch version before writing _doc_count #7704

Merged

3 tasks

This was referenced May 11, 2023

Update apm-data: set a default span.representative_count #10792

Merged

apmpackage: prepare for metrics output schema change #10793

Merged

axw closed this as completed in #10792 May 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revise model transformation approach #3565

Revise model transformation approach #3565

axw commented Mar 27, 2020

axw commented Mar 27, 2020

axw commented Nov 30, 2020

axw commented Mar 9, 2021

simitt commented Mar 9, 2021

felixbarny commented Apr 14, 2021

axw commented Oct 20, 2021

axw commented Oct 20, 2021

felixbarny commented Oct 20, 2021

stuartnelson3 commented Mar 10, 2022

axw commented Mar 15, 2022

Revise model transformation approach #3565

Revise model transformation approach #3565

Comments

axw commented Mar 27, 2020

axw commented Mar 27, 2020

axw commented Nov 30, 2020

axw commented Mar 9, 2021

simitt commented Mar 9, 2021

felixbarny commented Apr 14, 2021

axw commented Oct 20, 2021

axw commented Oct 20, 2021

felixbarny commented Oct 20, 2021

stuartnelson3 commented Mar 10, 2022

axw commented Mar 15, 2022