-
Notifications
You must be signed in to change notification settings - Fork 528
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace json schema validation with auto generated validation rules from go struct #3551
Comments
Something to keep in mind for the API, when designing the generated validation/decoding methods: we should aim to reuse more memory by pooling decoded objects. If we use the style |
The primary motivation behind this change is to lay the groundwork for merging shared (i.e. stream) and per-event metadata at decode time, rather than transformation time, which we'll need for #3485. We could merge metadata without these changes, but it would be more difficult and error prone. Making these change also provide some performance improvements – see below. Finally, there is also overlap between merging metadata and revising the decoders to enable memory use (#3551 (comment)). In theory this could be a considered a breaking change, due to the fact that an empty string coming from an agent would no longer be recorded in output documents. In practice, it does not make sense for any of the metadata fields to have empty string values. Due to the use of empty strings, we would have to change the behaviour of utility.Set to not record empty strings. Because I have only modified metadata types, and not all model types, I instead changed the metadata types' Fields methods to stop using utility.Set and implemented a limited version of #3565 which is more explicit about omitting empty strings. These changes yield a significant performance improvement in micro-benchmarks, both in decoding and transformation. Decoding improvements can be attributed to fewer allocations, while transformation improvements can be attributed to: - fewer allocations -- no interface allocations, or unnecessary deep copying of maps, due to utility.Set -- lazy map construction - less reflection, due to not using utility.Set - less pointer indirection name old time/op new time/op delta pkg:github.com/elastic/apm-server/model/metadata goos:linux goarch:amd64 MetadataSet/minimal-8 1.16µs ± 6% 0.38µs ±11% -67.59% (p=0.008 n=5+5) MetadataSet/full-8 11.9µs ± 4% 5.3µs ± 6% -55.53% (p=0.008 n=5+5) pkg:github.com/elastic/apm-server/model/modeldecoder goos:linux goarch:amd64 DecodeMetadata-8 9.70µs ± 1% 9.30µs ±17% ~ (p=0.690 n=5+5) name old alloc/op new alloc/op delta pkg:github.com/elastic/apm-server/model/metadata goos:linux goarch:amd64 MetadataSet/minimal-8 896B ± 0% 368B ± 0% -58.93% (p=0.008 n=5+5) MetadataSet/full-8 14.0kB ± 0% 6.2kB ± 0% -55.36% (p=0.008 n=5+5) pkg:github.com/elastic/apm-server/model/modeldecoder goos:linux goarch:amd64 DecodeMetadata-8 1.31kB ± 0% 1.06kB ± 0% -18.96% (p=0.000 n=5+4) name old allocs/op new allocs/op delta pkg:github.com/elastic/apm-server/model/metadata goos:linux goarch:amd64 MetadataSet/minimal-8 10.0 ± 0% 4.0 ± 0% -60.00% (p=0.008 n=5+5) MetadataSet/full-8 114 ± 0% 68 ± 0% -40.35% (p=0.008 n=5+5) pkg:github.com/elastic/apm-server/model/modeldecoder goos:linux goarch:amd64 DecodeMetadata-8 61.0 ± 0% 28.0 ± 0% -54.10% (p=0.008 n=5+5) * model/modeldecoder: benchmark DecodeMetadata * Benchmark recycled memory decoding * model/modeldecoder: update decoding * model/metadata: use non-pointer fields * Adapt inputs to model changes * model/metadata: benchmark Metadata.Set * model: fix golint error (Id->ID)
The primary motivation behind this change is to lay the groundwork for merging shared (i.e. stream) and per-event metadata at decode time, rather than transformation time, which we'll need for elastic#3485. We could merge metadata without these changes, but it would be more difficult and error prone. Making these change also provide some performance improvements – see below. Finally, there is also overlap between merging metadata and revising the decoders to enable memory use (elastic#3551 (comment)). In theory this could be a considered a breaking change, due to the fact that an empty string coming from an agent would no longer be recorded in output documents. In practice, it does not make sense for any of the metadata fields to have empty string values. Due to the use of empty strings, we would have to change the behaviour of utility.Set to not record empty strings. Because I have only modified metadata types, and not all model types, I instead changed the metadata types' Fields methods to stop using utility.Set and implemented a limited version of elastic#3565 which is more explicit about omitting empty strings. These changes yield a significant performance improvement in micro-benchmarks, both in decoding and transformation. Decoding improvements can be attributed to fewer allocations, while transformation improvements can be attributed to: - fewer allocations -- no interface allocations, or unnecessary deep copying of maps, due to utility.Set -- lazy map construction - less reflection, due to not using utility.Set - less pointer indirection name old time/op new time/op delta pkg:github.com/elastic/apm-server/model/metadata goos:linux goarch:amd64 MetadataSet/minimal-8 1.16µs ± 6% 0.38µs ±11% -67.59% (p=0.008 n=5+5) MetadataSet/full-8 11.9µs ± 4% 5.3µs ± 6% -55.53% (p=0.008 n=5+5) pkg:github.com/elastic/apm-server/model/modeldecoder goos:linux goarch:amd64 DecodeMetadata-8 9.70µs ± 1% 9.30µs ±17% ~ (p=0.690 n=5+5) name old alloc/op new alloc/op delta pkg:github.com/elastic/apm-server/model/metadata goos:linux goarch:amd64 MetadataSet/minimal-8 896B ± 0% 368B ± 0% -58.93% (p=0.008 n=5+5) MetadataSet/full-8 14.0kB ± 0% 6.2kB ± 0% -55.36% (p=0.008 n=5+5) pkg:github.com/elastic/apm-server/model/modeldecoder goos:linux goarch:amd64 DecodeMetadata-8 1.31kB ± 0% 1.06kB ± 0% -18.96% (p=0.000 n=5+4) name old allocs/op new allocs/op delta pkg:github.com/elastic/apm-server/model/metadata goos:linux goarch:amd64 MetadataSet/minimal-8 10.0 ± 0% 4.0 ± 0% -60.00% (p=0.008 n=5+5) MetadataSet/full-8 114 ± 0% 68 ± 0% -40.35% (p=0.008 n=5+5) pkg:github.com/elastic/apm-server/model/modeldecoder goos:linux goarch:amd64 DecodeMetadata-8 61.0 ± 0% 28.0 ± 0% -54.10% (p=0.008 n=5+5) * model/modeldecoder: benchmark DecodeMetadata * Benchmark recycled memory decoding * model/modeldecoder: update decoding * model/metadata: use non-pointer fields * Adapt inputs to model changes * model/metadata: benchmark Metadata.Set * model: fix golint error (Id->ID)
The primary motivation behind this change is to lay the groundwork for merging shared (i.e. stream) and per-event metadata at decode time, rather than transformation time, which we'll need for #3485. We could merge metadata without these changes, but it would be more difficult and error prone. Making these change also provide some performance improvements – see below. Finally, there is also overlap between merging metadata and revising the decoders to enable memory use (#3551 (comment)). In theory this could be a considered a breaking change, due to the fact that an empty string coming from an agent would no longer be recorded in output documents. In practice, it does not make sense for any of the metadata fields to have empty string values. Due to the use of empty strings, we would have to change the behaviour of utility.Set to not record empty strings. Because I have only modified metadata types, and not all model types, I instead changed the metadata types' Fields methods to stop using utility.Set and implemented a limited version of #3565 which is more explicit about omitting empty strings. These changes yield a significant performance improvement in micro-benchmarks, both in decoding and transformation. Decoding improvements can be attributed to fewer allocations, while transformation improvements can be attributed to: - fewer allocations -- no interface allocations, or unnecessary deep copying of maps, due to utility.Set -- lazy map construction - less reflection, due to not using utility.Set - less pointer indirection name old time/op new time/op delta pkg:github.com/elastic/apm-server/model/metadata goos:linux goarch:amd64 MetadataSet/minimal-8 1.16µs ± 6% 0.38µs ±11% -67.59% (p=0.008 n=5+5) MetadataSet/full-8 11.9µs ± 4% 5.3µs ± 6% -55.53% (p=0.008 n=5+5) pkg:github.com/elastic/apm-server/model/modeldecoder goos:linux goarch:amd64 DecodeMetadata-8 9.70µs ± 1% 9.30µs ±17% ~ (p=0.690 n=5+5) name old alloc/op new alloc/op delta pkg:github.com/elastic/apm-server/model/metadata goos:linux goarch:amd64 MetadataSet/minimal-8 896B ± 0% 368B ± 0% -58.93% (p=0.008 n=5+5) MetadataSet/full-8 14.0kB ± 0% 6.2kB ± 0% -55.36% (p=0.008 n=5+5) pkg:github.com/elastic/apm-server/model/modeldecoder goos:linux goarch:amd64 DecodeMetadata-8 1.31kB ± 0% 1.06kB ± 0% -18.96% (p=0.000 n=5+4) name old allocs/op new allocs/op delta pkg:github.com/elastic/apm-server/model/metadata goos:linux goarch:amd64 MetadataSet/minimal-8 10.0 ± 0% 4.0 ± 0% -60.00% (p=0.008 n=5+5) MetadataSet/full-8 114 ± 0% 68 ± 0% -40.35% (p=0.008 n=5+5) pkg:github.com/elastic/apm-server/model/modeldecoder goos:linux goarch:amd64 DecodeMetadata-8 61.0 ± 0% 28.0 ± 0% -54.10% (p=0.008 n=5+5) * model/modeldecoder: benchmark DecodeMetadata * Benchmark recycled memory decoding * model/modeldecoder: update decoding * model/metadata: use non-pointer fields * Adapt inputs to model changes * model/metadata: benchmark Metadata.Set * model: fix golint error (Id->ID)
When we get to decoding, I think we should look at using https://github.com/json-iterator/go, and possibly https://github.com/minio/simdjson-go on supported hardware. |
https://github.com/minio/simdjson-go looks very promising; I looked a bit at it during the research phase, but didn't try it out since it had special hardware requirements. It does offer a |
I think that this needs to be qualified. My impression is that it will run everywhere, it will just run faster when the CPU requirements are met. @fwessels might be able to tell us more. |
These are the requirements:
So essentially any modern amd64 CPU will do. |
Add support for metricset to model decoder/v2 part of elastic#3551
Add support for metricset to model decoder/v2 part of elastic#3551
Add support for metricset to model decoder/v2 part of #3551
Add support for metricset to model decoder/v2 part of elastic#3551
Motivation
Currently every event ingested via the Intake API v2 and v3 is sent as a new line in the http request body. The APM Server first decodes it into a
map[string]interface{}
. Themap
is passed aValidator
that uses an external library to validate the event against a manually crafted JSON schema (v7). Next the samemap
is sent to a manualDecoder
that creates ango struct
out of themap
for further processing.Creating the initial
map
for the validation and decoding step creates a lot of extra allocations putting more pressure on the garbage collector.The manual decoder is fast and relatively efficient allocation wise, but it operates on a map, that needs to be created upfront, which is expensive. The code for decoding an event field needs to be manually maintained per event field.
Whenever a new event field needs to be added there are several places that need to be manually updated: the JSON schema definition, event go struct, event decoder and the transformation logic. There is a relatively outdated automated test framework in place ensuring that the JSON schema is behaving as expected and plays well together with the structs.
The manual decoder needs some guard clauses for restrictions that are already defined in the schema. Since the schema validation and the decoding are two independent steps (created manually, not derived from each other and run independent of each other) such guard clauses are necessary as assumptions in the struct decoding are not automatically changed when changes on the schema occur, and discrepancies could lead to runtime issues.
There is no distinction between development and production validation of the schema.
Goal
map
first, and combining the decoding and validation steps into one step.Required steps
Implement custom logic or reuse existing logic, such as github.com/alecthomas/jsonschema. After doing some investigation on existing libraries the tendency is to create a custom logic, allowing for more flexibility, e.g. on requirements such as handling
additionalProperties
and custom types used for http headers.Compare performance and garbage collection overhead with the existing logic.
processor/stream/processor.go
logic forThis will mostly mean removing old and unnecessary test logic.(update: keep the tests around at least until the json schema is also auto-created)Optional
The text was updated successfully, but these errors were encountered: