-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[processor/deltatocumulative] partial linear pipeline #35048
[processor/deltatocumulative] partial linear pipeline #35048
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks good to me as a first glance, but it would be awesome if we had some kind of end-to-end test with generated testdata, similar to what we have with intervalprocessor.
Not sure if I'm missing something, but I don't see a test that creates a new deltatocumulative processor through the Factory, call ConsumeMetrics and checks the result. I see for the original processor, but not for linear. I can also see the Chain
object you created to call two processors together, but I'm not understanding where exactly it's used xD. In summary, I think you covered the e2e tests with linear but I'm struggling to understand how exactly
It's there, in recently merged opentelemetry-collector-contrib/processor/deltatocumulativeprocessor/processor_test.go Lines 39 to 44 in 52937cf
It's in
|
But if I'm reading things correctly, we're only calling the first processor of the Chain EDIT: |
@RichieSams can you take a look? |
Yes. Apologies. I've been meaning to take a look at this for a while now. But kept getting waylaid. I'll review this afternoon |
} | ||
} | ||
|
||
func (stale Tracker) Collect(max time.Duration) []identity.Stream { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: Can we rename this to CollectStale()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in processor.go I named it the field stale staleness.Tracker
, so using it reads as p.stale.Collect()
.
Renaming this to p.stale.CollectStale()
which I very slightly like less, because it stutters.
no strong opinion
} | ||
linear := newLinear(pcfg, ltel, proc) | ||
|
||
return Chain{linear, proc}, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still need to chain them? What isn't yet implemented in Linear? IMO it would be much simpler (for metrics, file structure, etc) to just switch wholesale, rather than trying to keep both around.
The deltatocumulative-linear
branch didn't have chain. So I was confused at first when reviewing the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Linear only does sums on this branch.
Making linear do everything involves some fairly advanced generics usage, which I think deserves to be reviewed properly and probably separately :)
Given the code already exists on the non-partial branch, I expect to send the next patch right after merging this one.
We should take care to merge so that we only release after merging both.
Overall, I quite like the code. I personally would vote to do the change wholesale (which I believe is what the |
84a4210
to
79a3988
Compare
adds staleness.Tracker type to `internal/exp/metrics`, which does the same as `staleness.Staleness`, but in a less coupled way
adds metrics for tracking operations of the linear pipeline
Introduces a highly decoupled, linear processing pipeline. Instead of overloading `Map.Store()` to do aggregation, staleness and limiting, this functionality is now explcitly handled in `ConsumeMetrics`. This highly aids readability and makes understanding this processor a lot easier, as less mental context needs to be kept.
Datapoints are first processed by the linear pipeline, and then forwarded to the traditional one for anything not yet implemented
79a3988
to
e51c2f3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this refactor!
…#35048) **Description:** Partially introduces a highly decoupled, linear processing pipeline. Implemented as a standalone struct to make review easier, will refactor this later. Instead of overloading `Map.Store()` to do aggregation, staleness and limiting, this functionality is now explcitly handled in `ConsumeMetrics`. This highly aids readability and makes understanding this processor a lot easier, as less mental context needs to be kept. *Notes to reviewer*: See [`68dc901`](open-telemetry@68dc901) for the main added logic. Compare `processor.go` (old, nested) to `linear.go` (new, linear) Replaces open-telemetry#34757 **Link to tracking Issue:** none **Testing:** This is a refactor. Existing tests were not modified and still pass **Documentation:** not needed
#### Description As an oversight, #35048 creates two `metadata.TelemetryBuilder` instances. It also introduces an async metric, but one `TelemetryBuilder` sets no callback for that, leading to a panic on `Collect()`. Fixes that by using the same `TelemetryBuilder` for both, properly setting the callback. #### Testing Test was added in first commit that passes after adding second commit
#### Description As an oversight, open-telemetry#35048 creates two `metadata.TelemetryBuilder` instances. It also introduces an async metric, but one `TelemetryBuilder` sets no callback for that, leading to a panic on `Collect()`. Fixes that by using the same `TelemetryBuilder` for both, properly setting the callback. #### Testing Test was added in first commit that passes after adding second commit
#### Description The max_streams default value was changed in #35048 but it was not updated in the readme.
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue. Ex. Adding a feature - Explain what this achieves.--> #### Description Finishes work started in #35048 That PR only partially introduced a less complex processor architecture by only using it for Sums. Back then I was not sure of the best way to do it for multiple datatypes, as generics seemed to introduce a lot of complexity regardless of usage. I since then did of a lot of perf analysis and due to the way Go works (see gcshapes), we do not really gain anything at runtime from using generics, given method calls are still dynamic. This implementation uses regular Go interfaces and a good old type switch in the hot path (ConsumeMetrics), which lowers mental complexity quite a lot imo. The value of the new architecture is backed up by the following benchmark: ``` goos: linux goarch: arm64 pkg: github.com/open-telemetry/opentelemetry-collector-contrib/processor/deltatocumulativeprocessor │ sums.nested │ sums.linear │ │ sec/op │ sec/op vs base │ Processor/sums-8 56.35µ ± 1% 39.99µ ± 1% -29.04% (p=0.000 n=10) │ sums.nested │ sums.linear │ │ B/op │ B/op vs base │ Processor/sums-8 11.520Ki ± 0% 3.683Ki ± 0% -68.03% (p=0.000 n=10) │ sums.nested │ sums.linear │ │ allocs/op │ allocs/op vs base │ Processor/sums-8 365.0 ± 0% 260.0 ± 0% -28.77% (p=0.000 n=10) ``` <!--Describe what testing was performed and which tests were added.--> #### Testing This is a refactor, existing tests pass unaltered. <!--Describe the documentation added.--> #### Documentation not needed <!--Please delete paragraphs that you did not use before submitting.-->
#### Description Removes the nested (aka overloading `streams.Map`) implementation. This has been entirely replaced by a leaner, "linear" implementation: - #35048 - #36486 <!--Describe what testing was performed and which tests were added.--> #### Testing Existing tests continue to pass unaltered <!--Describe the documentation added.--> #### Documentation not needed <!--Please delete paragraphs that you did not use before submitting.-->
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue. Ex. Adding a feature - Explain what this achieves.--> #### Description Finishes work started in open-telemetry#35048 That PR only partially introduced a less complex processor architecture by only using it for Sums. Back then I was not sure of the best way to do it for multiple datatypes, as generics seemed to introduce a lot of complexity regardless of usage. I since then did of a lot of perf analysis and due to the way Go works (see gcshapes), we do not really gain anything at runtime from using generics, given method calls are still dynamic. This implementation uses regular Go interfaces and a good old type switch in the hot path (ConsumeMetrics), which lowers mental complexity quite a lot imo. The value of the new architecture is backed up by the following benchmark: ``` goos: linux goarch: arm64 pkg: github.com/open-telemetry/opentelemetry-collector-contrib/processor/deltatocumulativeprocessor │ sums.nested │ sums.linear │ │ sec/op │ sec/op vs base │ Processor/sums-8 56.35µ ± 1% 39.99µ ± 1% -29.04% (p=0.000 n=10) │ sums.nested │ sums.linear │ │ B/op │ B/op vs base │ Processor/sums-8 11.520Ki ± 0% 3.683Ki ± 0% -68.03% (p=0.000 n=10) │ sums.nested │ sums.linear │ │ allocs/op │ allocs/op vs base │ Processor/sums-8 365.0 ± 0% 260.0 ± 0% -28.77% (p=0.000 n=10) ``` <!--Describe what testing was performed and which tests were added.--> #### Testing This is a refactor, existing tests pass unaltered. <!--Describe the documentation added.--> #### Documentation not needed <!--Please delete paragraphs that you did not use before submitting.-->
…metry#36498) #### Description Removes the nested (aka overloading `streams.Map`) implementation. This has been entirely replaced by a leaner, "linear" implementation: - open-telemetry#35048 - open-telemetry#36486 <!--Describe what testing was performed and which tests were added.--> #### Testing Existing tests continue to pass unaltered <!--Describe the documentation added.--> #### Documentation not needed <!--Please delete paragraphs that you did not use before submitting.-->
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue. Ex. Adding a feature - Explain what this achieves.--> #### Description Finishes work started in open-telemetry#35048 That PR only partially introduced a less complex processor architecture by only using it for Sums. Back then I was not sure of the best way to do it for multiple datatypes, as generics seemed to introduce a lot of complexity regardless of usage. I since then did of a lot of perf analysis and due to the way Go works (see gcshapes), we do not really gain anything at runtime from using generics, given method calls are still dynamic. This implementation uses regular Go interfaces and a good old type switch in the hot path (ConsumeMetrics), which lowers mental complexity quite a lot imo. The value of the new architecture is backed up by the following benchmark: ``` goos: linux goarch: arm64 pkg: github.com/open-telemetry/opentelemetry-collector-contrib/processor/deltatocumulativeprocessor │ sums.nested │ sums.linear │ │ sec/op │ sec/op vs base │ Processor/sums-8 56.35µ ± 1% 39.99µ ± 1% -29.04% (p=0.000 n=10) │ sums.nested │ sums.linear │ │ B/op │ B/op vs base │ Processor/sums-8 11.520Ki ± 0% 3.683Ki ± 0% -68.03% (p=0.000 n=10) │ sums.nested │ sums.linear │ │ allocs/op │ allocs/op vs base │ Processor/sums-8 365.0 ± 0% 260.0 ± 0% -28.77% (p=0.000 n=10) ``` <!--Describe what testing was performed and which tests were added.--> #### Testing This is a refactor, existing tests pass unaltered. <!--Describe the documentation added.--> #### Documentation not needed <!--Please delete paragraphs that you did not use before submitting.-->
…metry#36498) #### Description Removes the nested (aka overloading `streams.Map`) implementation. This has been entirely replaced by a leaner, "linear" implementation: - open-telemetry#35048 - open-telemetry#36486 <!--Describe what testing was performed and which tests were added.--> #### Testing Existing tests continue to pass unaltered <!--Describe the documentation added.--> #### Documentation not needed <!--Please delete paragraphs that you did not use before submitting.-->
…lemetry#36169) #### Description The max_streams default value was changed in open-telemetry#35048 but it was not updated in the readme.
#### Description As an oversight, open-telemetry#35048 creates two `metadata.TelemetryBuilder` instances. It also introduces an async metric, but one `TelemetryBuilder` sets no callback for that, leading to a panic on `Collect()`. Fixes that by using the same `TelemetryBuilder` for both, properly setting the callback. #### Testing Test was added in first commit that passes after adding second commit
…lemetry#36169) #### Description The max_streams default value was changed in open-telemetry#35048 but it was not updated in the readme.
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue. Ex. Adding a feature - Explain what this achieves.--> #### Description Finishes work started in open-telemetry#35048 That PR only partially introduced a less complex processor architecture by only using it for Sums. Back then I was not sure of the best way to do it for multiple datatypes, as generics seemed to introduce a lot of complexity regardless of usage. I since then did of a lot of perf analysis and due to the way Go works (see gcshapes), we do not really gain anything at runtime from using generics, given method calls are still dynamic. This implementation uses regular Go interfaces and a good old type switch in the hot path (ConsumeMetrics), which lowers mental complexity quite a lot imo. The value of the new architecture is backed up by the following benchmark: ``` goos: linux goarch: arm64 pkg: github.com/open-telemetry/opentelemetry-collector-contrib/processor/deltatocumulativeprocessor │ sums.nested │ sums.linear │ │ sec/op │ sec/op vs base │ Processor/sums-8 56.35µ ± 1% 39.99µ ± 1% -29.04% (p=0.000 n=10) │ sums.nested │ sums.linear │ │ B/op │ B/op vs base │ Processor/sums-8 11.520Ki ± 0% 3.683Ki ± 0% -68.03% (p=0.000 n=10) │ sums.nested │ sums.linear │ │ allocs/op │ allocs/op vs base │ Processor/sums-8 365.0 ± 0% 260.0 ± 0% -28.77% (p=0.000 n=10) ``` <!--Describe what testing was performed and which tests were added.--> #### Testing This is a refactor, existing tests pass unaltered. <!--Describe the documentation added.--> #### Documentation not needed <!--Please delete paragraphs that you did not use before submitting.-->
…metry#36498) #### Description Removes the nested (aka overloading `streams.Map`) implementation. This has been entirely replaced by a leaner, "linear" implementation: - open-telemetry#35048 - open-telemetry#36486 <!--Describe what testing was performed and which tests were added.--> #### Testing Existing tests continue to pass unaltered <!--Describe the documentation added.--> #### Documentation not needed <!--Please delete paragraphs that you did not use before submitting.-->
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue. Ex. Adding a feature - Explain what this achieves.--> #### Description Finishes work started in open-telemetry#35048 That PR only partially introduced a less complex processor architecture by only using it for Sums. Back then I was not sure of the best way to do it for multiple datatypes, as generics seemed to introduce a lot of complexity regardless of usage. I since then did of a lot of perf analysis and due to the way Go works (see gcshapes), we do not really gain anything at runtime from using generics, given method calls are still dynamic. This implementation uses regular Go interfaces and a good old type switch in the hot path (ConsumeMetrics), which lowers mental complexity quite a lot imo. The value of the new architecture is backed up by the following benchmark: ``` goos: linux goarch: arm64 pkg: github.com/open-telemetry/opentelemetry-collector-contrib/processor/deltatocumulativeprocessor │ sums.nested │ sums.linear │ │ sec/op │ sec/op vs base │ Processor/sums-8 56.35µ ± 1% 39.99µ ± 1% -29.04% (p=0.000 n=10) │ sums.nested │ sums.linear │ │ B/op │ B/op vs base │ Processor/sums-8 11.520Ki ± 0% 3.683Ki ± 0% -68.03% (p=0.000 n=10) │ sums.nested │ sums.linear │ │ allocs/op │ allocs/op vs base │ Processor/sums-8 365.0 ± 0% 260.0 ± 0% -28.77% (p=0.000 n=10) ``` <!--Describe what testing was performed and which tests were added.--> #### Testing This is a refactor, existing tests pass unaltered. <!--Describe the documentation added.--> #### Documentation not needed <!--Please delete paragraphs that you did not use before submitting.-->
…metry#36498) #### Description Removes the nested (aka overloading `streams.Map`) implementation. This has been entirely replaced by a leaner, "linear" implementation: - open-telemetry#35048 - open-telemetry#36486 <!--Describe what testing was performed and which tests were added.--> #### Testing Existing tests continue to pass unaltered <!--Describe the documentation added.--> #### Documentation not needed <!--Please delete paragraphs that you did not use before submitting.-->
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue. Ex. Adding a feature - Explain what this achieves.--> #### Description Finishes work started in open-telemetry#35048 That PR only partially introduced a less complex processor architecture by only using it for Sums. Back then I was not sure of the best way to do it for multiple datatypes, as generics seemed to introduce a lot of complexity regardless of usage. I since then did of a lot of perf analysis and due to the way Go works (see gcshapes), we do not really gain anything at runtime from using generics, given method calls are still dynamic. This implementation uses regular Go interfaces and a good old type switch in the hot path (ConsumeMetrics), which lowers mental complexity quite a lot imo. The value of the new architecture is backed up by the following benchmark: ``` goos: linux goarch: arm64 pkg: github.com/open-telemetry/opentelemetry-collector-contrib/processor/deltatocumulativeprocessor │ sums.nested │ sums.linear │ │ sec/op │ sec/op vs base │ Processor/sums-8 56.35µ ± 1% 39.99µ ± 1% -29.04% (p=0.000 n=10) │ sums.nested │ sums.linear │ │ B/op │ B/op vs base │ Processor/sums-8 11.520Ki ± 0% 3.683Ki ± 0% -68.03% (p=0.000 n=10) │ sums.nested │ sums.linear │ │ allocs/op │ allocs/op vs base │ Processor/sums-8 365.0 ± 0% 260.0 ± 0% -28.77% (p=0.000 n=10) ``` <!--Describe what testing was performed and which tests were added.--> #### Testing This is a refactor, existing tests pass unaltered. <!--Describe the documentation added.--> #### Documentation not needed <!--Please delete paragraphs that you did not use before submitting.-->
…metry#36498) #### Description Removes the nested (aka overloading `streams.Map`) implementation. This has been entirely replaced by a leaner, "linear" implementation: - open-telemetry#35048 - open-telemetry#36486 <!--Describe what testing was performed and which tests were added.--> #### Testing Existing tests continue to pass unaltered <!--Describe the documentation added.--> #### Documentation not needed <!--Please delete paragraphs that you did not use before submitting.-->
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue. Ex. Adding a feature - Explain what this achieves.--> #### Description Finishes work started in open-telemetry#35048 That PR only partially introduced a less complex processor architecture by only using it for Sums. Back then I was not sure of the best way to do it for multiple datatypes, as generics seemed to introduce a lot of complexity regardless of usage. I since then did of a lot of perf analysis and due to the way Go works (see gcshapes), we do not really gain anything at runtime from using generics, given method calls are still dynamic. This implementation uses regular Go interfaces and a good old type switch in the hot path (ConsumeMetrics), which lowers mental complexity quite a lot imo. The value of the new architecture is backed up by the following benchmark: ``` goos: linux goarch: arm64 pkg: github.com/open-telemetry/opentelemetry-collector-contrib/processor/deltatocumulativeprocessor │ sums.nested │ sums.linear │ │ sec/op │ sec/op vs base │ Processor/sums-8 56.35µ ± 1% 39.99µ ± 1% -29.04% (p=0.000 n=10) │ sums.nested │ sums.linear │ │ B/op │ B/op vs base │ Processor/sums-8 11.520Ki ± 0% 3.683Ki ± 0% -68.03% (p=0.000 n=10) │ sums.nested │ sums.linear │ │ allocs/op │ allocs/op vs base │ Processor/sums-8 365.0 ± 0% 260.0 ± 0% -28.77% (p=0.000 n=10) ``` <!--Describe what testing was performed and which tests were added.--> #### Testing This is a refactor, existing tests pass unaltered. <!--Describe the documentation added.--> #### Documentation not needed <!--Please delete paragraphs that you did not use before submitting.-->
Description:
Partially introduces a highly decoupled, linear processing pipeline.
Implemented as a standalone struct to make review easier, will refactor this later.
Instead of overloading
Map.Store()
to do aggregation, staleness andlimiting, this functionality is now explcitly handled in
ConsumeMetrics
.This highly aids readability and makes understanding this processor a
lot easier, as less mental context needs to be kept.
Notes to reviewer:
See
68dc901
for the main added logic.Compare
processor.go
(old, nested) tolinear.go
(new, linear)Replaces #34757
Link to tracking Issue: none
Testing: This is a refactor. Existing tests were not modified and still pass
Documentation: not needed