-
Notifications
You must be signed in to change notification settings - Fork 480
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Target Allocator implementation (Part 2 - Target Allocator Image logic) #354
Conversation
@jpkrohling @Aneurysm9 can you please review. Look forward to any feedback you may have. ty! |
I'm planning on review this by the end of this week. |
@@ -0,0 +1,128 @@ | |||
package allocation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
None of these packages are used outside of this program, correct? Can we move them to cmd/otel-allocator/internal/*
? That way we can be confident their scope is contained and not worry about breaking any external packages that have depended on them.
This can be done separately and doesn't need to hold up this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Has this been addressed?
Similar to the other PR: I left a couple of comments in the design doc. Let me know if I should go ahead and review this, or if you want to settle the open points first. |
@alexperez52 @Raul9595 can you rerun the tests to make sure the E2E tests are passing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm overall. Mostly suggestions to make the code easier to read.
Pull request has been modified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry that I couldn't complete the review, but will make another pass later today or tomorrow.
@@ -0,0 +1,128 @@ | |||
package allocation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Has this been addressed?
Pull request has been modified.
I'll review this again soon, but please, try to get a PR fixing the intermittent tests that the part 1 introduced. |
Seems like the PR merged today solved the issue with the e2e tests |
} | ||
|
||
// Tests that the delta in number of targets per collector is less than 15% of an even distribution | ||
func TestCollectorBalanceWhenAddingAndRemovingAtRandom(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What advantages is this bringing in comparison to the test TestAddingAndRemovingTargets
? Is this exercising a different code path than TestCollectorBalanceWhenAddingTargets
when it comes to the balancing itself?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TestAddingAndRemovingTargets
tests that targets are successfully added with the expected with the expected key
TestCollectorBalanceWhenAddingTargets
tests the the balance on an easily divisible number for the purpose of even allocation
TestCollectorBalanceWhenAddingAndRemovingAtRandom
uses a more random approach where targets are both added and removed while verifying that at each step there isnt above a 15% difference in workload from collector to collector
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I get the value-add between the second and the third, but assuming there's enough value justifying the added complexity, isn't it covering the third case covering the second? Meaning, do we need the second? And the first?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From a straight coverage perspective, I think the third test will cover what the first two do as well. But they may still have value in enabling testing and reasoning about each capability somewhat more independently.
I'd like to take another pass at this allocation logic in the future, but I think for now it is a solid starting point.
} | ||
} | ||
|
||
func TestCollectorBalanceWhenCollectorsChanged(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as the previous comment: what's the difference between this and the previous test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current implementation will do a re allocation from scratch when collector information is modified so this test is for the purposes of verifying that targets were re allocated from scratch. Using a divisible ratio of targets to collectors, its verified that the targets are allocated evenly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I should have been clearer, but do we still need both tests? Isn't one of them covering what the other is doing?
Ping me when this is ready for review again. |
Co-authored-by: Anthony Mirabella <[email protected]>
…rator into otel-allocator
dde7f07
to
f450e17
Compare
I understand that @Aneurysm9 will be maintaining this code, so, I'll wait for his review before doing a final round myself. |
Hi @jpkrohling thanks for taking a look. @Aneurysm9 can you do a final review and let's get this merged. Ty!!! |
Ping @Aneurysm9 please review so that @jpkrohling can merge. Thx. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some things I'd like to refactor, particularly in the allocation
package and the main()
function, but I think this is a solid starting point and provides a usable, and useful, service.
@jpkrohling if you'd like to see that refactoring done before merging this I'd understand, but I also think it could be good to land it as it is to make the capability available and see what happens when people attempt to put it to use.
} | ||
|
||
// Tests that the delta in number of targets per collector is less than 15% of an even distribution | ||
func TestCollectorBalanceWhenAddingAndRemovingAtRandom(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From a straight coverage perspective, I think the third test will cover what the first two do as well. But they may still have value in enabling testing and reasoning about each capability somewhat more independently.
I'd like to take another pass at this allocation logic in the future, but I think for now it is a solid starting point.
The e2e tests for Kubernetes 1.19 failed and I just restarted. I'll merge this as soon as the tests pass. |
…ic) (open-telemetry#354) * Target Allocation server logic Co-Authored-By: Alexis Perez <[email protected]> Co-Authored-By: JBD <[email protected]> * Added cmd to indicate executable * Update cmd/otel-allocator/allocation/allocator.go Co-authored-by: Anthony Mirabella <[email protected]> * Update cmd/otel-allocator/allocation/allocator.go Co-authored-by: Anthony Mirabella <[email protected]> * Updated discovery manager, collector component and added testing file for collector.go Updated code to parse config using default Prometheus config and added testing file for collector component. * Removed unnecessary struct in config.go * Added load testing * Update cmd/otel-allocator/allocation/allocator.go Co-authored-by: Anthony Mirabella <[email protected]> * Update cmd/otel-allocator/allocation/allocator.go Co-authored-by: Anthony Mirabella <[email protected]> * Update cmd/otel-allocator/allocation/allocator.go Co-authored-by: Anthony Mirabella <[email protected]> * Update cmd/otel-allocator/allocation/allocator.go * Update cmd/otel-allocator/allocation/allocator.go * Removed nextCollector and modified locks * Updated collector.go to reflect new namespace * Refactored display map logic & updated locking convention * Updated container port * Change initialized empty collector to nil collector Co-authored-by: Anthony Mirabella <[email protected]> * Updated collector test logic * Updated allocation files * Updated allocation import in main.go * Updated collector & discovery files * Updated unit tallocator unit tests * Updated runWatch to prevent panic * Seperated http logic from allocator logic * Integrated logr * Updated collector test to use channels * Update use of logger and fix error messages * Update test files Co-authored-by: Rahul Varma <[email protected]> Co-authored-by: JBD <[email protected]> Co-authored-by: Anthony Mirabella <[email protected]> Co-authored-by: Rahul Varma <[email protected]>
…ic) (open-telemetry#354) * Target Allocation server logic Co-Authored-By: Alexis Perez <[email protected]> Co-Authored-By: JBD <[email protected]> * Added cmd to indicate executable * Update cmd/otel-allocator/allocation/allocator.go Co-authored-by: Anthony Mirabella <[email protected]> * Update cmd/otel-allocator/allocation/allocator.go Co-authored-by: Anthony Mirabella <[email protected]> * Updated discovery manager, collector component and added testing file for collector.go Updated code to parse config using default Prometheus config and added testing file for collector component. * Removed unnecessary struct in config.go * Added load testing * Update cmd/otel-allocator/allocation/allocator.go Co-authored-by: Anthony Mirabella <[email protected]> * Update cmd/otel-allocator/allocation/allocator.go Co-authored-by: Anthony Mirabella <[email protected]> * Update cmd/otel-allocator/allocation/allocator.go Co-authored-by: Anthony Mirabella <[email protected]> * Update cmd/otel-allocator/allocation/allocator.go * Update cmd/otel-allocator/allocation/allocator.go * Removed nextCollector and modified locks * Updated collector.go to reflect new namespace * Refactored display map logic & updated locking convention * Updated container port * Change initialized empty collector to nil collector Co-authored-by: Anthony Mirabella <[email protected]> * Updated collector test logic * Updated allocation files * Updated allocation import in main.go * Updated collector & discovery files * Updated unit tallocator unit tests * Updated runWatch to prevent panic * Seperated http logic from allocator logic * Integrated logr * Updated collector test to use channels * Update use of logger and fix error messages * Update test files Co-authored-by: Rahul Varma <[email protected]> Co-authored-by: JBD <[email protected]> Co-authored-by: Anthony Mirabella <[email protected]> Co-authored-by: Rahul Varma <[email protected]>
Description
Follows the changes in #351
This PR addresses issue open-telemetry/wg-prometheus#60. This is part 2 of a 3 part update in which enhancements are presented to the OTel Operator to support the deployment of a Target Allocator. This update contains the logic for the Operator Managed target allocator image which will be used to perform target sharding for the Prometheus scrape targets and expose them to the collector(s) for consumption using http sd config.
The Design document for this part 2 can be found here:
Target Allocator Design Document
Type of change
New feature
Testing
Unit tests were added to test the least connection algorithm flow as well as updating and removing targets.
More testing details can be found on the doc mentioned above.
cc: @Raul9595 @alolita