Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Worker pool testing harness #4669

Merged
merged 4 commits into from
Sep 27, 2021
Merged

Conversation

adriansmares
Copy link
Contributor

@adriansmares adriansmares commented Sep 25, 2021

Summary

References #4607

Changes

  • Remove the ability to generate handlers on demand (the handler factory concept)
    • This feature created ambiguity for the Publish call, as the item may be queued, but the creation of a new worker may fail. The caller now doesn't know if he should attempt to resubmit the item or not. This also could have led to situations in which workers can no longer be created but items are still in the queue
    • This simplifies most of the call sites, and all of them had static handlers anyway (minus the webhooks)
  • Spawn a worker on panic, in order to avoid falling behind the minimal number of workers
    • In practice, the worker would have been spawned on the next Publish, but during testing I've hit this particular issue when there can be at most 1 worker spawned at the time, it panics, and there are still items in the queue which are not processed until Publish is called again
  • Introduce a testing harness for pkg/workerpool
    • New tests for the conditional increments / decrements
    • A set of test cases that push the limits of the minimum/maximum worker counts and queue sizes
      • Every item for which Publish did not fail is processed exactly once
      • Every item for which Publish did fail is not processed
      • No other items are processed
    • We also test that the pool behaves correctly when some items panic. As part of the tests, about 5% of the work items cause a panic
  • Introduce a (basic) benchmarking harness for pkg/workerpool
    • Uses the default parameters of the pool
    • Varies the producer delay (so entry PPS) and consumer delay (so exit PPS)

The results on my machine are:

goos: linux
goarch: amd64
pkg: go.thethings.network/lorawan-stack/v3/pkg/workerpool
cpu: Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz
BenchmarkWorkerPool/processingDelay/5ms/publishingDelay/5ms-8         	       1	5616981385 ns/op	        54.00 dropped	     63946 published	         1.008 queueDelayMS	23184808 B/op	  704369 allocs/op
BenchmarkWorkerPool/processingDelay/5ms/publishingDelay/10ms-8        	       1	10675325046 ns/op	         0 dropped	     64000 published	         0.0009687 queueDelayMS	20701048 B/op	  578360 allocs/op
BenchmarkWorkerPool/processingDelay/5ms/publishingDelay/50ms-8        	       1	50994860804 ns/op	         0 dropped	     64000 published	         0 queueDelayMS	20652408 B/op	  578277 allocs/op
BenchmarkWorkerPool/processingDelay/10ms/publishingDelay/5ms-8        	       1	5675990274 ns/op	     30223 dropped	     33777 published	         4.492 queueDelayMS	19977424 B/op	  614828 allocs/op
BenchmarkWorkerPool/processingDelay/10ms/publishingDelay/10ms-8       	       1	10690789991 ns/op	         8.000 dropped	     63992 published	         1.322 queueDelayMS	22586280 B/op	  701898 allocs/op
BenchmarkWorkerPool/processingDelay/10ms/publishingDelay/50ms-8       	       1	51108205618 ns/op	         0 dropped	     64000 published	         0 queueDelayMS	20680216 B/op	  578455 allocs/op
BenchmarkWorkerPool/processingDelay/50ms/publishingDelay/5ms-8        	       1	5694501109 ns/op	     56878 dropped	      7122 published	        24.34 queueDelayMS	17624888 B/op	  534881 allocs/op
BenchmarkWorkerPool/processingDelay/50ms/publishingDelay/10ms-8       	       1	10691858955 ns/op	     50541 dropped	     13459 published	        24.34 queueDelayMS	18192448 B/op	  553992 allocs/op
BenchmarkWorkerPool/processingDelay/50ms/publishingDelay/50ms-8       	       1	51052396468 ns/op	        46.00 dropped	     63954 published	        12.32 queueDelayMS	22616232 B/op	  704611 allocs/op

The way I interpret them is that queue 'physics' check out - if the producers have more PPS than the PPS that the pool can consume, the messages are dropped, and the delay for the queued items is minimal.

Testing

Unit testing for the first change, while introducing new tests for pkg/workerpool

Regressions

This reverts back webhooks to a shared *http.Client, but no other changes are expected.

Checklist

  • Scope: The referenced issue is addressed, there are no unrelated changes.
  • Compatibility: The changes are backwards compatible with existing API, storage, configuration and CLI, according to the compatibility commitments in README.md for the chosen target branch.
  • Documentation: Relevant documentation is added or updated.
  • Changelog: Significant features, behavior changes, deprecations and fixes are added to CHANGELOG.md.
  • Commits: Commit messages follow guidelines in CONTRIBUTING.md, there are no fixup commits left.

@adriansmares adriansmares added this to the v3.15.1 milestone Sep 25, 2021
@adriansmares adriansmares self-assigned this Sep 25, 2021
@github-actions github-actions bot added c/application server This is related to the Application Server c/gateway server This is related to the Gateway Server compat/config This could affect Configuration compatibility labels Sep 25, 2021
@adriansmares adriansmares force-pushed the feature/workerpool-tests-benchmark branch from fd5c0a9 to af6d35d Compare September 25, 2021 14:16
@adriansmares adriansmares marked this pull request as ready for review September 25, 2021 20:49
@adriansmares adriansmares mentioned this pull request Sep 26, 2021
5 tasks
Copy link
Member

@johanstokking johanstokking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great stuff

@adriansmares adriansmares merged commit c6a0861 into v3.15 Sep 27, 2021
@adriansmares adriansmares deleted the feature/workerpool-tests-benchmark branch September 27, 2021 10:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c/application server This is related to the Application Server c/gateway server This is related to the Gateway Server compat/config This could affect Configuration compatibility
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants