Hot shard metrics #4365

pdoerner · 2023-05-18T21:57:00Z

What changed?
Adding new persistence health signal collection components

Why?
To better understand why some shards/namespaces struggle during noisy neighbor type issues

How did you test it?
Existing unit/functional tests

Potential risks
None

Is hotfix candidate?

common/metrics/tags.go

common/aggregate/moving_window_average.go

common/aggregate/persistence_health_signal_aggregator.go

common/persistence/persistenceHealthSignalClients.go

dnr · 2023-05-24T00:29:19Z

common/aggregate/moving_window_average.go

+		sync.RWMutex
+		windowSize    time.Duration
+		maxBufferSize int
+		head          *ring.Ring


I'd want to use a ring buffer instead of a linked list. The code is a little more complicated (unless we find a library) but the memory usage patterns will be much nicer. How many of these do we expect to exist at once?

What do you mean by a ring buffer? My understanding is that a circular linked list is an example of a ring buffer. Are you looking for something that can do dynamic resizing?

I think David is referring to use an array to implement the ring instead of using a linked-list, that way we can scan a block of memory instead of chasing pointers.

Also, I think unless we have a good idea about the cap on maxBufferSize, it might be tricky to get this right to not have overflows. An alternative approach that I can think of is to have a ring buffer where each item represents a unit of time. The size of the buffer will be our window size (in unit of time); e.g., if we want the moving average to calculate the last 60 seconds, we can have a buffer of size 60 where each item in the buffer represents the average for one second with a sum and count and we can keep track of the aggregated average. This can be a bit tricky to implement though since it requires a bit of book-keeping when recording and calculating the average. We can leave this for later, only if this becomes a problem.

For the sake of completeness, I wrote an array-based implementation and in our bench test it preformed nearly the same (about 7ns/op slower than ring-based). If you want to see it for yourself, it's in commit 98f2b66b9b7a711c427d5f55de82bc55d135ddfc I see what you're saying about the contiguous memory of an array being preferable, but I'm not convinced this optimization is necessary at this point.

At the moment we are only using this to track averages for persistence latency and persistence error rate, so I do not expect to have a large number of them in memory at once. And for those metrics, we do not need to guarantee the perfect maxBufferSize to fit all observations in our time window; it just needs to be sufficiently large that we get an accurate picture of persistence health.

For now I'm going to leave it as is, unless there are strong feelings to the contrary. If we need to track other kinds of averages in the future, it may be worth it to revisit this and store elements based on time as Saman suggests.

That's right, it all depends on the pattern of how the data structure is accessed. In the benchmark for instance, since we are calling Average after each Record which means there is only 1 element of array that might be expired after calling Average. So there is little benefit since there is no contagious scanning of the elements. If you change the benchmark to call Average every 10th times, array has 20% lower overhead. (45 ns/op vs 60 ns/op in my experiments).

So depending on what we expect the call pattern to be, this might or might not add much benefits. There is also the overhead of synchronization (mutex) which can mask the benefits when this is being updated concurrently.

Yeah, I mean an array with head and tail indexes. Specifically something that can't do dynamic resizing, unlike the linked list.

I think this is one of those situations where microbenchmarks are misleading: if the whole thing fits in L1 and you're using it from a single thread, then chasing pointers is mostly free and everything will perform about the same. When either or both of those aren't true then the performance will start to diverge.

In general we have an io-bound service so you can say cpu optimizations don't matter, which is true up to a point. If we have just a few of these, then I agree. I was assuming we'd have one per namespace or more, where it'd be a bigger concern.

Okay I have been convinced; I swapped the linked list implementation for the array one. Seems there is no reason not to since we will most likely get better performance.

This reverts commit e283bfb.

This reverts commit 859950e.

samanbarghi · 2023-05-24T19:20:01Z

common/aggregate/moving_window_average.go

+		sync.RWMutex
+		windowSize    time.Duration
+		maxBufferSize int
+		head          *ring.Ring


I think David is referring to use an array to implement the ring instead of using a linked-list, that way we can scan a block of memory instead of chasing pointers.

Also, I think unless we have a good idea about the cap on maxBufferSize, it might be tricky to get this right to not have overflows. An alternative approach that I can think of is to have a ring buffer where each item represents a unit of time. The size of the buffer will be our window size (in unit of time); e.g., if we want the moving average to calculate the last 60 seconds, we can have a buffer of size 60 where each item in the buffer represents the average for one second with a sum and count and we can keep track of the aggregated average. This can be a bit tricky to implement though since it requires a bit of book-keeping when recording and calculating the average. We can leave this for later, only if this becomes a problem.

common/aggregate/moving_window_average.go

This reverts commit 98f2b66.

dnr · 2023-05-25T06:03:19Z

common/aggregate/moving_window_average.go

+		sync.RWMutex
+		windowSize    time.Duration
+		maxBufferSize int
+		head          *ring.Ring


Yeah, I mean an array with head and tail indexes. Specifically something that can't do dynamic resizing, unlike the linked list.

I think this is one of those situations where microbenchmarks are misleading: if the whole thing fits in L1 and you're using it from a single thread, then chasing pointers is mostly free and everything will perform about the same. When either or both of those aren't true then the performance will start to diverge.

In general we have an io-bound service so you can say cpu optimizations don't matter, which is true up to a point. If we have just a few of these, then I agree. I was assuming we'd have one per namespace or more, where it'd be a bigger concern.

common/persistence/client/factory.go

common/persistence/client/fx.go

common/persistence/health_signal_aggregator.go

common/persistence/client/factory.go

common/resource/fx.go

common/persistence/health_signal_aggregator.go

common/resource/fx.go

pdoerner added 9 commits May 18, 2023 12:07

moving window average

0045fb7

remove channel avg impl

ee44c22

add signal aggregator

68766b7

adjust record fn

bfc7002

add health signal clients

82abfa4

inject signal aggregator

f38f142

fix tests

c037475

Merge branch 'master' into rate-limiter-metrics

9543764

add metric emission

859950e

pdoerner marked this pull request as ready for review May 22, 2023 19:32

pdoerner requested a review from a team as a code owner May 22, 2023 19:32

race condition

e283bfb

yycptt reviewed May 23, 2023

View reviewed changes

dnr reviewed May 24, 2023

View reviewed changes

pdoerner and others added 15 commits May 23, 2023 17:54

Revert "race condition"

c43f1a1

This reverts commit e283bfb.

Revert "add metric emission"

dc5eb2a

This reverts commit 859950e.

emit per shard RPS metric

217cb96

cleanup

bf205a4

merge metric and signal clients

b8293dc

cleanup

27ec4c5

cleanup

df989eb

Merge branch 'master' into rate-limiter-metrics

6aa2e7f

linting

f960396

remove generics

3c409ce

cleanup

fe2955d

fix deferred metric fn

f412f60

fix defer metric fn

3caed40

fix clients

d64ff92

types

e91a0ee

samanbarghi reviewed May 24, 2023

View reviewed changes

pdoerner and others added 8 commits May 24, 2023 14:40

acquire lock once

db4db37

locks

c256fb1

array moving average

98f2b66

Revert "array moving average"

f3dfa3d

This reverts commit 98f2b66.

Merge branch 'master' into rate-limiter-metrics

fc62af7

cleanup

33448c8

emit per shard RPS

5a94697

Merge branch 'master' into rate-limiter-metrics

e179473

dnr reviewed May 25, 2023

View reviewed changes

pdoerner and others added 4 commits May 25, 2023 10:05

array average

e0ce388

feedback

e48fcec

cleanup

d3b20f1

Merge branch 'master' into rate-limiter-metrics

c42a921

pdoerner mentioned this pull request May 25, 2023

Dynamic rate limiter #4390

Merged

handle nil health signals

b6fd862

yycptt reviewed May 25, 2023

View reviewed changes

pdoerner and others added 3 commits May 25, 2023 12:49

feedback

0bd90b0

allow start and stop when not enabled

0911a11

some improvements

df5f722

yycptt approved these changes May 25, 2023

View reviewed changes

Merge branch 'master' into rate-limiter-metrics

937c0a9

yycptt merged commit 47d3fe5 into temporalio:master May 26, 2023

yycptt changed the title ~~Rate limiter metrics~~ Hot shard metrics May 26, 2023

pdoerner deleted the rate-limiter-metrics branch May 31, 2023 16:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hot shard metrics #4365

Hot shard metrics #4365

pdoerner commented May 18, 2023 •

edited

Loading

dnr May 24, 2023

pdoerner May 24, 2023

samanbarghi May 24, 2023

pdoerner May 24, 2023

samanbarghi May 25, 2023

dnr May 25, 2023

pdoerner May 25, 2023

samanbarghi May 24, 2023

dnr May 25, 2023

Hot shard metrics #4365

Hot shard metrics #4365

Conversation

pdoerner commented May 18, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pdoerner commented May 18, 2023 •

edited

Loading