Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hot shard metrics #4365

Merged
merged 42 commits into from
May 26, 2023
Merged

Hot shard metrics #4365

merged 42 commits into from
May 26, 2023

Conversation

pdoerner
Copy link
Contributor

@pdoerner pdoerner commented May 18, 2023

What changed?
Adding new persistence health signal collection components

Why?
To better understand why some shards/namespaces struggle during noisy neighbor type issues

How did you test it?
Existing unit/functional tests

Potential risks
None

Is hotfix candidate?

@pdoerner pdoerner marked this pull request as ready for review May 22, 2023 19:32
@pdoerner pdoerner requested a review from a team as a code owner May 22, 2023 19:32
sync.RWMutex
windowSize time.Duration
maxBufferSize int
head *ring.Ring
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd want to use a ring buffer instead of a linked list. The code is a little more complicated (unless we find a library) but the memory usage patterns will be much nicer. How many of these do we expect to exist at once?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by a ring buffer? My understanding is that a circular linked list is an example of a ring buffer. Are you looking for something that can do dynamic resizing?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think David is referring to use an array to implement the ring instead of using a linked-list, that way we can scan a block of memory instead of chasing pointers.

Also, I think unless we have a good idea about the cap on maxBufferSize, it might be tricky to get this right to not have overflows. An alternative approach that I can think of is to have a ring buffer where each item represents a unit of time. The size of the buffer will be our window size (in unit of time); e.g., if we want the moving average to calculate the last 60 seconds, we can have a buffer of size 60 where each item in the buffer represents the average for one second with a sum and count and we can keep track of the aggregated average. This can be a bit tricky to implement though since it requires a bit of book-keeping when recording and calculating the average. We can leave this for later, only if this becomes a problem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the sake of completeness, I wrote an array-based implementation and in our bench test it preformed nearly the same (about 7ns/op slower than ring-based). If you want to see it for yourself, it's in commit 98f2b66b9b7a711c427d5f55de82bc55d135ddfc I see what you're saying about the contiguous memory of an array being preferable, but I'm not convinced this optimization is necessary at this point.

At the moment we are only using this to track averages for persistence latency and persistence error rate, so I do not expect to have a large number of them in memory at once. And for those metrics, we do not need to guarantee the perfect maxBufferSize to fit all observations in our time window; it just needs to be sufficiently large that we get an accurate picture of persistence health.

For now I'm going to leave it as is, unless there are strong feelings to the contrary. If we need to track other kinds of averages in the future, it may be worth it to revisit this and store elements based on time as Saman suggests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right, it all depends on the pattern of how the data structure is accessed. In the benchmark for instance, since we are calling Average after each Record which means there is only 1 element of array that might be expired after calling Average. So there is little benefit since there is no contagious scanning of the elements. If you change the benchmark to call Average every 10th times, array has 20% lower overhead. (45 ns/op vs 60 ns/op in my experiments).

So depending on what we expect the call pattern to be, this might or might not add much benefits. There is also the overhead of synchronization (mutex) which can mask the benefits when this is being updated concurrently.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I mean an array with head and tail indexes. Specifically something that can't do dynamic resizing, unlike the linked list.

I think this is one of those situations where microbenchmarks are misleading: if the whole thing fits in L1 and you're using it from a single thread, then chasing pointers is mostly free and everything will perform about the same. When either or both of those aren't true then the performance will start to diverge.

In general we have an io-bound service so you can say cpu optimizations don't matter, which is true up to a point. If we have just a few of these, then I agree. I was assuming we'd have one per namespace or more, where it'd be a bigger concern.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I have been convinced; I swapped the linked list implementation for the array one. Seems there is no reason not to since we will most likely get better performance.

sync.RWMutex
windowSize time.Duration
maxBufferSize int
head *ring.Ring
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think David is referring to use an array to implement the ring instead of using a linked-list, that way we can scan a block of memory instead of chasing pointers.

Also, I think unless we have a good idea about the cap on maxBufferSize, it might be tricky to get this right to not have overflows. An alternative approach that I can think of is to have a ring buffer where each item represents a unit of time. The size of the buffer will be our window size (in unit of time); e.g., if we want the moving average to calculate the last 60 seconds, we can have a buffer of size 60 where each item in the buffer represents the average for one second with a sum and count and we can keep track of the aggregated average. This can be a bit tricky to implement though since it requires a bit of book-keeping when recording and calculating the average. We can leave this for later, only if this becomes a problem.

sync.RWMutex
windowSize time.Duration
maxBufferSize int
head *ring.Ring
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I mean an array with head and tail indexes. Specifically something that can't do dynamic resizing, unlike the linked list.

I think this is one of those situations where microbenchmarks are misleading: if the whole thing fits in L1 and you're using it from a single thread, then chasing pointers is mostly free and everything will perform about the same. When either or both of those aren't true then the performance will start to diverge.

In general we have an io-bound service so you can say cpu optimizations don't matter, which is true up to a point. If we have just a few of these, then I agree. I was assuming we'd have one per namespace or more, where it'd be a bigger concern.

@pdoerner pdoerner mentioned this pull request May 25, 2023
@yycptt yycptt merged commit 47d3fe5 into temporalio:master May 26, 2023
@yycptt yycptt changed the title Rate limiter metrics Hot shard metrics May 26, 2023
@pdoerner pdoerner deleted the rate-limiter-metrics branch May 31, 2023 16:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants