-
Notifications
You must be signed in to change notification settings - Fork 911
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hot shard metrics #4365
Hot shard metrics #4365
Conversation
sync.RWMutex | ||
windowSize time.Duration | ||
maxBufferSize int | ||
head *ring.Ring |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd want to use a ring buffer instead of a linked list. The code is a little more complicated (unless we find a library) but the memory usage patterns will be much nicer. How many of these do we expect to exist at once?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by a ring buffer? My understanding is that a circular linked list is an example of a ring buffer. Are you looking for something that can do dynamic resizing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think David is referring to use an array to implement the ring instead of using a linked-list, that way we can scan a block of memory instead of chasing pointers.
Also, I think unless we have a good idea about the cap on maxBufferSize
, it might be tricky to get this right to not have overflows. An alternative approach that I can think of is to have a ring buffer where each item represents a unit of time. The size of the buffer will be our window size (in unit of time); e.g., if we want the moving average to calculate the last 60 seconds, we can have a buffer of size 60 where each item in the buffer represents the average for one second with a sum
and count
and we can keep track of the aggregated average. This can be a bit tricky to implement though since it requires a bit of book-keeping when recording and calculating the average. We can leave this for later, only if this becomes a problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the sake of completeness, I wrote an array-based implementation and in our bench test it preformed nearly the same (about 7ns/op slower than ring-based). If you want to see it for yourself, it's in commit 98f2b66b9b7a711c427d5f55de82bc55d135ddfc
I see what you're saying about the contiguous memory of an array being preferable, but I'm not convinced this optimization is necessary at this point.
At the moment we are only using this to track averages for persistence latency and persistence error rate, so I do not expect to have a large number of them in memory at once. And for those metrics, we do not need to guarantee the perfect maxBufferSize
to fit all observations in our time window; it just needs to be sufficiently large that we get an accurate picture of persistence health.
For now I'm going to leave it as is, unless there are strong feelings to the contrary. If we need to track other kinds of averages in the future, it may be worth it to revisit this and store elements based on time as Saman suggests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right, it all depends on the pattern of how the data structure is accessed. In the benchmark for instance, since we are calling Average
after each Record
which means there is only 1 element of array that might be expired after calling Average
. So there is little benefit since there is no contagious scanning of the elements. If you change the benchmark to call Average
every 10th times, array has 20% lower overhead. (45 ns/op vs 60 ns/op in my experiments).
So depending on what we expect the call pattern to be, this might or might not add much benefits. There is also the overhead of synchronization (mutex) which can mask the benefits when this is being updated concurrently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I mean an array with head and tail indexes. Specifically something that can't do dynamic resizing, unlike the linked list.
I think this is one of those situations where microbenchmarks are misleading: if the whole thing fits in L1 and you're using it from a single thread, then chasing pointers is mostly free and everything will perform about the same. When either or both of those aren't true then the performance will start to diverge.
In general we have an io-bound service so you can say cpu optimizations don't matter, which is true up to a point. If we have just a few of these, then I agree. I was assuming we'd have one per namespace or more, where it'd be a bigger concern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay I have been convinced; I swapped the linked list implementation for the array one. Seems there is no reason not to since we will most likely get better performance.
sync.RWMutex | ||
windowSize time.Duration | ||
maxBufferSize int | ||
head *ring.Ring |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think David is referring to use an array to implement the ring instead of using a linked-list, that way we can scan a block of memory instead of chasing pointers.
Also, I think unless we have a good idea about the cap on maxBufferSize
, it might be tricky to get this right to not have overflows. An alternative approach that I can think of is to have a ring buffer where each item represents a unit of time. The size of the buffer will be our window size (in unit of time); e.g., if we want the moving average to calculate the last 60 seconds, we can have a buffer of size 60 where each item in the buffer represents the average for one second with a sum
and count
and we can keep track of the aggregated average. This can be a bit tricky to implement though since it requires a bit of book-keeping when recording and calculating the average. We can leave this for later, only if this becomes a problem.
sync.RWMutex | ||
windowSize time.Duration | ||
maxBufferSize int | ||
head *ring.Ring |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I mean an array with head and tail indexes. Specifically something that can't do dynamic resizing, unlike the linked list.
I think this is one of those situations where microbenchmarks are misleading: if the whole thing fits in L1 and you're using it from a single thread, then chasing pointers is mostly free and everything will perform about the same. When either or both of those aren't true then the performance will start to diverge.
In general we have an io-bound service so you can say cpu optimizations don't matter, which is true up to a point. If we have just a few of these, then I agree. I was assuming we'd have one per namespace or more, where it'd be a bigger concern.
What changed?
Adding new persistence health signal collection components
Why?
To better understand why some shards/namespaces struggle during noisy neighbor type issues
How did you test it?
Existing unit/functional tests
Potential risks
None
Is hotfix candidate?