-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kv: performance degradation in YCSB (A) between 20.1 and 20.2 #54515
Comments
Hi @dbist, please add a C-ategory label to your issue. Check out the label system docs. While you're here, please consider adding an A- label to help keep our repository tidy. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
@nvanbenschoten Can you triage for #kv? @dbist initially suspect Pebble was to blame for the regression, but he saw even worse performance on 20.2 beta 1 using RocksDB. So this seems to be outside of the #storage layer. |
Thanks for filing @dbist. I was able to reproduce your results. I was concerned that the reproduction steps switch hardware between tests ("The new IP list is as follows"), but even without that, the ~5% reduction in throughput is visible on this workload. The other thing to note is that the reproduction steps use a I began digging into this by looking at recent roachperf degredations. The most recent major drop in performance across many different workloads was on September 12th. I've since pinned this on 2d69b3d. That commit enabled jemalloc profiling on all roachtests. The comment added in that change says that this has "minimal performance impact", but our nightly testing begs to differ. Here's the impact on
A 7.6% hit to throughput is pretty serious. I'm tempted to revert that change, as I'm not aware of any instance where these new profiles have helped in debugging. What do you think @petermattis? Next, I'm going to look into the throughput dip around August 8th and the subsequent one around August 21st. |
This was due to the revert back to Go 1.13. So the jump in throughput we see on kv95 between June 29th and August 8th was due to Go 1.14. We should expect to get this back once we upgrade again at the beginning of the next release cycle. |
I've believe I have pinned this regression down to 9039cb1:
I haven't dug deep enough into that commit to understand where the regression is coming from, but after a brief look, I suspect it's due to some combination of increased heap allocation and increased lock contention. Concretely, I'm concerned with how mutex-heavy the Also, I wasn't explicit in #54515 (comment) that the jemalloc profiling could not be contributing to the YCSB regression, because it only affects roachtest results. |
Glad you tracked this down, @nvanbenschoten!
That's a much more significant hit than I was expected. I'm not sure what @knz did for testing here. Perhaps we can disable jemalloc profiles for performance oriented roachtests. @knz can weigh in with thoughts about the importance of the profiles. |
The question came up to troubleshoot failed bulk i.o tests and also failing fixture imports, when nodes crashed due to insufficient memory. I agree we should revert this change in light of the new learning. |
reverting in #54612 |
@arulajmani I took a look at a few performance profiles to see whether anything stood out between 9039cb1 and its parent commit. I didn't see much in the mutex profiles, but both the heap and cpu profiles painted a pretty clear picture that the changes made in that commit come with a pretty large cost, relative to the rest of the workload. For each profile, I filtered the samples down to those containing On the parent commit (3e01420), I see no heap allocations at all (alloc_objects) that pass this filter. In terms of CPU, I see 0.32% of CPU time spent within the On the commit in question (9039cb1), things are different. First, I see 3.05% of all allocated heap objects coming from within the I also see 1.34% of CPU time spent here: So it appears that our suspicion was correct that the From those profiles, it appears that
I think the key to avoiding this regression is going to be making |
I've looked at the code a little. Assuming we can't re-organize the code to avoid re-computing an ID from scratch every time here's what we can do with ConstructStatementID:
I get this:
func ConstructStatementID(anonymizedStmt string, failed bool, implicitTxn bool) StmtID {
// Magic constants as suitable for a FNV-64 hash.
s0 := uint64(14695981039346656037)
for _, c := range anonymizedStmt {
// These two assignments as well as those below should be moved to
// a helper function to share the code.
s0 *= 1099511628211
s0 ^= uint64(c)
}
if failed {
s0 *= 1099511628211
s0 ^= uint64('F')
} else {
s0 *= 1099511628211
s0 ^= uint64('S')
}
if implicitTxn {
s0 *= 1099511628211
s0 ^= uint64('I')
} else {
s0 *= 1099511628211
s0 ^= uint64('E')
}
b := make([]byte, 16)
const hex = "0123456789abcdef"
for i := 0; i < 16; i++ {
b[i] = hex[s0&0xf]
s0 = s0 >> 4
}
return StmtID(*(*string)(unsafe.Pointer(&b)))
} |
Benchmarking as usual with this:
before/after comparison with (explaining for Arul in case you've never done that before) |
You could remove that final allocation by changing the type of |
It's used for a proto field. I'm not sure if it's possible to change the string in the proto to [16]byte. |
Oh what we could do instead is use an uint64. The FNV hash computes an uint64, so no need for a string. |
Thanks @nvanbenschoten for helping narrow down the allocations and @knz for the suggestions on how to improve this. I tried to validate the theory above by making
It seemed to have clawed some, but not all, of the regression. For comparison, these are the numbers on the previous commit(3e01420) and my commit (9039cb1 ) that I'm looking to close:
The next step for me is to re-organize the code a bit so that we don't always have to compute the statement ID from scratch. Then, I'll take the other suggestions on this thread, run the numbers again, and report back. |
We recently increased allocations in the `appStats` structure when adding support for transaction level statistics in cockroachdb#52704. This was because the interactions with the `fnv` library were expensive in terms of allocations. This patch aims to claw back the regression by: - Using our own implementation of the FNV algorithm instead of the library, which is significantly lighter weight (microbenchmarks below). - Re-organizes the code to only construct the statement IDs (deemed the expensive operation) if required. When comparing the difference between the commit that introduced the regression and the changes proposed by this diff, I got the following improvements on the KV workload: ``` name old ops/s new ops/s delta kv95-throughput 34.5k ± 6% 35.7k ± 4% +3.42% (p=0.023 n=10+10) ``` Microbenchmarks for the new hashing algorithm (written/run by @knz): ``` name old time/op new time/op delta ConstructStatementID-32 405ns ±17% 39ns ±12% -90.34% (p=0.008 n=5+5) name old alloc/op new alloc/op delta ConstructStatementID-32 120B ± 0% 16B ± 0% -86.67% (p=0.008 n=5+5) name old allocs/op new allocs/op delta ConstructStatementID-32 6.00 ± 0% 1.00 ± 0% -83.33% (p=0.008 n=5+5) ``` Closes cockroachdb#54515 Release note: None
We recently increased allocations in the `appStats` structure when adding support for transaction level statistics in cockroachdb#52704. This was because the interactions with the `fnv` library were expensive in terms of allocations. This patch aims to claw back the regression by: - Using our own implementation of the FNV algorithm instead of the library, which is significantly lighter weight (microbenchmarks below). - Re-organizes the code to only construct the statement IDs (deemed the expensive operation) if required. When comparing the difference between the commit that introduced the regression and the changes proposed by this diff, I got the following improvements on the KV workload: ``` name old ops/s new ops/s delta kv95-throughput 34.5k ± 6% 35.7k ± 4% +3.42% (p=0.023 n=10+10) ``` Microbenchmarks for the new hashing algorithm (written/run by @knz): ``` name old time/op new time/op delta ConstructStatementID-32 405ns ±17% 39ns ±12% -90.34% (p=0.008 n=5+5) name old alloc/op new alloc/op delta ConstructStatementID-32 120B ± 0% 16B ± 0% -86.67% (p=0.008 n=5+5) name old allocs/op new allocs/op delta ConstructStatementID-32 6.00 ± 0% 1.00 ± 0% -83.33% (p=0.008 n=5+5) ``` Closes cockroachdb#54515 Release note: None
We recently increased allocations in the `appStats` structure when adding support for transaction level statistics in cockroachdb#52704. This was because the interactions with the `fnv` library were expensive in terms of allocations. This patch aims to claw back the regression by: - Using our own implementation of the FNV algorithm instead of the library, which is significantly lighter weight (microbenchmarks below). - Re-organizes the code to only construct the statement IDs (deemed the expensive operation) if required. When comparing the difference between the commit that introduced the regression and the changes proposed by this diff, I got the following improvements on the KV workload: ``` name old ops/s new ops/s delta kv95-throughput 34.5k ± 6% 35.7k ± 4% +3.42% (p=0.023 n=10+10) ``` Microbenchmarks for the new hashing algorithm (written/run by @knz): ``` name old time/op new time/op delta ConstructStatementID-32 405ns ±17% 39ns ±12% -90.34% (p=0.008 n=5+5) name old alloc/op new alloc/op delta ConstructStatementID-32 120B ± 0% 16B ± 0% -86.67% (p=0.008 n=5+5) name old allocs/op new allocs/op delta ConstructStatementID-32 6.00 ± 0% 1.00 ± 0% -83.33% (p=0.008 n=5+5) ``` Closes cockroachdb#54515 Release note: None
We recently increased allocations in the `appStats` structure when adding support for transaction level statistics in cockroachdb#52704. This was because the interactions with the `fnv` library were expensive in terms of allocations. This patch aims to claw back the regression by: - Using our own implementation of the FNV algorithm instead of the library, which is significantly lighter weight (microbenchmarks below). - Reorganizing the code to only construct a statement ID from scratch - if required. When comparing the difference between the commit that introduced the regression and the changes proposed by this diff, I got the following improvements on the KV workload: ``` name old ops/s new ops/s delta kv95-throughput 34.5k ± 6% 35.7k ± 4% +3.42% (p=0.023 n=10+10) ``` Microbenchmarks for the new hashing algorithm (written/run by @knz): ``` name old time/op new time/op delta ConstructStatementID-32 405ns ±17% 39ns ±12% -90.34% (p=0.008 n=5+5) name old alloc/op new alloc/op delta ConstructStatementID-32 120B ± 0% 16B ± 0% -86.67% (p=0.008 n=5+5) name old allocs/op new allocs/op delta ConstructStatementID-32 6.00 ± 0% 1.00 ± 0% -83.33% (p=0.008 n=5+5) ``` Closes cockroachdb#54515 Release note: None
55062: kvserver: remove unused parameter r=TheSamHuang a=tbg Release note: None 55214: sql: fix performance regression due to increased hash allocations r=nvanbenschoten,knz a=arulajmani We recently increased allocations in the `appStats` structure when adding support for transaction level statistics in #52704. This was because the interactions with the `fnv` library were expensive in terms of allocations. This patch aims to claw back the regression by: - Using our own implementation of the FNV algorithm instead of the library, which is significantly lighter weight (microbenchmarks below). - Re-organizes the code to only construct the statement IDs (deemed the expensive operation) if required. When comparing the difference between the commit that introduced the regression and the changes proposed by this diff, I got the following improvements on the KV workload: ``` name old ops/s new ops/s delta kv95-throughput 34.5k ± 6% 35.7k ± 4% +3.42% (p=0.023 n=10+10) ``` Microbenchmarks for the new hashing algorithm (written/run by @knz): ``` name old time/op new time/op delta ConstructStatementID-32 405ns ±17% 39ns ±12% -90.34% (p=0.008 n=5+5) name old alloc/op new alloc/op delta ConstructStatementID-32 120B ± 0% 16B ± 0% -86.67% (p=0.008 n=5+5) name old allocs/op new allocs/op delta ConstructStatementID-32 6.00 ± 0% 1.00 ± 0% -83.33% (p=0.008 n=5+5) ``` Closes #54515 Release note: None 55277: server: create engines in NewServer r=irfansharif a=tbg We were jumping through a number of hoops to create the engines only in `(*Server).Start` since that seems to be the "idiomatic" place to start moving parts. However, it creates a lot of complexity since various callbacks have to be registered with access to engines. Move engine creation to `NewServer`. This unblocks #54936. Release note: None 55280: roachtest: recognize GEOS dlls on more platforms r=otan a=knz This makes roachtest work on the BSDs again. Release note: None Co-authored-by: Tobias Grieger <[email protected]> Co-authored-by: arulajmani <[email protected]> Co-authored-by: David Hartunian <[email protected]> Co-authored-by: Raphael 'kena' Poss <[email protected]>
We recently increased allocations in the `appStats` structure when adding support for transaction level statistics in cockroachdb#52704. This was because the interactions with the `fnv` library were expensive in terms of allocations. This patch aims to claw back the regression by: - Using our own implementation of the FNV algorithm instead of the library, which is significantly lighter weight (microbenchmarks below). - Reorganizing the code to only construct a statement ID from scratch - if required. When comparing the difference between the commit that introduced the regression and the changes proposed by this diff, I got the following improvements on the KV workload: ``` name old ops/s new ops/s delta kv95-throughput 34.5k ± 6% 35.7k ± 4% +3.42% (p=0.023 n=10+10) ``` Microbenchmarks for the new hashing algorithm (written/run by @knz): ``` name old time/op new time/op delta ConstructStatementID-32 405ns ±17% 39ns ±12% -90.34% (p=0.008 n=5+5) name old alloc/op new alloc/op delta ConstructStatementID-32 120B ± 0% 16B ± 0% -86.67% (p=0.008 n=5+5) name old allocs/op new allocs/op delta ConstructStatementID-32 6.00 ± 0% 1.00 ± 0% -83.33% (p=0.008 n=5+5) ``` Closes cockroachdb#54515 Release note: None
We recently increased allocations in the `appStats` structure when adding support for transaction level statistics in #52704. This was because the interactions with the `fnv` library were expensive in terms of allocations. This patch aims to claw back the regression by: - Using our own implementation of the FNV algorithm instead of the library, which is significantly lighter weight (microbenchmarks below). - Reorganizing the code to only construct a statement ID from scratch - if required. When comparing the difference between the commit that introduced the regression and the changes proposed by this diff, I got the following improvements on the KV workload: ``` name old ops/s new ops/s delta kv95-throughput 34.5k ± 6% 35.7k ± 4% +3.42% (p=0.023 n=10+10) ``` Microbenchmarks for the new hashing algorithm (written/run by @knz): ``` name old time/op new time/op delta ConstructStatementID-32 405ns ±17% 39ns ±12% -90.34% (p=0.008 n=5+5) name old alloc/op new alloc/op delta ConstructStatementID-32 120B ± 0% 16B ± 0% -86.67% (p=0.008 n=5+5) name old allocs/op new allocs/op delta ConstructStatementID-32 6.00 ± 0% 1.00 ± 0% -83.33% (p=0.008 n=5+5) ``` Closes #54515 Release note: None
What is your situation?
Seeing about 5-10k qps dip in performance on the same machine shapes and identical setup compared to 20.1.5
Select all that apply:
Observed performance
What did you see? How did you measure it?
If you have already ran tests, include your test details here:
Application profile
Performance depends on the application. Please help us understand how you use CockroachDB before we can discuss performance.
Requested resolution
When/how would you consider this issue resolved?
Select all that applies:
Repro steps are described in the following document
The text was updated successfully, but these errors were encountered: