-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: zeroqps during schema change with reduced bulk_io_write.max_rate
#36806
Comments
I stopped the workload and restarted it to try to get this test working again but its still seeing zeroqps.
Followed by:
|
Looks like some stuff is blocked resolving intents? Shouldn't these calls have timeouts? Or some other way to cancel themselves?
|
Yes they should probably have a timeout, especially given that intent resolution is best effort. I can make a PR for that when I get back. That being said, this seems more like a symptom than a cause. Why is intent resolution blocked? Are there unavailable ranges? |
Agreed that a timeout here would be good. Also agree that this isn't the root cause of the problem. Here's closer to the root cause (from the goroutine dump):
An SST write is waiting since 46 minutes for the rate limiter. There isn't anyone else using this limiter. So... is the limiter maybe broken? This doesn't seem super likely to me but from the goroutines that looks to be the case. As a side effect of this stall, raft processing for this replica was blocked, so any requests entering it would get stuck as well. This could explain the zero qps, and I also assume that the blocking intent resolution requests had an intent on that range and got that request stuck there. |
cc @bdarnell -- this seems another problem worthy of fixing before we release with dist backfill on. |
FYI I'll looking at checking if setting |
@ajwerner There was a timeout on intent resolution before, but it got lost in #34803.
The |
Typing up a diff to add back timeouts now. |
Does this need to go on the release blockers list (#35554)? |
No. As far as controlling schema changes is concerned we already have #36430 which @awoods187 is presently testing to see if it works. We'll either reopen that issue or marked it fixed. This issue is still being investigated. |
The subject line omits a crucial detail: the zero qps is a result of setting The new rate limiter looks like it's in a better place, although I'm not sure that we've actually solved the problem unless we give it a good default. |
bulk_io_write.max_rate
This PR adds a configuration option for a timeout for batch request sends in the request batcher and adopts this option in the intent resolver. Prior to the introduction of batching in cockroachdb#34803 intent resolution was performed with a maximum timeout of 30s. This PR uses that same constant and additionally applies it to batches of requests to gc transaction records. Informs cockroachdb#36806. Release note: None
I've updated #36430 with the latest test results. We can followup there on what could be good defaults. |
This PR adds a configuration option for a timeout for batch request sends in the request batcher and adopts this option in the intent resolver. Prior to the introduction of batching in cockroachdb#34803 intent resolution was performed with a maximum timeout of 30s. This PR uses that same constant and additionally applies it to batches of requests to gc transaction records. Informs cockroachdb#36806. Release note: None
36845: requestbatcher,intentresolver: add timeout in requestbatcher and adopt r=ajwerner a=ajwerner This PR adds a configuration option for a timeout for batch request sends in the request batcher and adopts this option in the intent resolver. Prior to the introduction of batching in #34803 intent resolution was performed with a maximum timeout of 30s. This PR uses that same constant and additionally applies it to batches of requests to gc transaction records. Informs #36806. Release note: None Co-authored-by: Andrew Werner <[email protected]>
@ajwerner feel free to close this issue if you're satisfied. |
@vivekmenezes I thought we identified that this knob doesn't work as intended? |
we should discuss if we should take this knob out. @dt has thoughts on why this knob might still be needed despite having the new knob that controls ADD SST rate. i believe it is to allow control over disk iops. |
we still need a disk IO limiter, not a file count limiter, since e.g. RESTORE ingests relatively fewer, bigger SSTs, and in IO-contsrained environments, we've seen doing a sustained, unlimited write starve out other traffic, and turning that knob down to 25mb or something be a big stability win. It needs to be below raft, at least for now, since it needs to a) slow down the application of a 32 single file (so it sleeps between chunks) and b) needs to slow down on a node, by following many leaders that might each have a healthy rate, ends up at an unhealthy aggregate rate. It's still better to fall behind on the one range doing the bulk ingestion than to have the whole store get stuck waiting for IO. I think we might need a lower-bound on it though to prevent slowing raft too far (e.g. max-wait of 5s or something), and ideally we'd somehow propagate information from followers to leaders so they could do better admission control rather than forcing the follower to just choose to fall behind. |
I can easily reproduce the problem where the nodes are all blocked on @tbg was right that it's a rate limiter problem: during the import, we are incurring an enormous deficit of tokens in the rate limiter due to floating point underflows and integer casts. The issue isn't the https://github.com/golang/time/blob/master/rate/rate.go#L365-L368 This function is rounding all intervals down to 0 whenever it's used, meaning that at the default rate, (1) there is no rate-limiting, and (2) tokens never get replenished. So when we finally change the rate to 1 MB/s, there's a huge deficit of tokens in the rate limiter (on the order of 10e9 after writing a few thousand chunks during importing tpcc-100), and the rate limiter correctly computes that it would take 1-2 hours to recover. To fix this, I think we can just set the default for |
The rate limiter has a special case for MaxFloat64 (we're using MaxInt64 instead). We should change to use MaxFloat64 (or the alias |
The problem is that |
36884: storage: decrease default kv.bulk_io_write.max_rate r=lucy-zhang a=lucy-zhang Previously `kv.bulk_io_write.max_rate` had a default value of `MaxInt64`, which caused rounding problems in the rate limiter due to very small time intervals. This lowers the default to 1 TB/s. Fixes #36806. (See #36806 (comment)) Release note (bug fix): The default value of `kv.bulk_io_write.max_rate` is now 1 TB/s, to help prevent incorrect rate limiting behavior due to rounding. Co-authored-by: Lucy Zhang <[email protected]>
Describe the problem
While testing out the new index backfill I



SET CLUSTER SETTING kv.bulk_io_write.max_rate = '1mb'
. Once the schema change started it dropped QPS to zero on the cluster:To Reproduce
Then run the test:
Environment:
19.1 branch
v19.1.0-rc.2-40-g0c83360
The text was updated successfully, but these errors were encountered: