-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvserver, timeutil: fix some Timer user-after-Stops #61373
Conversation
48820fd
to
033c154
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 for the bug fix. Is there a way we would could have caught this easier? What if Stop
operated on a **Timer
and set the reference to nil
? That would prevent the use-after-free.
I'm not sure about the StopExclusive()
extension. I wouldn't want to complicate an already complicated API for limited benefit and I don't think we have any reason to believe that not adding to the pool during this small race is causing issues (unless you were seeing something in cpu/heap profiles). But can we do what you want unconditionally? Do we need to worry about concurrent writes to t.Read
? Either way, it may be better to pull it into a separate PR so we can iterate on it without holding up the bug fix portion.
Reviewed 2 of 4 files at r1.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @ajwerner)
Two guys were continuing to use a Timer after Stop()ing it, which is illegal. Release note: None Release justification: Bug fix.
033c154
to
1512a14
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed everything but the fix.
Is there a way we would could have caught this easier? What if Stop operated on a **Timer and set the reference to nil? That would prevent the use-after-free.
Well but you have the same problem tho - a StopTimer(**Timer) could not be called concurrently with a receive (<-t.C
). So, you wouldn't be able to simply generally replace Stop()
with it (since Stop
can be called concurrently with a receive[*]. So, you'd end up with an extra API.
[*] The comments on Stop
say:
// Stop does not
// close the channel, to prevent a read from succeeding incorrectly.
// Note that a Timer must never be used again after calls to Stop as the timer
// object will be put into an object pool for reuse.
This comment is a bit ambiguous, because the two sentences seem to be at odds. The first sentence implies that it's possible for somebody to be receiving on t.C
concurrently. The second sentence says that you can't use the channel after Stop
, so depending on the definition of "after", one shouldn't be receiving on it. In any case, I believe we do have code Stopping and receiving at the same time.
I'm not sure about the StopExclusive() extension. I wouldn't want to complicate an already complicated API for limited benefit and I don't think we have any reason to believe that not adding to the pool during this small race is causing issues (unless you were seeing something in cpu/heap profiles). But can we do what you want unconditionally? Do we need to worry about concurrent writes to t.Read? Either way, it may be better to pull it into a separate PR so we can iterate on it without holding up the bug fix portion.
I think we do need to worry about concurrent t.Read = true
, yeah. What could we do about that, if we wanted a Stop
that always puts the timer back in the pool? We could make t.Read
an atomic (set through a new method), and have Stop
do:
res = t.timer.Stop()
if !res {
select {
case <-t.C:
drained = true
default:
}
if !drained {
<spin until t.Read == true, which should be soon>
}
}
WDYT? Doesn't make the heart joyous.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @nvanbenschoten)
I have regrets |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gimme a stamp pls
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @nvanbenschoten)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bors r+
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @nvanbenschoten)
Build succeeded: |
Informs cockroachdb#119593. Closes cockroachdb#38055. This commit removes the pooling of timeutil.Timer structs and, in doing so, permits the structs to be stack allocated so that no pooling is necessary. This superfluous (and in hindsight, harmful) memory pooling was introduced in f11ec1c, which also added very necessary pooling for the internal time.Timer structs. The pooling was harmful because it mandated a contract where Timer structs could not be used after their Stop method was called. This was surprising (time.Timer has no such limitation) and led to subtle use-after-free bugs over time (cockroachdb#61373 and cockroachdb#119595). It was also unnecessary because the outer Timer structs can be stack allocated. Ironically, the only thing that causes them to escape to the heap was the pooling mechanism itself. Removing pooling solves the issue. ``` name old time/op new time/op delta Timer-10 153µs ± 1% 152µs ± 1% ~ (p=0.589 n=10+9) name old alloc/op new alloc/op delta Timer-10 200B ± 0% 200B ± 0% ~ (all equal) name old allocs/op new allocs/op delta Timer-10 3.00 ± 0% 3.00 ± 0% ~ (all equal) ``` Epic: None Release note: None
Informs cockroachdb#119593. Closes cockroachdb#38055. This commit removes the pooling of timeutil.Timer structs and, in doing so, permits the structs to be stack allocated so that no pooling is necessary. This superfluous (and in hindsight, harmful) memory pooling was introduced in f11ec1c, which also added very necessary pooling for the internal time.Timer structs. The pooling was harmful because it mandated a contract where Timer structs could not be used after their Stop method was called. This was surprising (time.Timer has no such limitation) and led to subtle use-after-free bugs over time (cockroachdb#61373 and cockroachdb#119595). It was also unnecessary because the outer Timer structs can be stack allocated. Ironically, the only thing that causes them to escape to the heap was the pooling mechanism itself. Removing pooling solves the issue. ``` name old time/op new time/op delta Timer-10 153µs ± 1% 152µs ± 1% ~ (p=0.589 n=10+9) name old alloc/op new alloc/op delta Timer-10 200B ± 0% 200B ± 0% ~ (all equal) name old allocs/op new allocs/op delta Timer-10 3.00 ± 0% 3.00 ± 0% ~ (all equal) ``` Epic: None Release note: None
Informs cockroachdb#119593. Closes cockroachdb#38055. This commit removes the pooling of timeutil.Timer structs and, in doing so, permits the structs to be stack allocated so that no pooling is necessary. This superfluous (and in hindsight, harmful) memory pooling was introduced in f11ec1c, which also added very necessary pooling for the internal time.Timer structs. The pooling was harmful because it mandated a contract where Timer structs could not be used after their Stop method was called. This was surprising (time.Timer has no such limitation) and led to subtle use-after-free bugs over time (cockroachdb#61373 and cockroachdb#119595). It was also unnecessary because the outer Timer structs can be stack allocated. Ironically, the only thing that causes them to escape to the heap was the pooling mechanism itself. Removing pooling solves the issue. ``` name old time/op new time/op delta Timer-10 153µs ± 1% 152µs ± 1% ~ (p=0.589 n=10+9) name old alloc/op new alloc/op delta Timer-10 200B ± 0% 200B ± 0% ~ (all equal) name old allocs/op new allocs/op delta Timer-10 3.00 ± 0% 3.00 ± 0% ~ (all equal) ``` Epic: None Release note: None
119901: timeutil: stack-allocate Timer, remove pooling r=nvanbenschoten a=nvanbenschoten Informs #119593. Closes #38055. This PR removes the pooling of `timeutil.Timer` structs and, in doing so, permits the structs to be stack allocated so that no pooling is necessary. This superfluous (and in hindsight, harmful) memory pooling was introduced in f11ec1c, which also added very necessary pooling for the internal time.Timer structs. The pooling was harmful because it mandated a contract where Timer structs could not be used after their Stop method was called. This was surprising (time.Timer has no such limitation) and led to subtle use-after-free bugs over time (#61373 and #119595). It was also unnecessary because the outer Timer structs can be stack allocated. Ironically, the only thing that causes them to escape to the heap was the pooling mechanism itself. Removing pooling solves the issue. ``` name old time/op new time/op delta Timer-10 153µs ± 1% 152µs ± 1% ~ (p=0.589 n=10+9) name old alloc/op new alloc/op delta Timer-10 200B ± 0% 200B ± 0% ~ (all equal) name old allocs/op new allocs/op delta Timer-10 3.00 ± 0% 3.00 ± 0% ~ (all equal) ``` ---- The PR then improves the memory pooling of the inner `time.Timer` so that it is always recycled. This was originally identified by `@andreimatei` in #13466 (review). Doing so has a positive impact on the microbenchmark introduced in the first commit, demonstrating that timers can be stack-allocated and require zero heap allocations: ``` name old time/op new time/op delta Timer-10 152µs ± 1% 153µs ± 1% ~ (p=0.133 n=9+10) name old alloc/op new alloc/op delta Timer-10 200B ± 0% 0B -100.00% (p=0.000 n=10+10) name old allocs/op new allocs/op delta Timer-10 3.00 ± 0% 0.00 -100.00% (p=0.000 n=10+10) ``` ---- cc. `@andreimatei` `@ajwerner` Epic: None Release note: None Co-authored-by: Nathan VanBenschoten <[email protected]>
Two guys were continuing to use a Timer after Stop()ing it, which is
illegal.
Release note: None
Release justification: Bug fix.