-
Notifications
You must be signed in to change notification settings - Fork 255
Implement actual timeouts for requests #552
Conversation
I’m not familiar with how rayon works, but does it try to keep the pool full, even after a worker thread panic (does it spawn a new thread then?) |
Rayon generally propagates the panic though in this case I'm using a pool configured with a panic handler rayon::Configuration::default()
.thread_name(|num| format!("request-worker-{}", num))
.panic_handler(|err| warn!("{:?}", err)) I would be very surprised if panics handled this way depleted the thread pool. As the docs don't mention this directly it's probably easiest to demonstrate it working on the playground. |
b46ce2a
to
52ec8b6
Compare
const COMPILER_TIMEOUT: u64 = 1500; | ||
|
||
// Timeout for potenially very slow CPU CI boxes | ||
#[cfg(test)] | ||
const COMPILER_TIMEOUT: u64 = 3_600_000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1hr timeout per single request in testing environment seems... overly excessive. I think we could bump it up to 10s or similar here, but for test-level timeout I'd toy more with the TEST_TIMEOUT_IN_SEC value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just to deal with potentially slow CI environments as tests currently (and generally) won't deal with timeouts. And if you did test a timeout you wouldn't do it by waiting 2 seconds.
I haven't actually spotted this being an issue, this is just me being defensive since this PR introduces the timeouts having an effect.
I just used an arbitrarily large number because we expect this never to happen. Maybe I should have used u64::MAX
.
src/actions/requests.rs
Outdated
let ctx = ctx.inited(); | ||
let file_path = parse_file_path!(¶ms.text_document.uri, "symbols")?; | ||
|
||
let analysis = ctx.analysis.clone(); | ||
|
||
let rustw_handle = thread::spawn(move || { | ||
let receiver = receive_from_thread(move|| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style nit: I'd keep move ||
space-separated version to be more consistent with the style used in other places in the codebase
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do
src/actions/requests.rs
Outdated
}, | ||
None => Ok(vec![]), | ||
|
||
match receiver.recv_timeout(max_wait - elapsed) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm not mistaken, this changes the semantics to treat both analysis data and racer as equal and use the first one returned, whereas the previous one preferred the analysis data if possible. Probably not a bad thing, but something to keep in mind
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right. It seemed the only sensible thing to do. Waiting then entire timeout is not something we ever want to happen, its very much a measure to cap the worst case scenario.
I think it would be cleaner just to use one or the other if you want full clarity. Waiting the max request timeouts for the racer results hardly seems worthwhile. Note though, that if the compiler fast-fails then this logic will work as before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's just that with goto_def_racer_fallback
cfg option enabled, Racer will stop acting as a fallback and both threads will race to return an answer to be used. Since Racer is more of a heuristic compared to analysis data, we might be sacrificing correctness for speed in some cases.
@nrc should chime in about that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to preserve the existing behaviour (or at least the intention of the existing behaviour)
- Start both threads at the same time
- If the compiler thread completes, return a result immediately
- if we hit the timeout, use the Racer result if it exists, otherwise return nothing.
src/actions/requests.rs
Outdated
t.unpark(); | ||
|
||
result | ||
let (sender, receiver) = mpsc::channel(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we were to still prioritize analysis data, I think we could simplify this a great deal with receive_from_thread
, like used elsewhere. We could start both threads, wait with timeout on the analysis one (this should succeed or fail fast in practice), and if needed wait for the remainder for the Racer one (iirc it can also take longer and is less reliable). It'd also remove the need to use WORK_POOL
directly and do the explicit gymnastics with counters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tbf I would have generalised the racing logic it if it were used more than once.
As I wrote above I think you need to race the computations, or just use one or the other. If the compiler fails fast then it'll work fine in this case too. Except here we can handle a compiler-timeout and a fast racer with grace.
I see that this is a change in functionality though, racer could never beat the compiler before. Let me know what you guys want to do with this.
Thanks for the PR, this definitely looks like an improvement! Left some comments inline, @nrc should take a closer look as well as the changes made may be somewhat delicate |
Thanks for reviewing the PR @Xanewok. You guys just let me know what you'd like to do in the racer-compiler race, that's just my idea of how to do it - not a big deal. As I said this PR just puts a cap on the worst case request scenario blocking rls forever, so its a small win. But since the code to try to do this was already there, it seems important we make sure it works. |
911eb45
to
089672f
Compare
This PR can be closed if you guys want to merge #565, as that includes this. I actually reworked the Definition functionality back to resemble how it was to cut down on the amount of questions the change would raise. |
Hey, I was looking at this PR and #565, they both look like great improvements! Given that they are both complex and fairly big, I think it would be good to keep the two separate so I'll comment separately. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This basically looks good! I think it will need to change a little bit to accommodate the intended compiler/Racer semantics, so I haven't looked too closely at that part.
Have you observed a benefit from using a Rayon thread pool? I'm a bit worried about adding another dep + complexity. But if it makes things observably better, then it is worth it.
089672f
to
90248a7
Compare
I've reverted the compiler/racer to behaviour to the previous style.
I think we benefit from having a thread pool, it prevents us from having an unbound amount of threads open in RLS (possible currently if tasks take a very long time to complete). Rayon is just a decent work-stealing pool, which in my opinion is a good fit for this situation. |
44a66ea
to
2f0b96c
Compare
Use rayon thread pool to avoid unbound thread count Use channels to provide timeouts warn! on thread panics Add long request-timeout for CI testing Use consistent `move ||` code style Use compiler-then-racer impl for `Definition`
2f0b96c
to
cf9c07d
Compare
Cool, thanks for the PR and changes! |
Related to the discussion in #540 this PR fixes requests.rs to timeout after the set duration and respond with a default response. The code previously looked like it was doing this, but wasn't (see #540 for explanation).
Since now we can potentially expect the main thread to create many concurrent threads it makes sense to cap that amount of threads. I used a rayon thread pool to this effect.
Summary: