-
Notifications
You must be signed in to change notification settings - Fork 794
Conversation
Thanks for the in-depth investigation!
How does returning the error fix the CPU issue? Is this error a temporary error, which means that it made sense to ignore it? If we're terminating on a temporary error that doesn't seem right? |
The error I'm seeing (and I think all of them), is not a temporary error. Once you get it, checking that future again always gives back that same error (in my case, an Eof) immediately. The CPU issue is because it gets stuck in Panic works fine for me though because I detect that my subscription exited and reconnect (code here). |
// TODO: Log the error? | ||
Some(Err(_)) => {}, | ||
Some(Err(_)) => { | ||
return Err(ClientError::UnexpectedClose); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe instead of dropping the error we can add a new field to enum UnknownError(Err)
that wraps the error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good idea. I'll try to get to that tomorrow.
@@ -401,8 +401,9 @@ where | |||
// Handle ws messages | |||
resp = self.ws.next() => match resp { | |||
Some(Ok(resp)) => self.handle(resp).await?, | |||
// TODO: Log the error? | |||
Some(Err(_)) => {}, | |||
Some(Err(_)) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that makes sense, but let's also log here the error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 let's log the error and otherwise looks good
* return error instead of swallow error * changelog Co-authored-by: Bryan Stitt <[email protected]>
Motivation
I noticed that occasionally my program would get stuck with one thread spinning at near 100% CPU. After a bunch of investigation with rust-gdb and tokio-console and stripping it down to a much smaller program, I narrowed it down to a websocket task being very active doing nothing.
Here you can see the spawned future stuck running:
It is being woken up ~253k times per second, so that explains the 100% CPU:
After some digging, I found this related code:
ethers-rs/ethers-providers/src/transports/ws.rs
Lines 402 to 408 in 3df1527
Dropping that error is bad because it might be an
Io(Kind(UnexpectedEof))
. If that's the case, the websocket needs to reconnect.note: run with
RUST_BACKTRACE=1
environment variable to display a backtraceSolution
Quick fix is to just return an error when the socket gives an error.
I think a more robust fix might be to use something like https://docs.rs/stubborn-io/latest/stubborn_io/.
PR Checklist
I'm not sure how to write a test for this.