-
Notifications
You must be signed in to change notification settings - Fork 261
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cargo-shuttle: fix address in use error when service panicked in a previous run #805
cargo-shuttle: fix address in use error when service panicked in a previous run #805
Conversation
64275c4
to
b04ff51
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @iulianbarbu! I'm wondering if this test failure in runtime is spurious or due to the change in proto, did you see it fail locally too?
I am curious if they will fail again now. I am running them now locally as well. |
It seems like the However, on my machine, the
The |
25c80dd
to
c5bf4f0
Compare
@oddgrd @chesedo , it seems like the previously failing tests pass now but this time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems about good to go.
cargo-shuttle/src/lib.rs
Outdated
if signal_received { | ||
let stop_request = StopRequest {}; | ||
trace!( | ||
?stop_request, | ||
"stopping service because it received a signal" | ||
); | ||
let _response = | ||
rt.1.stop(tonic::Request::new(stop_request)) | ||
.or_else(|err| async { | ||
provisioner_server.abort(); | ||
rt.0.kill().await?; | ||
Err(err) | ||
}) | ||
.await? | ||
.into_inner(); | ||
continue; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have we not already handled this case above? Ie if signal_received
was false above then it will still be false here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The previous singal_received
handling refers to receiving a signal during the runtimes initializations. If signal_received
is true
it should stop all the existing runtimes and return.
Otherwise, it moves to the logic of waiting on each runtime
in a for loop, namely this case we're discussing. While waiting on these runtimes we must handle signals as well, hence we're checking the signal_received
before actually waiting on a runtime in a new iteration. If it's true, then we've received a signal in a previous iteration and we must stop all the runtimes which we haven't waited on instead on continuing to wait on them.
6b5baa5
to
a501e6c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for figuring this out Iulian, I understand why they have a warning about this being tricky in the tokio::signal docs 😄 I left a few comments but mostly nits, and a note about us considering doing this for windows too in the future.
We might also want to test this on windows to see that it doesn't cause any regressions, IIRC importing something that's only available on unix may be an issue. But I can check this quickly tomorrow. Edit: It looks like all uses of unix signals is inside cfg(unix), so this should be fine.
if !response.success { | ||
error!(error = response.message, "failed to load your service"); | ||
exit(1); | ||
// If prior signal received is set to true we must stop all the existing runtimes and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this loop stop the services (unless the stop request fails), and then the runtimes are stopped on return (drop)? I may be misreading this 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, yes, in a way, only that runtimes being dropped doesn't mean necessarily that the underlying processes are killed (https://docs.rs/tokio/latest/tokio/process/struct.Child.html#caveats). What I wanted to say is that during the previous loop of initializing the runtimes if a signal is received then we should stop the existing runtimes (in a graceful way, through the StopRequest
or kill them only if that fails) and then we can return from the local_run
as an abrupt termination.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An addendum for the above comment: I had to adjust a bit this logic. Now, if a service panics during initialization then all the workspace runtimes must be killed and cargo-shuttle
exits with code 1. Because we forcefully exit, the destructors for the runtime are not called and we must forcefully kill the runtime besides suggesting it to stop in a graceful way. The rest still applies.
...of service panics.
a501e6c
to
3a7d277
Compare
e6eb1fb
to
3ee158d
Compare
4351ebb
to
3bf6a0d
Compare
cargo-shuttle/src/lib.rs
Outdated
@@ -657,6 +657,7 @@ impl Shuttle { | |||
.stop(tonic::Request::new(stop_request)) | |||
.or_else(|err| async { | |||
runtime.kill().await?; | |||
error!(status = ?err, "killed the runtime by force because stopping it errored out"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we do trace!
here, then the user won't see an error (which won't be relevant to them in the event of CTRL C
since as far as they are concerned it was a success, not an error, right?)?
…evious run (#805) * runtime & proto: teardown idle runtime in case... ...of service panics. * cargo-shuttle: abort all runtimes when one fails during a... ...local run. * cargo-shuttle: added sigint and sigterm handlers * proto: remove kill_on_drop flag * cargo-shuttle: fix comments * runtime: remove leftovers * cargo-shuttle: extract stop_runtime logic in a function * proto: add kill_on_drop flag again * cargo-shuttle: fix again one service panic case * runtime(tests): fix bind_panic * cargo-shuttle: handle stop requests error gracefully * cargo-shuttle: handle stopping error gracefully... ...for service that panics * cargo-shuttle: clarified logging in case runtime stopping errors * cargo-shuttle: change log level
Description of change
Running user services locally through
cargo shuttle run
will run a runtime that will load/start a service as a tokio task. If the task panics, the shuttle-runtime will live on because of tokio behavior: tokio-rs/tokio#2002.This is not the case anymore. I haven't seen this making any difference for our use cases. Removing it to have the
bind_panic
test succeed.Extra: set to true thekill_on_drop
flag for started shuttle-runtimes to make sure we either.wait()
or.kill()
on them or if dropped, thekill
is automatically issued: https://docs.rs/tokio/latest/tokio/process/struct.Command.html#method.kill_on_drop.How Has This Been Tested (if applicable)?
Run locally the
hello-world-rocket-app
example with panic in it. Consequent local runs without the fix hang with theAddress in use
error. The fix ensures the runtime and provisioner are killed in case the runtime response is not successful.Also, if
cargo-shuttle
process receivesSIGINT
orSIGTERM
it's successfully stopping the existing runtimes.