cargo-shuttle: fix address in use error when service panicked in a previous run #805

iulianbarbu · 2023-04-19T15:32:14Z

Description of change

Running user services locally through cargo shuttle run will run a runtime that will load/start a service as a tokio task. If the task panics, the shuttle-runtime will live on because of tokio behavior: tokio-rs/tokio#2002.

This is not the case anymore. I haven't seen this making any difference for our use cases. Removing it to have the bind_panic test succeed.
Extra: set to true the kill_on_drop flag for started shuttle-runtimes to make sure we either .wait() or .kill() on them or if dropped, the kill is automatically issued: https://docs.rs/tokio/latest/tokio/process/struct.Command.html#method.kill_on_drop.

How Has This Been Tested (if applicable)?

Run locally the hello-world-rocket-app example with panic in it. Consequent local runs without the fix hang with the Address in use error. The fix ensures the runtime and provisioner are killed in case the runtime response is not successful.

Also, if cargo-shuttle process receives SIGINT or SIGTERM it's successfully stopping the existing runtimes.

oddgrd

Thanks @iulianbarbu! I'm wondering if this test failure in runtime is spurious or due to the change in proto, did you see it fail locally too?

cargo-shuttle/src/lib.rs

iulianbarbu · 2023-04-20T09:34:30Z

Thanks @iulianbarbu! I'm wondering if this test failure in runtime is spurious or due to the change in proto, did you see it fail locally too?

I am curious if they will fail again now. I am running them now locally as well.

iulianbarbu · 2023-04-20T09:57:23Z

It seems like the cargo-shuttle and shuttle-runtime tests fail again on the CI.

However, on my machine, the cargo-shuttle related ones pass. It's worth mentioning that running the tests as the CI does ends with the below message for cargo-shuttle. However, I don't see any failed tests in the output.

error: test failed, to rerun pass `-p cargo-shuttle --test integration`

Caused by:
  process didn't exit successfully: `/Users/iulianbarbu/repos/shuttle/target/debug/deps/integration-9fe6044d5f6f7b77 --nocapture` (exit status: 1)

The shuttle-runtime bind_panic fails it seems because of the kill_on_drop flag I added. I'll investigate if removing it is not doing a regression to the fix, because we might not need it.

iulianbarbu · 2023-04-25T08:27:43Z

@oddgrd @chesedo , it seems like the previously failing tests pass now but this time shuttle-deployer tests failed. I think this is a spurious one because for the previous commit it was successful and the last commit is only comments related. I think this is mergeable besides improving a bit how the code looks.

chesedo

Seems about good to go.

chesedo · 2023-04-25T08:35:06Z

cargo-shuttle/src/lib.rs

+                if signal_received {
+                    let stop_request = StopRequest {};
+                    trace!(
+                        ?stop_request,
+                        "stopping service because it received a signal"
+                    );
+                    let _response =
+                        rt.1.stop(tonic::Request::new(stop_request))
+                            .or_else(|err| async {
+                                provisioner_server.abort();
+                                rt.0.kill().await?;
+                                Err(err)
+                            })
+                            .await?
+                            .into_inner();
+                    continue;
                }


Have we not already handled this case above? Ie if signal_received was false above then it will still be false here

The previous singal_received handling refers to receiving a signal during the runtimes initializations. If signal_received is true it should stop all the existing runtimes and return.

Otherwise, it moves to the logic of waiting on each runtime in a for loop, namely this case we're discussing. While waiting on these runtimes we must handle signals as well, hence we're checking the signal_received before actually waiting on a runtime in a new iteration. If it's true, then we've received a signal in a previous iteration and we must stop all the runtimes which we haven't waited on instead on continuing to wait on them.

runtime/src/alpha/mod.rs

oddgrd

Thanks for figuring this out Iulian, I understand why they have a warning about this being tricky in the tokio::signal docs 😄 I left a few comments but mostly nits, and a note about us considering doing this for windows too in the future.

~~We might also want to test this on windows to see that it doesn't cause any regressions, IIRC importing something that's only available on unix may be an issue. But I can check this quickly tomorrow.~~ Edit: It looks like all uses of unix signals is inside cfg(unix), so this should be fine.

cargo-shuttle/src/lib.rs

oddgrd · 2023-04-25T15:26:49Z

cargo-shuttle/src/lib.rs

-            if !response.success {
-                error!(error = response.message, "failed to load your service");
-                exit(1);
+        // If prior signal received is set to true we must stop all the existing runtimes and


Doesn't this loop stop the services (unless the stop request fails), and then the runtimes are stopped on return (drop)? I may be misreading this 😅

Hmm, yes, in a way, only that runtimes being dropped doesn't mean necessarily that the underlying processes are killed (https://docs.rs/tokio/latest/tokio/process/struct.Child.html#caveats). What I wanted to say is that during the previous loop of initializing the runtimes if a signal is received then we should stop the existing runtimes (in a graceful way, through the StopRequest or kill them only if that fails) and then we can return from the local_run as an abrupt termination.

An addendum for the above comment: I had to adjust a bit this logic. Now, if a service panics during initialization then all the workspace runtimes must be killed and cargo-shuttle exits with code 1. Because we forcefully exit, the destructors for the runtime are not called and we must forcefully kill the runtime besides suggesting it to stop in a graceful way. The rest still applies.

...of service panics.

...local run.

cargo-shuttle/src/lib.rs

...for service that panics

oddgrd · 2023-04-27T07:21:53Z

cargo-shuttle/src/lib.rs

@@ -657,6 +657,7 @@ impl Shuttle {
            .stop(tonic::Request::new(stop_request))
            .or_else(|err| async {
                runtime.kill().await?;
+                error!(status = ?err, "killed the runtime by force because stopping it errored out");


What if we do trace! here, then the user won't see an error (which won't be relevant to them in the event of CTRL C since as far as they are concerned it was a success, not an error, right?)?

…evious run (#805) * runtime & proto: teardown idle runtime in case... ...of service panics. * cargo-shuttle: abort all runtimes when one fails during a... ...local run. * cargo-shuttle: added sigint and sigterm handlers * proto: remove kill_on_drop flag * cargo-shuttle: fix comments * runtime: remove leftovers * cargo-shuttle: extract stop_runtime logic in a function * proto: add kill_on_drop flag again * cargo-shuttle: fix again one service panic case * runtime(tests): fix bind_panic * cargo-shuttle: handle stop requests error gracefully * cargo-shuttle: handle stopping error gracefully... ...for service that panics * cargo-shuttle: clarified logging in case runtime stopping errors * cargo-shuttle: change log level

iulianbarbu force-pushed the fix/catch_unwind_local_run_errors branch from 64275c4 to b04ff51 Compare April 19, 2023 15:45

iulianbarbu requested review from oddgrd and chesedo and removed request for oddgrd April 19, 2023 15:46

iulianbarbu self-assigned this Apr 19, 2023

oddgrd reviewed Apr 20, 2023

View reviewed changes

cargo-shuttle/src/lib.rs Outdated Show resolved Hide resolved

oddgrd mentioned this pull request Apr 21, 2023

[Bug]: Local development not closes the app properly with Actix and Static folders #809

Closed

1 task

iulianbarbu force-pushed the fix/catch_unwind_local_run_errors branch 4 times, most recently from 25c80dd to c5bf4f0 Compare April 25, 2023 07:22

chesedo reviewed Apr 25, 2023

View reviewed changes

iulianbarbu force-pushed the fix/catch_unwind_local_run_errors branch 3 times, most recently from 6b5baa5 to a501e6c Compare April 25, 2023 10:35

iulianbarbu changed the title ~~cargo-shuttle(local_run): fix address in use error when service panicked in a previous run~~ cargo-shuttle: fix address in use error when service panicked in a previous run Apr 25, 2023

iulianbarbu requested review from chesedo and oddgrd April 25, 2023 14:55

oddgrd approved these changes Apr 25, 2023

View reviewed changes

cargo-shuttle/src/lib.rs Outdated Show resolved Hide resolved

cargo-shuttle/src/lib.rs Show resolved Hide resolved

cargo-shuttle/src/lib.rs Outdated Show resolved Hide resolved

oddgrd reviewed Apr 25, 2023

View reviewed changes

iulianbarbu added 6 commits April 25, 2023 22:20

runtime & proto: teardown idle runtime in case...

b13f6b1

...of service panics.

cargo-shuttle: abort all runtimes when one fails during a...

f3477d0

...local run.

cargo-shuttle: added sigint and sigterm handlers

dc15b8b

proto: remove kill_on_drop flag

91ed509

cargo-shuttle: fix comments

5d602cc

runtime: remove leftovers

7681113

iulianbarbu added 2 commits April 25, 2023 22:20

cargo-shuttle: extract stop_runtime logic in a function

4eb665d

proto: add kill_on_drop flag again

3a7d277

iulianbarbu force-pushed the fix/catch_unwind_local_run_errors branch from a501e6c to 3a7d277 Compare April 25, 2023 19:38

iulianbarbu added 2 commits April 26, 2023 08:28

cargo-shuttle: fix again one service panic case

a110852

runtime(tests): fix bind_panic

3ee158d

iulianbarbu force-pushed the fix/catch_unwind_local_run_errors branch from e6eb1fb to 3ee158d Compare April 26, 2023 06:13

iulianbarbu commented Apr 26, 2023

View reviewed changes

cargo-shuttle/src/lib.rs Show resolved Hide resolved

iulianbarbu added 3 commits April 26, 2023 14:58

cargo-shuttle: handle stop requests error gracefully

3ff2ef8

cargo-shuttle: handle stopping error gracefully...

281d5d5

...for service that panics

cargo-shuttle: clarified logging in case runtime stopping errors

3bf6a0d

iulianbarbu force-pushed the fix/catch_unwind_local_run_errors branch from 4351ebb to 3bf6a0d Compare April 26, 2023 13:05

oddgrd reviewed Apr 27, 2023

View reviewed changes

cargo-shuttle: change log level

b18b196

oddgrd merged commit 458cf25 into shuttle-hq:main Apr 27, 2023

iulianbarbu mentioned this pull request Apr 27, 2023

cargo-shuttle: separate local runs per target family #823

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cargo-shuttle: fix address in use error when service panicked in a previous run #805

cargo-shuttle: fix address in use error when service panicked in a previous run #805

iulianbarbu commented Apr 19, 2023 •

edited

Loading

oddgrd left a comment

iulianbarbu commented Apr 20, 2023

iulianbarbu commented Apr 20, 2023 •

edited

Loading

iulianbarbu commented Apr 25, 2023 •

edited

Loading

chesedo left a comment

chesedo Apr 25, 2023

iulianbarbu Apr 25, 2023 •

edited

Loading

oddgrd left a comment •

edited

Loading

oddgrd Apr 25, 2023

iulianbarbu Apr 25, 2023

iulianbarbu Apr 26, 2023 •

edited

Loading

oddgrd Apr 27, 2023

cargo-shuttle: fix address in use error when service panicked in a previous run #805

cargo-shuttle: fix address in use error when service panicked in a previous run #805

Conversation

iulianbarbu commented Apr 19, 2023 • edited Loading

Description of change

How Has This Been Tested (if applicable)?

oddgrd left a comment

Choose a reason for hiding this comment

iulianbarbu commented Apr 20, 2023

iulianbarbu commented Apr 20, 2023 • edited Loading

iulianbarbu commented Apr 25, 2023 • edited Loading

chesedo left a comment

Choose a reason for hiding this comment

chesedo Apr 25, 2023

Choose a reason for hiding this comment

iulianbarbu Apr 25, 2023 • edited Loading

Choose a reason for hiding this comment

oddgrd left a comment • edited Loading

Choose a reason for hiding this comment

oddgrd Apr 25, 2023

Choose a reason for hiding this comment

iulianbarbu Apr 25, 2023

Choose a reason for hiding this comment

iulianbarbu Apr 26, 2023 • edited Loading

Choose a reason for hiding this comment

oddgrd Apr 27, 2023

Choose a reason for hiding this comment

iulianbarbu commented Apr 19, 2023 •

edited

Loading

iulianbarbu commented Apr 20, 2023 •

edited

Loading

iulianbarbu commented Apr 25, 2023 •

edited

Loading

iulianbarbu Apr 25, 2023 •

edited

Loading

oddgrd left a comment •

edited

Loading

iulianbarbu Apr 26, 2023 •

edited

Loading