-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix broken grateful shutdown for tracker API #589
Merged
josecelano
merged 4 commits into
torrust:develop
from
josecelano:588-error-starting-tracker-api-failed-to-install-stop-signal-recverror
Jan 9, 2024
Merged
Fix broken grateful shutdown for tracker API #589
josecelano
merged 4 commits into
torrust:develop
from
josecelano:588-error-starting-tracker-api-failed-to-install-stop-signal-recverror
Jan 9, 2024
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The internal halt channel was nor working becuase the sender was being droped just after starting the server. That also made the `shutdown_signal` fail. ```rust pub async fn shutdown_signal(rx_halt: tokio::sync::oneshot::Receiver<Halted>) { let halt = async { match rx_halt.await { Ok(signal) => signal, Err(err) => panic!("Failed to install stop signal: {err}"), } }; tokio::select! { signal = halt => { info!("Halt signal processed: {}", signal) }, () = global_shutdown_signal() => { info!("Global shutdown signal processed") } } } ``` Since the signal branch in the `tokio::select!` was finishing the global_shutdown_signal did not work either. So you had to kill the process manually to stop the tracker. It seems Rust droped partially the `Running::halt_taks` attribute and that closed the channel.
josecelano
requested review from
da2ce7 and
mickvandijke
and removed request for
da2ce7
January 9, 2024 16:07
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## develop #589 +/- ##
===========================================
- Coverage 80.68% 80.65% -0.03%
===========================================
Files 117 117
Lines 7816 7826 +10
===========================================
+ Hits 6306 6312 +6
- Misses 1510 1514 +4 ☔ View full report in Codecov by Sentry. |
Added the URL where the API is running. ``` 2024-01-09T16:29:46.911284166+00:00 [API][INFO] API server started on http://127.0.0.1:1212 ``` Some hosting services parse the application log output to discover services. For example, GitHub Codespaces can use that into to automatically do port forwarding. Besides, it's also useful for development when you are using random ports and to see what services are you running.
ACK 452b4a0 |
This was referenced Jan 9, 2024
josecelano
added a commit
that referenced
this pull request
Jan 11, 2024
…lt" channel 5fd0c84 chore: normalize log ouput (Jose Celano) 0c1f389 fix: [#591] panicking after starting UDP server due to close halt channel (Jose Celano) Pull request description: Relates to: #594 ### Context We use a oneshot channel between the main app that launches the UDP server and the running UDP server. It could be used to send the "halt" message to the UDP server. When the app starts the server this server waits for that signal. That code is not working because when the server starts waiting the channel is closed. That does not affect the server that is still running. And currently, we are not using that signal in production (only for test environments). ### Details The "halt" channel is immediately closed after starting the UDP tracker. So the app panics with: ```console thread 'tokio-runtime-worker' panicked at src/servers/signals.rs:52:25: Failed to install stop signal: channel closed ``` The UDP is started but it's not possible to shut it down gracefully. I've added a lot of debug lines in this PR: ```rust $ cargo run Compiling torrust-tracker v3.0.0-alpha.12-develop (/home/josecelano/Documents/git/committer/me/github/torrust/torrust-tracker) Finished dev [optimized + debuginfo] target(s) in 7.05s Running `target/debug/torrust-tracker` Loading default configuration file: `./share/default/config/tracker.development.sqlite3.toml` ... 2024-01-10T17:46:40.348930411+00:00 [torrust_tracker::bootstrap::logging][INFO] logging initialized. 2024-01-10T17:46:40.349651636+00:00 [UDP Tracker][DEBUG] Launcher starting ... 2024-01-10T17:46:40.349677556+00:00 [UDP Tracker][INFO] Starting on: udp://0.0.0.0:6969 2024-01-10T17:46:40.349682046+00:00 [UDP Tracker][INFO] Started on: udp://0.0.0.0:6969 2024-01-10T17:46:40.349684556+00:00 [UDP Tracker][DEBUG] Launcher finished ... 2024-01-10T17:46:40.349687856+00:00 [UDP Tracker][DEBUG] Waiting for packets ... 2024-01-10T17:46:40.349687736+00:00 [torrust_tracker::bootstrap::jobs::http_tracker][INFO] Note: Not loading Http Tracker Service, Not Enabled in Configuration. 2024-01-10T17:46:40.349691306+00:00 [UDP Tracker][DEBUG] Wait for launcher (UDP service) to finish ... 2024-01-10T17:46:40.349689536+00:00 [UDP Tracker][DEBUG] Server on socket address: udp://0.0.0.0:6969 waiting for halt signal ... 2024-01-10T17:46:40.349700506+00:00 [torrust_tracker::bootstrap::jobs::health_check_api][INFO] Starting Health Check API server: http://127.0.0.1:1313 thread 'tokio-runtime-worker' panicked at src/servers/signals.rs:52:25: Failed to install stop signal: channel closed note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace 2024-01-10T17:46:40.349793665+00:00 [torrust_tracker::bootstrap::jobs::health_check_api][INFO] Torrust Health Check API server started on: http://127.0.0.1:1313 2024-01-10T17:46:40.349793225+00:00 [UDP Tracker][DEBUG] Halt signal spawned task stopped on address: udp://0.0.0.0:6969 ``` The function panicking is: ```rust pub async fn shutdown_signal(rx_halt: tokio::sync::oneshot::Receiver<Halted>) { let halt = async { match rx_halt.await { Ok(signal) => signal, Err(err) => panic!("Failed to install stop signal: {err}"), } }; tokio::select! { signal = halt => { info!("Halt signal processed: {}", signal) }, () = global_shutdown_signal() => { info!("Global shutdown signal processed") } } } ``` When the thread starts waiting for the signal it receives the error "channel closed". I fixed a similar problem for the Tracker API and the HTTP Tracker: - #593 - #589 Apparently the problem was the sender in the channel was dropped (I don't know why, it looks like a compiler optimization). In the end, I think the problem was we were not waiting for the right spawned task. I've reorganized the code but the change that fixed the problem is: ```rust let task = tokio::spawn(async move { debug!(target: "UDP Tracker", "Launcher starting ..."); let starting = launcher.start(tracker, tx_start, rx_halt).await; starting.await.expect("UDP server should have started running"); launcher }); ``` We needed to wait for the launcher to finish because in this case, the launcher is not the thread that runs the final server. So we were only waiting until the launcher was "launched" but not for the final task running the UDP server. That is the reason why the channel was closed prematurely. I have also normalized the log output: ``` $ cargo run Finished dev [optimized + debuginfo] target(s) in 0.07s Running `target/debug/torrust-tracker` Loading default configuration file: `./share/default/config/tracker.development.sqlite3.toml` ... 2024-01-11T17:12:11.134816964+00:00 [torrust_tracker::bootstrap::logging][INFO] logging initialized. 2024-01-11T17:12:11.135473883+00:00 [UDP Tracker][INFO] Starting on: udp://0.0.0.0:6969 2024-01-11T17:12:11.135494422+00:00 [UDP Tracker][INFO] Started on: udp://0.0.0.0:6969 2024-01-11T17:12:11.135503672+00:00 [torrust_tracker::bootstrap::jobs][INFO] TLS not enabled 2024-01-11T17:12:11.135587738+00:00 [HTTP Tracker][INFO] Starting on: http://0.0.0.0:7070 2024-01-11T17:12:11.135612497+00:00 [HTTP Tracker][INFO] Started on: http://0.0.0.0:7070 2024-01-11T17:12:11.135619586+00:00 [torrust_tracker::bootstrap::jobs][INFO] TLS not enabled 2024-01-11T17:12:11.135675454+00:00 [API][INFO] Starting on http://127.0.0.1:1212 2024-01-11T17:12:11.135688443+00:00 [API][INFO] Started on http://127.0.0.1:1212 2024-01-11T17:12:11.135701143+00:00 [Health Check API][INFO] Starting on: http://127.0.0.1:1313 2024-01-11T17:12:11.135718012+00:00 [Health Check API][INFO] Started on: http://127.0.0.1:1313 ^C2024-01-11T17:12:17.463967431+00:00 [torrust_tracker][INFO] Torrust shutting down.. 2024-01-11T17:12:17.463960892+00:00 [torrust_tracker::servers::signals][INFO] Global shutdown signal processed 2024-01-11T17:12:17.464011399+00:00 [torrust_tracker::servers::signals][INFO] Shutting down HTTP server on socket address: 0.0.0.0:7070 2024-01-11T17:12:17.464015379+00:00 [torrust_tracker::servers::signals][INFO] Sending graceful shutdown signal !! shuting down in 90 seconds !! 2024-01-11T17:12:17.464035618+00:00 [torrust_tracker::servers::signals][INFO] Global shutdown signal processed 2024-01-11T17:12:17.464055757+00:00 [torrust_tracker::servers::health_check_api::server][INFO] Stopping Torrust Health Check API server o http://127.0.0.1:1313 ... 2024-01-11T17:12:17.464064287+00:00 [torrust_tracker::servers::signals][INFO] Global shutdown signal processed 2024-01-11T17:12:17.464088366+00:00 [torrust_tracker::servers::signals][INFO] Shutting down UDP server on socket address: udp://0.0.0.0:6969 2024-01-11T17:12:17.464067327+00:00 [torrust_tracker::servers::signals][INFO] Shutting down tracker API server on socket address: 127.0.0.1:1212 2024-01-11T17:12:17.464097095+00:00 [torrust_tracker::servers::signals][INFO] Sending graceful shutdown signal !! shuting down in 90 seconds !! 2024-01-11T17:12:17.464105665+00:00 [Health Check API][INFO] Stopped server running on: http://127.0.0.1:1313 2024-01-11T17:12:17.464093226+00:00 [torrust_tracker::bootstrap::jobs::torrent_cleanup][INFO] Stopping torrent cleanup job.. 2024-01-11T17:12:17.464154493+00:00 [torrust_tracker][INFO] Torrust successfully shutdown. ``` I'm using targets for some reasons: 1. Easily identify service in logs 2. Easily filter log by service 3. Autodiscover services from logs ACKs for top commit: josecelano: ACK 5fd0c84 Tree-SHA512: 83d50ded031c8fe1505ca68a7b73dc0fac7dd9d5e414ac615f357daf3682e561fda5ddf9c8ae91dab811e2ecb3c8f540d1790ebbe469c218adcdc476c016df9f
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The internal halt channel was not working because the sender was being dropped just after starting the server.
That also made the
shutdown_signal
fail.Since the signal branch in the
tokio::select!
was finishing the global_shutdown_signal did not work either. So you had to kill the process manually to stop the tracker.It seems Rust partially dropped the
Running::halt_taks
attribute and that closed the channel.The issue was introduced in this commit: 13140f6