walreceiver: lifecycle traced events without any context #3330

koivunej · 2023-01-13T14:56:57Z

Noted while watching the production logs on zenith-1-ps-2, because of an ongoing cleanup we now find randomly spaced plain timestamp INFO Connection cancelled which cannot be directly correlated to any timeline from

neon/pageserver/src/walreceiver/walreceiver_connection.rs

Line 121 in 16baa91

_ = connection_cancellation.cancelled() => info!("Connection cancelled"),

Right above previous callsite is similar timestamp INFO Walreceiver db connection closed, which also appears in logs:

neon/pageserver/src/walreceiver/walreceiver_connection.rs

Line 113 in 16baa91

Ok(()) => info!("Walreceiver db connection closed"),

Not saying these cannot be useful, but they need more context, but perhaps their overall usefulness should be considered.

Looking around however, in the same scope of handle_walreceiver_connection we have access to timeline and wal_source_connconf, so we should be able to add context by using #[instrument(...)] and propagating that at to the task_mgr::spawn'd future which will run both connection and be the context for the two lifecycle events.

It might be this is on already someone's list so this issue can be closed.

Cc: #3218

The text was updated successfully, but these errors were encountered:

shanyp · 2023-05-08T08:56:17Z

@koivunej was this addressed by wal_reciever PR that you merged?

SomeoneToIgnore · 2023-05-08T13:42:17Z

#4090 was not supposed to address this issue, unless done accidentally.

koivunej · 2023-05-08T13:51:10Z

Agreed, it was not supposed to be closed.

walreceiver logs are a bit hard to understand because of partial span usage, extra messages, ignored errors popping up as huge stacktraces. Fixes #3330 (by spans, also demote info -> debug). - arrange walreceivers spans into a hiearchy: - `wal_connection_manager{tenant_id, timeline_id}` -> `connection{node_id}` -> `poller` - unifies the error reporting inside `wal_receiver`: - All ok errors are now `walreceiver connection handling ended: {e:#}` - All unknown errors are still stacktraceful task_mgr reported errors with context `walreceiver connection handling failure` - Remove `connect` special casing, was: `DB connection stream finished` for ok errors - Remove `done replicating` special casing, was `Replication stream finished` for ok errors - lowered log levels for (non-exhaustive list): - `WAL receiver manager started, connecting to broker` (at startup) - `WAL receiver shutdown requested, shutting down` (at shutdown) - `Connection manager loop ended, shutting down` (at shutdown) - `sender is dropped while join handle is still alive` (at lucky shutdown, see #2885) - `timeline entered terminal state {:?}, stopping wal connection manager loop` (at shutdown) - `connected!` (at startup) - `Walreceiver db connection closed` (at disconnects?, was without span) - `Connection cancelled` (at shutdown, was without span) - `observed timeline state change, new state is {new_state:?}` (never after Timeline::activate was made infallible) - changed: - `Timeline dropped state updates sender, stopping wal connection manager loop` - was out of date; sender is not dropped but `Broken | Stopping` state transition - also made `debug!` - `Timeline dropped state updates sender before becoming active, stopping wal connection manager loop` - was out of date: sender is again not dropped but `Broken | Stopping` state transition - also made `debug!` - log fixes: - stop double reporting panics via JoinError

SomeoneToIgnore self-assigned this Jan 16, 2023

koivunej added a/tech_debt Area: related to tech debt c/storage/pageserver Component: storage: pageserver labels Feb 20, 2023

SomeoneToIgnore assigned koivunej and unassigned SomeoneToIgnore Apr 23, 2023

koivunej mentioned this issue Jun 2, 2023

Better walreceiver logging #4402

Merged

5 tasks

koivunej closed this as completed in #4402 Jun 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

walreceiver: lifecycle traced events without any context #3330

walreceiver: lifecycle traced events without any context #3330

koivunej commented Jan 13, 2023

shanyp commented May 8, 2023

SomeoneToIgnore commented May 8, 2023

koivunej commented May 8, 2023

walreceiver: lifecycle traced events without any context #3330

walreceiver: lifecycle traced events without any context #3330

Comments

koivunej commented Jan 13, 2023

shanyp commented May 8, 2023

SomeoneToIgnore commented May 8, 2023

koivunej commented May 8, 2023