Pageserver is allegedly takes a lot of time to restart when there are a lot of tenants #4183

kelvich · 2023-05-09T07:04:16Z

It's more of an observation, so should be verified first. Staging has pageservers with 40k+ tenants

kelvich · 2023-05-09T07:04:24Z

koivunej · 2023-05-11T15:25:26Z

With 40k+ tenants we probably do not get metrics anymore?

This is most likely related to #4025.

Startup can take a long time. We suspect it's the initial logical size calculations. Long term solution is to not block the tokio executors but do most of I/O in spawn_blocking. See: #4025, #4183 Short-term solution to above: - Delay global background tasks until initial tenant loads complete - Just limit how many init logical size calculations can we have at the same time to `cores / 2` This PR is for trying in staging.

LizardWizzard · 2023-05-30T09:31:19Z

Discussion happens in this long thread: https://neondb.slack.com/archives/C033RQ5SPDH/p1685012031795059

koivunej · 2023-05-30T14:23:38Z

I posted earlier attempts (#4366, revert) on #4366. After #4372 it looks a bit more promising without too intrusive changes:

after deploying #4366 on staging:

ps-0.eu-west-1 (10k): 100s => 37s, 6s
ps-1.eu-west-1 (8k): 73s => 5s, 5.5s
ps-99.us-east-2 (<2k?): 2.8s => 2.3s, 2s

so I think this looks at least not bad.

But I haven't been able to retry these results yet. I suspect that the remaining problem is the blocking of the background runtime for initial logical size AND repartitioning. The "page_service connection pressure" has been brought up as an idea to lower the activation time for timelines which are being being re-connected to.

Designing and implementing such prioritization system might not be straightforward. Basically it would have to act as a semaphore, but upon getting a notification of page_service connection, it should allow these instaces to jump the queue. But what would this prioritization protect? The first initial logical size calculation's?

Perhaps an easier step is to delay initial repartition + compaction and garbage collection until we've attempted all initial logical size calculations. This should probably delay the timeline's eviction task as well just to be sure. Unsure if this is the right path, because we might end up in a situation that some timelines do not get an active walreceiver connection, and so they would not get an initial logical size calculation happening.

koivunej · 2023-06-05T16:23:46Z

With #4397 staging startup times:

ps-0.eu-west-1 (8k): 4.6s, 4.0s
ps-1.eu-west-1 (8k): 3.4s, 3.5s
ps-99.us-east-2 (<2k?): 2.1s, 2.3s

Not really comparable anymore, because ps-0 lost 2k tenants. However, the high values are no longer expected.

The #4399 would further help this by delaying all initial logical size calculations to a phase which runs after we've completed activating all tenants. There will be no background jobs running until timeout (10s by default). It is assumed that the 10s would be spent efficiently doing many queued up initial logical size calculations before letting the compactions start.

Initial logical size calculation could still hinder our fast startup efforts in #4397. See #4183. In deployment of 2023-06-06 about a 200 initial logical sizes were calculated on hosts which took the longest to complete initial load (12s). Implements the three step/tier initialization ordering described in #4397: 1. load local tenants 2. do initial logical sizes per walreceivers for 10s 3. background tasks Ordering is controlled by: - waiting on `utils::completion::Barrier`s on background tasks - having one attempt for each Timeline to do initial logical size calculation - `pageserver/src/bin/pageserver.rs` releasing background jobs after timeout or completion of initial logical size calculation The timeout is there just to safeguard in case a legitimate non-broken timeline initial logical size calculation goes long. The timeout is configurable, by default 10s, which I think would be fine for production systems. In the test cases I've been looking at, it seems that these steps are completed as fast as possible. Co-authored-by: Christian Schwarz <[email protected]>

koivunej · 2023-12-07T10:58:34Z

I'll just close this because after changes we have now different causes. Originally this helped.

kelvich added the t/bug Issue Type: Bug label May 9, 2023

koivunej mentioned this issue May 29, 2023

try: startup speedup #4366

Merged

koivunej mentioned this issue May 30, 2023

Continued startup speedup #4372

Merged

5 tasks

shanyp added c/storage/pageserver Component: storage: pageserver triaged bugs that were already triaged labels Jun 1, 2023

shanyp assigned koivunej Jun 1, 2023

koivunej mentioned this issue Jun 5, 2023

feat: three phased startup order #4399

Merged

5 tasks

This was referenced Aug 4, 2023

add pageserver SLO for startup performance: tenant load & time-to-active #4083

Open

Excessive tenant initial load times on large pageserver after restart #4025

Closed

koivunej closed this as completed Dec 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pageserver is allegedly takes a lot of time to restart when there are a lot of tenants #4183

Pageserver is allegedly takes a lot of time to restart when there are a lot of tenants #4183

kelvich commented May 9, 2023

kelvich commented May 9, 2023

koivunej commented May 11, 2023 •

edited

Loading

LizardWizzard commented May 30, 2023

koivunej commented May 30, 2023 •

edited

Loading

koivunej commented Jun 5, 2023 •

edited

Loading

koivunej commented Dec 7, 2023

Pageserver is allegedly takes a lot of time to restart when there are a lot of tenants #4183

Pageserver is allegedly takes a lot of time to restart when there are a lot of tenants #4183

Comments

kelvich commented May 9, 2023

kelvich commented May 9, 2023

koivunej commented May 11, 2023 • edited Loading

LizardWizzard commented May 30, 2023

koivunej commented May 30, 2023 • edited Loading

koivunej commented Jun 5, 2023 • edited Loading

koivunej commented Dec 7, 2023

koivunej commented May 11, 2023 •

edited

Loading

koivunej commented May 30, 2023 •

edited

Loading

koivunej commented Jun 5, 2023 •

edited

Loading