run `Layer::get_value_reconstruct_data` in `spawn_blocking` #4498

problame · 2023-06-13T18:05:47Z

This PR concludes the "async Layer::get_value_reconstruct_data" project.

The problem we're solving is that, before this patch, we'd execute Layer::get_value_reconstruct_data on the tokio executor threads.
This function is IO- and/or CPU-intensive.
The IO is using VirtualFile / std::fs; hence it's blocking.
This results in unfairness towards other tokio tasks, especially under (disk) load.

Some context can be found at #4154
where I suspect (but can't prove) load spikes of logical size calculation to
cause heavy eviction skew.

Sadly we don't have tokio runtime/scheduler metrics to quantify the unfairness.
But generally, we know blocking the executor threads on std::fs IO is bad.
So, let's have this change and watch out for severe perf regressions in staging & during rollout.

Changes

rename Layer::get_value_reconstruct_data to Layer::get_value_reconstruct_data_blocking
add a new blanket impl'd Layer::get_value_reconstruct_data async_trait method that runs get_value_reconstruct_data_blocking inside spawn_blocking.
The spawn_blocking requires 'static lifetime of the captured variables; hence I had to change the data flow to move the ValueReconstructState into and back out of get_value_reconstruct_data instead of passing a reference. It's a small struct, so I don't expect a big performance penalty.

Performance

Fundamentally, the code changes cause the following performance-relevant changes:

Latency & allocations: each get_value_reconstruct_data call now makes a short-lived allocation because async_trait is just sugar for boxed futures under the hood
Latency: spawn_blocking adds some latency because it needs to move the work to a thread pool
- using spawn_blocking plus the existing synchronous code inside is probably more efficient better than switching all the synchronous code to tokio::fs because each tokio::fs call does spawn_blocking under the hood.
Throughput: the spawn_blocking thread pool is much larger than the async executor thread pool. Hence, as long as the disks can keep up, which they should according to AWS specs, we will be able to deliver higher get_value_reconstruct_data throughput.
Disk IOPS utilization: we will see higher disk utilization if we get more throughput. Not a problem because the disks in prod are currently under-utilized, according to node_exporter metrics & the AWS specs.
CPU utilization: at higher throughput, CPU utilization will be higher.

Slightly higher latency under regular load is acceptable given the throughput gains and expected better fairness during disk load peaks, such as logical size calculation peaks uncovered in #4154.

Full Stack Of Preliminary PRs

This PR builds on top of the following preliminary PRs

Clean-ups

tenant_map_insert: don't expose the vacant entry to the closure #4316
refactor responsibility for tenant/timeline activation #4317
refactor: eliminate global storage_broker client state #4318
refactor: make timeline activation infallible #4319
refactor TenantState transitions #4321
Note: these were mostly to find an alternative to allow async code inside Tenant::state.send_modify #4291, which I thought we'd need in my original plan where we would need to convert Tenant::timelines into an async locking primitive (make Tenant::timelines a tokio::sync::RwLock #4333). In reviews, we walked away from that, but these cleanups were still quite useful.

not ideal but an easy fix

skyzh · 2023-06-13T18:11:34Z

I would prefer merge and observe the performance difference in prod/staging at some time

github-actions · 2023-06-13T18:53:37Z

1016 tests run: 974 passed, 0 failed, 42 skipped (full report)

Flaky tests (1)

Postgres 14

test_crafted_wal_end[last_wal_record_crossing_segment]: ✅ debug

_{The comment gets automatically updated with the latest test results
9afb589 at 2023-06-23T15:00:04.219Z :recycle:}

LizardWizzard

Looks good to me. Minor nits. Note there is run_benchmarks label so you can run benchmarks before merging. I havent used it but some people did.

Two comments point to possible optimizations, feel free to close them (we can get back to them later if needed).

pageserver/src/tenant/storage_layer.rs

pageserver/src/tenant/storage_layer/image_layer.rs

pageserver/src/tenant/storage_layer/inmemory_layer.rs

…ne-get/get-value-reconstruct-data-spawn-blocking

…4498)" This reverts commit 1faf69a.

" (#4569) This reverts commit 1faf69a.

problame and others added 5 commits June 13, 2023 20:05

asyncify get_value_reconstruct_data (impls still use sync IO)

fd98b30

switch to async_trait

d67a5b2

undo the value_reconstruct_data passing, it's not needd with async_trait

c9f6de9

layer impls: run get_value_reconstruct_data in block_in_place

9d547b5

not ideal but an easy fix

switch to spawn_blocking

f37e3b9

problame requested a review from skyzh June 13, 2023 18:09

skyzh approved these changes Jun 13, 2023

View reviewed changes

problame requested review from shanyp and LizardWizzard June 14, 2023 09:17

shanyp approved these changes Jun 14, 2023

View reviewed changes

problame marked this pull request as ready for review June 14, 2023 09:40

problame requested review from a team as code owners June 14, 2023 09:40

problame requested review from fprasx and removed request for a team and fprasx June 14, 2023 09:40

LizardWizzard approved these changes Jun 14, 2023

View reviewed changes

problame added 2 commits June 14, 2023 18:55

minimize diff

042e706

use anyhow::Result instead of Result in return signatures of impls

40e9a6b

problame added the run-benchmarks Indicates to the CI that benchmarks should be run for PR marked with this label label Jun 14, 2023

problame enabled auto-merge (squash) June 14, 2023 17:05

problame mentioned this pull request Jun 14, 2023

make Timeline::get's disk IO async through spawn_blocking #4220

Closed

3 tasks

Merge remote-tracking branch 'origin/main' into problame/async-timeli…

463a5df

…ne-get/get-value-reconstruct-data-spawn-blocking

problame disabled auto-merge June 15, 2023 12:20

problame added 2 commits June 16, 2023 16:55

Merge remote-tracking branch 'origin/main' into problame/async-timeli…

7b53620

…ne-get/get-value-reconstruct-data-spawn-blocking

Merge remote-tracking branch 'origin/main' into problame/async-timeli…

b2069dc

…ne-get/get-value-reconstruct-data-spawn-blocking

problame removed the run-benchmarks Indicates to the CI that benchmarks should be run for PR marked with this label label Jun 23, 2023

empty commit to trigger CI rerun without benchmarks

9afb589

problame merged commit 1faf69a into main Jun 26, 2023

problame deleted the problame/async-timeline-get/get-value-reconstruct-data-spawn-blocking branch June 26, 2023 09:43

shanyp added a commit that referenced this pull request Jun 27, 2023

Revert "run Layer::get_value_reconstruct_data in spawn_blocking (#…

bc98db9

…4498)" This reverts commit 1faf69a.

shanyp added a commit that referenced this pull request Jun 27, 2023

Revert "run Layer::get_value_reconstruct_data in spawn_blocking#4498

a7f3f5f

" (#4569) This reverts commit 1faf69a.

This was referenced Jul 18, 2023

Epic: convert remaining IO stack for Timeline::get to async fn #4743

Closed

Epic: scalable async disk IO (tokio-epoll-uring) #4744

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

run `Layer::get_value_reconstruct_data` in `spawn_blocking` #4498

run `Layer::get_value_reconstruct_data` in `spawn_blocking` #4498

problame commented Jun 13, 2023 •

edited

Loading

skyzh commented Jun 13, 2023

github-actions bot commented Jun 13, 2023 •

edited

Loading

Postgres 14

LizardWizzard left a comment

run Layer::get_value_reconstruct_data in spawn_blocking #4498

run Layer::get_value_reconstruct_data in spawn_blocking #4498

Conversation

problame commented Jun 13, 2023 • edited Loading

Changes

Performance

Full Stack Of Preliminary PRs

skyzh commented Jun 13, 2023

github-actions bot commented Jun 13, 2023 • edited Loading

1016 tests run: 974 passed, 0 failed, 42 skipped (full report)

Postgres 14

LizardWizzard left a comment

Choose a reason for hiding this comment

run `Layer::get_value_reconstruct_data` in `spawn_blocking` #4498

run `Layer::get_value_reconstruct_data` in `spawn_blocking` #4498

problame commented Jun 13, 2023 •

edited

Loading

github-actions bot commented Jun 13, 2023 •

edited

Loading