fast_import: Make CPU & memory size configurable #10709

hlinnaka · 2025-02-07T00:31:33Z

The old values assumed that you have at least about 18 GB of RAM available (shared_buffers=10GB and maintenance_work_mem=8GB). That's a lot when testing locally. Make it configurable, and make the default assumption much smaller: 256 MB.

This is nice for local testing, but it's also in preparation for starting to use VMs to run these jobs. When launched in a VM, the control plane can set these env variables according to the max size of the VM.

Also change the formula for how RAM is distributed: use 10% of RAM for shared_buffers, and 70% for maintenance_work_mem. That leaves a good amount for misc. other stuff and the OS. A very large shared_buffers setting won't typically help with bulk loading. It won't help with the network and I/O of processing all the tables, unless maybe if the whole database fits in shared buffers, but even then it's not much faster than using local disk. Bulk loading is all sequential I/O. It also won't help much with index creation, which is also sequential I/O. A large maintenance_work_mem can be quite useful, however, so that's where we put most of the RAM.

The old values assumed that you have at least about 18 GB of RAM available (shared_buffers=10GB and maintenance_work_mem=8GB). That's a lot when testing locally. Make it configurable, and make the default assumption much smaller: 256 MB. This is nice for local testing, but it's also in preparation for starting to use VMs to run these jobs. When launched in a VM, the control plane can set these env variables according to the max size of the VM. Also change the formula for how RAM is distributed: use 10% of RAM for shared_buffers, and 70% for maintenance_work_mem. That leaves a good amount for misc. other stuff and the OS. A very large shared_buffers setting won't typically help with bulk loading. It won't help with the network and I/O of processing all the tables, unless maybe if the whole database fits in shared buffers, but even then it's not much faster than using local disk. Bulk loading is all sequential I/O. It also won't help much with index creation, which is also sequential I/O. A large maintenance_work_mem can be quite useful, however, so that's where we put most of the RAM.

github-actions · 2025-02-07T02:07:52Z

7425 tests run: 7073 passed, 0 failed, 352 skipped (full report)

Flaky tests (1)

Postgres 15

test_partial_evict_tenant[relative_spare]: release-x86-64-with-lfc

Code coverage* (full report)

functions: 33.3% (8588 of 25824 functions)
lines: 49.1% (72309 of 147268 lines)

* collected from Rust tests only

_{The comment gets automatically updated with the latest test results
daf43a1 at 2025-02-07T02:07:52.185Z :recycle:}

MMeent · 2025-02-07T12:03:44Z

It also won't help much with index creation, which is also sequential I/O.

Unless we're talking about GIN indexes, or (SP-)GIST indexes without opclasses that have sortsupport.

hlinnaka requested a review from a team as a code owner February 7, 2025 00:31

hlinnaka requested review from MMeent, lubennikovaav and NanoBjorn and removed request for MMeent and lubennikovaav February 7, 2025 00:31

hlinnaka force-pushed the fast_import-in-vm branch from 03a0815 to daf43a1 Compare February 7, 2025 01:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fast_import: Make CPU & memory size configurable #10709

fast_import: Make CPU & memory size configurable #10709

hlinnaka commented Feb 7, 2025

github-actions bot commented Feb 7, 2025

Postgres 15

MMeent commented Feb 7, 2025

fast_import: Make CPU & memory size configurable #10709

Are you sure you want to change the base?

fast_import: Make CPU & memory size configurable #10709

Conversation

hlinnaka commented Feb 7, 2025

github-actions bot commented Feb 7, 2025

7425 tests run: 7073 passed, 0 failed, 352 skipped (full report)

Postgres 15

Code coverage* (full report)

MMeent commented Feb 7, 2025