Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fast_import: Make CPU & memory size configurable #10709

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

hlinnaka
Copy link
Contributor

@hlinnaka hlinnaka commented Feb 7, 2025

The old values assumed that you have at least about 18 GB of RAM available (shared_buffers=10GB and maintenance_work_mem=8GB). That's a lot when testing locally. Make it configurable, and make the default assumption much smaller: 256 MB.

This is nice for local testing, but it's also in preparation for starting to use VMs to run these jobs. When launched in a VM, the control plane can set these env variables according to the max size of the VM.

Also change the formula for how RAM is distributed: use 10% of RAM for shared_buffers, and 70% for maintenance_work_mem. That leaves a good amount for misc. other stuff and the OS. A very large shared_buffers setting won't typically help with bulk loading. It won't help with the network and I/O of processing all the tables, unless maybe if the whole database fits in shared buffers, but even then it's not much faster than using local disk. Bulk loading is all sequential I/O. It also won't help much with index creation, which is also sequential I/O. A large maintenance_work_mem can be quite useful, however, so that's where we put most of the RAM.

@hlinnaka hlinnaka requested a review from a team as a code owner February 7, 2025 00:31
@hlinnaka hlinnaka requested review from MMeent, lubennikovaav and NanoBjorn and removed request for MMeent and lubennikovaav February 7, 2025 00:31
The old values assumed that you have at least about 18 GB of RAM
available (shared_buffers=10GB and maintenance_work_mem=8GB).  That's
a lot when testing locally. Make it configurable, and make the default
assumption much smaller: 256 MB.

This is nice for local testing, but it's also in preparation for
starting to use VMs to run these jobs. When launched in a VM, the
control plane can set these env variables according to the max size of
the VM.

Also change the formula for how RAM is distributed: use 10% of RAM for
shared_buffers, and 70% for maintenance_work_mem. That leaves a good
amount for misc. other stuff and the OS. A very large shared_buffers
setting won't typically help with bulk loading. It won't help with the
network and I/O of processing all the tables, unless maybe if the
whole database fits in shared buffers, but even then it's not much
faster than using local disk. Bulk loading is all sequential I/O. It
also won't help much with index creation, which is also sequential
I/O.  A large maintenance_work_mem can be quite useful, however, so
that's where we put most of the RAM.
Copy link

github-actions bot commented Feb 7, 2025

7425 tests run: 7073 passed, 0 failed, 352 skipped (full report)


Flaky tests (1)

Postgres 15

Code coverage* (full report)

  • functions: 33.3% (8588 of 25824 functions)
  • lines: 49.1% (72309 of 147268 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
daf43a1 at 2025-02-07T02:07:52.185Z :recycle:

@MMeent
Copy link
Contributor

MMeent commented Feb 7, 2025

It also won't help much with index creation, which is also sequential I/O.

Unless we're talking about GIN indexes, or (SP-)GIST indexes without opclasses that have sortsupport.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants