From b3e1b8e8b03399dbd0bdae46d84e3f8babefd8a2 Mon Sep 17 00:00:00 2001 From: Ryan Lambert Date: Sat, 25 Jun 2022 09:13:31 -0600 Subject: [PATCH 1/3] Updating performance notes for current version of PgOSM Flex, completed minimal and started default layerset. --- docs/PERFORMANCE.md | 166 +++++++++++++++++++++++--------------------- 1 file changed, 86 insertions(+), 80 deletions(-) diff --git a/docs/PERFORMANCE.md b/docs/PERFORMANCE.md index b17d3c9..0a18246 100644 --- a/docs/PERFORMANCE.md +++ b/docs/PERFORMANCE.md @@ -1,127 +1,133 @@ # PgOSM-Flex Performance This page provides timings for how long PgOSM-Flex runs for various region sizes. -The server used for these tests has 8 vCPU and 64 GB RAM to match the target +The server used to host these tests has 8 vCPU and 64 GB RAM to match the target server size [outlined in the osm2pgsql manual](https://osm2pgsql.org/doc/manual.html#preparing-the-database). -> Note: The Flex output of osm2pgsql is currently **Experimental** -and performance characteristics are likely to shift. - ## Versions Tested -Versions used for testing: +Versions used for testing: PgOSM Flex 0.4.7 Docker image, based on the offical +PostGIS image with Postgres 14 / PostGIS 3.2. -* Ubuntu 20.04 -* osm2pgsql 1.4.2 -* PostgreSQL 13.2 -* PostGIS 3.1 -* PgOSM-Flex 0.1.4 +## Layerset: Minimal -## Road / Place +The `minimal` layer set only loads major roads, places, and POIs. -The `run-road-place` layer set is a minimal set only loads roads and places, -7 tables and 3 views. +Timings with nested admin polygons and dumping the processed data to a `.sql` +file. +| Sub-region | PBF Size | PostGIS Size | `.sql` Size | Import Time | +| :--- | :-: | :-: | :-: | :-: | +| District of Columbia | 18 MB | 36 MB | 14 MB | 15.3 sec | +| Colorado | 226 MB | 181 MB | 129 MB | 1 min 23 sec | +| Norway | 1.1 GB | 618 MB | 489 MB | 5 min 36 sec | +| North America | 12 GB | 9.5 GB | 7.7 GB | 3.03 hours | -| Sub-region | PBF Size | PostGIS Size | Import (s) | Post-import (s) | Nested Places (s) | -| :--- | :-: | :-: | :-: | :-: | :-: | -| District of Columbia | 17 MB | 60 MB | 10 | 0.3 | 0.08 | -| Colorado | 208 MB | 398 MB | 111 | 4.3 | 2.5 | -| Norway | 909 MB | 797 MB | 402 | 34 | 20 | -| North America | 11 GB | 17 GB | 4884 | 281 | 4174 | +Timings skipping nested admin polygons the dump to `.sql`. This adds +`--skip-dump --skip-nested` to the `docker exec process`. -## No Tags -The `run-no-tags` layer set loads nearly all of the data, excluding the unstructured -`tags` data. 35 tables and 6 views. +| Sub-region | Import Time | +| :--- | :-: | +| District of Columbia | 15.0 sec | +| Colorado | 1 min 21 sec | +| Norway | 5 min 12 sec | +| North America | 1.25 hours | +## Layerset: Default -| Sub-region | PBF Size | PostGIS Size | Import (s) | Post-import (s) | -| :--- | :-: | :-: | :-: | :-: | -| District of Columbia | 17 MB | 182 MB | 42 | 2.3 | -| Colorado | 208 MB | 1449 MB | 391 | 19 | -| Norway | 909 MB | 3.8 GB | 1403 | 57 | -| North America | 11 GB | 65 GB | 18809 | 1076 | +The `default` layer set.... +Timings with nested admin polygons and dumping the processed data to a `.sql` +file. -## Methodology -Timings are an average of multiple recorded test runs over more than one day. -For example, the North America `run-road-place.lua` had two times: 4,845 seconds and 4,922 seconds for an average of 4,884 s -(1 hour 21 minutes). -The difference of these two runs was only 1 minute 17 seconds, a rather small -amount of variation. +| Sub-region | PBF Size | PostGIS Size | `.sql` Size | Import Time | +| :--- | :-: | :-: | :-: | :-: | +| District of Columbia | 18 MB | ZZ MB | ZZ MB | ZZZZ sec | +| Colorado | 226 MB | ZZZ MB | 1.9 GB | 8 min 20 sec | +| Norway | 1.1 GB | ZZZ MB | ZZZ GB | Z min ZZ sec | +| North America | ZZ GB | ZZ GB | ZZ GB | ZZZ | -Time for the import step is reported directly from osm2gpsql while the psql commands use the Linux `time` command as shown in the commands above. -`PostGIS Size` reported is according to the meta-data in Postgres exposed through -the [PgDD extension](https://github.com/rustprooflabs/pgdd) using this query. +Timings skipping nested admin polygons the dump to `.sql`. This adds +`--skip-dump --skip-nested` to the `docker exec process`. -```sql -SELECT size_plus_indexes - FROM dd.schemas - WHERE s_name = 'osm' -; -``` +| Sub-region | Import Time | +| :--- | :-: | +| District of Columbia | ZZZZ sec | +| Colorado | Z min Z sec | +| Norway | Z min Z sec | +| North America | ZZZ | -### Commands +## Methodology -D.C., Colorado, and Norway imports used this command format. +The timing for the first `docker exec` for each region was discarded as +it included the timing for downloading the PBF file. +Timings are an average of multiple recorded test runs over more than one day. +For example, the Norway region for the `minimal` layerset had two times: 5 min 35 seconds +and 5 minutes 37 seconds for an average of 5 minutes 36 seconds. -```bash -osm2pgsql --slim --drop \ - --cache=30000 \ - --output=flex --style=./run-.lua \ - -d $PGOSM_CONN \ - ~/pgosm-data/-latest.osm.pbf -``` +Time for the import step is reported using the Linux `time` command on the `docker exec` +step as shown in the following commands. -North America loaded using `--flat-nodes` and sets `--cache=0`. -```bash -osm2pgsql --slim --drop \ - --cache=0 \ - --flat-nodes=/tmp/nodes \ - --output=flex --style=./run-lua \ - -d $PGOSM_CONN \ - ~/pgosm-data/-latest.osm.pbf +`PostGIS Size` reported is according to the meta-data in Postgres exposed through +the [PgDD extension](https://github.com/rustprooflabs/pgdd) using this query. + +```sql +SELECT db_size + FROM dd.database +; ``` -All regions use the same post-processing command and build nested polygons. + +### Commands ```bash -time psql -d $PGOSM_CONN -f run-.sql -time psql -d $PGOSM_CONN -c "CALL osm.build_nested_admin_polygons();" +export POSTGRES_USER=postgres +export POSTGRES_PASSWORD=mysecretpassword + +docker run --name pgosm -d --rm \ + -v ~/pgosm-data:/app/output \ + -v /etc/localtime:/etc/localtime:ro \ + -e POSTGRES_PASSWORD=$POSTGRES_PASSWORD \ + -p 5433:5432 -d rustprooflabs/pgosm-flex \ + -c shared_buffers=1GB \ + -c work_mem=50MB \ + -c maintenance_work_mem=10GB \ + -c autovacuum_work_mem=2GB \ + -c checkpoint_timeout=300min \ + -c max_wal_senders=0 -c wal_level=minimal \ + -c max_wal_size=10GB \ + -c checkpoint_completion_target=0.9 \ + -c random_page_cost=1.0 \ + -c full_page_writes=off \ + -c fsync=off + + +time docker exec -it \ + pgosm python3 docker/pgosm_flex.py \ + --ram=64 \ + --region=north-america/us \ + --subregion=colorado \ + --layerset=minimal ``` -## Postgres Config +> WARNING: Setting `full_page_writes=off` and `fsync=off` is part of the [expert tuning](https://osm2pgsql.org/doc/manual.html#expert-tuning) for the best possible performance. This is deemed acceptable in this Docker container running `--rm`, obviously this container will be discarded immediately after processing. **DO NOT** use these configurations unless you understand and accept the risks of corruption. -Postgres is configured per the [suggestions in the osm2pgsql manual](https://osm2pgsql.org/doc/manual.html#preparing-the-database). -```bash -shared_buffers = 1GB -work_mem = 50MB -maintenance_work_mem = 10GB -autovacuum_work_mem = 2GB -wal_level = minimal -checkpoint_timeout = 60min -max_wal_size = 10GB -checkpoint_completion_target = 0.9 -max_wal_senders = 0 -random_page_cost = 1.0 -``` - ## Other testing @@ -142,6 +148,6 @@ legacy three-table load from osm2pgsql. Due to this fundamental switch, data lo via PgOSM-Flex is analysis-ready as soon as the load is done! The legacy data model required substantial post-processing to achieve analysis-quality data. -The limited comparsions done showed that loading a region using the -full PgOSM-Flex (`run-all.lua`) will take a few times longer than using the legacy method. +The limited comparsions show that loading a region using the +default PgOSM Flex layerset will take a few times longer than using the legacy method. From a4ee540897540fd16dcc10ff89a5e2f7627f4a66 Mon Sep 17 00:00:00 2001 From: Ryan Lambert Date: Sat, 9 Jul 2022 12:00:34 -0600 Subject: [PATCH 2/3] Adjusting tables, filling in more timings and sizes --- docs/PERFORMANCE.md | 91 ++++++++++++++++++++------------------------- 1 file changed, 41 insertions(+), 50 deletions(-) diff --git a/docs/PERFORMANCE.md b/docs/PERFORMANCE.md index 0a18246..4f4812f 100644 --- a/docs/PERFORMANCE.md +++ b/docs/PERFORMANCE.md @@ -29,15 +29,17 @@ file. Timings skipping nested admin polygons the dump to `.sql`. This adds -`--skip-dump --skip-nested` to the `docker exec process`. +`--skip-dump --skip-nested` to the `docker exec process`. The following +table compares the import time using these skips against the full times reported +above. -| Sub-region | Import Time | -| :--- | :-: | -| District of Columbia | 15.0 sec | -| Colorado | 1 min 21 sec | -| Norway | 5 min 12 sec | -| North America | 1.25 hours | +| Sub-region | Import Time (full) | Import Time (skips) | +| :--- | :-: | :-: | +| District of Columbia | 15.3 sec | 15.0 sec | +| Colorado | 1 min 23 sec | 1 min 21 sec | +| Norway | 5 min 36 sec | 5 min 12 sec | +| North America | 3.03 hours | 1.25 hours | ## Layerset: Default @@ -50,23 +52,25 @@ file. | Sub-region | PBF Size | PostGIS Size | `.sql` Size | Import Time | | :--- | :-: | :-: | :-: | :-: | -| District of Columbia | 18 MB | ZZ MB | ZZ MB | ZZZZ sec | -| Colorado | 226 MB | ZZZ MB | 1.9 GB | 8 min 20 sec | -| Norway | 1.1 GB | ZZZ MB | ZZZ GB | Z min ZZ sec | -| North America | ZZ GB | ZZ GB | ZZ GB | ZZZ | +| District of Columbia | 18 MB | 212 MB | 160 MB | 53 sec | +| Colorado | 226 MB | 2.1 GB | 1.9 GB | 8 min 20 sec | +| Norway | 1.1 GB | ZZZ MB | 6.5 GB | 33 min 44 sec | +| North America | 12 GB | ZZ GB | ZZ GB | ZZZ | Timings skipping nested admin polygons the dump to `.sql`. This adds -`--skip-dump --skip-nested` to the `docker exec process`. +`--skip-dump --skip-nested` to the `docker exec process`. The following +table compares the import time using these skips against the full times reported +above. -| Sub-region | Import Time | -| :--- | :-: | -| District of Columbia | ZZZZ sec | -| Colorado | Z min Z sec | -| Norway | Z min Z sec | -| North America | ZZZ | +| Sub-region | Import Time (full) | Import Time (skips) | +| :--- | :-: | :-: | +| District of Columbia | 53 sec | 51 sec | +| Colorado | 8 min 20 sec | 7 min 55 sec | +| Norway | 33 min 44 sec | 32 min 18 sec | +| North America | ZZZ | ZZZ | ## Methodology @@ -82,18 +86,22 @@ Time for the import step is reported using the Linux `time` command on the `dock step as shown in the following commands. -`PostGIS Size` reported is according to the meta-data in Postgres exposed through -the [PgDD extension](https://github.com/rustprooflabs/pgdd) using this query. +`PostGIS Size` reported is according to the meta-data in Postgres using this query. ```sql -SELECT db_size - FROM dd.database -; +SELECT d.oid, d.datname AS db_name, + pg_size_pretty(pg_database_size(d.datname)) AS db_size + FROM pg_catalog.pg_database d + WHERE d.datname = current_database() ``` ### Commands +Set environment variables and start `pgosm` Docker container with configurations +set per the [osm2pgsql tuning guidelines](https://osm2pgsql.org/doc/manual.html#tuning-the-postgresql-server). + + ```bash export POSTGRES_USER=postgres export POSTGRES_PASSWORD=mysecretpassword @@ -114,8 +122,18 @@ docker run --name pgosm -d --rm \ -c random_page_cost=1.0 \ -c full_page_writes=off \ -c fsync=off +``` + +> WARNING: Setting `full_page_writes=off` and `fsync=off` is part of the [expert tuning](https://osm2pgsql.org/doc/manual.html#expert-tuning) for the best possible performance. This is deemed acceptable in this Docker container running `--rm`, obviously this container will be discarded immediately after processing. **DO NOT** use these configurations unless you understand and accept the risks of corruption. + +Run PgOSM Flex within Docker. The first run time is discarded because the first +run time includes time downloading the PBF file. Subsequent runs only include the +time running the processing. + +```bash + time docker exec -it \ pgosm python3 docker/pgosm_flex.py \ --ram=64 \ @@ -124,30 +142,3 @@ time docker exec -it \ --layerset=minimal ``` -> WARNING: Setting `full_page_writes=off` and `fsync=off` is part of the [expert tuning](https://osm2pgsql.org/doc/manual.html#expert-tuning) for the best possible performance. This is deemed acceptable in this Docker container running `--rm`, obviously this container will be discarded immediately after processing. **DO NOT** use these configurations unless you understand and accept the risks of corruption. - - - - -## Other testing - -Initial results on larger scale tests (both data and hardware) are available -in [issue #12](https://github.com/rustprooflabs/pgosm-flex/issues/12). As this project -matures additional performance testing results will become available. - -### Legacy benchmarks - -See the blog post -[Scaling osm2pgsql: Process and costs](https://blog.rustprooflabs.com/2019/10/osm2pgsql-scaling) -for a deeper look at how performance scales using various sizes of regions and hardware. - -### Comparisons to osm2pgsql legacy output - -The data loaded via PgOSM-Flex is of much higher quality than the -legacy three-table load from osm2pgsql. Due to this fundamental switch, data loaded -via PgOSM-Flex is analysis-ready as soon as the load is done! The legacy data model -required substantial post-processing to achieve analysis-quality data. - -The limited comparsions show that loading a region using the -default PgOSM Flex layerset will take a few times longer than using the legacy method. - From 7ff8d0d89d1112f1d36fd43720cfd6cbf4160b76 Mon Sep 17 00:00:00 2001 From: Ryan Lambert Date: Sun, 10 Jul 2022 06:17:39 -0600 Subject: [PATCH 3/3] Add remaining timing and sizes --- docs/PERFORMANCE.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/PERFORMANCE.md b/docs/PERFORMANCE.md index 4f4812f..f569b25 100644 --- a/docs/PERFORMANCE.md +++ b/docs/PERFORMANCE.md @@ -54,8 +54,8 @@ file. | :--- | :-: | :-: | :-: | :-: | | District of Columbia | 18 MB | 212 MB | 160 MB | 53 sec | | Colorado | 226 MB | 2.1 GB | 1.9 GB | 8 min 20 sec | -| Norway | 1.1 GB | ZZZ MB | 6.5 GB | 33 min 44 sec | -| North America | 12 GB | ZZ GB | ZZ GB | ZZZ | +| Norway | 1.1 GB | 7.2 GB | 6.5 GB | 33 min 44 sec | +| North America | 12 GB | 98 GB | 55 GB | 8.78 hours | @@ -70,7 +70,7 @@ above. | District of Columbia | 53 sec | 51 sec | | Colorado | 8 min 20 sec | 7 min 55 sec | | Norway | 33 min 44 sec | 32 min 18 sec | -| North America | ZZZ | ZZZ | +| North America | 8.78 hours | 6.58 hours | ## Methodology