Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to update with osm2pgsql-replication? #211

Closed
MTRNord opened this issue Nov 30, 2021 · 6 comments
Closed

How to update with osm2pgsql-replication? #211

MTRNord opened this issue Nov 30, 2021 · 6 comments
Labels
enhancement New feature or request
Milestone

Comments

@MTRNord
Copy link

MTRNord commented Nov 30, 2021

Hi I set up the database using https://github.com/rustprooflabs/pgosm-flex/blob/main/docs/MANUAL-STEPS-RUN.md and wanted to run the update of data using the replication approach using the command: osm2pgsql-replication update -d pgosm -U osmuser -- -O flex -S ./run.lua.

However doing that I get ERROR: DB copy thread failed: Database error: ERROR: update or delete on table "place_polygon" violates foreign key constraint "fk_place_polygon_nested" on table "place_polygon_nested". So I wonder what is the correct way of updating the data regularly?

@rustprooflabs
Copy link
Owner

Hi @MTRNord - It is good to see interest in the update process! Data updates do not work quite yet, but I have been looking at how to support that functionality. There is a discussion open (#167) related to this. A roadblock was recently removed with #182 but that was only part of the issue as you have found.

I have not used osm2pgsql-replication yet, it looks that's a wrapper around using osm2pgsql with --append? I will study that script and the official docs: https://osm2pgsql.org/doc/manual.html#keeping-the-database-up-to-date-with-osm2pgsql-replication

My plan is to work through each of the post-processing SQL scripts and create a process to drop and recreate the necessary objects. For an immediate hack, if you are so motivated, here's what I think is involved but proceed with caution! You can drop the offending FK and see if that's all it needs for the refresh to work. I have not tested that yet, but see that's the only FK added. You may need to drop and/or recreate the PKs and indexes as well. I don't know enough about what osm2pgsql does under the hood for updates to know if those will cause problems or be auto-dropped somehow. You would also need to refresh the two materialized views (osm.vplace_polygon and osm.vpoi_all).

If you happen to hack away on this, feel free to submit a PR to the dev branch. I'll be looking at this closer myself in the next few days. Thanks!

@MTRNord
Copy link
Author

MTRNord commented Nov 30, 2021

Hi :)

Yes I am very interested. I would love to use this to layer this, combined with the routing doc example, and gtfs and some other transportation data to get some postgres and rust based way to do efficient routing without having to do full graph rebuilds like using OpenTripPlanner which is very expensive to do.

The script I am referencing is this one https://github.com/openstreetmap/osm2pgsql/blob/master/scripts/osm2pgsql-replication
The nice thing about it is it seems to use the osm patches instead of the full osm files.

If i find some time I may try that but I am guessing I wont get to it currently. :)

@rustprooflabs rustprooflabs added the enhancement New feature or request label Dec 1, 2021
@rustprooflabs
Copy link
Owner

Initial test shows the FK on osm.place_polygon_nested is the only block issue, at least detected so far. Had a DC file from Nov 12 2021 handy, using the manual steps I loaded the initial file with --slim. Followed by the post-processing sql.

osm2pgsql --slim --output=flex --style=./run.lua -d $PGOSM_CONN ~/pgosm-data/district-of-columbia-2021-11-12.osm.pbf
lua ./run-sql.lua

Dropping the constraint mentioned in the error (fk_place_polygon_nested) would resolve the immediate error on the FK but does not cleanly solve the problem. The data in osm.place_polygon_nested is pre-populated as part of flex-config/sql/place.sql (see https://github.com/rustprooflabs/pgosm-flex/blob/0.4.1/flex-config/sql/place.sql#L149-L158). In the future a more eloquent solution can be found but this works for now.

TRUNCATE TABLE osm.place_polygon_nested;

I created the dcchanges.osc.gz following the pyosmium instructions and fed that into osm2pgsql --append mode.

osm2pgsql --slim --append --output=flex --style=./run.lua -d $PGOSM_CONN ~/pgosm-data/tmp/dcchanges.osc.gz

Truncated output.

...
2021-12-01 15:42:16  Reading input files done in 2s.                                      
2021-12-01 15:42:16    Processed 1458 nodes in 1s - 1k/s
2021-12-01 15:42:16    Processed 441 ways in 0s - 441/s
2021-12-01 15:42:16    Processed 88 relations in 1s - 88/s
2021-12-01 15:42:17  Going over 315 pending ways (using 4 threads)
Left to process: 0.....
2021-12-01 15:42:19  Processing 315 pending ways took 2s at a rate of 157.50/s
2021-12-01 15:42:19  Going over 164 pending relations (using 4 threads)
Left to process: 0.....
2021-12-01 15:42:22  Processing 164 pending relations took 3s at a rate of 54.67/s
2021-12-01 15:42:22  Skipping stage 1c (no marked ways).
2021-12-01 15:42:22  No marked ways (Skipping stage 2).
...
2021-12-01 15:42:22  osm2pgsql took 8s overall.

Refresh materialized views and populate data back into osm.place_polygon_nested.

REFRESH MATERIALIZED VIEW osm.vplace_polygon;
REFRESH MATERIALIZED VIEW osm.vplace_polygon_subdivide;
REFRESH MATERIALIZED VIEW osm.vpoi_all;

INSERT INTO osm.place_polygon_nested (osm_id, name, osm_type, admin_level, geom)
SELECT p.osm_id, p.name, p.osm_type,
        COALESCE(p.admin_level::INT, 99) AS admin_level,
        geom
    FROM osm.vplace_polygon p
    WHERE (p.boundary = 'administrative'
            OR p.osm_type IN   ('neighborhood', 'city', 'suburb', 'town', 'admin_level', 'locality')
       )
        AND p.name IS NOT NULL
;

Then optionally running the function to populate the nested data.

CALL osm.build_nested_admin_polygons();

In a nutshell, this manual approach seems to work OK. Will need documentation for this manual process, then work out how to build into the Docker process. Resolving #194 will make this functionality in Docker far more attractive.

@rustprooflabs
Copy link
Owner

Experimental support for using osm2pgsql-replication is in place for both Manual and Docker modes. Feedback welcome, I'm certain there are rough edges still. Further details (enhancements, bugs, etc) should open new tickets to address.

Manual mode has procedures for where the process differs: https://github.com/rustprooflabs/pgosm-flex/blob/main/docs/APPEND-MODE.md

Docker mode has a section on using --append: https://github.com/rustprooflabs/pgosm-flex/blob/main/docs/DOCKER-RUN.md#use---append-for-updates

Version 0.4.6. will be tagged and released soon.

@james-newtracs
Copy link

Hi - I've been experimenting with this recent feature, the append process you describe doesn't appear to work when using an external PG connection.

Im running:

DO $do$ BEGIN
   IF EXISTS (SELECT FROM pg_catalog.pg_roles WHERE  rolname = 'pgosm_flex') THEN
      RAISE NOTICE 'Role pgosm_flex already exists. Skipping.'; ELSE
      CREATE ROLE pgosm_flex LOGIN PASSWORD 'password';
   END IF;
END $do$;

CREATE SCHEMA IF NOT EXISTS osm AUTHORIZATION pgosm_flex;
GRANT CREATE ON DATABASE database TO pgosm_flex;

If I run without adding the below function to postgres I get an error that its missing, so:

CREATE OR REPLACE PROCEDURE osm.append_data_start()
 LANGUAGE plpgsql
 AS $$

 BEGIN

    RAISE NOTICE 'Truncating table osm.place_polygon_nested;';
    
END $$;

CREATE OR REPLACE PROCEDURE osm.append_data_finish(skip_nested BOOLEAN = False)
 LANGUAGE plpgsql
 AS $$
 BEGIN

    REFRESH MATERIALIZED VIEW osm.vplace_polygon;
    REFRESH MATERIALIZED VIEW osm.vplace_polygon_subdivide;
    REFRESH MATERIALIZED VIEW osm.vpoi_all;

    IF $1 = False THEN
        RAISE NOTICE 'Populating nested place table';
        CALL osm.populate_place_polygon_nested();
        RAISE NOTICE 'Calculating nesting of place polygons';
        CALL osm.build_nested_admin_polygons();

    END IF;


END $$;

then:

export POSTGRES_USER=pgosm_flex
export POSTGRES_PASSWORD=password
export POSTGRES_HOST=host.docker.internal
export POSTGRES_DB=database
export POSTGRES_PORT=5432

create container

docker run --name pgosm -d --rm \
    -v ~/pgosm-data:/app/output \
    -v /etc/localtime:/etc/localtime:ro \
    -e POSTGRES_USER=$POSTGRES_USER \
    -e POSTGRES_PASSWORD=$POSTGRES_PASSWORD \
    -e POSTGRES_HOST=$POSTGRES_HOST \
    -e POSTGRES_DB=$POSTGRES_DB \
    -e POSTGRES_PORT=$POSTGRES_PORT \
    -p 5433:5432 \
    -d rustprooflabs/pgosm-flex \
    -c max_connections=300

then start data import

docker exec -it \
pgosm python3 docker/pgosm_flex.py \
--ram=8 \
--region=north-america/us \
--subregion=district-of-columbia \
--skip-dump \
--skip-nested \
--srid="4326" \
--debug \
--pgosm-date="2022-06-15" \
--append

The above command produces the following error:

ERROR: Database error: ERROR:  relation "osm.amenity_point" does not exist
LINE 1: PREPARE get_wkb(bigint) AS SELECT "geom" FROM "osm"."amenity...
                                                      ^

Traceback (most recent call last):
  File "/usr/local/bin/osm2pgsql-replication", line 535, in <module>
    sys.exit(main())
  File "/usr/local/bin/osm2pgsql-replication", line 528, in main
    ret = args.handler(conn, args)
  File "/usr/local/bin/osm2pgsql-replication", line 399, in update
    subprocess.run(osm2pgsql, check=True)
  File "/usr/lib/python3.9/subprocess.py", line 528, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['osm2pgsql', '--append', '--slim', '--prefix', 'planet_osm', '--output=flex', '--style=./run.lua', '--slim', '-d', 'postgresql://pgosm_flex:[email protected]/database?application_name=pgosm-flex', '-d', 'postgresql://pgosm_flex:[email protected]/database?application_name=pgosm-flex', '/tmp/tmpn4kqzshn/osm2pgsql_diff.osc.gz']' returned non-zero exit status 1.

If I leave out the --append flag it runs fine (except for an already called out Error in post-processing layerset: pgosm-meta). If I then try to run an update:

pgosm python3 docker/pgosm_flex.py \
--ram=8 \
--data-only \
--region=north-america/us \
--subregion=district-of-columbia \
--skip-dump \
--skip-nested \
--srid="4326" \
--debug \
--append

I get:

ERROR: DB copy thread failed: Database error: ERROR: update or delete on table "place_polygon" violates foreign key constraint "fk_place_polygon_nested" on table "place_polygon_nested"

Any suggestions?

@rustprooflabs
Copy link
Owner

@newtracs Please open a new issue to report this as a bug. Make sure to include version number of PgOSM Flex used along with all other steps to reproduce. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants