Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pgr_chinesePostman crash on test data #1300

Closed
jmarca opened this issue Oct 9, 2019 · 7 comments · Fixed by #1638
Closed

pgr_chinesePostman crash on test data #1300

jmarca opened this issue Oct 9, 2019 · 7 comments · Fixed by #1638
Assignees
Milestone

Comments

@jmarca
Copy link

jmarca commented Oct 9, 2019

Expected behavior and actual behavior

Using the test data described here: http://docs.pgrouting.org/latest/en/sampledata.html, and running the query described here: http://docs.pgrouting.org/latest/en/pgr_chinesePostman.html,

SELECT * FROM pgr_chinesePostman(
    'SELECT id,
     source, target,
     cost, reverse_cost FROM edge_table where id < 17'
);

The algorithm code works two times, but crashes on the third try, consistently (I repeated multiple times)

Steps to reproduce the problem

Load the data; run the query three times; third time is a crash.

Output of logging is unhelpful to me (how can I log more? setting debug5 log level didn't help) but I do see invalid record length at 0/3A15448: wanted 24, got 0 each time. Also this same query works twice no problem before the crash. Crashes on the third time. Smells like not freeing up memory or something like that.

2019-10-09 10:10:43.021 PDT [67] FATAL:  the database system is in recovery mode
2019-10-09 10:10:43.022 PDT [1] LOG:  all server processes terminated; reinitializing
2019-10-09 10:10:43.058 PDT [68] LOG:  database system was interrupted; last known up at 2019-10-09 10:10:24 PDT
2019-10-09 10:10:43.132 PDT [68] LOG:  database system was not properly shut down; automatic recovery in progress
2019-10-09 10:10:43.135 PDT [68] LOG:  redo starts at 0/39DF250
2019-10-09 10:10:43.135 PDT [68] LOG:  invalid record length at 0/39DF288: wanted 24, got 0
2019-10-09 10:10:43.135 PDT [68] LOG:  redo done at 0/39DF250
2019-10-09 10:10:43.166 PDT [1] LOG:  database system is ready to accept connections
2019-10-09 10:15:21.257 PDT [1] LOG:  server process (PID 80) was terminated by signal 11: Segmentation fault
2019-10-09 10:15:21.257 PDT [1] DETAIL:  Failed process was running: select * from pgr_chinesePostman(
	    'select id,source,target,cost,reverse_cost 
	     from edge_table where id <17');
2019-10-09 10:15:21.257 PDT [1] LOG:  terminating any other active server processes
2019-10-09 10:15:21.257 PDT [72] WARNING:  terminating connection because of crash of another server process
2019-10-09 10:15:21.257 PDT [72] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2019-10-09 10:15:21.257 PDT [72] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2019-10-09 10:15:21.258 PDT [84] FATAL:  the database system is in recovery mode
2019-10-09 10:15:21.286 PDT [1] LOG:  all server processes terminated; reinitializing
2019-10-09 10:15:21.295 PDT [85] LOG:  database system was interrupted; last known up at 2019-10-09 10:10:43 PDT
2019-10-09 10:15:21.480 PDT [85] LOG:  database system was not properly shut down; automatic recovery in progress
2019-10-09 10:15:21.483 PDT [85] LOG:  redo starts at 0/39DF300
2019-10-09 10:15:21.484 PDT [85] LOG:  invalid record length at 0/3A151F8: wanted 24, got 0
2019-10-09 10:15:21.484 PDT [85] LOG:  redo done at 0/3A151C0
2019-10-09 10:15:21.615 PDT [1] LOG:  database system is ready to accept connections
2019-10-09 10:16:51.173 PDT [1] LOG:  server process (PID 97) was terminated by signal 11: Segmentation fault
2019-10-09 10:16:51.173 PDT [1] DETAIL:  Failed process was running: select * from pgr_chinesePostman(
	    'select id,source,target,cost,reverse_cost 
	     from edge_table where id <17');
2019-10-09 10:16:51.173 PDT [1] LOG:  terminating any other active server processes
2019-10-09 10:16:51.173 PDT [89] WARNING:  terminating connection because of crash of another server process
2019-10-09 10:16:51.173 PDT [89] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2019-10-09 10:16:51.173 PDT [89] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2019-10-09 10:16:51.173 PDT [99] FATAL:  the database system is in recovery mode
2019-10-09 10:16:51.201 PDT [1] LOG:  all server processes terminated; reinitializing
2019-10-09 10:16:51.211 PDT [100] LOG:  database system was interrupted; last known up at 2019-10-09 10:15:21 PDT
2019-10-09 10:16:51.319 PDT [100] LOG:  database system was not properly shut down; automatic recovery in progress
2019-10-09 10:16:51.322 PDT [100] LOG:  redo starts at 0/3A15270
2019-10-09 10:16:51.322 PDT [100] LOG:  invalid record length at 0/3A152A8: wanted 24, got 0
2019-10-09 10:16:51.322 PDT [100] LOG:  redo done at 0/3A15270
2019-10-09 10:16:51.354 PDT [1] LOG:  database system is ready to accept connections
2019-10-09 10:17:20.363 PDT [1] LOG:  server process (PID 112) was terminated by signal 11: Segmentation fault
2019-10-09 10:17:20.363 PDT [1] DETAIL:  Failed process was running: select * from pgr_chinesePostman(
	    'select id,source,target,cost,reverse_cost 
	     from edge_table where id <17');
2019-10-09 10:17:20.363 PDT [1] LOG:  terminating any other active server processes
2019-10-09 10:17:20.363 PDT [104] WARNING:  terminating connection because of crash of another server process
2019-10-09 10:17:20.363 PDT [104] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2019-10-09 10:17:20.363 PDT [104] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2019-10-09 10:17:20.363 PDT [113] FATAL:  the database system is in recovery mode
2019-10-09 10:17:20.391 PDT [1] LOG:  all server processes terminated; reinitializing
2019-10-09 10:17:20.401 PDT [114] LOG:  database system was interrupted; last known up at 2019-10-09 10:16:51 PDT
2019-10-09 10:17:20.501 PDT [114] LOG:  database system was not properly shut down; automatic recovery in progress
2019-10-09 10:17:20.504 PDT [114] LOG:  redo starts at 0/3A15320
2019-10-09 10:17:20.504 PDT [114] LOG:  invalid record length at 0/3A15358: wanted 24, got 0
2019-10-09 10:17:20.504 PDT [114] LOG:  redo done at 0/3A15320
2019-10-09 10:17:20.541 PDT [1] LOG:  database system is ready to accept connections
2019-10-09 10:17:30.842 PDT [1] LOG:  server process (PID 126) was terminated by signal 11: Segmentation fault
2019-10-09 10:17:30.842 PDT [1] DETAIL:  Failed process was running: select * from pgr_chinesePostman(
	    'select id,source,target,cost,reverse_cost 
	     from edge_table where id <17');
2019-10-09 10:17:30.842 PDT [1] LOG:  terminating any other active server processes
2019-10-09 10:17:30.842 PDT [118] WARNING:  terminating connection because of crash of another server process
2019-10-09 10:17:30.842 PDT [118] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2019-10-09 10:17:30.842 PDT [118] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2019-10-09 10:17:30.842 PDT [127] FATAL:  the database system is in recovery mode
2019-10-09 10:17:30.870 PDT [1] LOG:  all server processes terminated; reinitializing
2019-10-09 10:17:30.880 PDT [128] LOG:  database system was interrupted; last known up at 2019-10-09 10:17:20 PDT
2019-10-09 10:17:30.935 PDT [128] LOG:  database system was not properly shut down; automatic recovery in progress
2019-10-09 10:17:30.937 PDT [128] LOG:  invalid record length at 0/3A153D0: wanted 24, got 0
2019-10-09 10:17:30.937 PDT [128] LOG:  redo is not required
2019-10-09 10:17:30.962 PDT [1] LOG:  database system is ready to accept connections
2019-10-09 10:17:39.188 PDT [1] LOG:  server process (PID 140) was terminated by signal 11: Segmentation fault
2019-10-09 10:17:39.188 PDT [1] DETAIL:  Failed process was running: select * from pgr_chinesePostman(
	    'select id,source,target,cost,reverse_cost 
	     from edge_table where id <17');
2019-10-09 10:17:39.188 PDT [1] LOG:  terminating any other active server processes
2019-10-09 10:17:39.188 PDT [132] WARNING:  terminating connection because of crash of another server process
2019-10-09 10:17:39.188 PDT [132] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2019-10-09 10:17:39.188 PDT [132] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2019-10-09 10:17:39.217 PDT [1] LOG:  all server processes terminated; reinitializing
2019-10-09 10:17:39.225 PDT [142] FATAL:  the database system is in recovery mode
2019-10-09 10:17:39.225 PDT [141] LOG:  database system was interrupted; last known up at 2019-10-09 10:17:30 PDT
2019-10-09 10:17:39.294 PDT [141] LOG:  database system was not properly shut down; automatic recovery in progress
2019-10-09 10:17:39.296 PDT [141] LOG:  invalid record length at 0/3A15448: wanted 24, got 0
2019-10-09 10:17:39.296 PDT [141] LOG:  redo is not required
2019-10-09 10:17:39.326 PDT [1] LOG:  database system is ready to accept connections
2019-10-09 10:17:53.180 PDT [1] LOG:  server process (PID 154) was terminated by signal 11: Segmentation fault
2019-10-09 10:17:53.180 PDT [1] DETAIL:  Failed process was running: select * from pgr_chinesePostman(
	    'select id,source,target,cost,reverse_cost 
	     from edge_table where id <17');
2019-10-09 10:17:53.180 PDT [1] LOG:  terminating any other active server processes
2019-10-09 10:17:53.180 PDT [146] WARNING:  terminating connection because of crash of another server process
2019-10-09 10:17:53.180 PDT [146] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2019-10-09 10:17:53.180 PDT [146] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2019-10-09 10:17:53.180 PDT [155] FATAL:  the database system is in recovery mode
2019-10-09 10:17:53.208 PDT [1] LOG:  all server processes terminated; reinitializing
2019-10-09 10:17:53.216 PDT [156] LOG:  database system was interrupted; last known up at 2019-10-09 10:17:39 PDT
2019-10-09 10:17:53.279 PDT [156] LOG:  database system was not properly shut down; automatic recovery in progress
2019-10-09 10:17:53.281 PDT [156] LOG:  invalid record length at 0/3A154C0: wanted 24, got 0
2019-10-09 10:17:53.281 PDT [156] LOG:  redo is not required
2019-10-09 10:17:53.312 PDT [1] LOG:  database system is ready to accept connections

Specifications like the version of pgRouting/PostGIS and PostgreSQL as well as Operating System

Use the commands:

psql (12.0)
Type "help" for help.

glendale=> SELECT version();
                                        version                                        
---------------------------------------------------------------------------------------
 PostgreSQL 12.0 on x86_64-pc-linux-musl, compiled by gcc (Alpine 8.3.0) 8.3.0, 64-bit
(1 row)

glendale=> SELECT postgis_full_version();
                                                   postgis_full_version                                                    
---------------------------------------------------------------------------------------------------------------------------
 POSTGIS="3.0.0rc1" [EXTENSION] PGSQL="120" GEOS="3.7.2-CAPI-1.11.2 b55d2125" PROJ="7.0.0" LIBXML="2.9.9" LIBJSON="0.13.1"
(1 row)

glendale=> SELECT pgr_version();
 pgr_version 
-------------
 3.0.0-alpha
(1 row)

@cvvergara
Copy link
Member

@jmarca
Thanks for reporting

This function is currently an experimental function.
Will make a note that it might crash the server, but the fix might be done for next mayor release.

@jmarca
Copy link
Author

jmarca commented Oct 23, 2019 via email

@cvvergara cvvergara self-assigned this Oct 13, 2020
@cvvergara
Copy link
Member

@cvvergara
Copy link
Member

cvvergara commented Oct 15, 2020

Its no crashing the server in the newer releases, but nothing has been done to the code, so the issue is still there, only way to make sure that the server does not crash is working on 3.0.0-alpha
In this execution using 3.0.0-alpha, the server does not crash.
https://github.com/cvvergara/pgrouting/runs/1256391910?check_suite_focus=true#step:8:74
Todo:

  • clean up the fix,
  • port the fix to
    • 3.0.3 on release-3.0 branch
    • 3.1.1 on master branch
    • 3.2.0-dev on develop branch

@krashish8
Copy link
Member

Chinese postman fails, it crashes the server

@krashish8 krashish8 reopened this Oct 24, 2020
@cvvergara
Copy link
Member

Thanks @krashish8 for noticing will review the code again

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants