Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

change priority query: the node with the highest write_lsn / write_lo… #198

Merged
merged 1 commit into from
Sep 17, 2021

Conversation

Mixton
Copy link
Contributor

@Mixton Mixton commented Sep 15, 2021

…ation (the closest to the master node) will also have the highest priority

…ation (the closest to the master node) will also have the highest priority
@Mixton
Copy link
Contributor Author

Mixton commented Sep 15, 2021

Hello,

Here is our cluster configuration:
paf 2.3.0
pgsql 13
(same result on other cluster using paf 2.2.0 and pg96)
maxlag = 0

srv11 => slave sync
srv21 => master
srv31 => slave async (potential as it's in the slave list and we requested at least 1 sync node)

here is the output of the original priority sql request:

postgres=# select application_name, (1000 - ( row_number() OVER ( PARTITION BY state IN ('startup', 'backup') ORDER BY location ASC, application_name ASC ) - 1 ) * 10 ) * 1 AS priority, location, state, current_lag, sync_state FROM (select application_name, write_lsn AS location, state, pg_wal_lsn_diff(pg_current_wal_lsn(), write_lsn) AS current_lag, sync_state FROM pg_stat_replication) AS s1 ORDER BY priority DESC;
-[ RECORD 1 ]----+-------------------------------
application_name | srv31
priority         | 1000
location         | 74C9/C5702CF8
state            | streaming
current_lag      | 129368
sync_state       | potential
-[ RECORD 2 ]----+-------------------------------
application_name | srv11
priority         | 990
location         | 74C9/C571A700
state            | streaming
current_lag      | 21488
sync_state       | sync

here srv11 has the smallest score while is sync and has the highest write_lsn.

pacemaker resource status:

> pcs status --full
Cluster name: pgcluster
Stack: corosync
Current DC: srv31 (3) (version 1.1.19-8.el7_6.5-c3c624ea3d) - partition with quorum
Last updated: Wed Sep 15 14:33:23 2021
Last change: Wed Sep 15 14:11:04 2021 by root via crm_attribute on srv21

3 nodes configured
3 resources configured

Online: [ srv11 (1) srv21 (2) srv31 (3) ]

Full list of resources:

 Master/Slave Set: pgsql-ha [pgsqld]
     pgsqld	(ocf::heartbeat:pgsqlms):	Slave srv11
     pgsqld	(ocf::heartbeat:pgsqlms):	Master srv21
     pgsqld	(ocf::heartbeat:pgsqlms):	Slave srv31
     Masters: [ srv21 ]
     Slaves: [ srv11 srv31 ]

Node Attributes:
* Node srv11 (1):
    + master-pgsqld                   	: 990      
* Node srv21 (2):
    + master-pgsqld                   	: 1001      
* Node srv31 (3):
    + master-pgsqld                   	: 1000     

Migration Summary:
* Node srv11 (1):
* Node srv21 (2):
* Node srv31 (3):

PCSD Status:
  srv21: Online
  srv11: Online
  srv31: Online

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled
  sbd: active/enabled

What this commit does:

The row_number is now ordered by location DESC, the smallest row number will be assigned to the highest write_lsn (or write_location) which is the closest to the master (here the node is a sync node using "synchronous_commit = remote_apply" conf so it should be the same....). As the math in the request will subtract this row number to the max score (1000) the one with the highest write_lsn will be the one with the highest score.

here is the result of the request:

postgres=# select application_name, (1000 - ( row_number() OVER ( PARTITION BY state IN ('startup', 'backup') ORDER BY location DESC, application_name ASC ) - 1 ) * 10 ) * 1 AS priority, location, state, current_lag, sync_state FROM (select application_name, write_lsn AS location, state, pg_wal_lsn_diff(pg_current_wal_lsn(), write_lsn) AS current_lag, sync_state FROM pg_stat_replication) AS s1 ORDER BY priority DESC;
-[ RECORD 1 ]----+-------------------------------
application_name | srv11
priority         | **1000**
location         | 74C9/B7962B20
state            | streaming
current_lag      | 14568
sync_state       | sync
-[ RECORD 2 ]----+-------------------------------
application_name | srv31
priority         | **990**
location         | 74C9/B7948B48
state            | streaming
current_lag      | 121024
sync_state       | potential

Could you confirm that the behavior of this commit is ok?

Regards,

@ioguix
Copy link
Member

ioguix commented Sep 17, 2021

Could you confirm that the behavior of this commit is ok?

I can confirm this commit is ok. I've been able to reproduce on my side and the fix is correct.

Good catch, thank you for the report and pull request!

@ioguix ioguix merged commit 7ebb25b into ClusterLabs:master Sep 17, 2021
@Mixton
Copy link
Contributor Author

Mixton commented Sep 18, 2021

Hello @ioguix , thanks for your time ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants