-
Notifications
You must be signed in to change notification settings - Fork 481
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixed mysql maintainer mode for debug build (bug1408232) #14
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
laurynas-biveinis
added a commit
that referenced
this pull request
Feb 25, 2015
Fixed mysql maintainer mode for debug build (bug1408232)
george-lorch
pushed a commit
to george-lorch/percona-server
that referenced
this pull request
May 7, 2016
george-lorch
pushed a commit
to george-lorch/percona-server
that referenced
this pull request
May 7, 2016
Summary: SQL table records are stored as (pk_keyval -> record_val) key-value pairs. When PK column datatype allows to decode PK column value from its mem-comparable form in pk_keyval, we don't need to store the column in record_val. Test Plan: ./mtr t/rocks*test && ./mtr --suite=rocksdb && ./mtr --suite=rocksdb_rpl Reviewers: hermanlee4, jonahcohen, maykov, yoshinorim Reviewed By: yoshinorim Differential Revision: https://reviews.facebook.net/D33117
george-lorch
pushed a commit
to george-lorch/percona-server
that referenced
this pull request
May 7, 2016
…riable names Test Plan: N/A Reviewers: maykov, hermanlee4, yoshinorim, jonahcohen Reviewed By: jonahcohen Differential Revision: https://reviews.facebook.net/D33453
george-lorch
pushed a commit
to george-lorch/percona-server
that referenced
this pull request
May 9, 2016
george-lorch
pushed a commit
to george-lorch/percona-server
that referenced
this pull request
May 9, 2016
Summary: SQL table records are stored as (pk_keyval -> record_val) key-value pairs. When PK column datatype allows to decode PK column value from its mem-comparable form in pk_keyval, we don't need to store the column in record_val. Test Plan: ./mtr t/rocks*test && ./mtr --suite=rocksdb && ./mtr --suite=rocksdb_rpl Reviewers: hermanlee4, jonahcohen, maykov, yoshinorim Reviewed By: yoshinorim Differential Revision: https://reviews.facebook.net/D33117
george-lorch
pushed a commit
to george-lorch/percona-server
that referenced
this pull request
May 9, 2016
…riable names Test Plan: N/A Reviewers: maykov, hermanlee4, yoshinorim, jonahcohen Reviewed By: jonahcohen Differential Revision: https://reviews.facebook.net/D33453
laurynas-biveinis
referenced
this pull request
in laurynas-biveinis/percona-server
Jul 21, 2017
BohuTANG
referenced
this pull request
in xelabs/tokudb
Dec 24, 2017
Summary: In the xa transation 'XA END' phase(thd_sql_command is SQLCOM_END), TokuDB slave will create both transaction for trx->sp_level and trx->stmt, this will cause the toku_xids_can_create_child abort since the trx->sp_level->xids is 0x00. How to reproduce: With tokudb_debug=32, do the queries on master: create table t1(a int)engine=tokudb; xa start 'x1'; insert into t1 values(1); xa end 'x1'; xa prepare 'x1'; xa commit 'x1'; xa start 'x2'; insert into t1 values(2); xa end 'x2'; xa prepare 'x2'; xa commit 'x2'; Slave debug info: xa start 'x1'; insert into t1 values(1); xa end 'x1'; xa prepare 'x1'; xa commit 'x1'; 2123 0x7ff2d44c5830 /u01/tokudb/storage/tokudb/ha_tokudb.cc:6533 ha_tokudb::external_lock trx (nil) (nil) (nil) (nil) 0 0 2123 /u01/tokudb/storage/tokudb/tokudb_txn.h:127 txn_begin begin txn (nil) 0x7ff2d44a3000 67108864 r=0 2123 0x7ff2d44c5830 /u01/tokudb/storage/tokudb/ha_tokudb.cc:6426 ha_tokudb::create_txn created master 0x7ff2d44a3000 2123 /u01/tokudb/storage/tokudb/tokudb_txn.h:127 txn_begin begin txn 0x7ff2d44a3000 0x7ff2d44a3100 1 r=0 2123 0x7ff2d44c5830 /u01/tokudb/storage/tokudb/ha_tokudb.cc:6468 ha_tokudb::create_txn created stmt 0x7ff2d44a3000 sp_level 0x7ff2d44a3100 2123 0x7ff2d44c5830 /u01/tokudb/storage/tokudb/ha_tokudb.cc:4120 ha_tokudb::write_row txn 0x7ff2d44a3100 2123 /u01/tokudb/storage/tokudb/hatoku_hton.cc:942 tokudb_commit commit trx 0 txn 0x7ff2d44a3100 syncflag 512 xa start 'x2'; insert into t1 values(2); xa end 'x2'; xa prepare 'x2'; xa commit 'x2'; 2123 0x7ff2d44c5830 /u01/tokudb/storage/tokudb/ha_tokudb.cc:6533 ha_tokudb::external_lock trx 0x7ff2d44a3000 (nil) 0x7ff2d44a3000 (nil) 0 0 2123 /u01/tokudb/storage/tokudb/tokudb_txn.h:127 txn_begin begin txn 0x7ff2d44a3000 0x7ff2d44a3000 1 r=0 2123 0x7ff2d44c5830 /u01/tokudb/storage/tokudb/ha_tokudb.cc:6468 ha_tokudb::create_txn created stmt 0x7ff2d44a3000 sp_level 0x7ff2d44a3000 2123 0x7ff2d44c5830 /u01/tokudb/storage/tokudb/ha_tokudb.cc:4120 ha_tokudb::write_row txn 0x7ff2d44a3000 2017-12-24T08:36:45.347405Z 11 [ERROR] TokuDB: toku_db_put: Transaction cannot do work when child exists 2017-12-24T08:36:45.347444Z 11 [Warning] Slave: Got error 22 from storage engine Error_code: 1030 2017-12-24T08:36:45.347448Z 11 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with 'SLAVE START'. We stopped at log 'mysql-bin.000001' position 1007 2123 /u01/tokudb/storage/tokudb/hatoku_hton.cc:972 tokudb_rollback rollback 0 txn 0x7ff2d44a3000 Segmentation fault (core dumped) This crash caused by the parent->xid is 0x00. The core statck info: (gdb) bt #0 __pthread_kill (threadid=<optimized out>, signo=signo@entry=11) at ../sysdeps/unix/sysv/linux/pthread_kill.c:62 #1 0x0000000000f6b647 in my_write_core (sig=sig@entry=11) at /u01/tokudb/mysys/stacktrace.c:249 #2 0x000000000086b945 in handle_fatal_signal (sig=11) at /u01/tokudb/sql/signal_handler.cc:223 #3 <signal handler called> #4 toku_xids_can_create_child (xids=0x0) at /u01/tokudb/storage/tokudb/PerconaFT/ft/txn/xids.cc:93 #5 0x000000000080531f in toku_txn_begin_with_xid (parent=0x7f0bf501c280, txnp=0x7f0bf50a3490, logger=0x7f0c415e66c0, xid=..., snapshot_type=TXN_SNAPSHOT_CHILD, container_db_txn=0x7f0bf50a3400, for_recovery=false, read_only=false) at /u01/tokudb/storage/tokudb/PerconaFT/ft/txn/txn.cc:137 #6 0x00000000007aa6a2 in toku_txn_begin (env=0x7f0c819fde00, stxn=0x7f0bf50a3300, txn=0x7f0bf500dca8, flags=<optimized out>) at /u01/tokudb/storage/tokudb/PerconaFT/src/ydb_txn.cc:579 #7 0x0000000000f99323 in txn_begin (thd=0x7f0bf504bfc0, flags=1, txn=0x7f0bf500dca8, parent=0x7f0bf50a3300, env=<optimized out>) at /u01/tokudb/storage/tokudb/tokudb_txn.h:116 #8 ha_tokudb::create_txn (this=0x7f0bf50c8830, thd=0x7f0bf504bfc0, trx=0x7f0bf500dca0) at /u01/tokudb/storage/tokudb/ha_tokudb.cc:6458 #9 0x0000000000fa48f9 in ha_tokudb::external_lock (this=0x7f0bf50c8830, thd=0x7f0bf504bfc0, lock_type=1) at /u01/tokudb/storage/tokudb/ha_tokudb.cc:6544 #10 0x00000000008d46eb in handler::ha_external_lock (this=0x7f0bf50c8830, thd=thd@entry=0x7f0bf504bfc0, lock_type=lock_type@entry=1) at /u01/tokudb/sql/handler.cc:8352 #11 0x0000000000e4f3b4 in lock_external (count=1, tables=0x7f0bf5050688, thd=0x7f0bf504bfc0) at /u01/tokudb/sql/lock.cc:389 #12 mysql_lock_tables (thd=thd@entry=0x7f0bf504bfc0, tables=<optimized out>, count=<optimized out>, flags=0) at /u01/tokudb/sql/lock.cc:325 #13 0x0000000000cd0b6d in lock_tables (thd=thd@entry=0x7f0bf504bfc0, tables=0x7f0bf4d11020, count=<optimized out>, flags=flags@entry=0) at /u01/tokudb/sql/sql_base.cc:6705 #14 0x0000000000cd61f2 in open_and_lock_tables (thd=0x7f0bf504bfc0, tables=0x7f0bf4d11020, flags=flags@entry=0, prelocking_strategy=prelocking_strategy@entry=0x7f0c89629680) at /u01/tokudb/sql/sql_base.cc:6523 percona#15 0x0000000000ee09eb in open_and_lock_tables (flags=0, tables=<optimized out>, thd=<optimized out>) at /u01/tokudb/sql/sql_base.h:484 percona#16 Rows_log_event::do_apply_event (this=0x7f0bf50ab4a0, rli=0x7f0c87762800) at /u01/tokudb/sql/log_event.cc:10911 percona#17 0x0000000000ed71c0 in Log_event::apply_event (this=this@entry=0x7f0bf50ab4a0, rli=rli@entry=0x7f0c87762800) at /u01/tokudb/sql/log_event.cc:3329 percona#18 0x0000000000f1d233 in apply_event_and_update_pos (ptr_ev=ptr_ev@entry=0x7f0c89629940, thd=thd@entry=0x7f0bf504bfc0, rli=rli@entry=0x7f0c87762800) at /u01/tokudb/sql/rpl_slave.cc:4761 percona#19 0x0000000000f280a8 in exec_relay_log_event (rli=0x7f0c87762800, thd=0x7f0bf504bfc0) at /u01/tokudb/sql/rpl_slave.cc:5276 percona#20 handle_slave_sql (arg=<optimized out>) at /u01/tokudb/sql/rpl_slave.cc:7491 percona#21 0x00000000013c6184 in pfs_spawn_thread (arg=0x7f0bf5bea820) at /u01/tokudb/storage/perfschema/pfs.cc:2185 percona#22 0x00007f0c885126ba in start_thread (arg=0x7f0c8962a700) at pthread_create.c:333 percona#23 0x00007f0c87d293dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109 (gdb) f 10 #10 0x00000000008d46eb in handler::ha_external_lock (this=0x7f0bf50c8830, thd=thd@entry=0x7f0bf504bfc0, lock_type=lock_type@entry=1) at /u01/tokudb/sql/handler.cc:8352 8352 /u01/tokudb/sql/handler.cc: No such file or directory. (gdb) p thd->lex->sql_command = SQLCOM_END With the fixed patch, the debug info is: xa start 'x1'; insert into t1 values(1); xa end 'x1'; xa prepare 'x1'; xa commit 'x1'; 24111 0x7f4aba6c4830 /u01/tokudb/storage/tokudb/ha_tokudb.cc:6534 ha_tokudb::external_lock trx (nil) (nil) (nil) (nil) 0 0 24111 /u01/tokudb/storage/tokudb/tokudb_txn.h:127 txn_begin begin txn (nil) 0x7f4aba689000 67108864 r=0 24111 0x7f4aba6c4830 /u01/tokudb/storage/tokudb/ha_tokudb.cc:6469 ha_tokudb::create_txn created stmt (nil) sp_level 0x7f4aba689000 24111 0x7f4aba6c4830 /u01/tokudb/storage/tokudb/ha_tokudb.cc:4120 ha_tokudb::write_row txn 0x7f4aba689000 24111 /u01/tokudb/storage/tokudb/hatoku_hton.cc:942 tokudb_commit commit trx 0 txn 0x7f4aba689000 syncflag 512 xa start 'x2'; insert into t1 values(2); xa end 'x2'; xa prepare 'x2'; xa commit 'x2'; 24111 0x7f4aba6c4830 /u01/tokudb/storage/tokudb/ha_tokudb.cc:6534 ha_tokudb::external_lock trx (nil) (nil) (nil) (nil) 0 0 24111 /u01/tokudb/storage/tokudb/tokudb_txn.h:127 txn_begin begin txn (nil) 0x7f4aba689000 67108864 r=0 24111 0x7f4aba6c4830 /u01/tokudb/storage/tokudb/ha_tokudb.cc:6469 ha_tokudb::create_txn created stmt (nil) sp_level 0x7f4aba689000 24111 0x7f4aba6c4830 /u01/tokudb/storage/tokudb/ha_tokudb.cc:4120 ha_tokudb::write_row txn 0x7f4aba689000 24111 /u01/tokudb/storage/tokudb/hatoku_hton.cc:942 tokudb_commit commit trx 0 txn 0x7f4aba689000 syncflag 512 Test: mtr --suite=tokudb xa Reviewed by: Rik
laurynas-biveinis
added a commit
that referenced
this pull request
Aug 27, 2018
A subset of binlog encryption tests was crashing with: * thread #39, stop reason = signal SIGSTOP frame #0: 0x00007fff56063b66 libsystem_kernel.dylib`__pthread_kill + 10 frame #1: 0x00007fff5622e080 libsystem_pthread.dylib`pthread_kill + 333 frame #2: 0x000000010657442b mysqld-debug`my_write_core(sig=11) at stacktrace.cc:278 frame #3: 0x0000000104d84334 mysqld-debug`::handle_fatal_signal(sig=11) at signal_handler.cc:254 frame #4: 0x00007fff56221f5a libsystem_platform.dylib`_sigtramp + 26 frame #5: 0x00007fff5622934d libsystem_pthread.dylib`pthread_mutex_lock + 1 frame #6: 0x0000000106578d05 mysqld-debug`native_mutex_lock(mutex=0x0000000000000000) at thr_mutex.h:93 frame #7: 0x0000000106578a57 mysqld-debug`safe_mutex_lock(mp=0x0000000000000000, try_lock=false, file="/Users/laurynas/percona/mysql-server/mysys/mf_iocache2.cc", line=113) at thr_mutex.cc:70 frame #8: 0x000000010653cd3a mysqld-debug`my_mutex_lock(mp=0x00007ffb6b215038, file="/Users/laurynas/percona/mysql-server/mysys/mf_iocache2.cc", line=113) at thr_mutex.h:180 frame #9: 0x000000010653b2cc mysqld-debug`inline_mysql_mutex_lock(that=0x00007ffb6b215038, src_file="/Users/laurynas/percona/mysql-server/mysys/mf_iocache2.cc", src_line=113) at mysql_mutex.h:267 * frame #10: 0x000000010653b0d8 mysqld-debug`my_b_append_tell(info=0x00007ffb6b214fd8) at mf_iocache2.cc:113 frame #11: 0x0000000105ed6a96 mysqld-debug`MYSQL_BIN_LOG::write_buffer(this=0x00007ffb6b214cb8, buf="", len=47, mi=0x00007ffb6b1f6a00) at binlog.cc:7128 frame #12: 0x0000000105f4d54b mysqld-debug`queue_event(mi=0x00007ffb6b1f6a00, buf="", event_len=47, do_flush_mi=true) at rpl_slave.cc:7756 frame #13: 0x0000000105f3a243 mysqld-debug`::handle_slave_io(arg=0x00007ffb6b1f6a00) at rpl_slave.cc:5382 frame #14: 0x00000001065b87a5 mysqld-debug`pfs_spawn_thread(arg=0x00007ffb6a543af0) at pfs.cc:2836 frame #15: 0x00007fff5622b661 libsystem_pthread.dylib`_pthread_body + 340 frame #16: 0x00007fff5622b50d libsystem_pthread.dylib`_pthread_start + 377 frame #17: 0x00007fff5622abf9 libsystem_pthread.dylib`thread_start + 13 This was caused by my_b_append_tell trying to lock a nullptr IO_CACHE::append_buffer_lock. The lock was nullptr, because it's only initialized for SEQ_READ_APPEND IO_CACHEs, whereas we have WRITE_CACHE. This mismatch was introduced by WL#8599 [1] changing the IO_CACHE type from the former to the latter. Fix by using the correct API for the new IO_CACHE type: my_b_tell instead of my_b_append_tell. [1]: commit dbd2ca2 Author: Joao Gramacho <[email protected]> Date: Tue Nov 1 06:45:39 2016 +0000 WL#8599: Reduce contention in IO and SQL threads (...)
laurynas-biveinis
added a commit
that referenced
this pull request
Sep 6, 2018
create_table_info_t::create_table_def leaked memory in the case enable_encryption(table) call failed: worker[5] Sanitizer report from /tmp/results/PS/mysql-test/var/5/log/mysqld.2.err after tests: binlog_encryption.binlog_encryption_without_keyring group_replication.gr_change_master_hidden group_replication.gr_server_uuid_matches_group_name group_replication.gr_perfschema_connect_status group_replication.gr_single_primary_and_leader_election_on_error group_replication.gr_without_perfschema rpl.rpl_key_rotation -------------------------------------------------------------------------- ==14131==ERROR: LeakSanitizer: detected memory leaks Direct leak of 1136 byte(s) in 1 object(s) allocated from: #0 0x7fe9233f1602 in malloc (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x98602) #1 0xc692483 in ut_allocator<unsigned char>::allocate(unsigned long, unsigned char const*, unsigned int, bool, bool) storage/innobase/include/ut0new.h:608 #2 0xc692483 in mem_heap_create_block_func(mem_block_info_t*, unsigned long, unsigned long) storage/innobase/mem/memory.cc:281 #3 0xb99ff96 in mem_heap_create_func storage/innobase/include/mem0mem.ic:464 #4 0xbae8604 in create_table_info_t::create_table_def(dd::Table const*) storage/innobase/handler/ha_innodb.cc:10349 #5 0xbaee018 in create_table_info_t::create_table(dd::Table const*) storage/innobase/handler/ha_innodb.cc:12420 #6 0xbaf1aba in int innobase_basic_ddl::create_impl<dd::Table>(THD*, char const*, TABLE*, HA_CREATE_INFO*, dd::Table*, bool, bool, bool, unsigned long, unsigned long) storage/innobase/handler/ha_innodb.cc:12805 #7 0xbaf7e6a in ha_innobase::create(char const*, TABLE*, HA_CREATE_INFO*, dd::Table*) storage/innobase/handler/ha_innodb.cc:13756 #8 0x2857f7a in ha_create_table(THD*, char const*, char const*, char const*, HA_CREATE_INFO*, List<Create_field> const*, bool, bool, dd::Table*) sql/handler.cc:5156 #9 0x19d0d9f in rea_create_base_table sql/sql_table.cc:991 #10 0x19d0d9f in create_table_impl sql/sql_table.cc:7118 #11 0x19d37cf in mysql_create_table_no_lock(THD*, char const*, char const*, HA_CREATE_INFO*, Alter_info*, unsigned int, bool, bool*, handlerton**) sql/sql_table.cc:7200 #12 0x19dffb2 in mysql_create_table(THD*, TABLE_LIST*, HA_CREATE_INFO*, Alter_info*) sql/sql_table.cc:7950 #13 0x3b58b9b in Sql_cmd_create_table::execute(THD*) sql/sql_cmd_ddl_table.cc:319 #14 0x15917c1 in mysql_execute_command(THD*, bool) sql/sql_parse.cc:4417 #15 0x15b086e in mysql_parse(THD*, Parser_state*, bool) sql/sql_parse.cc:5139 #16 0x8efc7fd in Query_log_event::do_apply_event(Relay_log_info const*, char const*, unsigned long) sql/log_event.cc:5295 #17 0x8f7ea48 in Log_event::apply_event(Relay_log_info*) sql/log_event.cc:3882 #18 0x91cb682 in apply_event_and_update_pos sql/rpl_slave.cc:4352 #19 0x9215e69 in exec_relay_log_event sql/rpl_slave.cc:4812 #20 0x9254685 in handle_slave_sql sql/rpl_slave.cc:6912 #21 0xb1913a3 in pfs_spawn_thread storage/perfschema/pfs.cc:2836 #22 0x7fe9231436b9 in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x76b9) Fix by adding the missing mem_heap_free(heap) call.
percona-ysorokin
pushed a commit
to percona-ysorokin/percona-server
that referenced
this pull request
May 6, 2019
…E TO A SERVER Problem ======================================================================== Running the GCS tests with ASAN seldomly reports a user-after-free of the server reference that the acceptor_learner_task uses. Here is an excerpt of ASAN's output: ==43936==ERROR: AddressSanitizer: heap-use-after-free on address 0x63100021c840 at pc 0x000000530ff8 bp 0x7fc0427e8530 sp 0x7fc0427e8520 WRITE of size 8 at 0x63100021c840 thread T3 #0 0x530ff7 in server_detected /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/xcom_transport.c:962 #1 0x533814 in buffered_read_bytes /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/xcom_transport.c:1249 #2 0x5481af in buffered_read_msg /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/xcom_transport.c:1399 #3 0x51e171 in acceptor_learner_task /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/xcom_base.c:4690 #4 0x562357 in task_loop /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/task.c:1140 #5 0x5003b2 in xcom_taskmain2 /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/xcom_base.c:1324 #6 0x6a278a in Gcs_xcom_proxy_impl::xcom_init(unsigned short, node_address*) /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/gcs_xcom_proxy.cc:164 #7 0x59b3c1 in xcom_taskmain_startup /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/gcs_xcom_control_interface.cc:107 percona#8 0x7fc04a2e4dd4 in start_thread (/lib64/libpthread.so.0+0x7dd4) percona#9 0x7fc047ff2bfc in __clone (/lib64/libc.so.6+0xfebfc) 0x63100021c840 is located 64 bytes inside of 65688-byte region [0x63100021c800,0x63100022c898) freed by thread T3 here: #0 0x7fc04a5d7508 in __interceptor_free (/lib64/libasan.so.4+0xde508) #1 0x52cf86 in freesrv /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/xcom_transport.c:836 #2 0x52ea78 in srv_unref /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/xcom_transport.c:868 #3 0x524c30 in reply_handler_task /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/xcom_base.c:4914 #4 0x562357 in task_loop /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/task.c:1140 #5 0x5003b2 in xcom_taskmain2 /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/xcom_base.c:1324 #6 0x6a278a in Gcs_xcom_proxy_impl::xcom_init(unsigned short, node_address*) /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/gcs_xcom_proxy.cc:164 #7 0x59b3c1 in xcom_taskmain_startup /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/gcs_xcom_control_interface.cc:107 percona#8 0x7fc04a2e4dd4 in start_thread (/lib64/libpthread.so.0+0x7dd4) previously allocated by thread T3 here: #0 0x7fc04a5d7a88 in __interceptor_calloc (/lib64/libasan.so.4+0xdea88) #1 0x543604 in mksrv /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/xcom_transport.c:721 #2 0x543b4c in addsrv /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/xcom_transport.c:755 #3 0x54af61 in update_servers /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/xcom_transport.c:1747 #4 0x501082 in site_install_action /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/xcom_base.c:1572 #5 0x55447c in import_config /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/site_def.c:486 #6 0x506dfc in handle_x_snapshot /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/xcom_base.c:5257 #7 0x50c444 in xcom_fsm /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/xcom_base.c:5325 percona#8 0x516c36 in dispatch_op /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/xcom_base.c:4510 percona#9 0x521997 in acceptor_learner_task /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/xcom_base.c:4772 percona#10 0x562357 in task_loop /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/task.c:1140 percona#11 0x5003b2 in xcom_taskmain2 /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/xcom_base.c:1324 percona#12 0x6a278a in Gcs_xcom_proxy_impl::xcom_init(unsigned short, node_address*) /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/gcs_xcom_proxy.cc:164 percona#13 0x59b3c1 in xcom_taskmain_startup /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/gcs_xcom_control_interface.cc:107 percona#14 0x7fc04a2e4dd4 in start_thread (/lib64/libpthread.so.0+0x7dd4) Analysis ======================================================================== The server structure is reference counted by the associated sender_task and reply_handler_task. When they finish, they unreference the server, which leads to its memory being freed. However, the acceptor_learner_task keeps a "naked" reference to the server structure. Under the right ordering of operations, i.e. the sender_task and reply_handler_task terminating after the acceptor_learner_task acquires, but before it uses, the reference to the server structure, leads to the acceptor_learner_task accessing the server structure after it has been freed. Solution ======================================================================== Let the acceptor_learner_task also reference count the server structure so it is not freed while still in use. Reviewed-by: André Negrão <[email protected]> Reviewed-by: Venkatesh Venugopal <[email protected]> RB: 21209
inikep
pushed a commit
to inikep/percona-server
that referenced
this pull request
Apr 23, 2020
fbshipit-source-id: e972e55c1d1
inikep
pushed a commit
to inikep/percona-server
that referenced
this pull request
Apr 23, 2020
Summary: SQL table records are stored as (pk_keyval -> record_val) key-value pairs. When PK column datatype allows to decode PK column value from its mem-comparable form in pk_keyval, we don't need to store the column in record_val. Differential Revision: https://reviews.facebook.net/D33117 fbshipit-source-id: af2ac6da4e3
inikep
pushed a commit
to inikep/percona-server
that referenced
this pull request
Apr 23, 2020
…riable names Differential Revision: https://reviews.facebook.net/D33453 fbshipit-source-id: 69b3f017f8d
inikep
pushed a commit
to inikep/percona-server
that referenced
this pull request
Feb 24, 2021
fbshipit-source-id: e972e55c1d1
inikep
pushed a commit
to inikep/percona-server
that referenced
this pull request
Feb 24, 2021
Summary: SQL table records are stored as (pk_keyval -> record_val) key-value pairs. When PK column datatype allows to decode PK column value from its mem-comparable form in pk_keyval, we don't need to store the column in record_val. Differential Revision: https://reviews.facebook.net/D33117 fbshipit-source-id: af2ac6da4e3
inikep
pushed a commit
to inikep/percona-server
that referenced
this pull request
Feb 24, 2021
…riable names Differential Revision: https://reviews.facebook.net/D33453 fbshipit-source-id: 69b3f017f8d
ldonoso
pushed a commit
to ldonoso/percona-server
that referenced
this pull request
Nov 4, 2021
Since table is open when update_stats() is called, the m_share pointer is always assigned. Remove conditional logic for m_share. Change-Id: I771a21ed40e95af8782f075b247f22f8635974e4
inikep
pushed a commit
to inikep/percona-server
that referenced
this pull request
Nov 15, 2021
fbshipit-source-id: 0018e7703f7
inikep
pushed a commit
to inikep/percona-server
that referenced
this pull request
Nov 15, 2021
Summary: SQL table records are stored as (pk_keyval -> record_val) key-value pairs. When PK column datatype allows to decode PK column value from its mem-comparable form in pk_keyval, we don't need to store the column in record_val. Differential Revision: https://reviews.facebook.net/D33117 fbshipit-source-id: 4d1d9ddf59e
inikep
pushed a commit
to inikep/percona-server
that referenced
this pull request
Nov 15, 2021
…riable names Differential Revision: https://reviews.facebook.net/D33453 fbshipit-source-id: a82ac4a6677
ldonoso
pushed a commit
to ldonoso/percona-server
that referenced
this pull request
Mar 15, 2022
ldonoso
pushed a commit
to ldonoso/percona-server
that referenced
this pull request
Mar 15, 2022
Summary: SQL table records are stored as (pk_keyval -> record_val) key-value pairs. When PK column datatype allows to decode PK column value from its mem-comparable form in pk_keyval, we don't need to store the column in record_val. Differential Revision: https://reviews.facebook.net/D33117
ldonoso
pushed a commit
to ldonoso/percona-server
that referenced
this pull request
Mar 15, 2022
…riable names Differential Revision: https://reviews.facebook.net/D33453
dlenev
pushed a commit
to dlenev/percona-server
that referenced
this pull request
Jul 25, 2024
…s=0 and a local DDL executed https://perconadev.atlassian.net/browse/PS-9018 Problem ------- In high concurrency scenarios, MySQL replica can enter into a deadlock due to a race condition between the replica applier thread and the client thread performing a binlog group commit. Analysis -------- It needs at least 3 threads for this deadlock to happen 1. One client thread 2. Two replica applier threads How this deadlock happens? -------------------------- 0. Binlog is enabled on replica, but log_replica_updates is disabled. 1. Initially, both "Commit Order" and "Binlog Flush" queues are empty. 2. Replica applier thread 1 enters the group commit pipeline to register in the "Commit Order" queue since `log-replica-updates` is disabled on the replica node. 3. Since both "Commit Order" and "Binlog Flush" queues are empty, the applier thread 1 3.1. Becomes leader (In Commit_stage_manager::enroll_for()). 3.2. Registers in the commit order queue. 3.3. Acquires the lock MYSQL_BIN_LOG::LOCK_log. 3.4. Commit Order queue is emptied, but the lock MYSQL_BIN_LOG::LOCK_log is not yet released. NOTE: SE commit for applier thread is already done by the time it reaches here. 4. Replica applier thread 2 enters the group commit pipeline to register in the "Commit Order" queue since `log-replica-updates` is disabled on the replica node. 5. Since the "Commit Order" queue is empty (emptied by applier thread 1 in 3.4), the applier thread 2 5.1. Becomes leader (In Commit_stage_manager::enroll_for()) 5.2. Registers in the commit order queue. 5.3. Tries to acquire the lock MYSQL_BIN_LOG::LOCK_log. Since it is held by applier thread 1 it will wait until the lock is released. 6. Client thread enters the group commit pipeline to register in the "Binlog Flush" queue. 7. Since "Commit Order" queue is not empty (there is applier thread 2 in the queue), it enters the conditional wait `m_stage_cond_leader` with an intention to become the leader for both the "Binlog Flush" and "Commit Order" queues. 8. Applier thread 1 releases the lock MYSQL_BIN_LOG::LOCK_log and proceeds to update the GTID by calling gtid_state->update_commit_group() from Commit_order_manager::flush_engine_and_signal_threads(). 9. Applier thread 2 acquires the lock MYSQL_BIN_LOG::LOCK_log. 9.1. It checks if there is any thread waiting in the "Binlog Flush" queue to become the leader. Here it finds the client thread waiting to be the leader. 9.2. It releases the lock MYSQL_BIN_LOG::LOCK_log and signals on the cond_var `m_stage_cond_leader` and enters a conditional wait until the thread's `tx_commit_pending` is set to false by the client thread (will be done in the Commit_stage_manager::process_final_stage_for_ordered_commit_group() called by client thread from fetch_and_process_flush_stage_queue()). 10. The client thread wakes up from the cond_var `m_stage_cond_leader`. The thread has now become a leader and it is its responsibility to update GTID of applier thread 2. 10.1. It acquires the lock MYSQL_BIN_LOG::LOCK_log. 10.2. Returns from `enroll_for()` and proceeds to process the "Commit Order" and "Binlog Flush" queues. 10.3. Fetches the "Commit Order" and "Binlog Flush" queues. 10.4. Performs the storage engine flush by calling ha_flush_logs() from fetch_and_process_flush_stage_queue(). 10.5. Proceeds to update the GTID of threads in "Commit Order" queue by calling gtid_state->update_commit_group() from Commit_stage_manager::process_final_stage_for_ordered_commit_group(). 11. At this point, we will have - Client thread performing GTID update on behalf if applier thread 2 (from step 10.5), and - Applier thread 1 performing GTID update for itself (from step 8). Due to the lack of proper synchronization between the above two threads, there exists a time window where both threads can call gtid_state->update_commit_group() concurrently. In subsequent steps, both threads simultaneously try to modify the contents of the array `commit_group_sidnos` which is used to track the lock status of sidnos. This concurrent access to `update_commit_group()` can cause a lock-leak resulting in one thread acquiring the sidno lock and not releasing at all. ----------------------------------------------------------------------------------------------------------- Client thread Applier Thread 1 ----------------------------------------------------------------------------------------------------------- update_commit_group() => global_sid_lock->rdlock(); update_commit_group() => global_sid_lock->rdlock(); calls update_gtids_impl_lock_sidnos() calls update_gtids_impl_lock_sidnos() set commit_group_sidno[2] = true set commit_group_sidno[2] = true lock_sidno(2) -> successful lock_sidno(2) -> waits update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()` if (commit_group_sidnos[2]) { unlock_sidno(2); commit_group_sidnos[2] = false; } Applier thread continues.. lock_sidno(2) -> successful update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()` if (commit_group_sidnos[2]) { <=== this check fails and lock is not released. unlock_sidno(2); commit_group_sidnos[2] = false; } Client thread continues without releasing the lock ----------------------------------------------------------------------------------------------------------- 12. As the above lock-leak can also happen the other way i.e, the applier thread fails to unlock, there can be different consequences hereafter. 13. If the client thread continues without releasing the lock, then at a later stage, it can enter into a deadlock with the applier thread performing a GTID update with stack trace. Client_thread ------------- #1 __GI___lll_lock_wait percona#2 ___pthread_mutex_lock percona#3 native_mutex_lock <= waits for commit lock while holding sidno lock percona#4 Commit_stage_manager::enroll_for percona#5 MYSQL_BIN_LOG::change_stage percona#6 MYSQL_BIN_LOG::ordered_commit percona#7 MYSQL_BIN_LOG::commit percona#8 ha_commit_trans percona#9 trans_commit_implicit percona#10 mysql_create_like_table percona#11 Sql_cmd_create_table::execute percona#12 mysql_execute_command percona#13 dispatch_sql_command Applier thread -------------- #1 ___pthread_mutex_lock percona#2 native_mutex_lock percona#3 safe_mutex_lock percona#4 Gtid_state::update_gtids_impl_lock_sidnos <= waits for sidno lock percona#5 Gtid_state::update_commit_group percona#6 Commit_order_manager::flush_engine_and_signal_threads <= acquires commit lock here percona#7 Commit_order_manager::finish percona#8 Commit_order_manager::wait_and_finish percona#9 ha_commit_low percona#10 trx_coordinator::commit_in_engines percona#11 MYSQL_BIN_LOG::commit percona#12 ha_commit_trans percona#13 trans_commit percona#14 Xid_log_event::do_commit percona#15 Xid_apply_log_event::do_apply_event_worker percona#16 Slave_worker::slave_worker_exec_event percona#17 slave_worker_exec_job_group percona#18 handle_slave_worker 14. If the applier thread continues without releasing the lock, then at a later stage, it can perform recursive locking while setting the GTID for the next transaction (in set_gtid_next()). In debug builds the above case hits the assertion `safe_mutex_assert_not_owner()` meaning the lock is already acquired by the replica applier thread when it tries to re-acquire the lock. Solution -------- In the above problematic example, when seen from each thread individually, we can conclude that there is no problem in the order of lock acquisition, thus there is no need to change the lock order. However, the root cause for this problem is that multiple threads can concurrently access to the array `Gtid_state::commit_group_sidnos`. In its initial implementation, it was expected that threads should hold the `MYSQL_BIN_LOG::LOCK_commit` before modifying its contents. But it was not considered when upstream implemented WL#7846 (MTS: slave-preserve-commit-order when log-slave-updates/binlog is disabled). With this patch, we now ensure that `MYSQL_BIN_LOG::LOCK_commit` is acquired when the client thread (binlog flush leader) when it tries to perform GTID update on behalf of threads waiting in "Commit Order" queue, thus providing a guarantee that `Gtid_state::commit_group_sidnos` array is never accessed without the protection of `MYSQL_BIN_LOG::LOCK_commit`.
dlenev
pushed a commit
to dlenev/percona-server
that referenced
this pull request
Jul 30, 2024
…s=0 and a local DDL executed https://perconadev.atlassian.net/browse/PS-9018 Problem ------- In high concurrency scenarios, MySQL replica can enter into a deadlock due to a race condition between the replica applier thread and the client thread performing a binlog group commit. Analysis -------- It needs at least 3 threads for this deadlock to happen 1. One client thread 2. Two replica applier threads How this deadlock happens? -------------------------- 0. Binlog is enabled on replica, but log_replica_updates is disabled. 1. Initially, both "Commit Order" and "Binlog Flush" queues are empty. 2. Replica applier thread 1 enters the group commit pipeline to register in the "Commit Order" queue since `log-replica-updates` is disabled on the replica node. 3. Since both "Commit Order" and "Binlog Flush" queues are empty, the applier thread 1 3.1. Becomes leader (In Commit_stage_manager::enroll_for()). 3.2. Registers in the commit order queue. 3.3. Acquires the lock MYSQL_BIN_LOG::LOCK_log. 3.4. Commit Order queue is emptied, but the lock MYSQL_BIN_LOG::LOCK_log is not yet released. NOTE: SE commit for applier thread is already done by the time it reaches here. 4. Replica applier thread 2 enters the group commit pipeline to register in the "Commit Order" queue since `log-replica-updates` is disabled on the replica node. 5. Since the "Commit Order" queue is empty (emptied by applier thread 1 in 3.4), the applier thread 2 5.1. Becomes leader (In Commit_stage_manager::enroll_for()) 5.2. Registers in the commit order queue. 5.3. Tries to acquire the lock MYSQL_BIN_LOG::LOCK_log. Since it is held by applier thread 1 it will wait until the lock is released. 6. Client thread enters the group commit pipeline to register in the "Binlog Flush" queue. 7. Since "Commit Order" queue is not empty (there is applier thread 2 in the queue), it enters the conditional wait `m_stage_cond_leader` with an intention to become the leader for both the "Binlog Flush" and "Commit Order" queues. 8. Applier thread 1 releases the lock MYSQL_BIN_LOG::LOCK_log and proceeds to update the GTID by calling gtid_state->update_commit_group() from Commit_order_manager::flush_engine_and_signal_threads(). 9. Applier thread 2 acquires the lock MYSQL_BIN_LOG::LOCK_log. 9.1. It checks if there is any thread waiting in the "Binlog Flush" queue to become the leader. Here it finds the client thread waiting to be the leader. 9.2. It releases the lock MYSQL_BIN_LOG::LOCK_log and signals on the cond_var `m_stage_cond_leader` and enters a conditional wait until the thread's `tx_commit_pending` is set to false by the client thread (will be done in the Commit_stage_manager::process_final_stage_for_ordered_commit_group() called by client thread from fetch_and_process_flush_stage_queue()). 10. The client thread wakes up from the cond_var `m_stage_cond_leader`. The thread has now become a leader and it is its responsibility to update GTID of applier thread 2. 10.1. It acquires the lock MYSQL_BIN_LOG::LOCK_log. 10.2. Returns from `enroll_for()` and proceeds to process the "Commit Order" and "Binlog Flush" queues. 10.3. Fetches the "Commit Order" and "Binlog Flush" queues. 10.4. Performs the storage engine flush by calling ha_flush_logs() from fetch_and_process_flush_stage_queue(). 10.5. Proceeds to update the GTID of threads in "Commit Order" queue by calling gtid_state->update_commit_group() from Commit_stage_manager::process_final_stage_for_ordered_commit_group(). 11. At this point, we will have - Client thread performing GTID update on behalf if applier thread 2 (from step 10.5), and - Applier thread 1 performing GTID update for itself (from step 8). Due to the lack of proper synchronization between the above two threads, there exists a time window where both threads can call gtid_state->update_commit_group() concurrently. In subsequent steps, both threads simultaneously try to modify the contents of the array `commit_group_sidnos` which is used to track the lock status of sidnos. This concurrent access to `update_commit_group()` can cause a lock-leak resulting in one thread acquiring the sidno lock and not releasing at all. ----------------------------------------------------------------------------------------------------------- Client thread Applier Thread 1 ----------------------------------------------------------------------------------------------------------- update_commit_group() => global_sid_lock->rdlock(); update_commit_group() => global_sid_lock->rdlock(); calls update_gtids_impl_lock_sidnos() calls update_gtids_impl_lock_sidnos() set commit_group_sidno[2] = true set commit_group_sidno[2] = true lock_sidno(2) -> successful lock_sidno(2) -> waits update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()` if (commit_group_sidnos[2]) { unlock_sidno(2); commit_group_sidnos[2] = false; } Applier thread continues.. lock_sidno(2) -> successful update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()` if (commit_group_sidnos[2]) { <=== this check fails and lock is not released. unlock_sidno(2); commit_group_sidnos[2] = false; } Client thread continues without releasing the lock ----------------------------------------------------------------------------------------------------------- 12. As the above lock-leak can also happen the other way i.e, the applier thread fails to unlock, there can be different consequences hereafter. 13. If the client thread continues without releasing the lock, then at a later stage, it can enter into a deadlock with the applier thread performing a GTID update with stack trace. Client_thread ------------- #1 __GI___lll_lock_wait percona#2 ___pthread_mutex_lock percona#3 native_mutex_lock <= waits for commit lock while holding sidno lock percona#4 Commit_stage_manager::enroll_for percona#5 MYSQL_BIN_LOG::change_stage percona#6 MYSQL_BIN_LOG::ordered_commit percona#7 MYSQL_BIN_LOG::commit percona#8 ha_commit_trans percona#9 trans_commit_implicit percona#10 mysql_create_like_table percona#11 Sql_cmd_create_table::execute percona#12 mysql_execute_command percona#13 dispatch_sql_command Applier thread -------------- #1 ___pthread_mutex_lock percona#2 native_mutex_lock percona#3 safe_mutex_lock percona#4 Gtid_state::update_gtids_impl_lock_sidnos <= waits for sidno lock percona#5 Gtid_state::update_commit_group percona#6 Commit_order_manager::flush_engine_and_signal_threads <= acquires commit lock here percona#7 Commit_order_manager::finish percona#8 Commit_order_manager::wait_and_finish percona#9 ha_commit_low percona#10 trx_coordinator::commit_in_engines percona#11 MYSQL_BIN_LOG::commit percona#12 ha_commit_trans percona#13 trans_commit percona#14 Xid_log_event::do_commit percona#15 Xid_apply_log_event::do_apply_event_worker percona#16 Slave_worker::slave_worker_exec_event percona#17 slave_worker_exec_job_group percona#18 handle_slave_worker 14. If the applier thread continues without releasing the lock, then at a later stage, it can perform recursive locking while setting the GTID for the next transaction (in set_gtid_next()). In debug builds the above case hits the assertion `safe_mutex_assert_not_owner()` meaning the lock is already acquired by the replica applier thread when it tries to re-acquire the lock. Solution -------- In the above problematic example, when seen from each thread individually, we can conclude that there is no problem in the order of lock acquisition, thus there is no need to change the lock order. However, the root cause for this problem is that multiple threads can concurrently access to the array `Gtid_state::commit_group_sidnos`. In its initial implementation, it was expected that threads should hold the `MYSQL_BIN_LOG::LOCK_commit` before modifying its contents. But it was not considered when upstream implemented WL#7846 (MTS: slave-preserve-commit-order when log-slave-updates/binlog is disabled). With this patch, we now ensure that `MYSQL_BIN_LOG::LOCK_commit` is acquired when the client thread (binlog flush leader) when it tries to perform GTID update on behalf of threads waiting in "Commit Order" queue, thus providing a guarantee that `Gtid_state::commit_group_sidnos` array is never accessed without the protection of `MYSQL_BIN_LOG::LOCK_commit`.
dlenev
pushed a commit
to dlenev/percona-server
that referenced
this pull request
Jul 30, 2024
…s=0 and a local DDL executed https://perconadev.atlassian.net/browse/PS-9018 Problem ------- In high concurrency scenarios, MySQL replica can enter into a deadlock due to a race condition between the replica applier thread and the client thread performing a binlog group commit. Analysis -------- It needs at least 3 threads for this deadlock to happen 1. One client thread 2. Two replica applier threads How this deadlock happens? -------------------------- 0. Binlog is enabled on replica, but log_replica_updates is disabled. 1. Initially, both "Commit Order" and "Binlog Flush" queues are empty. 2. Replica applier thread 1 enters the group commit pipeline to register in the "Commit Order" queue since `log-replica-updates` is disabled on the replica node. 3. Since both "Commit Order" and "Binlog Flush" queues are empty, the applier thread 1 3.1. Becomes leader (In Commit_stage_manager::enroll_for()). 3.2. Registers in the commit order queue. 3.3. Acquires the lock MYSQL_BIN_LOG::LOCK_log. 3.4. Commit Order queue is emptied, but the lock MYSQL_BIN_LOG::LOCK_log is not yet released. NOTE: SE commit for applier thread is already done by the time it reaches here. 4. Replica applier thread 2 enters the group commit pipeline to register in the "Commit Order" queue since `log-replica-updates` is disabled on the replica node. 5. Since the "Commit Order" queue is empty (emptied by applier thread 1 in 3.4), the applier thread 2 5.1. Becomes leader (In Commit_stage_manager::enroll_for()) 5.2. Registers in the commit order queue. 5.3. Tries to acquire the lock MYSQL_BIN_LOG::LOCK_log. Since it is held by applier thread 1 it will wait until the lock is released. 6. Client thread enters the group commit pipeline to register in the "Binlog Flush" queue. 7. Since "Commit Order" queue is not empty (there is applier thread 2 in the queue), it enters the conditional wait `m_stage_cond_leader` with an intention to become the leader for both the "Binlog Flush" and "Commit Order" queues. 8. Applier thread 1 releases the lock MYSQL_BIN_LOG::LOCK_log and proceeds to update the GTID by calling gtid_state->update_commit_group() from Commit_order_manager::flush_engine_and_signal_threads(). 9. Applier thread 2 acquires the lock MYSQL_BIN_LOG::LOCK_log. 9.1. It checks if there is any thread waiting in the "Binlog Flush" queue to become the leader. Here it finds the client thread waiting to be the leader. 9.2. It releases the lock MYSQL_BIN_LOG::LOCK_log and signals on the cond_var `m_stage_cond_leader` and enters a conditional wait until the thread's `tx_commit_pending` is set to false by the client thread (will be done in the Commit_stage_manager::process_final_stage_for_ordered_commit_group() called by client thread from fetch_and_process_flush_stage_queue()). 10. The client thread wakes up from the cond_var `m_stage_cond_leader`. The thread has now become a leader and it is its responsibility to update GTID of applier thread 2. 10.1. It acquires the lock MYSQL_BIN_LOG::LOCK_log. 10.2. Returns from `enroll_for()` and proceeds to process the "Commit Order" and "Binlog Flush" queues. 10.3. Fetches the "Commit Order" and "Binlog Flush" queues. 10.4. Performs the storage engine flush by calling ha_flush_logs() from fetch_and_process_flush_stage_queue(). 10.5. Proceeds to update the GTID of threads in "Commit Order" queue by calling gtid_state->update_commit_group() from Commit_stage_manager::process_final_stage_for_ordered_commit_group(). 11. At this point, we will have - Client thread performing GTID update on behalf if applier thread 2 (from step 10.5), and - Applier thread 1 performing GTID update for itself (from step 8). Due to the lack of proper synchronization between the above two threads, there exists a time window where both threads can call gtid_state->update_commit_group() concurrently. In subsequent steps, both threads simultaneously try to modify the contents of the array `commit_group_sidnos` which is used to track the lock status of sidnos. This concurrent access to `update_commit_group()` can cause a lock-leak resulting in one thread acquiring the sidno lock and not releasing at all. ----------------------------------------------------------------------------------------------------------- Client thread Applier Thread 1 ----------------------------------------------------------------------------------------------------------- update_commit_group() => global_sid_lock->rdlock(); update_commit_group() => global_sid_lock->rdlock(); calls update_gtids_impl_lock_sidnos() calls update_gtids_impl_lock_sidnos() set commit_group_sidno[2] = true set commit_group_sidno[2] = true lock_sidno(2) -> successful lock_sidno(2) -> waits update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()` if (commit_group_sidnos[2]) { unlock_sidno(2); commit_group_sidnos[2] = false; } Applier thread continues.. lock_sidno(2) -> successful update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()` if (commit_group_sidnos[2]) { <=== this check fails and lock is not released. unlock_sidno(2); commit_group_sidnos[2] = false; } Client thread continues without releasing the lock ----------------------------------------------------------------------------------------------------------- 12. As the above lock-leak can also happen the other way i.e, the applier thread fails to unlock, there can be different consequences hereafter. 13. If the client thread continues without releasing the lock, then at a later stage, it can enter into a deadlock with the applier thread performing a GTID update with stack trace. Client_thread ------------- #1 __GI___lll_lock_wait percona#2 ___pthread_mutex_lock percona#3 native_mutex_lock <= waits for commit lock while holding sidno lock percona#4 Commit_stage_manager::enroll_for percona#5 MYSQL_BIN_LOG::change_stage percona#6 MYSQL_BIN_LOG::ordered_commit percona#7 MYSQL_BIN_LOG::commit percona#8 ha_commit_trans percona#9 trans_commit_implicit percona#10 mysql_create_like_table percona#11 Sql_cmd_create_table::execute percona#12 mysql_execute_command percona#13 dispatch_sql_command Applier thread -------------- #1 ___pthread_mutex_lock percona#2 native_mutex_lock percona#3 safe_mutex_lock percona#4 Gtid_state::update_gtids_impl_lock_sidnos <= waits for sidno lock percona#5 Gtid_state::update_commit_group percona#6 Commit_order_manager::flush_engine_and_signal_threads <= acquires commit lock here percona#7 Commit_order_manager::finish percona#8 Commit_order_manager::wait_and_finish percona#9 ha_commit_low percona#10 trx_coordinator::commit_in_engines percona#11 MYSQL_BIN_LOG::commit percona#12 ha_commit_trans percona#13 trans_commit percona#14 Xid_log_event::do_commit percona#15 Xid_apply_log_event::do_apply_event_worker percona#16 Slave_worker::slave_worker_exec_event percona#17 slave_worker_exec_job_group percona#18 handle_slave_worker 14. If the applier thread continues without releasing the lock, then at a later stage, it can perform recursive locking while setting the GTID for the next transaction (in set_gtid_next()). In debug builds the above case hits the assertion `safe_mutex_assert_not_owner()` meaning the lock is already acquired by the replica applier thread when it tries to re-acquire the lock. Solution -------- In the above problematic example, when seen from each thread individually, we can conclude that there is no problem in the order of lock acquisition, thus there is no need to change the lock order. However, the root cause for this problem is that multiple threads can concurrently access to the array `Gtid_state::commit_group_sidnos`. In its initial implementation, it was expected that threads should hold the `MYSQL_BIN_LOG::LOCK_commit` before modifying its contents. But it was not considered when upstream implemented WL#7846 (MTS: slave-preserve-commit-order when log-slave-updates/binlog is disabled). With this patch, we now ensure that `MYSQL_BIN_LOG::LOCK_commit` is acquired when the client thread (binlog flush leader) when it tries to perform GTID update on behalf of threads waiting in "Commit Order" queue, thus providing a guarantee that `Gtid_state::commit_group_sidnos` array is never accessed without the protection of `MYSQL_BIN_LOG::LOCK_commit`.
dlenev
pushed a commit
to dlenev/percona-server
that referenced
this pull request
Aug 21, 2024
…s=0 and a local DDL executed https://perconadev.atlassian.net/browse/PS-9018 Problem ------- In high concurrency scenarios, MySQL replica can enter into a deadlock due to a race condition between the replica applier thread and the client thread performing a binlog group commit. Analysis -------- It needs at least 3 threads for this deadlock to happen 1. One client thread 2. Two replica applier threads How this deadlock happens? -------------------------- 0. Binlog is enabled on replica, but log_replica_updates is disabled. 1. Initially, both "Commit Order" and "Binlog Flush" queues are empty. 2. Replica applier thread 1 enters the group commit pipeline to register in the "Commit Order" queue since `log-replica-updates` is disabled on the replica node. 3. Since both "Commit Order" and "Binlog Flush" queues are empty, the applier thread 1 3.1. Becomes leader (In Commit_stage_manager::enroll_for()). 3.2. Registers in the commit order queue. 3.3. Acquires the lock MYSQL_BIN_LOG::LOCK_log. 3.4. Commit Order queue is emptied, but the lock MYSQL_BIN_LOG::LOCK_log is not yet released. NOTE: SE commit for applier thread is already done by the time it reaches here. 4. Replica applier thread 2 enters the group commit pipeline to register in the "Commit Order" queue since `log-replica-updates` is disabled on the replica node. 5. Since the "Commit Order" queue is empty (emptied by applier thread 1 in 3.4), the applier thread 2 5.1. Becomes leader (In Commit_stage_manager::enroll_for()) 5.2. Registers in the commit order queue. 5.3. Tries to acquire the lock MYSQL_BIN_LOG::LOCK_log. Since it is held by applier thread 1 it will wait until the lock is released. 6. Client thread enters the group commit pipeline to register in the "Binlog Flush" queue. 7. Since "Commit Order" queue is not empty (there is applier thread 2 in the queue), it enters the conditional wait `m_stage_cond_leader` with an intention to become the leader for both the "Binlog Flush" and "Commit Order" queues. 8. Applier thread 1 releases the lock MYSQL_BIN_LOG::LOCK_log and proceeds to update the GTID by calling gtid_state->update_commit_group() from Commit_order_manager::flush_engine_and_signal_threads(). 9. Applier thread 2 acquires the lock MYSQL_BIN_LOG::LOCK_log. 9.1. It checks if there is any thread waiting in the "Binlog Flush" queue to become the leader. Here it finds the client thread waiting to be the leader. 9.2. It releases the lock MYSQL_BIN_LOG::LOCK_log and signals on the cond_var `m_stage_cond_leader` and enters a conditional wait until the thread's `tx_commit_pending` is set to false by the client thread (will be done in the Commit_stage_manager::process_final_stage_for_ordered_commit_group() called by client thread from fetch_and_process_flush_stage_queue()). 10. The client thread wakes up from the cond_var `m_stage_cond_leader`. The thread has now become a leader and it is its responsibility to update GTID of applier thread 2. 10.1. It acquires the lock MYSQL_BIN_LOG::LOCK_log. 10.2. Returns from `enroll_for()` and proceeds to process the "Commit Order" and "Binlog Flush" queues. 10.3. Fetches the "Commit Order" and "Binlog Flush" queues. 10.4. Performs the storage engine flush by calling ha_flush_logs() from fetch_and_process_flush_stage_queue(). 10.5. Proceeds to update the GTID of threads in "Commit Order" queue by calling gtid_state->update_commit_group() from Commit_stage_manager::process_final_stage_for_ordered_commit_group(). 11. At this point, we will have - Client thread performing GTID update on behalf if applier thread 2 (from step 10.5), and - Applier thread 1 performing GTID update for itself (from step 8). Due to the lack of proper synchronization between the above two threads, there exists a time window where both threads can call gtid_state->update_commit_group() concurrently. In subsequent steps, both threads simultaneously try to modify the contents of the array `commit_group_sidnos` which is used to track the lock status of sidnos. This concurrent access to `update_commit_group()` can cause a lock-leak resulting in one thread acquiring the sidno lock and not releasing at all. ----------------------------------------------------------------------------------------------------------- Client thread Applier Thread 1 ----------------------------------------------------------------------------------------------------------- update_commit_group() => global_sid_lock->rdlock(); update_commit_group() => global_sid_lock->rdlock(); calls update_gtids_impl_lock_sidnos() calls update_gtids_impl_lock_sidnos() set commit_group_sidno[2] = true set commit_group_sidno[2] = true lock_sidno(2) -> successful lock_sidno(2) -> waits update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()` if (commit_group_sidnos[2]) { unlock_sidno(2); commit_group_sidnos[2] = false; } Applier thread continues.. lock_sidno(2) -> successful update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()` if (commit_group_sidnos[2]) { <=== this check fails and lock is not released. unlock_sidno(2); commit_group_sidnos[2] = false; } Client thread continues without releasing the lock ----------------------------------------------------------------------------------------------------------- 12. As the above lock-leak can also happen the other way i.e, the applier thread fails to unlock, there can be different consequences hereafter. 13. If the client thread continues without releasing the lock, then at a later stage, it can enter into a deadlock with the applier thread performing a GTID update with stack trace. Client_thread ------------- #1 __GI___lll_lock_wait percona#2 ___pthread_mutex_lock percona#3 native_mutex_lock <= waits for commit lock while holding sidno lock percona#4 Commit_stage_manager::enroll_for percona#5 MYSQL_BIN_LOG::change_stage percona#6 MYSQL_BIN_LOG::ordered_commit percona#7 MYSQL_BIN_LOG::commit percona#8 ha_commit_trans percona#9 trans_commit_implicit percona#10 mysql_create_like_table percona#11 Sql_cmd_create_table::execute percona#12 mysql_execute_command percona#13 dispatch_sql_command Applier thread -------------- #1 ___pthread_mutex_lock percona#2 native_mutex_lock percona#3 safe_mutex_lock percona#4 Gtid_state::update_gtids_impl_lock_sidnos <= waits for sidno lock percona#5 Gtid_state::update_commit_group percona#6 Commit_order_manager::flush_engine_and_signal_threads <= acquires commit lock here percona#7 Commit_order_manager::finish percona#8 Commit_order_manager::wait_and_finish percona#9 ha_commit_low percona#10 trx_coordinator::commit_in_engines percona#11 MYSQL_BIN_LOG::commit percona#12 ha_commit_trans percona#13 trans_commit percona#14 Xid_log_event::do_commit percona#15 Xid_apply_log_event::do_apply_event_worker percona#16 Slave_worker::slave_worker_exec_event percona#17 slave_worker_exec_job_group percona#18 handle_slave_worker 14. If the applier thread continues without releasing the lock, then at a later stage, it can perform recursive locking while setting the GTID for the next transaction (in set_gtid_next()). In debug builds the above case hits the assertion `safe_mutex_assert_not_owner()` meaning the lock is already acquired by the replica applier thread when it tries to re-acquire the lock. Solution -------- In the above problematic example, when seen from each thread individually, we can conclude that there is no problem in the order of lock acquisition, thus there is no need to change the lock order. However, the root cause for this problem is that multiple threads can concurrently access to the array `Gtid_state::commit_group_sidnos`. In its initial implementation, it was expected that threads should hold the `MYSQL_BIN_LOG::LOCK_commit` before modifying its contents. But it was not considered when upstream implemented WL#7846 (MTS: slave-preserve-commit-order when log-slave-updates/binlog is disabled). With this patch, we now ensure that `MYSQL_BIN_LOG::LOCK_commit` is acquired when the client thread (binlog flush leader) when it tries to perform GTID update on behalf of threads waiting in "Commit Order" queue, thus providing a guarantee that `Gtid_state::commit_group_sidnos` array is never accessed without the protection of `MYSQL_BIN_LOG::LOCK_commit`.
dlenev
pushed a commit
to dlenev/percona-server
that referenced
this pull request
Aug 28, 2024
…s=0 and a local DDL executed https://perconadev.atlassian.net/browse/PS-9018 Problem ------- In high concurrency scenarios, MySQL replica can enter into a deadlock due to a race condition between the replica applier thread and the client thread performing a binlog group commit. Analysis -------- It needs at least 3 threads for this deadlock to happen 1. One client thread 2. Two replica applier threads How this deadlock happens? -------------------------- 0. Binlog is enabled on replica, but log_replica_updates is disabled. 1. Initially, both "Commit Order" and "Binlog Flush" queues are empty. 2. Replica applier thread 1 enters the group commit pipeline to register in the "Commit Order" queue since `log-replica-updates` is disabled on the replica node. 3. Since both "Commit Order" and "Binlog Flush" queues are empty, the applier thread 1 3.1. Becomes leader (In Commit_stage_manager::enroll_for()). 3.2. Registers in the commit order queue. 3.3. Acquires the lock MYSQL_BIN_LOG::LOCK_log. 3.4. Commit Order queue is emptied, but the lock MYSQL_BIN_LOG::LOCK_log is not yet released. NOTE: SE commit for applier thread is already done by the time it reaches here. 4. Replica applier thread 2 enters the group commit pipeline to register in the "Commit Order" queue since `log-replica-updates` is disabled on the replica node. 5. Since the "Commit Order" queue is empty (emptied by applier thread 1 in 3.4), the applier thread 2 5.1. Becomes leader (In Commit_stage_manager::enroll_for()) 5.2. Registers in the commit order queue. 5.3. Tries to acquire the lock MYSQL_BIN_LOG::LOCK_log. Since it is held by applier thread 1 it will wait until the lock is released. 6. Client thread enters the group commit pipeline to register in the "Binlog Flush" queue. 7. Since "Commit Order" queue is not empty (there is applier thread 2 in the queue), it enters the conditional wait `m_stage_cond_leader` with an intention to become the leader for both the "Binlog Flush" and "Commit Order" queues. 8. Applier thread 1 releases the lock MYSQL_BIN_LOG::LOCK_log and proceeds to update the GTID by calling gtid_state->update_commit_group() from Commit_order_manager::flush_engine_and_signal_threads(). 9. Applier thread 2 acquires the lock MYSQL_BIN_LOG::LOCK_log. 9.1. It checks if there is any thread waiting in the "Binlog Flush" queue to become the leader. Here it finds the client thread waiting to be the leader. 9.2. It releases the lock MYSQL_BIN_LOG::LOCK_log and signals on the cond_var `m_stage_cond_leader` and enters a conditional wait until the thread's `tx_commit_pending` is set to false by the client thread (will be done in the Commit_stage_manager::process_final_stage_for_ordered_commit_group() called by client thread from fetch_and_process_flush_stage_queue()). 10. The client thread wakes up from the cond_var `m_stage_cond_leader`. The thread has now become a leader and it is its responsibility to update GTID of applier thread 2. 10.1. It acquires the lock MYSQL_BIN_LOG::LOCK_log. 10.2. Returns from `enroll_for()` and proceeds to process the "Commit Order" and "Binlog Flush" queues. 10.3. Fetches the "Commit Order" and "Binlog Flush" queues. 10.4. Performs the storage engine flush by calling ha_flush_logs() from fetch_and_process_flush_stage_queue(). 10.5. Proceeds to update the GTID of threads in "Commit Order" queue by calling gtid_state->update_commit_group() from Commit_stage_manager::process_final_stage_for_ordered_commit_group(). 11. At this point, we will have - Client thread performing GTID update on behalf if applier thread 2 (from step 10.5), and - Applier thread 1 performing GTID update for itself (from step 8). Due to the lack of proper synchronization between the above two threads, there exists a time window where both threads can call gtid_state->update_commit_group() concurrently. In subsequent steps, both threads simultaneously try to modify the contents of the array `commit_group_sidnos` which is used to track the lock status of sidnos. This concurrent access to `update_commit_group()` can cause a lock-leak resulting in one thread acquiring the sidno lock and not releasing at all. ----------------------------------------------------------------------------------------------------------- Client thread Applier Thread 1 ----------------------------------------------------------------------------------------------------------- update_commit_group() => global_sid_lock->rdlock(); update_commit_group() => global_sid_lock->rdlock(); calls update_gtids_impl_lock_sidnos() calls update_gtids_impl_lock_sidnos() set commit_group_sidno[2] = true set commit_group_sidno[2] = true lock_sidno(2) -> successful lock_sidno(2) -> waits update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()` if (commit_group_sidnos[2]) { unlock_sidno(2); commit_group_sidnos[2] = false; } Applier thread continues.. lock_sidno(2) -> successful update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()` if (commit_group_sidnos[2]) { <=== this check fails and lock is not released. unlock_sidno(2); commit_group_sidnos[2] = false; } Client thread continues without releasing the lock ----------------------------------------------------------------------------------------------------------- 12. As the above lock-leak can also happen the other way i.e, the applier thread fails to unlock, there can be different consequences hereafter. 13. If the client thread continues without releasing the lock, then at a later stage, it can enter into a deadlock with the applier thread performing a GTID update with stack trace. Client_thread ------------- #1 __GI___lll_lock_wait percona#2 ___pthread_mutex_lock percona#3 native_mutex_lock <= waits for commit lock while holding sidno lock percona#4 Commit_stage_manager::enroll_for percona#5 MYSQL_BIN_LOG::change_stage percona#6 MYSQL_BIN_LOG::ordered_commit percona#7 MYSQL_BIN_LOG::commit percona#8 ha_commit_trans percona#9 trans_commit_implicit percona#10 mysql_create_like_table percona#11 Sql_cmd_create_table::execute percona#12 mysql_execute_command percona#13 dispatch_sql_command Applier thread -------------- #1 ___pthread_mutex_lock percona#2 native_mutex_lock percona#3 safe_mutex_lock percona#4 Gtid_state::update_gtids_impl_lock_sidnos <= waits for sidno lock percona#5 Gtid_state::update_commit_group percona#6 Commit_order_manager::flush_engine_and_signal_threads <= acquires commit lock here percona#7 Commit_order_manager::finish percona#8 Commit_order_manager::wait_and_finish percona#9 ha_commit_low percona#10 trx_coordinator::commit_in_engines percona#11 MYSQL_BIN_LOG::commit percona#12 ha_commit_trans percona#13 trans_commit percona#14 Xid_log_event::do_commit percona#15 Xid_apply_log_event::do_apply_event_worker percona#16 Slave_worker::slave_worker_exec_event percona#17 slave_worker_exec_job_group percona#18 handle_slave_worker 14. If the applier thread continues without releasing the lock, then at a later stage, it can perform recursive locking while setting the GTID for the next transaction (in set_gtid_next()). In debug builds the above case hits the assertion `safe_mutex_assert_not_owner()` meaning the lock is already acquired by the replica applier thread when it tries to re-acquire the lock. Solution -------- In the above problematic example, when seen from each thread individually, we can conclude that there is no problem in the order of lock acquisition, thus there is no need to change the lock order. However, the root cause for this problem is that multiple threads can concurrently access to the array `Gtid_state::commit_group_sidnos`. In its initial implementation, it was expected that threads should hold the `MYSQL_BIN_LOG::LOCK_commit` before modifying its contents. But it was not considered when upstream implemented WL#7846 (MTS: slave-preserve-commit-order when log-slave-updates/binlog is disabled). With this patch, we now ensure that `MYSQL_BIN_LOG::LOCK_commit` is acquired when the client thread (binlog flush leader) when it tries to perform GTID update on behalf of threads waiting in "Commit Order" queue, thus providing a guarantee that `Gtid_state::commit_group_sidnos` array is never accessed without the protection of `MYSQL_BIN_LOG::LOCK_commit`.
dlenev
pushed a commit
to dlenev/percona-server
that referenced
this pull request
Aug 30, 2024
…s=0 and a local DDL executed https://perconadev.atlassian.net/browse/PS-9018 Problem ------- In high concurrency scenarios, MySQL replica can enter into a deadlock due to a race condition between the replica applier thread and the client thread performing a binlog group commit. Analysis -------- It needs at least 3 threads for this deadlock to happen 1. One client thread 2. Two replica applier threads How this deadlock happens? -------------------------- 0. Binlog is enabled on replica, but log_replica_updates is disabled. 1. Initially, both "Commit Order" and "Binlog Flush" queues are empty. 2. Replica applier thread 1 enters the group commit pipeline to register in the "Commit Order" queue since `log-replica-updates` is disabled on the replica node. 3. Since both "Commit Order" and "Binlog Flush" queues are empty, the applier thread 1 3.1. Becomes leader (In Commit_stage_manager::enroll_for()). 3.2. Registers in the commit order queue. 3.3. Acquires the lock MYSQL_BIN_LOG::LOCK_log. 3.4. Commit Order queue is emptied, but the lock MYSQL_BIN_LOG::LOCK_log is not yet released. NOTE: SE commit for applier thread is already done by the time it reaches here. 4. Replica applier thread 2 enters the group commit pipeline to register in the "Commit Order" queue since `log-replica-updates` is disabled on the replica node. 5. Since the "Commit Order" queue is empty (emptied by applier thread 1 in 3.4), the applier thread 2 5.1. Becomes leader (In Commit_stage_manager::enroll_for()) 5.2. Registers in the commit order queue. 5.3. Tries to acquire the lock MYSQL_BIN_LOG::LOCK_log. Since it is held by applier thread 1 it will wait until the lock is released. 6. Client thread enters the group commit pipeline to register in the "Binlog Flush" queue. 7. Since "Commit Order" queue is not empty (there is applier thread 2 in the queue), it enters the conditional wait `m_stage_cond_leader` with an intention to become the leader for both the "Binlog Flush" and "Commit Order" queues. 8. Applier thread 1 releases the lock MYSQL_BIN_LOG::LOCK_log and proceeds to update the GTID by calling gtid_state->update_commit_group() from Commit_order_manager::flush_engine_and_signal_threads(). 9. Applier thread 2 acquires the lock MYSQL_BIN_LOG::LOCK_log. 9.1. It checks if there is any thread waiting in the "Binlog Flush" queue to become the leader. Here it finds the client thread waiting to be the leader. 9.2. It releases the lock MYSQL_BIN_LOG::LOCK_log and signals on the cond_var `m_stage_cond_leader` and enters a conditional wait until the thread's `tx_commit_pending` is set to false by the client thread (will be done in the Commit_stage_manager::process_final_stage_for_ordered_commit_group() called by client thread from fetch_and_process_flush_stage_queue()). 10. The client thread wakes up from the cond_var `m_stage_cond_leader`. The thread has now become a leader and it is its responsibility to update GTID of applier thread 2. 10.1. It acquires the lock MYSQL_BIN_LOG::LOCK_log. 10.2. Returns from `enroll_for()` and proceeds to process the "Commit Order" and "Binlog Flush" queues. 10.3. Fetches the "Commit Order" and "Binlog Flush" queues. 10.4. Performs the storage engine flush by calling ha_flush_logs() from fetch_and_process_flush_stage_queue(). 10.5. Proceeds to update the GTID of threads in "Commit Order" queue by calling gtid_state->update_commit_group() from Commit_stage_manager::process_final_stage_for_ordered_commit_group(). 11. At this point, we will have - Client thread performing GTID update on behalf if applier thread 2 (from step 10.5), and - Applier thread 1 performing GTID update for itself (from step 8). Due to the lack of proper synchronization between the above two threads, there exists a time window where both threads can call gtid_state->update_commit_group() concurrently. In subsequent steps, both threads simultaneously try to modify the contents of the array `commit_group_sidnos` which is used to track the lock status of sidnos. This concurrent access to `update_commit_group()` can cause a lock-leak resulting in one thread acquiring the sidno lock and not releasing at all. ----------------------------------------------------------------------------------------------------------- Client thread Applier Thread 1 ----------------------------------------------------------------------------------------------------------- update_commit_group() => global_sid_lock->rdlock(); update_commit_group() => global_sid_lock->rdlock(); calls update_gtids_impl_lock_sidnos() calls update_gtids_impl_lock_sidnos() set commit_group_sidno[2] = true set commit_group_sidno[2] = true lock_sidno(2) -> successful lock_sidno(2) -> waits update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()` if (commit_group_sidnos[2]) { unlock_sidno(2); commit_group_sidnos[2] = false; } Applier thread continues.. lock_sidno(2) -> successful update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()` if (commit_group_sidnos[2]) { <=== this check fails and lock is not released. unlock_sidno(2); commit_group_sidnos[2] = false; } Client thread continues without releasing the lock ----------------------------------------------------------------------------------------------------------- 12. As the above lock-leak can also happen the other way i.e, the applier thread fails to unlock, there can be different consequences hereafter. 13. If the client thread continues without releasing the lock, then at a later stage, it can enter into a deadlock with the applier thread performing a GTID update with stack trace. Client_thread ------------- #1 __GI___lll_lock_wait percona#2 ___pthread_mutex_lock percona#3 native_mutex_lock <= waits for commit lock while holding sidno lock percona#4 Commit_stage_manager::enroll_for percona#5 MYSQL_BIN_LOG::change_stage percona#6 MYSQL_BIN_LOG::ordered_commit percona#7 MYSQL_BIN_LOG::commit percona#8 ha_commit_trans percona#9 trans_commit_implicit percona#10 mysql_create_like_table percona#11 Sql_cmd_create_table::execute percona#12 mysql_execute_command percona#13 dispatch_sql_command Applier thread -------------- #1 ___pthread_mutex_lock percona#2 native_mutex_lock percona#3 safe_mutex_lock percona#4 Gtid_state::update_gtids_impl_lock_sidnos <= waits for sidno lock percona#5 Gtid_state::update_commit_group percona#6 Commit_order_manager::flush_engine_and_signal_threads <= acquires commit lock here percona#7 Commit_order_manager::finish percona#8 Commit_order_manager::wait_and_finish percona#9 ha_commit_low percona#10 trx_coordinator::commit_in_engines percona#11 MYSQL_BIN_LOG::commit percona#12 ha_commit_trans percona#13 trans_commit percona#14 Xid_log_event::do_commit percona#15 Xid_apply_log_event::do_apply_event_worker percona#16 Slave_worker::slave_worker_exec_event percona#17 slave_worker_exec_job_group percona#18 handle_slave_worker 14. If the applier thread continues without releasing the lock, then at a later stage, it can perform recursive locking while setting the GTID for the next transaction (in set_gtid_next()). In debug builds the above case hits the assertion `safe_mutex_assert_not_owner()` meaning the lock is already acquired by the replica applier thread when it tries to re-acquire the lock. Solution -------- In the above problematic example, when seen from each thread individually, we can conclude that there is no problem in the order of lock acquisition, thus there is no need to change the lock order. However, the root cause for this problem is that multiple threads can concurrently access to the array `Gtid_state::commit_group_sidnos`. In its initial implementation, it was expected that threads should hold the `MYSQL_BIN_LOG::LOCK_commit` before modifying its contents. But it was not considered when upstream implemented WL#7846 (MTS: slave-preserve-commit-order when log-slave-updates/binlog is disabled). With this patch, we now ensure that `MYSQL_BIN_LOG::LOCK_commit` is acquired when the client thread (binlog flush leader) when it tries to perform GTID update on behalf of threads waiting in "Commit Order" queue, thus providing a guarantee that `Gtid_state::commit_group_sidnos` array is never accessed without the protection of `MYSQL_BIN_LOG::LOCK_commit`.
inikep
pushed a commit
to inikep/percona-server
that referenced
this pull request
Sep 11, 2024
…s=0 and a local DDL executed https://perconadev.atlassian.net/browse/PS-9018 Problem ------- In high concurrency scenarios, MySQL replica can enter into a deadlock due to a race condition between the replica applier thread and the client thread performing a binlog group commit. Analysis -------- It needs at least 3 threads for this deadlock to happen 1. One client thread 2. Two replica applier threads How this deadlock happens? -------------------------- 0. Binlog is enabled on replica, but log_replica_updates is disabled. 1. Initially, both "Commit Order" and "Binlog Flush" queues are empty. 2. Replica applier thread 1 enters the group commit pipeline to register in the "Commit Order" queue since `log-replica-updates` is disabled on the replica node. 3. Since both "Commit Order" and "Binlog Flush" queues are empty, the applier thread 1 3.1. Becomes leader (In Commit_stage_manager::enroll_for()). 3.2. Registers in the commit order queue. 3.3. Acquires the lock MYSQL_BIN_LOG::LOCK_log. 3.4. Commit Order queue is emptied, but the lock MYSQL_BIN_LOG::LOCK_log is not yet released. NOTE: SE commit for applier thread is already done by the time it reaches here. 4. Replica applier thread 2 enters the group commit pipeline to register in the "Commit Order" queue since `log-replica-updates` is disabled on the replica node. 5. Since the "Commit Order" queue is empty (emptied by applier thread 1 in 3.4), the applier thread 2 5.1. Becomes leader (In Commit_stage_manager::enroll_for()) 5.2. Registers in the commit order queue. 5.3. Tries to acquire the lock MYSQL_BIN_LOG::LOCK_log. Since it is held by applier thread 1 it will wait until the lock is released. 6. Client thread enters the group commit pipeline to register in the "Binlog Flush" queue. 7. Since "Commit Order" queue is not empty (there is applier thread 2 in the queue), it enters the conditional wait `m_stage_cond_leader` with an intention to become the leader for both the "Binlog Flush" and "Commit Order" queues. 8. Applier thread 1 releases the lock MYSQL_BIN_LOG::LOCK_log and proceeds to update the GTID by calling gtid_state->update_commit_group() from Commit_order_manager::flush_engine_and_signal_threads(). 9. Applier thread 2 acquires the lock MYSQL_BIN_LOG::LOCK_log. 9.1. It checks if there is any thread waiting in the "Binlog Flush" queue to become the leader. Here it finds the client thread waiting to be the leader. 9.2. It releases the lock MYSQL_BIN_LOG::LOCK_log and signals on the cond_var `m_stage_cond_leader` and enters a conditional wait until the thread's `tx_commit_pending` is set to false by the client thread (will be done in the Commit_stage_manager::process_final_stage_for_ordered_commit_group() called by client thread from fetch_and_process_flush_stage_queue()). 10. The client thread wakes up from the cond_var `m_stage_cond_leader`. The thread has now become a leader and it is its responsibility to update GTID of applier thread 2. 10.1. It acquires the lock MYSQL_BIN_LOG::LOCK_log. 10.2. Returns from `enroll_for()` and proceeds to process the "Commit Order" and "Binlog Flush" queues. 10.3. Fetches the "Commit Order" and "Binlog Flush" queues. 10.4. Performs the storage engine flush by calling ha_flush_logs() from fetch_and_process_flush_stage_queue(). 10.5. Proceeds to update the GTID of threads in "Commit Order" queue by calling gtid_state->update_commit_group() from Commit_stage_manager::process_final_stage_for_ordered_commit_group(). 11. At this point, we will have - Client thread performing GTID update on behalf if applier thread 2 (from step 10.5), and - Applier thread 1 performing GTID update for itself (from step 8). Due to the lack of proper synchronization between the above two threads, there exists a time window where both threads can call gtid_state->update_commit_group() concurrently. In subsequent steps, both threads simultaneously try to modify the contents of the array `commit_group_sidnos` which is used to track the lock status of sidnos. This concurrent access to `update_commit_group()` can cause a lock-leak resulting in one thread acquiring the sidno lock and not releasing at all. ----------------------------------------------------------------------------------------------------------- Client thread Applier Thread 1 ----------------------------------------------------------------------------------------------------------- update_commit_group() => global_sid_lock->rdlock(); update_commit_group() => global_sid_lock->rdlock(); calls update_gtids_impl_lock_sidnos() calls update_gtids_impl_lock_sidnos() set commit_group_sidno[2] = true set commit_group_sidno[2] = true lock_sidno(2) -> successful lock_sidno(2) -> waits update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()` if (commit_group_sidnos[2]) { unlock_sidno(2); commit_group_sidnos[2] = false; } Applier thread continues.. lock_sidno(2) -> successful update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()` if (commit_group_sidnos[2]) { <=== this check fails and lock is not released. unlock_sidno(2); commit_group_sidnos[2] = false; } Client thread continues without releasing the lock ----------------------------------------------------------------------------------------------------------- 12. As the above lock-leak can also happen the other way i.e, the applier thread fails to unlock, there can be different consequences hereafter. 13. If the client thread continues without releasing the lock, then at a later stage, it can enter into a deadlock with the applier thread performing a GTID update with stack trace. Client_thread ------------- #1 __GI___lll_lock_wait #2 ___pthread_mutex_lock #3 native_mutex_lock <= waits for commit lock while holding sidno lock #4 Commit_stage_manager::enroll_for #5 MYSQL_BIN_LOG::change_stage #6 MYSQL_BIN_LOG::ordered_commit #7 MYSQL_BIN_LOG::commit #8 ha_commit_trans #9 trans_commit_implicit #10 mysql_create_like_table #11 Sql_cmd_create_table::execute #12 mysql_execute_command percona#13 dispatch_sql_command Applier thread -------------- #1 ___pthread_mutex_lock #2 native_mutex_lock #3 safe_mutex_lock #4 Gtid_state::update_gtids_impl_lock_sidnos <= waits for sidno lock #5 Gtid_state::update_commit_group #6 Commit_order_manager::flush_engine_and_signal_threads <= acquires commit lock here #7 Commit_order_manager::finish #8 Commit_order_manager::wait_and_finish #9 ha_commit_low #10 trx_coordinator::commit_in_engines #11 MYSQL_BIN_LOG::commit #12 ha_commit_trans percona#13 trans_commit percona#14 Xid_log_event::do_commit percona#15 Xid_apply_log_event::do_apply_event_worker percona#16 Slave_worker::slave_worker_exec_event percona#17 slave_worker_exec_job_group percona#18 handle_slave_worker 14. If the applier thread continues without releasing the lock, then at a later stage, it can perform recursive locking while setting the GTID for the next transaction (in set_gtid_next()). In debug builds the above case hits the assertion `safe_mutex_assert_not_owner()` meaning the lock is already acquired by the replica applier thread when it tries to re-acquire the lock. Solution -------- In the above problematic example, when seen from each thread individually, we can conclude that there is no problem in the order of lock acquisition, thus there is no need to change the lock order. However, the root cause for this problem is that multiple threads can concurrently access to the array `Gtid_state::commit_group_sidnos`. In its initial implementation, it was expected that threads should hold the `MYSQL_BIN_LOG::LOCK_commit` before modifying its contents. But it was not considered when upstream implemented WL#7846 (MTS: slave-preserve-commit-order when log-slave-updates/binlog is disabled). With this patch, we now ensure that `MYSQL_BIN_LOG::LOCK_commit` is acquired when the client thread (binlog flush leader) when it tries to perform GTID update on behalf of threads waiting in "Commit Order" queue, thus providing a guarantee that `Gtid_state::commit_group_sidnos` array is never accessed without the protection of `MYSQL_BIN_LOG::LOCK_commit`.
inikep
pushed a commit
to inikep/percona-server
that referenced
this pull request
Sep 12, 2024
…s=0 and a local DDL executed https://perconadev.atlassian.net/browse/PS-9018 Problem ------- In high concurrency scenarios, MySQL replica can enter into a deadlock due to a race condition between the replica applier thread and the client thread performing a binlog group commit. Analysis -------- It needs at least 3 threads for this deadlock to happen 1. One client thread 2. Two replica applier threads How this deadlock happens? -------------------------- 0. Binlog is enabled on replica, but log_replica_updates is disabled. 1. Initially, both "Commit Order" and "Binlog Flush" queues are empty. 2. Replica applier thread 1 enters the group commit pipeline to register in the "Commit Order" queue since `log-replica-updates` is disabled on the replica node. 3. Since both "Commit Order" and "Binlog Flush" queues are empty, the applier thread 1 3.1. Becomes leader (In Commit_stage_manager::enroll_for()). 3.2. Registers in the commit order queue. 3.3. Acquires the lock MYSQL_BIN_LOG::LOCK_log. 3.4. Commit Order queue is emptied, but the lock MYSQL_BIN_LOG::LOCK_log is not yet released. NOTE: SE commit for applier thread is already done by the time it reaches here. 4. Replica applier thread 2 enters the group commit pipeline to register in the "Commit Order" queue since `log-replica-updates` is disabled on the replica node. 5. Since the "Commit Order" queue is empty (emptied by applier thread 1 in 3.4), the applier thread 2 5.1. Becomes leader (In Commit_stage_manager::enroll_for()) 5.2. Registers in the commit order queue. 5.3. Tries to acquire the lock MYSQL_BIN_LOG::LOCK_log. Since it is held by applier thread 1 it will wait until the lock is released. 6. Client thread enters the group commit pipeline to register in the "Binlog Flush" queue. 7. Since "Commit Order" queue is not empty (there is applier thread 2 in the queue), it enters the conditional wait `m_stage_cond_leader` with an intention to become the leader for both the "Binlog Flush" and "Commit Order" queues. 8. Applier thread 1 releases the lock MYSQL_BIN_LOG::LOCK_log and proceeds to update the GTID by calling gtid_state->update_commit_group() from Commit_order_manager::flush_engine_and_signal_threads(). 9. Applier thread 2 acquires the lock MYSQL_BIN_LOG::LOCK_log. 9.1. It checks if there is any thread waiting in the "Binlog Flush" queue to become the leader. Here it finds the client thread waiting to be the leader. 9.2. It releases the lock MYSQL_BIN_LOG::LOCK_log and signals on the cond_var `m_stage_cond_leader` and enters a conditional wait until the thread's `tx_commit_pending` is set to false by the client thread (will be done in the Commit_stage_manager::process_final_stage_for_ordered_commit_group() called by client thread from fetch_and_process_flush_stage_queue()). 10. The client thread wakes up from the cond_var `m_stage_cond_leader`. The thread has now become a leader and it is its responsibility to update GTID of applier thread 2. 10.1. It acquires the lock MYSQL_BIN_LOG::LOCK_log. 10.2. Returns from `enroll_for()` and proceeds to process the "Commit Order" and "Binlog Flush" queues. 10.3. Fetches the "Commit Order" and "Binlog Flush" queues. 10.4. Performs the storage engine flush by calling ha_flush_logs() from fetch_and_process_flush_stage_queue(). 10.5. Proceeds to update the GTID of threads in "Commit Order" queue by calling gtid_state->update_commit_group() from Commit_stage_manager::process_final_stage_for_ordered_commit_group(). 11. At this point, we will have - Client thread performing GTID update on behalf if applier thread 2 (from step 10.5), and - Applier thread 1 performing GTID update for itself (from step 8). Due to the lack of proper synchronization between the above two threads, there exists a time window where both threads can call gtid_state->update_commit_group() concurrently. In subsequent steps, both threads simultaneously try to modify the contents of the array `commit_group_sidnos` which is used to track the lock status of sidnos. This concurrent access to `update_commit_group()` can cause a lock-leak resulting in one thread acquiring the sidno lock and not releasing at all. ----------------------------------------------------------------------------------------------------------- Client thread Applier Thread 1 ----------------------------------------------------------------------------------------------------------- update_commit_group() => global_sid_lock->rdlock(); update_commit_group() => global_sid_lock->rdlock(); calls update_gtids_impl_lock_sidnos() calls update_gtids_impl_lock_sidnos() set commit_group_sidno[2] = true set commit_group_sidno[2] = true lock_sidno(2) -> successful lock_sidno(2) -> waits update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()` if (commit_group_sidnos[2]) { unlock_sidno(2); commit_group_sidnos[2] = false; } Applier thread continues.. lock_sidno(2) -> successful update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()` if (commit_group_sidnos[2]) { <=== this check fails and lock is not released. unlock_sidno(2); commit_group_sidnos[2] = false; } Client thread continues without releasing the lock ----------------------------------------------------------------------------------------------------------- 12. As the above lock-leak can also happen the other way i.e, the applier thread fails to unlock, there can be different consequences hereafter. 13. If the client thread continues without releasing the lock, then at a later stage, it can enter into a deadlock with the applier thread performing a GTID update with stack trace. Client_thread ------------- #1 __GI___lll_lock_wait #2 ___pthread_mutex_lock #3 native_mutex_lock <= waits for commit lock while holding sidno lock #4 Commit_stage_manager::enroll_for #5 MYSQL_BIN_LOG::change_stage #6 MYSQL_BIN_LOG::ordered_commit #7 MYSQL_BIN_LOG::commit #8 ha_commit_trans #9 trans_commit_implicit #10 mysql_create_like_table #11 Sql_cmd_create_table::execute #12 mysql_execute_command percona#13 dispatch_sql_command Applier thread -------------- #1 ___pthread_mutex_lock #2 native_mutex_lock #3 safe_mutex_lock #4 Gtid_state::update_gtids_impl_lock_sidnos <= waits for sidno lock #5 Gtid_state::update_commit_group #6 Commit_order_manager::flush_engine_and_signal_threads <= acquires commit lock here #7 Commit_order_manager::finish #8 Commit_order_manager::wait_and_finish #9 ha_commit_low #10 trx_coordinator::commit_in_engines #11 MYSQL_BIN_LOG::commit #12 ha_commit_trans percona#13 trans_commit percona#14 Xid_log_event::do_commit percona#15 Xid_apply_log_event::do_apply_event_worker percona#16 Slave_worker::slave_worker_exec_event percona#17 slave_worker_exec_job_group percona#18 handle_slave_worker 14. If the applier thread continues without releasing the lock, then at a later stage, it can perform recursive locking while setting the GTID for the next transaction (in set_gtid_next()). In debug builds the above case hits the assertion `safe_mutex_assert_not_owner()` meaning the lock is already acquired by the replica applier thread when it tries to re-acquire the lock. Solution -------- In the above problematic example, when seen from each thread individually, we can conclude that there is no problem in the order of lock acquisition, thus there is no need to change the lock order. However, the root cause for this problem is that multiple threads can concurrently access to the array `Gtid_state::commit_group_sidnos`. In its initial implementation, it was expected that threads should hold the `MYSQL_BIN_LOG::LOCK_commit` before modifying its contents. But it was not considered when upstream implemented WL#7846 (MTS: slave-preserve-commit-order when log-slave-updates/binlog is disabled). With this patch, we now ensure that `MYSQL_BIN_LOG::LOCK_commit` is acquired when the client thread (binlog flush leader) when it tries to perform GTID update on behalf of threads waiting in "Commit Order" queue, thus providing a guarantee that `Gtid_state::commit_group_sidnos` array is never accessed without the protection of `MYSQL_BIN_LOG::LOCK_commit`.
inikep
pushed a commit
to inikep/percona-server
that referenced
this pull request
Sep 17, 2024
…s=0 and a local DDL executed https://perconadev.atlassian.net/browse/PS-9018 Problem ------- In high concurrency scenarios, MySQL replica can enter into a deadlock due to a race condition between the replica applier thread and the client thread performing a binlog group commit. Analysis -------- It needs at least 3 threads for this deadlock to happen 1. One client thread 2. Two replica applier threads How this deadlock happens? -------------------------- 0. Binlog is enabled on replica, but log_replica_updates is disabled. 1. Initially, both "Commit Order" and "Binlog Flush" queues are empty. 2. Replica applier thread 1 enters the group commit pipeline to register in the "Commit Order" queue since `log-replica-updates` is disabled on the replica node. 3. Since both "Commit Order" and "Binlog Flush" queues are empty, the applier thread 1 3.1. Becomes leader (In Commit_stage_manager::enroll_for()). 3.2. Registers in the commit order queue. 3.3. Acquires the lock MYSQL_BIN_LOG::LOCK_log. 3.4. Commit Order queue is emptied, but the lock MYSQL_BIN_LOG::LOCK_log is not yet released. NOTE: SE commit for applier thread is already done by the time it reaches here. 4. Replica applier thread 2 enters the group commit pipeline to register in the "Commit Order" queue since `log-replica-updates` is disabled on the replica node. 5. Since the "Commit Order" queue is empty (emptied by applier thread 1 in 3.4), the applier thread 2 5.1. Becomes leader (In Commit_stage_manager::enroll_for()) 5.2. Registers in the commit order queue. 5.3. Tries to acquire the lock MYSQL_BIN_LOG::LOCK_log. Since it is held by applier thread 1 it will wait until the lock is released. 6. Client thread enters the group commit pipeline to register in the "Binlog Flush" queue. 7. Since "Commit Order" queue is not empty (there is applier thread 2 in the queue), it enters the conditional wait `m_stage_cond_leader` with an intention to become the leader for both the "Binlog Flush" and "Commit Order" queues. 8. Applier thread 1 releases the lock MYSQL_BIN_LOG::LOCK_log and proceeds to update the GTID by calling gtid_state->update_commit_group() from Commit_order_manager::flush_engine_and_signal_threads(). 9. Applier thread 2 acquires the lock MYSQL_BIN_LOG::LOCK_log. 9.1. It checks if there is any thread waiting in the "Binlog Flush" queue to become the leader. Here it finds the client thread waiting to be the leader. 9.2. It releases the lock MYSQL_BIN_LOG::LOCK_log and signals on the cond_var `m_stage_cond_leader` and enters a conditional wait until the thread's `tx_commit_pending` is set to false by the client thread (will be done in the Commit_stage_manager::process_final_stage_for_ordered_commit_group() called by client thread from fetch_and_process_flush_stage_queue()). 10. The client thread wakes up from the cond_var `m_stage_cond_leader`. The thread has now become a leader and it is its responsibility to update GTID of applier thread 2. 10.1. It acquires the lock MYSQL_BIN_LOG::LOCK_log. 10.2. Returns from `enroll_for()` and proceeds to process the "Commit Order" and "Binlog Flush" queues. 10.3. Fetches the "Commit Order" and "Binlog Flush" queues. 10.4. Performs the storage engine flush by calling ha_flush_logs() from fetch_and_process_flush_stage_queue(). 10.5. Proceeds to update the GTID of threads in "Commit Order" queue by calling gtid_state->update_commit_group() from Commit_stage_manager::process_final_stage_for_ordered_commit_group(). 11. At this point, we will have - Client thread performing GTID update on behalf if applier thread 2 (from step 10.5), and - Applier thread 1 performing GTID update for itself (from step 8). Due to the lack of proper synchronization between the above two threads, there exists a time window where both threads can call gtid_state->update_commit_group() concurrently. In subsequent steps, both threads simultaneously try to modify the contents of the array `commit_group_sidnos` which is used to track the lock status of sidnos. This concurrent access to `update_commit_group()` can cause a lock-leak resulting in one thread acquiring the sidno lock and not releasing at all. ----------------------------------------------------------------------------------------------------------- Client thread Applier Thread 1 ----------------------------------------------------------------------------------------------------------- update_commit_group() => global_sid_lock->rdlock(); update_commit_group() => global_sid_lock->rdlock(); calls update_gtids_impl_lock_sidnos() calls update_gtids_impl_lock_sidnos() set commit_group_sidno[2] = true set commit_group_sidno[2] = true lock_sidno(2) -> successful lock_sidno(2) -> waits update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()` if (commit_group_sidnos[2]) { unlock_sidno(2); commit_group_sidnos[2] = false; } Applier thread continues.. lock_sidno(2) -> successful update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()` if (commit_group_sidnos[2]) { <=== this check fails and lock is not released. unlock_sidno(2); commit_group_sidnos[2] = false; } Client thread continues without releasing the lock ----------------------------------------------------------------------------------------------------------- 12. As the above lock-leak can also happen the other way i.e, the applier thread fails to unlock, there can be different consequences hereafter. 13. If the client thread continues without releasing the lock, then at a later stage, it can enter into a deadlock with the applier thread performing a GTID update with stack trace. Client_thread ------------- #1 __GI___lll_lock_wait #2 ___pthread_mutex_lock #3 native_mutex_lock <= waits for commit lock while holding sidno lock #4 Commit_stage_manager::enroll_for #5 MYSQL_BIN_LOG::change_stage #6 MYSQL_BIN_LOG::ordered_commit #7 MYSQL_BIN_LOG::commit #8 ha_commit_trans #9 trans_commit_implicit #10 mysql_create_like_table #11 Sql_cmd_create_table::execute #12 mysql_execute_command percona#13 dispatch_sql_command Applier thread -------------- #1 ___pthread_mutex_lock #2 native_mutex_lock #3 safe_mutex_lock #4 Gtid_state::update_gtids_impl_lock_sidnos <= waits for sidno lock #5 Gtid_state::update_commit_group #6 Commit_order_manager::flush_engine_and_signal_threads <= acquires commit lock here #7 Commit_order_manager::finish #8 Commit_order_manager::wait_and_finish #9 ha_commit_low #10 trx_coordinator::commit_in_engines #11 MYSQL_BIN_LOG::commit #12 ha_commit_trans percona#13 trans_commit percona#14 Xid_log_event::do_commit percona#15 Xid_apply_log_event::do_apply_event_worker percona#16 Slave_worker::slave_worker_exec_event percona#17 slave_worker_exec_job_group percona#18 handle_slave_worker 14. If the applier thread continues without releasing the lock, then at a later stage, it can perform recursive locking while setting the GTID for the next transaction (in set_gtid_next()). In debug builds the above case hits the assertion `safe_mutex_assert_not_owner()` meaning the lock is already acquired by the replica applier thread when it tries to re-acquire the lock. Solution -------- In the above problematic example, when seen from each thread individually, we can conclude that there is no problem in the order of lock acquisition, thus there is no need to change the lock order. However, the root cause for this problem is that multiple threads can concurrently access to the array `Gtid_state::commit_group_sidnos`. In its initial implementation, it was expected that threads should hold the `MYSQL_BIN_LOG::LOCK_commit` before modifying its contents. But it was not considered when upstream implemented WL#7846 (MTS: slave-preserve-commit-order when log-slave-updates/binlog is disabled). With this patch, we now ensure that `MYSQL_BIN_LOG::LOCK_commit` is acquired when the client thread (binlog flush leader) when it tries to perform GTID update on behalf of threads waiting in "Commit Order" queue, thus providing a guarantee that `Gtid_state::commit_group_sidnos` array is never accessed without the protection of `MYSQL_BIN_LOG::LOCK_commit`.
inikep
pushed a commit
that referenced
this pull request
Sep 23, 2024
… for connection xxx'. The new iterator based explains are not impacted. The issue here is a race condition. More than one thread is using the query term iterator at the same time (whoch is neithe threas safe nor reantrant), and part of its state is in the query terms being visited which leads to interference/race conditions. a) the explain thread uses an iterator here: Sql_cmd_explain_other_thread::execute is inspecting the Query_expression of the running query calling master_query_expression()->find_blocks_query_term which uses an iterator over the query terms in the query expression: for (auto qt : query_terms<>()) { if (qt->query_block() == qb) { return qt; } } the above search fails to find qb due to the interference of the thread b), see below, and then tries to access a nullpointer: * thread #36, name = ‘connection’, stop reason = EXC_BAD_ACCESS (code=1, address=0x0) frame #0: 0x000000010bb3cf0d mysqld`Query_block::type(this=0x00007f8f82719088) const at sql_lex.cc:4441:11 frame #1: 0x000000010b83763e mysqld`(anonymous namespace)::Explain::explain_select_type(this=0x00007000020611b8) at opt_explain.cc:792:50 frame #2: 0x000000010b83cc4d mysqld`(anonymous namespace)::Explain_join::explain_select_type(this=0x00007000020611b8) at opt_explain.cc:1487:21 frame #3: 0x000000010b837c34 mysqld`(anonymous namespace)::Explain::prepare_columns(this=0x00007000020611b8) at opt_explain.cc:744:26 frame #4: 0x000000010b83ea0e mysqld`(anonymous namespace)::Explain_join::explain_qep_tab(this=0x00007000020611b8, tabnum=0) at opt_explain.cc:1415:32 frame #5: 0x000000010b83ca0a mysqld`(anonymous namespace)::Explain_join::shallow_explain(this=0x00007000020611b8) at opt_explain.cc:1364:9 frame #6: 0x000000010b83379b mysqld`(anonymous namespace)::Explain::send(this=0x00007000020611b8) at opt_explain.cc:770:14 frame #7: 0x000000010b834147 mysqld`explain_query_specification(explain_thd=0x00007f8fbb111e00, query_thd=0x00007f8fbb919c00, query_term=0x00007f8f82719088, ctx=CTX_JOIN) at opt_explain.cc:2088:20 frame #8: 0x000000010bd36b91 mysqld`Query_expression::explain_query_term(this=0x00007f8f7a090360, explain_thd=0x00007f8fbb111e00, query_thd=0x00007f8fbb919c00, qt=0x00007f8f82719088) at sql_union.cc:1519:11 frame #9: 0x000000010bd36c68 mysqld`Query_expression::explain_query_term(this=0x00007f8f7a090360, explain_thd=0x00007f8fbb111e00, query_thd=0x00007f8fbb919c00, qt=0x00007f8f8271d748) at sql_union.cc:1526:13 frame #10: 0x000000010bd373f7 mysqld`Query_expression::explain(this=0x00007f8f7a090360, explain_thd=0x00007f8fbb111e00, query_thd=0x00007f8fbb919c00) at sql_union.cc:1591:7 frame #11: 0x000000010b835820 mysqld`mysql_explain_query_expression(explain_thd=0x00007f8fbb111e00, query_thd=0x00007f8fbb919c00, unit=0x00007f8f7a090360) at opt_explain.cc:2392:17 frame #12: 0x000000010b835400 mysqld`explain_query(explain_thd=0x00007f8fbb111e00, query_thd=0x00007f8fbb919c00, unit=0x00007f8f7a090360) at opt_explain.cc:2353:13 * frame #13: 0x000000010b8363e4 mysqld`Sql_cmd_explain_other_thread::execute(this=0x00007f8fba585b68, thd=0x00007f8fbb111e00) at opt_explain.cc:2531:11 frame #14: 0x000000010bba7d8b mysqld`mysql_execute_command(thd=0x00007f8fbb111e00, first_level=true) at sql_parse.cc:4648:29 frame #15: 0x000000010bb9e230 mysqld`dispatch_sql_command(thd=0x00007f8fbb111e00, parser_state=0x0000700002065de8) at sql_parse.cc:5303:19 frame #16: 0x000000010bb9a4cb mysqld`dispatch_command(thd=0x00007f8fbb111e00, com_data=0x0000700002066e38, command=COM_QUERY) at sql_parse.cc:2135:7 frame #17: 0x000000010bb9c846 mysqld`do_command(thd=0x00007f8fbb111e00) at sql_parse.cc:1464:18 frame #18: 0x000000010b2f2574 mysqld`handle_connection(arg=0x0000600000e34200) at connection_handler_per_thread.cc:304:13 frame #19: 0x000000010e072fc4 mysqld`pfs_spawn_thread(arg=0x00007f8fba8160b0) at pfs.cc:3051:3 frame #20: 0x00007ff806c2b202 libsystem_pthread.dylib`_pthread_start + 99 frame #21: 0x00007ff806c26bab libsystem_pthread.dylib`thread_start + 15 b) the query thread being explained is itself performing LEX::cleanup and as part of the iterates over the query terms, but still allows EXPLAIN of the query plan since thd->query_plan.set_query_plan(SQLCOM_END, ...) hasn't been called yet. 20:frame: Query_terms<(Visit_order)1, (Visit_leaves)0>::Query_term_iterator::operator++() (in mysqld) (query_term.h:613) 21:frame: Query_expression::cleanup(bool) (in mysqld) (sql_union.cc:1861) 22:frame: LEX::cleanup(bool) (in mysqld) (sql_lex.h:4286) 30:frame: Sql_cmd_dml::execute(THD*) (in mysqld) (sql_select.cc:799) 31:frame: mysql_execute_command(THD*, bool) (in mysqld) (sql_parse.cc:4648) 32:frame: dispatch_sql_command(THD*, Parser_state*) (in mysqld) (sql_parse.cc:5303) 33:frame: dispatch_command(THD*, COM_DATA const*, enum_server_command) (in mysqld) (sql_parse.cc:2135) 34:frame: do_command(THD*) (in mysqld) (sql_parse.cc:1464) 57:frame: handle_connection(void*) (in mysqld) (connection_handler_per_thread.cc:304) 58:frame: pfs_spawn_thread(void*) (in mysqld) (pfs.cc:3053) 65:frame: _pthread_start (in libsystem_pthread.dylib) + 99 66:frame: thread_start (in libsystem_pthread.dylib) + 15 Solution: This patch solves the issue by removing iterator state from Query_term, making the query_term iterators thread safe. This solution labels every child query_term with its index in its parent's m_children vector. The iterator can therefore easily compute the next child to visit based on Query_term::m_sibling_idx. A unit test case is added to check reentrancy. One can also manually verify that we have no remaining race condition by running two client connections files (with \. <file>) with a big number of copies of the repro query in one connection and a big number of EXPLAIN format=json FOR <connection>, e.g. EXPLAIN FORMAT=json FOR CONNECTION 8\G in the other. The actual connection number would need to verified in connection one, of course. Change-Id: Ie7d56610914738ccbbecf399ccc4f465f7d26ea7
inikep
pushed a commit
that referenced
this pull request
Sep 23, 2024
…s=0 and a local DDL executed https://perconadev.atlassian.net/browse/PS-9018 Problem ------- In high concurrency scenarios, MySQL replica can enter into a deadlock due to a race condition between the replica applier thread and the client thread performing a binlog group commit. Analysis -------- It needs at least 3 threads for this deadlock to happen 1. One client thread 2. Two replica applier threads How this deadlock happens? -------------------------- 0. Binlog is enabled on replica, but log_replica_updates is disabled. 1. Initially, both "Commit Order" and "Binlog Flush" queues are empty. 2. Replica applier thread 1 enters the group commit pipeline to register in the "Commit Order" queue since `log-replica-updates` is disabled on the replica node. 3. Since both "Commit Order" and "Binlog Flush" queues are empty, the applier thread 1 3.1. Becomes leader (In Commit_stage_manager::enroll_for()). 3.2. Registers in the commit order queue. 3.3. Acquires the lock MYSQL_BIN_LOG::LOCK_log. 3.4. Commit Order queue is emptied, but the lock MYSQL_BIN_LOG::LOCK_log is not yet released. NOTE: SE commit for applier thread is already done by the time it reaches here. 4. Replica applier thread 2 enters the group commit pipeline to register in the "Commit Order" queue since `log-replica-updates` is disabled on the replica node. 5. Since the "Commit Order" queue is empty (emptied by applier thread 1 in 3.4), the applier thread 2 5.1. Becomes leader (In Commit_stage_manager::enroll_for()) 5.2. Registers in the commit order queue. 5.3. Tries to acquire the lock MYSQL_BIN_LOG::LOCK_log. Since it is held by applier thread 1 it will wait until the lock is released. 6. Client thread enters the group commit pipeline to register in the "Binlog Flush" queue. 7. Since "Commit Order" queue is not empty (there is applier thread 2 in the queue), it enters the conditional wait `m_stage_cond_leader` with an intention to become the leader for both the "Binlog Flush" and "Commit Order" queues. 8. Applier thread 1 releases the lock MYSQL_BIN_LOG::LOCK_log and proceeds to update the GTID by calling gtid_state->update_commit_group() from Commit_order_manager::flush_engine_and_signal_threads(). 9. Applier thread 2 acquires the lock MYSQL_BIN_LOG::LOCK_log. 9.1. It checks if there is any thread waiting in the "Binlog Flush" queue to become the leader. Here it finds the client thread waiting to be the leader. 9.2. It releases the lock MYSQL_BIN_LOG::LOCK_log and signals on the cond_var `m_stage_cond_leader` and enters a conditional wait until the thread's `tx_commit_pending` is set to false by the client thread (will be done in the Commit_stage_manager::process_final_stage_for_ordered_commit_group() called by client thread from fetch_and_process_flush_stage_queue()). 10. The client thread wakes up from the cond_var `m_stage_cond_leader`. The thread has now become a leader and it is its responsibility to update GTID of applier thread 2. 10.1. It acquires the lock MYSQL_BIN_LOG::LOCK_log. 10.2. Returns from `enroll_for()` and proceeds to process the "Commit Order" and "Binlog Flush" queues. 10.3. Fetches the "Commit Order" and "Binlog Flush" queues. 10.4. Performs the storage engine flush by calling ha_flush_logs() from fetch_and_process_flush_stage_queue(). 10.5. Proceeds to update the GTID of threads in "Commit Order" queue by calling gtid_state->update_commit_group() from Commit_stage_manager::process_final_stage_for_ordered_commit_group(). 11. At this point, we will have - Client thread performing GTID update on behalf if applier thread 2 (from step 10.5), and - Applier thread 1 performing GTID update for itself (from step 8). Due to the lack of proper synchronization between the above two threads, there exists a time window where both threads can call gtid_state->update_commit_group() concurrently. In subsequent steps, both threads simultaneously try to modify the contents of the array `commit_group_sidnos` which is used to track the lock status of sidnos. This concurrent access to `update_commit_group()` can cause a lock-leak resulting in one thread acquiring the sidno lock and not releasing at all. ----------------------------------------------------------------------------------------------------------- Client thread Applier Thread 1 ----------------------------------------------------------------------------------------------------------- update_commit_group() => global_sid_lock->rdlock(); update_commit_group() => global_sid_lock->rdlock(); calls update_gtids_impl_lock_sidnos() calls update_gtids_impl_lock_sidnos() set commit_group_sidno[2] = true set commit_group_sidno[2] = true lock_sidno(2) -> successful lock_sidno(2) -> waits update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()` if (commit_group_sidnos[2]) { unlock_sidno(2); commit_group_sidnos[2] = false; } Applier thread continues.. lock_sidno(2) -> successful update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()` if (commit_group_sidnos[2]) { <=== this check fails and lock is not released. unlock_sidno(2); commit_group_sidnos[2] = false; } Client thread continues without releasing the lock ----------------------------------------------------------------------------------------------------------- 12. As the above lock-leak can also happen the other way i.e, the applier thread fails to unlock, there can be different consequences hereafter. 13. If the client thread continues without releasing the lock, then at a later stage, it can enter into a deadlock with the applier thread performing a GTID update with stack trace. Client_thread ------------- #1 __GI___lll_lock_wait #2 ___pthread_mutex_lock #3 native_mutex_lock <= waits for commit lock while holding sidno lock #4 Commit_stage_manager::enroll_for #5 MYSQL_BIN_LOG::change_stage #6 MYSQL_BIN_LOG::ordered_commit #7 MYSQL_BIN_LOG::commit #8 ha_commit_trans #9 trans_commit_implicit #10 mysql_create_like_table #11 Sql_cmd_create_table::execute #12 mysql_execute_command #13 dispatch_sql_command Applier thread -------------- #1 ___pthread_mutex_lock #2 native_mutex_lock #3 safe_mutex_lock #4 Gtid_state::update_gtids_impl_lock_sidnos <= waits for sidno lock #5 Gtid_state::update_commit_group #6 Commit_order_manager::flush_engine_and_signal_threads <= acquires commit lock here #7 Commit_order_manager::finish #8 Commit_order_manager::wait_and_finish #9 ha_commit_low #10 trx_coordinator::commit_in_engines #11 MYSQL_BIN_LOG::commit #12 ha_commit_trans #13 trans_commit #14 Xid_log_event::do_commit #15 Xid_apply_log_event::do_apply_event_worker #16 Slave_worker::slave_worker_exec_event #17 slave_worker_exec_job_group #18 handle_slave_worker 14. If the applier thread continues without releasing the lock, then at a later stage, it can perform recursive locking while setting the GTID for the next transaction (in set_gtid_next()). In debug builds the above case hits the assertion `safe_mutex_assert_not_owner()` meaning the lock is already acquired by the replica applier thread when it tries to re-acquire the lock. Solution -------- In the above problematic example, when seen from each thread individually, we can conclude that there is no problem in the order of lock acquisition, thus there is no need to change the lock order. However, the root cause for this problem is that multiple threads can concurrently access to the array `Gtid_state::commit_group_sidnos`. In its initial implementation, it was expected that threads should hold the `MYSQL_BIN_LOG::LOCK_commit` before modifying its contents. But it was not considered when upstream implemented WL#7846 (MTS: slave-preserve-commit-order when log-slave-updates/binlog is disabled). With this patch, we now ensure that `MYSQL_BIN_LOG::LOCK_commit` is acquired when the client thread (binlog flush leader) when it tries to perform GTID update on behalf of threads waiting in "Commit Order" queue, thus providing a guarantee that `Gtid_state::commit_group_sidnos` array is never accessed without the protection of `MYSQL_BIN_LOG::LOCK_commit`.
inikep
pushed a commit
that referenced
this pull request
Sep 25, 2024
…ocal DDL executed https://perconadev.atlassian.net/browse/PS-9018 Problem ------- In high concurrency scenarios, MySQL replica can enter into a deadlock due to a race condition between the replica applier thread and the client thread performing a binlog group commit. Analysis -------- It needs at least 3 threads for this deadlock to happen 1. One client thread 2. Two replica applier threads How this deadlock happens? -------------------------- 0. Binlog is enabled on replica, but log_replica_updates is disabled. 1. Initially, both "Commit Order" and "Binlog Flush" queues are empty. 2. Replica applier thread 1 enters the group commit pipeline to register in the "Commit Order" queue since `log-replica-updates` is disabled on the replica node. 3. Since both "Commit Order" and "Binlog Flush" queues are empty, the applier thread 1 3.1. Becomes leader (In Commit_stage_manager::enroll_for()). 3.2. Registers in the commit order queue. 3.3. Acquires the lock MYSQL_BIN_LOG::LOCK_log. 3.4. Commit Order queue is emptied, but the lock MYSQL_BIN_LOG::LOCK_log is not yet released. NOTE: SE commit for applier thread is already done by the time it reaches here. 4. Replica applier thread 2 enters the group commit pipeline to register in the "Commit Order" queue since `log-replica-updates` is disabled on the replica node. 5. Since the "Commit Order" queue is empty (emptied by applier thread 1 in 3.4), the applier thread 2 5.1. Becomes leader (In Commit_stage_manager::enroll_for()) 5.2. Registers in the commit order queue. 5.3. Tries to acquire the lock MYSQL_BIN_LOG::LOCK_log. Since it is held by applier thread 1 it will wait until the lock is released. 6. Client thread enters the group commit pipeline to register in the "Binlog Flush" queue. 7. Since "Commit Order" queue is not empty (there is applier thread 2 in the queue), it enters the conditional wait `m_stage_cond_leader` with an intention to become the leader for both the "Binlog Flush" and "Commit Order" queues. 8. Applier thread 1 releases the lock MYSQL_BIN_LOG::LOCK_log and proceeds to update the GTID by calling gtid_state->update_commit_group() from Commit_order_manager::flush_engine_and_signal_threads(). 9. Applier thread 2 acquires the lock MYSQL_BIN_LOG::LOCK_log. 9.1. It checks if there is any thread waiting in the "Binlog Flush" queue to become the leader. Here it finds the client thread waiting to be the leader. 9.2. It releases the lock MYSQL_BIN_LOG::LOCK_log and signals on the cond_var `m_stage_cond_leader` and enters a conditional wait until the thread's `tx_commit_pending` is set to false by the client thread (will be done in the Commit_stage_manager::process_final_stage_for_ordered_commit_group() called by client thread from fetch_and_process_flush_stage_queue()). 10. The client thread wakes up from the cond_var `m_stage_cond_leader`. The thread has now become a leader and it is its responsibility to update GTID of applier thread 2. 10.1. It acquires the lock MYSQL_BIN_LOG::LOCK_log. 10.2. Returns from `enroll_for()` and proceeds to process the "Commit Order" and "Binlog Flush" queues. 10.3. Fetches the "Commit Order" and "Binlog Flush" queues. 10.4. Performs the storage engine flush by calling ha_flush_logs() from fetch_and_process_flush_stage_queue(). 10.5. Proceeds to update the GTID of threads in "Commit Order" queue by calling gtid_state->update_commit_group() from Commit_stage_manager::process_final_stage_for_ordered_commit_group(). 11. At this point, we will have - Client thread performing GTID update on behalf if applier thread 2 (from step 10.5), and - Applier thread 1 performing GTID update for itself (from step 8). Due to the lack of proper synchronization between the above two threads, there exists a time window where both threads can call gtid_state->update_commit_group() concurrently. In subsequent steps, both threads simultaneously try to modify the contents of the array `commit_group_sidnos` which is used to track the lock status of sidnos. This concurrent access to `update_commit_group()` can cause a lock-leak resulting in one thread acquiring the sidno lock and not releasing at all. ----------------------------------------------------------------------------------------------------------- Client thread Applier Thread 1 ----------------------------------------------------------------------------------------------------------- update_commit_group() => global_sid_lock->rdlock(); update_commit_group() => global_sid_lock->rdlock(); calls update_gtids_impl_lock_sidnos() calls update_gtids_impl_lock_sidnos() set commit_group_sidno[2] = true set commit_group_sidno[2] = true lock_sidno(2) -> successful lock_sidno(2) -> waits update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()` if (commit_group_sidnos[2]) { unlock_sidno(2); commit_group_sidnos[2] = false; } Applier thread continues.. lock_sidno(2) -> successful update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()` if (commit_group_sidnos[2]) { <=== this check fails and lock is not released. unlock_sidno(2); commit_group_sidnos[2] = false; } Client thread continues without releasing the lock ----------------------------------------------------------------------------------------------------------- 12. As the above lock-leak can also happen the other way i.e, the applier thread fails to unlock, there can be different consequences hereafter. 13. If the client thread continues without releasing the lock, then at a later stage, it can enter into a deadlock with the applier thread performing a GTID update with stack trace. Client_thread ------------- #1 __GI___lll_lock_wait #2 ___pthread_mutex_lock #3 native_mutex_lock <= waits for commit lock while holding sidno lock #4 Commit_stage_manager::enroll_for #5 MYSQL_BIN_LOG::change_stage #6 MYSQL_BIN_LOG::ordered_commit #7 MYSQL_BIN_LOG::commit #8 ha_commit_trans #9 trans_commit_implicit #10 mysql_create_like_table #11 Sql_cmd_create_table::execute #12 mysql_execute_command #13 dispatch_sql_command Applier thread -------------- #1 ___pthread_mutex_lock #2 native_mutex_lock #3 safe_mutex_lock #4 Gtid_state::update_gtids_impl_lock_sidnos <= waits for sidno lock #5 Gtid_state::update_commit_group #6 Commit_order_manager::flush_engine_and_signal_threads <= acquires commit lock here #7 Commit_order_manager::finish #8 Commit_order_manager::wait_and_finish #9 ha_commit_low #10 trx_coordinator::commit_in_engines #11 MYSQL_BIN_LOG::commit #12 ha_commit_trans #13 trans_commit #14 Xid_log_event::do_commit #15 Xid_apply_log_event::do_apply_event_worker #16 Slave_worker::slave_worker_exec_event #17 slave_worker_exec_job_group #18 handle_slave_worker 14. If the applier thread continues without releasing the lock, then at a later stage, it can perform recursive locking while setting the GTID for the next transaction (in set_gtid_next()). In debug builds the above case hits the assertion `safe_mutex_assert_not_owner()` meaning the lock is already acquired by the replica applier thread when it tries to re-acquire the lock. Solution -------- In the above problematic example, when seen from each thread individually, we can conclude that there is no problem in the order of lock acquisition, thus there is no need to change the lock order. However, the root cause for this problem is that multiple threads can concurrently access to the array `Gtid_state::commit_group_sidnos`. In its initial implementation, it was expected that threads should hold the `MYSQL_BIN_LOG::LOCK_commit` before modifying its contents. But it was not considered when upstream implemented WL#7846 (MTS: slave-preserve-commit-order when log-slave-updates/binlog is disabled). With this patch, we now ensure that `MYSQL_BIN_LOG::LOCK_commit` is acquired when the client thread (binlog flush leader) when it tries to perform GTID update on behalf of threads waiting in "Commit Order" queue, thus providing a guarantee that `Gtid_state::commit_group_sidnos` array is never accessed without the protection of `MYSQL_BIN_LOG::LOCK_commit`.
inikep
pushed a commit
to inikep/percona-server
that referenced
this pull request
Sep 25, 2024
…ocal DDL executed https://perconadev.atlassian.net/browse/PS-9018 Problem ------- In high concurrency scenarios, MySQL replica can enter into a deadlock due to a race condition between the replica applier thread and the client thread performing a binlog group commit. Analysis -------- It needs at least 3 threads for this deadlock to happen 1. One client thread 2. Two replica applier threads How this deadlock happens? -------------------------- 0. Binlog is enabled on replica, but log_replica_updates is disabled. 1. Initially, both "Commit Order" and "Binlog Flush" queues are empty. 2. Replica applier thread 1 enters the group commit pipeline to register in the "Commit Order" queue since `log-replica-updates` is disabled on the replica node. 3. Since both "Commit Order" and "Binlog Flush" queues are empty, the applier thread 1 3.1. Becomes leader (In Commit_stage_manager::enroll_for()). 3.2. Registers in the commit order queue. 3.3. Acquires the lock MYSQL_BIN_LOG::LOCK_log. 3.4. Commit Order queue is emptied, but the lock MYSQL_BIN_LOG::LOCK_log is not yet released. NOTE: SE commit for applier thread is already done by the time it reaches here. 4. Replica applier thread 2 enters the group commit pipeline to register in the "Commit Order" queue since `log-replica-updates` is disabled on the replica node. 5. Since the "Commit Order" queue is empty (emptied by applier thread 1 in 3.4), the applier thread 2 5.1. Becomes leader (In Commit_stage_manager::enroll_for()) 5.2. Registers in the commit order queue. 5.3. Tries to acquire the lock MYSQL_BIN_LOG::LOCK_log. Since it is held by applier thread 1 it will wait until the lock is released. 6. Client thread enters the group commit pipeline to register in the "Binlog Flush" queue. 7. Since "Commit Order" queue is not empty (there is applier thread 2 in the queue), it enters the conditional wait `m_stage_cond_leader` with an intention to become the leader for both the "Binlog Flush" and "Commit Order" queues. 8. Applier thread 1 releases the lock MYSQL_BIN_LOG::LOCK_log and proceeds to update the GTID by calling gtid_state->update_commit_group() from Commit_order_manager::flush_engine_and_signal_threads(). 9. Applier thread 2 acquires the lock MYSQL_BIN_LOG::LOCK_log. 9.1. It checks if there is any thread waiting in the "Binlog Flush" queue to become the leader. Here it finds the client thread waiting to be the leader. 9.2. It releases the lock MYSQL_BIN_LOG::LOCK_log and signals on the cond_var `m_stage_cond_leader` and enters a conditional wait until the thread's `tx_commit_pending` is set to false by the client thread (will be done in the Commit_stage_manager::process_final_stage_for_ordered_commit_group() called by client thread from fetch_and_process_flush_stage_queue()). 10. The client thread wakes up from the cond_var `m_stage_cond_leader`. The thread has now become a leader and it is its responsibility to update GTID of applier thread 2. 10.1. It acquires the lock MYSQL_BIN_LOG::LOCK_log. 10.2. Returns from `enroll_for()` and proceeds to process the "Commit Order" and "Binlog Flush" queues. 10.3. Fetches the "Commit Order" and "Binlog Flush" queues. 10.4. Performs the storage engine flush by calling ha_flush_logs() from fetch_and_process_flush_stage_queue(). 10.5. Proceeds to update the GTID of threads in "Commit Order" queue by calling gtid_state->update_commit_group() from Commit_stage_manager::process_final_stage_for_ordered_commit_group(). 11. At this point, we will have - Client thread performing GTID update on behalf if applier thread 2 (from step 10.5), and - Applier thread 1 performing GTID update for itself (from step 8). Due to the lack of proper synchronization between the above two threads, there exists a time window where both threads can call gtid_state->update_commit_group() concurrently. In subsequent steps, both threads simultaneously try to modify the contents of the array `commit_group_sidnos` which is used to track the lock status of sidnos. This concurrent access to `update_commit_group()` can cause a lock-leak resulting in one thread acquiring the sidno lock and not releasing at all. ----------------------------------------------------------------------------------------------------------- Client thread Applier Thread 1 ----------------------------------------------------------------------------------------------------------- update_commit_group() => global_sid_lock->rdlock(); update_commit_group() => global_sid_lock->rdlock(); calls update_gtids_impl_lock_sidnos() calls update_gtids_impl_lock_sidnos() set commit_group_sidno[2] = true set commit_group_sidno[2] = true lock_sidno(2) -> successful lock_sidno(2) -> waits update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()` if (commit_group_sidnos[2]) { unlock_sidno(2); commit_group_sidnos[2] = false; } Applier thread continues.. lock_sidno(2) -> successful update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()` if (commit_group_sidnos[2]) { <=== this check fails and lock is not released. unlock_sidno(2); commit_group_sidnos[2] = false; } Client thread continues without releasing the lock ----------------------------------------------------------------------------------------------------------- 12. As the above lock-leak can also happen the other way i.e, the applier thread fails to unlock, there can be different consequences hereafter. 13. If the client thread continues without releasing the lock, then at a later stage, it can enter into a deadlock with the applier thread performing a GTID update with stack trace. Client_thread ------------- #1 __GI___lll_lock_wait #2 ___pthread_mutex_lock #3 native_mutex_lock <= waits for commit lock while holding sidno lock #4 Commit_stage_manager::enroll_for #5 MYSQL_BIN_LOG::change_stage #6 MYSQL_BIN_LOG::ordered_commit #7 MYSQL_BIN_LOG::commit #8 ha_commit_trans #9 trans_commit_implicit #10 mysql_create_like_table #11 Sql_cmd_create_table::execute #12 mysql_execute_command percona#13 dispatch_sql_command Applier thread -------------- #1 ___pthread_mutex_lock #2 native_mutex_lock #3 safe_mutex_lock #4 Gtid_state::update_gtids_impl_lock_sidnos <= waits for sidno lock #5 Gtid_state::update_commit_group #6 Commit_order_manager::flush_engine_and_signal_threads <= acquires commit lock here #7 Commit_order_manager::finish #8 Commit_order_manager::wait_and_finish #9 ha_commit_low #10 trx_coordinator::commit_in_engines #11 MYSQL_BIN_LOG::commit #12 ha_commit_trans percona#13 trans_commit percona#14 Xid_log_event::do_commit percona#15 Xid_apply_log_event::do_apply_event_worker percona#16 Slave_worker::slave_worker_exec_event percona#17 slave_worker_exec_job_group percona#18 handle_slave_worker 14. If the applier thread continues without releasing the lock, then at a later stage, it can perform recursive locking while setting the GTID for the next transaction (in set_gtid_next()). In debug builds the above case hits the assertion `safe_mutex_assert_not_owner()` meaning the lock is already acquired by the replica applier thread when it tries to re-acquire the lock. Solution -------- In the above problematic example, when seen from each thread individually, we can conclude that there is no problem in the order of lock acquisition, thus there is no need to change the lock order. However, the root cause for this problem is that multiple threads can concurrently access to the array `Gtid_state::commit_group_sidnos`. In its initial implementation, it was expected that threads should hold the `MYSQL_BIN_LOG::LOCK_commit` before modifying its contents. But it was not considered when upstream implemented WL#7846 (MTS: slave-preserve-commit-order when log-slave-updates/binlog is disabled). With this patch, we now ensure that `MYSQL_BIN_LOG::LOCK_commit` is acquired when the client thread (binlog flush leader) when it tries to perform GTID update on behalf of threads waiting in "Commit Order" queue, thus providing a guarantee that `Gtid_state::commit_group_sidnos` array is never accessed without the protection of `MYSQL_BIN_LOG::LOCK_commit`.
dlenev
pushed a commit
to dlenev/percona-server
that referenced
this pull request
Oct 1, 2024
…s=0 and a local DDL executed https://perconadev.atlassian.net/browse/PS-9018 Problem ------- In high concurrency scenarios, MySQL replica can enter into a deadlock due to a race condition between the replica applier thread and the client thread performing a binlog group commit. Analysis -------- It needs at least 3 threads for this deadlock to happen 1. One client thread 2. Two replica applier threads How this deadlock happens? -------------------------- 0. Binlog is enabled on replica, but log_replica_updates is disabled. 1. Initially, both "Commit Order" and "Binlog Flush" queues are empty. 2. Replica applier thread 1 enters the group commit pipeline to register in the "Commit Order" queue since `log-replica-updates` is disabled on the replica node. 3. Since both "Commit Order" and "Binlog Flush" queues are empty, the applier thread 1 3.1. Becomes leader (In Commit_stage_manager::enroll_for()). 3.2. Registers in the commit order queue. 3.3. Acquires the lock MYSQL_BIN_LOG::LOCK_log. 3.4. Commit Order queue is emptied, but the lock MYSQL_BIN_LOG::LOCK_log is not yet released. NOTE: SE commit for applier thread is already done by the time it reaches here. 4. Replica applier thread 2 enters the group commit pipeline to register in the "Commit Order" queue since `log-replica-updates` is disabled on the replica node. 5. Since the "Commit Order" queue is empty (emptied by applier thread 1 in 3.4), the applier thread 2 5.1. Becomes leader (In Commit_stage_manager::enroll_for()) 5.2. Registers in the commit order queue. 5.3. Tries to acquire the lock MYSQL_BIN_LOG::LOCK_log. Since it is held by applier thread 1 it will wait until the lock is released. 6. Client thread enters the group commit pipeline to register in the "Binlog Flush" queue. 7. Since "Commit Order" queue is not empty (there is applier thread 2 in the queue), it enters the conditional wait `m_stage_cond_leader` with an intention to become the leader for both the "Binlog Flush" and "Commit Order" queues. 8. Applier thread 1 releases the lock MYSQL_BIN_LOG::LOCK_log and proceeds to update the GTID by calling gtid_state->update_commit_group() from Commit_order_manager::flush_engine_and_signal_threads(). 9. Applier thread 2 acquires the lock MYSQL_BIN_LOG::LOCK_log. 9.1. It checks if there is any thread waiting in the "Binlog Flush" queue to become the leader. Here it finds the client thread waiting to be the leader. 9.2. It releases the lock MYSQL_BIN_LOG::LOCK_log and signals on the cond_var `m_stage_cond_leader` and enters a conditional wait until the thread's `tx_commit_pending` is set to false by the client thread (will be done in the Commit_stage_manager::process_final_stage_for_ordered_commit_group() called by client thread from fetch_and_process_flush_stage_queue()). 10. The client thread wakes up from the cond_var `m_stage_cond_leader`. The thread has now become a leader and it is its responsibility to update GTID of applier thread 2. 10.1. It acquires the lock MYSQL_BIN_LOG::LOCK_log. 10.2. Returns from `enroll_for()` and proceeds to process the "Commit Order" and "Binlog Flush" queues. 10.3. Fetches the "Commit Order" and "Binlog Flush" queues. 10.4. Performs the storage engine flush by calling ha_flush_logs() from fetch_and_process_flush_stage_queue(). 10.5. Proceeds to update the GTID of threads in "Commit Order" queue by calling gtid_state->update_commit_group() from Commit_stage_manager::process_final_stage_for_ordered_commit_group(). 11. At this point, we will have - Client thread performing GTID update on behalf if applier thread 2 (from step 10.5), and - Applier thread 1 performing GTID update for itself (from step 8). Due to the lack of proper synchronization between the above two threads, there exists a time window where both threads can call gtid_state->update_commit_group() concurrently. In subsequent steps, both threads simultaneously try to modify the contents of the array `commit_group_sidnos` which is used to track the lock status of sidnos. This concurrent access to `update_commit_group()` can cause a lock-leak resulting in one thread acquiring the sidno lock and not releasing at all. ----------------------------------------------------------------------------------------------------------- Client thread Applier Thread 1 ----------------------------------------------------------------------------------------------------------- update_commit_group() => global_sid_lock->rdlock(); update_commit_group() => global_sid_lock->rdlock(); calls update_gtids_impl_lock_sidnos() calls update_gtids_impl_lock_sidnos() set commit_group_sidno[2] = true set commit_group_sidno[2] = true lock_sidno(2) -> successful lock_sidno(2) -> waits update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()` if (commit_group_sidnos[2]) { unlock_sidno(2); commit_group_sidnos[2] = false; } Applier thread continues.. lock_sidno(2) -> successful update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()` if (commit_group_sidnos[2]) { <=== this check fails and lock is not released. unlock_sidno(2); commit_group_sidnos[2] = false; } Client thread continues without releasing the lock ----------------------------------------------------------------------------------------------------------- 12. As the above lock-leak can also happen the other way i.e, the applier thread fails to unlock, there can be different consequences hereafter. 13. If the client thread continues without releasing the lock, then at a later stage, it can enter into a deadlock with the applier thread performing a GTID update with stack trace. Client_thread ------------- #1 __GI___lll_lock_wait percona#2 ___pthread_mutex_lock percona#3 native_mutex_lock <= waits for commit lock while holding sidno lock percona#4 Commit_stage_manager::enroll_for percona#5 MYSQL_BIN_LOG::change_stage percona#6 MYSQL_BIN_LOG::ordered_commit percona#7 MYSQL_BIN_LOG::commit percona#8 ha_commit_trans percona#9 trans_commit_implicit percona#10 mysql_create_like_table percona#11 Sql_cmd_create_table::execute percona#12 mysql_execute_command percona#13 dispatch_sql_command Applier thread -------------- #1 ___pthread_mutex_lock percona#2 native_mutex_lock percona#3 safe_mutex_lock percona#4 Gtid_state::update_gtids_impl_lock_sidnos <= waits for sidno lock percona#5 Gtid_state::update_commit_group percona#6 Commit_order_manager::flush_engine_and_signal_threads <= acquires commit lock here percona#7 Commit_order_manager::finish percona#8 Commit_order_manager::wait_and_finish percona#9 ha_commit_low percona#10 trx_coordinator::commit_in_engines percona#11 MYSQL_BIN_LOG::commit percona#12 ha_commit_trans percona#13 trans_commit percona#14 Xid_log_event::do_commit percona#15 Xid_apply_log_event::do_apply_event_worker percona#16 Slave_worker::slave_worker_exec_event percona#17 slave_worker_exec_job_group percona#18 handle_slave_worker 14. If the applier thread continues without releasing the lock, then at a later stage, it can perform recursive locking while setting the GTID for the next transaction (in set_gtid_next()). In debug builds the above case hits the assertion `safe_mutex_assert_not_owner()` meaning the lock is already acquired by the replica applier thread when it tries to re-acquire the lock. Solution -------- In the above problematic example, when seen from each thread individually, we can conclude that there is no problem in the order of lock acquisition, thus there is no need to change the lock order. However, the root cause for this problem is that multiple threads can concurrently access to the array `Gtid_state::commit_group_sidnos`. In its initial implementation, it was expected that threads should hold the `MYSQL_BIN_LOG::LOCK_commit` before modifying its contents. But it was not considered when upstream implemented WL#7846 (MTS: slave-preserve-commit-order when log-slave-updates/binlog is disabled). With this patch, we now ensure that `MYSQL_BIN_LOG::LOCK_commit` is acquired when the client thread (binlog flush leader) when it tries to perform GTID update on behalf of threads waiting in "Commit Order" queue, thus providing a guarantee that `Gtid_state::commit_group_sidnos` array is never accessed without the protection of `MYSQL_BIN_LOG::LOCK_commit`.
dlenev
pushed a commit
to dlenev/percona-server
that referenced
this pull request
Oct 17, 2024
…s=0 and a local DDL executed https://perconadev.atlassian.net/browse/PS-9018 Problem ------- In high concurrency scenarios, MySQL replica can enter into a deadlock due to a race condition between the replica applier thread and the client thread performing a binlog group commit. Analysis -------- It needs at least 3 threads for this deadlock to happen 1. One client thread 2. Two replica applier threads How this deadlock happens? -------------------------- 0. Binlog is enabled on replica, but log_replica_updates is disabled. 1. Initially, both "Commit Order" and "Binlog Flush" queues are empty. 2. Replica applier thread 1 enters the group commit pipeline to register in the "Commit Order" queue since `log-replica-updates` is disabled on the replica node. 3. Since both "Commit Order" and "Binlog Flush" queues are empty, the applier thread 1 3.1. Becomes leader (In Commit_stage_manager::enroll_for()). 3.2. Registers in the commit order queue. 3.3. Acquires the lock MYSQL_BIN_LOG::LOCK_log. 3.4. Commit Order queue is emptied, but the lock MYSQL_BIN_LOG::LOCK_log is not yet released. NOTE: SE commit for applier thread is already done by the time it reaches here. 4. Replica applier thread 2 enters the group commit pipeline to register in the "Commit Order" queue since `log-replica-updates` is disabled on the replica node. 5. Since the "Commit Order" queue is empty (emptied by applier thread 1 in 3.4), the applier thread 2 5.1. Becomes leader (In Commit_stage_manager::enroll_for()) 5.2. Registers in the commit order queue. 5.3. Tries to acquire the lock MYSQL_BIN_LOG::LOCK_log. Since it is held by applier thread 1 it will wait until the lock is released. 6. Client thread enters the group commit pipeline to register in the "Binlog Flush" queue. 7. Since "Commit Order" queue is not empty (there is applier thread 2 in the queue), it enters the conditional wait `m_stage_cond_leader` with an intention to become the leader for both the "Binlog Flush" and "Commit Order" queues. 8. Applier thread 1 releases the lock MYSQL_BIN_LOG::LOCK_log and proceeds to update the GTID by calling gtid_state->update_commit_group() from Commit_order_manager::flush_engine_and_signal_threads(). 9. Applier thread 2 acquires the lock MYSQL_BIN_LOG::LOCK_log. 9.1. It checks if there is any thread waiting in the "Binlog Flush" queue to become the leader. Here it finds the client thread waiting to be the leader. 9.2. It releases the lock MYSQL_BIN_LOG::LOCK_log and signals on the cond_var `m_stage_cond_leader` and enters a conditional wait until the thread's `tx_commit_pending` is set to false by the client thread (will be done in the Commit_stage_manager::process_final_stage_for_ordered_commit_group() called by client thread from fetch_and_process_flush_stage_queue()). 10. The client thread wakes up from the cond_var `m_stage_cond_leader`. The thread has now become a leader and it is its responsibility to update GTID of applier thread 2. 10.1. It acquires the lock MYSQL_BIN_LOG::LOCK_log. 10.2. Returns from `enroll_for()` and proceeds to process the "Commit Order" and "Binlog Flush" queues. 10.3. Fetches the "Commit Order" and "Binlog Flush" queues. 10.4. Performs the storage engine flush by calling ha_flush_logs() from fetch_and_process_flush_stage_queue(). 10.5. Proceeds to update the GTID of threads in "Commit Order" queue by calling gtid_state->update_commit_group() from Commit_stage_manager::process_final_stage_for_ordered_commit_group(). 11. At this point, we will have - Client thread performing GTID update on behalf if applier thread 2 (from step 10.5), and - Applier thread 1 performing GTID update for itself (from step 8). Due to the lack of proper synchronization between the above two threads, there exists a time window where both threads can call gtid_state->update_commit_group() concurrently. In subsequent steps, both threads simultaneously try to modify the contents of the array `commit_group_sidnos` which is used to track the lock status of sidnos. This concurrent access to `update_commit_group()` can cause a lock-leak resulting in one thread acquiring the sidno lock and not releasing at all. ----------------------------------------------------------------------------------------------------------- Client thread Applier Thread 1 ----------------------------------------------------------------------------------------------------------- update_commit_group() => global_sid_lock->rdlock(); update_commit_group() => global_sid_lock->rdlock(); calls update_gtids_impl_lock_sidnos() calls update_gtids_impl_lock_sidnos() set commit_group_sidno[2] = true set commit_group_sidno[2] = true lock_sidno(2) -> successful lock_sidno(2) -> waits update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()` if (commit_group_sidnos[2]) { unlock_sidno(2); commit_group_sidnos[2] = false; } Applier thread continues.. lock_sidno(2) -> successful update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()` if (commit_group_sidnos[2]) { <=== this check fails and lock is not released. unlock_sidno(2); commit_group_sidnos[2] = false; } Client thread continues without releasing the lock ----------------------------------------------------------------------------------------------------------- 12. As the above lock-leak can also happen the other way i.e, the applier thread fails to unlock, there can be different consequences hereafter. 13. If the client thread continues without releasing the lock, then at a later stage, it can enter into a deadlock with the applier thread performing a GTID update with stack trace. Client_thread ------------- #1 __GI___lll_lock_wait percona#2 ___pthread_mutex_lock percona#3 native_mutex_lock <= waits for commit lock while holding sidno lock percona#4 Commit_stage_manager::enroll_for percona#5 MYSQL_BIN_LOG::change_stage percona#6 MYSQL_BIN_LOG::ordered_commit percona#7 MYSQL_BIN_LOG::commit percona#8 ha_commit_trans percona#9 trans_commit_implicit percona#10 mysql_create_like_table percona#11 Sql_cmd_create_table::execute percona#12 mysql_execute_command percona#13 dispatch_sql_command Applier thread -------------- #1 ___pthread_mutex_lock percona#2 native_mutex_lock percona#3 safe_mutex_lock percona#4 Gtid_state::update_gtids_impl_lock_sidnos <= waits for sidno lock percona#5 Gtid_state::update_commit_group percona#6 Commit_order_manager::flush_engine_and_signal_threads <= acquires commit lock here percona#7 Commit_order_manager::finish percona#8 Commit_order_manager::wait_and_finish percona#9 ha_commit_low percona#10 trx_coordinator::commit_in_engines percona#11 MYSQL_BIN_LOG::commit percona#12 ha_commit_trans percona#13 trans_commit percona#14 Xid_log_event::do_commit percona#15 Xid_apply_log_event::do_apply_event_worker percona#16 Slave_worker::slave_worker_exec_event percona#17 slave_worker_exec_job_group percona#18 handle_slave_worker 14. If the applier thread continues without releasing the lock, then at a later stage, it can perform recursive locking while setting the GTID for the next transaction (in set_gtid_next()). In debug builds the above case hits the assertion `safe_mutex_assert_not_owner()` meaning the lock is already acquired by the replica applier thread when it tries to re-acquire the lock. Solution -------- In the above problematic example, when seen from each thread individually, we can conclude that there is no problem in the order of lock acquisition, thus there is no need to change the lock order. However, the root cause for this problem is that multiple threads can concurrently access to the array `Gtid_state::commit_group_sidnos`. In its initial implementation, it was expected that threads should hold the `MYSQL_BIN_LOG::LOCK_commit` before modifying its contents. But it was not considered when upstream implemented WL#7846 (MTS: slave-preserve-commit-order when log-slave-updates/binlog is disabled). With this patch, we now ensure that `MYSQL_BIN_LOG::LOCK_commit` is acquired when the client thread (binlog flush leader) when it tries to perform GTID update on behalf of threads waiting in "Commit Order" queue, thus providing a guarantee that `Gtid_state::commit_group_sidnos` array is never accessed without the protection of `MYSQL_BIN_LOG::LOCK_commit`.
dlenev
pushed a commit
to dlenev/percona-server
that referenced
this pull request
Oct 17, 2024
…s=0 and a local DDL executed https://perconadev.atlassian.net/browse/PS-9018 Problem ------- In high concurrency scenarios, MySQL replica can enter into a deadlock due to a race condition between the replica applier thread and the client thread performing a binlog group commit. Analysis -------- It needs at least 3 threads for this deadlock to happen 1. One client thread 2. Two replica applier threads How this deadlock happens? -------------------------- 0. Binlog is enabled on replica, but log_replica_updates is disabled. 1. Initially, both "Commit Order" and "Binlog Flush" queues are empty. 2. Replica applier thread 1 enters the group commit pipeline to register in the "Commit Order" queue since `log-replica-updates` is disabled on the replica node. 3. Since both "Commit Order" and "Binlog Flush" queues are empty, the applier thread 1 3.1. Becomes leader (In Commit_stage_manager::enroll_for()). 3.2. Registers in the commit order queue. 3.3. Acquires the lock MYSQL_BIN_LOG::LOCK_log. 3.4. Commit Order queue is emptied, but the lock MYSQL_BIN_LOG::LOCK_log is not yet released. NOTE: SE commit for applier thread is already done by the time it reaches here. 4. Replica applier thread 2 enters the group commit pipeline to register in the "Commit Order" queue since `log-replica-updates` is disabled on the replica node. 5. Since the "Commit Order" queue is empty (emptied by applier thread 1 in 3.4), the applier thread 2 5.1. Becomes leader (In Commit_stage_manager::enroll_for()) 5.2. Registers in the commit order queue. 5.3. Tries to acquire the lock MYSQL_BIN_LOG::LOCK_log. Since it is held by applier thread 1 it will wait until the lock is released. 6. Client thread enters the group commit pipeline to register in the "Binlog Flush" queue. 7. Since "Commit Order" queue is not empty (there is applier thread 2 in the queue), it enters the conditional wait `m_stage_cond_leader` with an intention to become the leader for both the "Binlog Flush" and "Commit Order" queues. 8. Applier thread 1 releases the lock MYSQL_BIN_LOG::LOCK_log and proceeds to update the GTID by calling gtid_state->update_commit_group() from Commit_order_manager::flush_engine_and_signal_threads(). 9. Applier thread 2 acquires the lock MYSQL_BIN_LOG::LOCK_log. 9.1. It checks if there is any thread waiting in the "Binlog Flush" queue to become the leader. Here it finds the client thread waiting to be the leader. 9.2. It releases the lock MYSQL_BIN_LOG::LOCK_log and signals on the cond_var `m_stage_cond_leader` and enters a conditional wait until the thread's `tx_commit_pending` is set to false by the client thread (will be done in the Commit_stage_manager::process_final_stage_for_ordered_commit_group() called by client thread from fetch_and_process_flush_stage_queue()). 10. The client thread wakes up from the cond_var `m_stage_cond_leader`. The thread has now become a leader and it is its responsibility to update GTID of applier thread 2. 10.1. It acquires the lock MYSQL_BIN_LOG::LOCK_log. 10.2. Returns from `enroll_for()` and proceeds to process the "Commit Order" and "Binlog Flush" queues. 10.3. Fetches the "Commit Order" and "Binlog Flush" queues. 10.4. Performs the storage engine flush by calling ha_flush_logs() from fetch_and_process_flush_stage_queue(). 10.5. Proceeds to update the GTID of threads in "Commit Order" queue by calling gtid_state->update_commit_group() from Commit_stage_manager::process_final_stage_for_ordered_commit_group(). 11. At this point, we will have - Client thread performing GTID update on behalf if applier thread 2 (from step 10.5), and - Applier thread 1 performing GTID update for itself (from step 8). Due to the lack of proper synchronization between the above two threads, there exists a time window where both threads can call gtid_state->update_commit_group() concurrently. In subsequent steps, both threads simultaneously try to modify the contents of the array `commit_group_sidnos` which is used to track the lock status of sidnos. This concurrent access to `update_commit_group()` can cause a lock-leak resulting in one thread acquiring the sidno lock and not releasing at all. ----------------------------------------------------------------------------------------------------------- Client thread Applier Thread 1 ----------------------------------------------------------------------------------------------------------- update_commit_group() => global_sid_lock->rdlock(); update_commit_group() => global_sid_lock->rdlock(); calls update_gtids_impl_lock_sidnos() calls update_gtids_impl_lock_sidnos() set commit_group_sidno[2] = true set commit_group_sidno[2] = true lock_sidno(2) -> successful lock_sidno(2) -> waits update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()` if (commit_group_sidnos[2]) { unlock_sidno(2); commit_group_sidnos[2] = false; } Applier thread continues.. lock_sidno(2) -> successful update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()` if (commit_group_sidnos[2]) { <=== this check fails and lock is not released. unlock_sidno(2); commit_group_sidnos[2] = false; } Client thread continues without releasing the lock ----------------------------------------------------------------------------------------------------------- 12. As the above lock-leak can also happen the other way i.e, the applier thread fails to unlock, there can be different consequences hereafter. 13. If the client thread continues without releasing the lock, then at a later stage, it can enter into a deadlock with the applier thread performing a GTID update with stack trace. Client_thread ------------- #1 __GI___lll_lock_wait percona#2 ___pthread_mutex_lock percona#3 native_mutex_lock <= waits for commit lock while holding sidno lock percona#4 Commit_stage_manager::enroll_for percona#5 MYSQL_BIN_LOG::change_stage percona#6 MYSQL_BIN_LOG::ordered_commit percona#7 MYSQL_BIN_LOG::commit percona#8 ha_commit_trans percona#9 trans_commit_implicit percona#10 mysql_create_like_table percona#11 Sql_cmd_create_table::execute percona#12 mysql_execute_command percona#13 dispatch_sql_command Applier thread -------------- #1 ___pthread_mutex_lock percona#2 native_mutex_lock percona#3 safe_mutex_lock percona#4 Gtid_state::update_gtids_impl_lock_sidnos <= waits for sidno lock percona#5 Gtid_state::update_commit_group percona#6 Commit_order_manager::flush_engine_and_signal_threads <= acquires commit lock here percona#7 Commit_order_manager::finish percona#8 Commit_order_manager::wait_and_finish percona#9 ha_commit_low percona#10 trx_coordinator::commit_in_engines percona#11 MYSQL_BIN_LOG::commit percona#12 ha_commit_trans percona#13 trans_commit percona#14 Xid_log_event::do_commit percona#15 Xid_apply_log_event::do_apply_event_worker percona#16 Slave_worker::slave_worker_exec_event percona#17 slave_worker_exec_job_group percona#18 handle_slave_worker 14. If the applier thread continues without releasing the lock, then at a later stage, it can perform recursive locking while setting the GTID for the next transaction (in set_gtid_next()). In debug builds the above case hits the assertion `safe_mutex_assert_not_owner()` meaning the lock is already acquired by the replica applier thread when it tries to re-acquire the lock. Solution -------- In the above problematic example, when seen from each thread individually, we can conclude that there is no problem in the order of lock acquisition, thus there is no need to change the lock order. However, the root cause for this problem is that multiple threads can concurrently access to the array `Gtid_state::commit_group_sidnos`. In its initial implementation, it was expected that threads should hold the `MYSQL_BIN_LOG::LOCK_commit` before modifying its contents. But it was not considered when upstream implemented WL#7846 (MTS: slave-preserve-commit-order when log-slave-updates/binlog is disabled). With this patch, we now ensure that `MYSQL_BIN_LOG::LOCK_commit` is acquired when the client thread (binlog flush leader) when it tries to perform GTID update on behalf of threads waiting in "Commit Order" queue, thus providing a guarantee that `Gtid_state::commit_group_sidnos` array is never accessed without the protection of `MYSQL_BIN_LOG::LOCK_commit`.
dlenev
pushed a commit
to dlenev/percona-server
that referenced
this pull request
Oct 22, 2024
…s=0 and a local DDL executed https://perconadev.atlassian.net/browse/PS-9018 Problem ------- In high concurrency scenarios, MySQL replica can enter into a deadlock due to a race condition between the replica applier thread and the client thread performing a binlog group commit. Analysis -------- It needs at least 3 threads for this deadlock to happen 1. One client thread 2. Two replica applier threads How this deadlock happens? -------------------------- 0. Binlog is enabled on replica, but log_replica_updates is disabled. 1. Initially, both "Commit Order" and "Binlog Flush" queues are empty. 2. Replica applier thread 1 enters the group commit pipeline to register in the "Commit Order" queue since `log-replica-updates` is disabled on the replica node. 3. Since both "Commit Order" and "Binlog Flush" queues are empty, the applier thread 1 3.1. Becomes leader (In Commit_stage_manager::enroll_for()). 3.2. Registers in the commit order queue. 3.3. Acquires the lock MYSQL_BIN_LOG::LOCK_log. 3.4. Commit Order queue is emptied, but the lock MYSQL_BIN_LOG::LOCK_log is not yet released. NOTE: SE commit for applier thread is already done by the time it reaches here. 4. Replica applier thread 2 enters the group commit pipeline to register in the "Commit Order" queue since `log-replica-updates` is disabled on the replica node. 5. Since the "Commit Order" queue is empty (emptied by applier thread 1 in 3.4), the applier thread 2 5.1. Becomes leader (In Commit_stage_manager::enroll_for()) 5.2. Registers in the commit order queue. 5.3. Tries to acquire the lock MYSQL_BIN_LOG::LOCK_log. Since it is held by applier thread 1 it will wait until the lock is released. 6. Client thread enters the group commit pipeline to register in the "Binlog Flush" queue. 7. Since "Commit Order" queue is not empty (there is applier thread 2 in the queue), it enters the conditional wait `m_stage_cond_leader` with an intention to become the leader for both the "Binlog Flush" and "Commit Order" queues. 8. Applier thread 1 releases the lock MYSQL_BIN_LOG::LOCK_log and proceeds to update the GTID by calling gtid_state->update_commit_group() from Commit_order_manager::flush_engine_and_signal_threads(). 9. Applier thread 2 acquires the lock MYSQL_BIN_LOG::LOCK_log. 9.1. It checks if there is any thread waiting in the "Binlog Flush" queue to become the leader. Here it finds the client thread waiting to be the leader. 9.2. It releases the lock MYSQL_BIN_LOG::LOCK_log and signals on the cond_var `m_stage_cond_leader` and enters a conditional wait until the thread's `tx_commit_pending` is set to false by the client thread (will be done in the Commit_stage_manager::process_final_stage_for_ordered_commit_group() called by client thread from fetch_and_process_flush_stage_queue()). 10. The client thread wakes up from the cond_var `m_stage_cond_leader`. The thread has now become a leader and it is its responsibility to update GTID of applier thread 2. 10.1. It acquires the lock MYSQL_BIN_LOG::LOCK_log. 10.2. Returns from `enroll_for()` and proceeds to process the "Commit Order" and "Binlog Flush" queues. 10.3. Fetches the "Commit Order" and "Binlog Flush" queues. 10.4. Performs the storage engine flush by calling ha_flush_logs() from fetch_and_process_flush_stage_queue(). 10.5. Proceeds to update the GTID of threads in "Commit Order" queue by calling gtid_state->update_commit_group() from Commit_stage_manager::process_final_stage_for_ordered_commit_group(). 11. At this point, we will have - Client thread performing GTID update on behalf if applier thread 2 (from step 10.5), and - Applier thread 1 performing GTID update for itself (from step 8). Due to the lack of proper synchronization between the above two threads, there exists a time window where both threads can call gtid_state->update_commit_group() concurrently. In subsequent steps, both threads simultaneously try to modify the contents of the array `commit_group_sidnos` which is used to track the lock status of sidnos. This concurrent access to `update_commit_group()` can cause a lock-leak resulting in one thread acquiring the sidno lock and not releasing at all. ----------------------------------------------------------------------------------------------------------- Client thread Applier Thread 1 ----------------------------------------------------------------------------------------------------------- update_commit_group() => global_sid_lock->rdlock(); update_commit_group() => global_sid_lock->rdlock(); calls update_gtids_impl_lock_sidnos() calls update_gtids_impl_lock_sidnos() set commit_group_sidno[2] = true set commit_group_sidno[2] = true lock_sidno(2) -> successful lock_sidno(2) -> waits update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()` if (commit_group_sidnos[2]) { unlock_sidno(2); commit_group_sidnos[2] = false; } Applier thread continues.. lock_sidno(2) -> successful update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()` if (commit_group_sidnos[2]) { <=== this check fails and lock is not released. unlock_sidno(2); commit_group_sidnos[2] = false; } Client thread continues without releasing the lock ----------------------------------------------------------------------------------------------------------- 12. As the above lock-leak can also happen the other way i.e, the applier thread fails to unlock, there can be different consequences hereafter. 13. If the client thread continues without releasing the lock, then at a later stage, it can enter into a deadlock with the applier thread performing a GTID update with stack trace. Client_thread ------------- #1 __GI___lll_lock_wait percona#2 ___pthread_mutex_lock percona#3 native_mutex_lock <= waits for commit lock while holding sidno lock percona#4 Commit_stage_manager::enroll_for percona#5 MYSQL_BIN_LOG::change_stage percona#6 MYSQL_BIN_LOG::ordered_commit percona#7 MYSQL_BIN_LOG::commit percona#8 ha_commit_trans percona#9 trans_commit_implicit percona#10 mysql_create_like_table percona#11 Sql_cmd_create_table::execute percona#12 mysql_execute_command percona#13 dispatch_sql_command Applier thread -------------- #1 ___pthread_mutex_lock percona#2 native_mutex_lock percona#3 safe_mutex_lock percona#4 Gtid_state::update_gtids_impl_lock_sidnos <= waits for sidno lock percona#5 Gtid_state::update_commit_group percona#6 Commit_order_manager::flush_engine_and_signal_threads <= acquires commit lock here percona#7 Commit_order_manager::finish percona#8 Commit_order_manager::wait_and_finish percona#9 ha_commit_low percona#10 trx_coordinator::commit_in_engines percona#11 MYSQL_BIN_LOG::commit percona#12 ha_commit_trans percona#13 trans_commit percona#14 Xid_log_event::do_commit percona#15 Xid_apply_log_event::do_apply_event_worker percona#16 Slave_worker::slave_worker_exec_event percona#17 slave_worker_exec_job_group percona#18 handle_slave_worker 14. If the applier thread continues without releasing the lock, then at a later stage, it can perform recursive locking while setting the GTID for the next transaction (in set_gtid_next()). In debug builds the above case hits the assertion `safe_mutex_assert_not_owner()` meaning the lock is already acquired by the replica applier thread when it tries to re-acquire the lock. Solution -------- In the above problematic example, when seen from each thread individually, we can conclude that there is no problem in the order of lock acquisition, thus there is no need to change the lock order. However, the root cause for this problem is that multiple threads can concurrently access to the array `Gtid_state::commit_group_sidnos`. In its initial implementation, it was expected that threads should hold the `MYSQL_BIN_LOG::LOCK_commit` before modifying its contents. But it was not considered when upstream implemented WL#7846 (MTS: slave-preserve-commit-order when log-slave-updates/binlog is disabled). With this patch, we now ensure that `MYSQL_BIN_LOG::LOCK_commit` is acquired when the client thread (binlog flush leader) when it tries to perform GTID update on behalf of threads waiting in "Commit Order" queue, thus providing a guarantee that `Gtid_state::commit_group_sidnos` array is never accessed without the protection of `MYSQL_BIN_LOG::LOCK_commit`.
inikep
pushed a commit
to inikep/percona-server
that referenced
this pull request
Oct 28, 2024
…ocal DDL executed https://perconadev.atlassian.net/browse/PS-9018 Problem ------- In high concurrency scenarios, MySQL replica can enter into a deadlock due to a race condition between the replica applier thread and the client thread performing a binlog group commit. Analysis -------- It needs at least 3 threads for this deadlock to happen 1. One client thread 2. Two replica applier threads How this deadlock happens? -------------------------- 0. Binlog is enabled on replica, but log_replica_updates is disabled. 1. Initially, both "Commit Order" and "Binlog Flush" queues are empty. 2. Replica applier thread 1 enters the group commit pipeline to register in the "Commit Order" queue since `log-replica-updates` is disabled on the replica node. 3. Since both "Commit Order" and "Binlog Flush" queues are empty, the applier thread 1 3.1. Becomes leader (In Commit_stage_manager::enroll_for()). 3.2. Registers in the commit order queue. 3.3. Acquires the lock MYSQL_BIN_LOG::LOCK_log. 3.4. Commit Order queue is emptied, but the lock MYSQL_BIN_LOG::LOCK_log is not yet released. NOTE: SE commit for applier thread is already done by the time it reaches here. 4. Replica applier thread 2 enters the group commit pipeline to register in the "Commit Order" queue since `log-replica-updates` is disabled on the replica node. 5. Since the "Commit Order" queue is empty (emptied by applier thread 1 in 3.4), the applier thread 2 5.1. Becomes leader (In Commit_stage_manager::enroll_for()) 5.2. Registers in the commit order queue. 5.3. Tries to acquire the lock MYSQL_BIN_LOG::LOCK_log. Since it is held by applier thread 1 it will wait until the lock is released. 6. Client thread enters the group commit pipeline to register in the "Binlog Flush" queue. 7. Since "Commit Order" queue is not empty (there is applier thread 2 in the queue), it enters the conditional wait `m_stage_cond_leader` with an intention to become the leader for both the "Binlog Flush" and "Commit Order" queues. 8. Applier thread 1 releases the lock MYSQL_BIN_LOG::LOCK_log and proceeds to update the GTID by calling gtid_state->update_commit_group() from Commit_order_manager::flush_engine_and_signal_threads(). 9. Applier thread 2 acquires the lock MYSQL_BIN_LOG::LOCK_log. 9.1. It checks if there is any thread waiting in the "Binlog Flush" queue to become the leader. Here it finds the client thread waiting to be the leader. 9.2. It releases the lock MYSQL_BIN_LOG::LOCK_log and signals on the cond_var `m_stage_cond_leader` and enters a conditional wait until the thread's `tx_commit_pending` is set to false by the client thread (will be done in the Commit_stage_manager::process_final_stage_for_ordered_commit_group() called by client thread from fetch_and_process_flush_stage_queue()). 10. The client thread wakes up from the cond_var `m_stage_cond_leader`. The thread has now become a leader and it is its responsibility to update GTID of applier thread 2. 10.1. It acquires the lock MYSQL_BIN_LOG::LOCK_log. 10.2. Returns from `enroll_for()` and proceeds to process the "Commit Order" and "Binlog Flush" queues. 10.3. Fetches the "Commit Order" and "Binlog Flush" queues. 10.4. Performs the storage engine flush by calling ha_flush_logs() from fetch_and_process_flush_stage_queue(). 10.5. Proceeds to update the GTID of threads in "Commit Order" queue by calling gtid_state->update_commit_group() from Commit_stage_manager::process_final_stage_for_ordered_commit_group(). 11. At this point, we will have - Client thread performing GTID update on behalf if applier thread 2 (from step 10.5), and - Applier thread 1 performing GTID update for itself (from step 8). Due to the lack of proper synchronization between the above two threads, there exists a time window where both threads can call gtid_state->update_commit_group() concurrently. In subsequent steps, both threads simultaneously try to modify the contents of the array `commit_group_sidnos` which is used to track the lock status of sidnos. This concurrent access to `update_commit_group()` can cause a lock-leak resulting in one thread acquiring the sidno lock and not releasing at all. ----------------------------------------------------------------------------------------------------------- Client thread Applier Thread 1 ----------------------------------------------------------------------------------------------------------- update_commit_group() => global_sid_lock->rdlock(); update_commit_group() => global_sid_lock->rdlock(); calls update_gtids_impl_lock_sidnos() calls update_gtids_impl_lock_sidnos() set commit_group_sidno[2] = true set commit_group_sidno[2] = true lock_sidno(2) -> successful lock_sidno(2) -> waits update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()` if (commit_group_sidnos[2]) { unlock_sidno(2); commit_group_sidnos[2] = false; } Applier thread continues.. lock_sidno(2) -> successful update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()` if (commit_group_sidnos[2]) { <=== this check fails and lock is not released. unlock_sidno(2); commit_group_sidnos[2] = false; } Client thread continues without releasing the lock ----------------------------------------------------------------------------------------------------------- 12. As the above lock-leak can also happen the other way i.e, the applier thread fails to unlock, there can be different consequences hereafter. 13. If the client thread continues without releasing the lock, then at a later stage, it can enter into a deadlock with the applier thread performing a GTID update with stack trace. Client_thread ------------- #1 __GI___lll_lock_wait #2 ___pthread_mutex_lock #3 native_mutex_lock <= waits for commit lock while holding sidno lock #4 Commit_stage_manager::enroll_for #5 MYSQL_BIN_LOG::change_stage #6 MYSQL_BIN_LOG::ordered_commit #7 MYSQL_BIN_LOG::commit #8 ha_commit_trans #9 trans_commit_implicit #10 mysql_create_like_table #11 Sql_cmd_create_table::execute #12 mysql_execute_command percona#13 dispatch_sql_command Applier thread -------------- #1 ___pthread_mutex_lock #2 native_mutex_lock #3 safe_mutex_lock #4 Gtid_state::update_gtids_impl_lock_sidnos <= waits for sidno lock #5 Gtid_state::update_commit_group #6 Commit_order_manager::flush_engine_and_signal_threads <= acquires commit lock here #7 Commit_order_manager::finish #8 Commit_order_manager::wait_and_finish #9 ha_commit_low #10 trx_coordinator::commit_in_engines #11 MYSQL_BIN_LOG::commit #12 ha_commit_trans percona#13 trans_commit percona#14 Xid_log_event::do_commit percona#15 Xid_apply_log_event::do_apply_event_worker percona#16 Slave_worker::slave_worker_exec_event percona#17 slave_worker_exec_job_group percona#18 handle_slave_worker 14. If the applier thread continues without releasing the lock, then at a later stage, it can perform recursive locking while setting the GTID for the next transaction (in set_gtid_next()). In debug builds the above case hits the assertion `safe_mutex_assert_not_owner()` meaning the lock is already acquired by the replica applier thread when it tries to re-acquire the lock. Solution -------- In the above problematic example, when seen from each thread individually, we can conclude that there is no problem in the order of lock acquisition, thus there is no need to change the lock order. However, the root cause for this problem is that multiple threads can concurrently access to the array `Gtid_state::commit_group_sidnos`. In its initial implementation, it was expected that threads should hold the `MYSQL_BIN_LOG::LOCK_commit` before modifying its contents. But it was not considered when upstream implemented WL#7846 (MTS: slave-preserve-commit-order when log-slave-updates/binlog is disabled). With this patch, we now ensure that `MYSQL_BIN_LOG::LOCK_commit` is acquired when the client thread (binlog flush leader) when it tries to perform GTID update on behalf of threads waiting in "Commit Order" queue, thus providing a guarantee that `Gtid_state::commit_group_sidnos` array is never accessed without the protection of `MYSQL_BIN_LOG::LOCK_commit`.
inikep
pushed a commit
that referenced
this pull request
Oct 30, 2024
…ocal DDL executed https://perconadev.atlassian.net/browse/PS-9018 Problem ------- In high concurrency scenarios, MySQL replica can enter into a deadlock due to a race condition between the replica applier thread and the client thread performing a binlog group commit. Analysis -------- It needs at least 3 threads for this deadlock to happen 1. One client thread 2. Two replica applier threads How this deadlock happens? -------------------------- 0. Binlog is enabled on replica, but log_replica_updates is disabled. 1. Initially, both "Commit Order" and "Binlog Flush" queues are empty. 2. Replica applier thread 1 enters the group commit pipeline to register in the "Commit Order" queue since `log-replica-updates` is disabled on the replica node. 3. Since both "Commit Order" and "Binlog Flush" queues are empty, the applier thread 1 3.1. Becomes leader (In Commit_stage_manager::enroll_for()). 3.2. Registers in the commit order queue. 3.3. Acquires the lock MYSQL_BIN_LOG::LOCK_log. 3.4. Commit Order queue is emptied, but the lock MYSQL_BIN_LOG::LOCK_log is not yet released. NOTE: SE commit for applier thread is already done by the time it reaches here. 4. Replica applier thread 2 enters the group commit pipeline to register in the "Commit Order" queue since `log-replica-updates` is disabled on the replica node. 5. Since the "Commit Order" queue is empty (emptied by applier thread 1 in 3.4), the applier thread 2 5.1. Becomes leader (In Commit_stage_manager::enroll_for()) 5.2. Registers in the commit order queue. 5.3. Tries to acquire the lock MYSQL_BIN_LOG::LOCK_log. Since it is held by applier thread 1 it will wait until the lock is released. 6. Client thread enters the group commit pipeline to register in the "Binlog Flush" queue. 7. Since "Commit Order" queue is not empty (there is applier thread 2 in the queue), it enters the conditional wait `m_stage_cond_leader` with an intention to become the leader for both the "Binlog Flush" and "Commit Order" queues. 8. Applier thread 1 releases the lock MYSQL_BIN_LOG::LOCK_log and proceeds to update the GTID by calling gtid_state->update_commit_group() from Commit_order_manager::flush_engine_and_signal_threads(). 9. Applier thread 2 acquires the lock MYSQL_BIN_LOG::LOCK_log. 9.1. It checks if there is any thread waiting in the "Binlog Flush" queue to become the leader. Here it finds the client thread waiting to be the leader. 9.2. It releases the lock MYSQL_BIN_LOG::LOCK_log and signals on the cond_var `m_stage_cond_leader` and enters a conditional wait until the thread's `tx_commit_pending` is set to false by the client thread (will be done in the Commit_stage_manager::process_final_stage_for_ordered_commit_group() called by client thread from fetch_and_process_flush_stage_queue()). 10. The client thread wakes up from the cond_var `m_stage_cond_leader`. The thread has now become a leader and it is its responsibility to update GTID of applier thread 2. 10.1. It acquires the lock MYSQL_BIN_LOG::LOCK_log. 10.2. Returns from `enroll_for()` and proceeds to process the "Commit Order" and "Binlog Flush" queues. 10.3. Fetches the "Commit Order" and "Binlog Flush" queues. 10.4. Performs the storage engine flush by calling ha_flush_logs() from fetch_and_process_flush_stage_queue(). 10.5. Proceeds to update the GTID of threads in "Commit Order" queue by calling gtid_state->update_commit_group() from Commit_stage_manager::process_final_stage_for_ordered_commit_group(). 11. At this point, we will have - Client thread performing GTID update on behalf if applier thread 2 (from step 10.5), and - Applier thread 1 performing GTID update for itself (from step 8). Due to the lack of proper synchronization between the above two threads, there exists a time window where both threads can call gtid_state->update_commit_group() concurrently. In subsequent steps, both threads simultaneously try to modify the contents of the array `commit_group_sidnos` which is used to track the lock status of sidnos. This concurrent access to `update_commit_group()` can cause a lock-leak resulting in one thread acquiring the sidno lock and not releasing at all. ----------------------------------------------------------------------------------------------------------- Client thread Applier Thread 1 ----------------------------------------------------------------------------------------------------------- update_commit_group() => global_sid_lock->rdlock(); update_commit_group() => global_sid_lock->rdlock(); calls update_gtids_impl_lock_sidnos() calls update_gtids_impl_lock_sidnos() set commit_group_sidno[2] = true set commit_group_sidno[2] = true lock_sidno(2) -> successful lock_sidno(2) -> waits update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()` if (commit_group_sidnos[2]) { unlock_sidno(2); commit_group_sidnos[2] = false; } Applier thread continues.. lock_sidno(2) -> successful update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()` if (commit_group_sidnos[2]) { <=== this check fails and lock is not released. unlock_sidno(2); commit_group_sidnos[2] = false; } Client thread continues without releasing the lock ----------------------------------------------------------------------------------------------------------- 12. As the above lock-leak can also happen the other way i.e, the applier thread fails to unlock, there can be different consequences hereafter. 13. If the client thread continues without releasing the lock, then at a later stage, it can enter into a deadlock with the applier thread performing a GTID update with stack trace. Client_thread ------------- #1 __GI___lll_lock_wait #2 ___pthread_mutex_lock #3 native_mutex_lock <= waits for commit lock while holding sidno lock #4 Commit_stage_manager::enroll_for #5 MYSQL_BIN_LOG::change_stage #6 MYSQL_BIN_LOG::ordered_commit #7 MYSQL_BIN_LOG::commit #8 ha_commit_trans #9 trans_commit_implicit #10 mysql_create_like_table #11 Sql_cmd_create_table::execute #12 mysql_execute_command #13 dispatch_sql_command Applier thread -------------- #1 ___pthread_mutex_lock #2 native_mutex_lock #3 safe_mutex_lock #4 Gtid_state::update_gtids_impl_lock_sidnos <= waits for sidno lock #5 Gtid_state::update_commit_group #6 Commit_order_manager::flush_engine_and_signal_threads <= acquires commit lock here #7 Commit_order_manager::finish #8 Commit_order_manager::wait_and_finish #9 ha_commit_low #10 trx_coordinator::commit_in_engines #11 MYSQL_BIN_LOG::commit #12 ha_commit_trans #13 trans_commit #14 Xid_log_event::do_commit #15 Xid_apply_log_event::do_apply_event_worker #16 Slave_worker::slave_worker_exec_event #17 slave_worker_exec_job_group #18 handle_slave_worker 14. If the applier thread continues without releasing the lock, then at a later stage, it can perform recursive locking while setting the GTID for the next transaction (in set_gtid_next()). In debug builds the above case hits the assertion `safe_mutex_assert_not_owner()` meaning the lock is already acquired by the replica applier thread when it tries to re-acquire the lock. Solution -------- In the above problematic example, when seen from each thread individually, we can conclude that there is no problem in the order of lock acquisition, thus there is no need to change the lock order. However, the root cause for this problem is that multiple threads can concurrently access to the array `Gtid_state::commit_group_sidnos`. In its initial implementation, it was expected that threads should hold the `MYSQL_BIN_LOG::LOCK_commit` before modifying its contents. But it was not considered when upstream implemented WL#7846 (MTS: slave-preserve-commit-order when log-slave-updates/binlog is disabled). With this patch, we now ensure that `MYSQL_BIN_LOG::LOCK_commit` is acquired when the client thread (binlog flush leader) when it tries to perform GTID update on behalf of threads waiting in "Commit Order" queue, thus providing a guarantee that `Gtid_state::commit_group_sidnos` array is never accessed without the protection of `MYSQL_BIN_LOG::LOCK_commit`.
inikep
pushed a commit
to inikep/percona-server
that referenced
this pull request
Nov 11, 2024
…s=0 and a local DDL executed https://perconadev.atlassian.net/browse/PS-9018 Problem ------- In high concurrency scenarios, MySQL replica can enter into a deadlock due to a race condition between the replica applier thread and the client thread performing a binlog group commit. Analysis -------- It needs at least 3 threads for this deadlock to happen 1. One client thread 2. Two replica applier threads How this deadlock happens? -------------------------- 0. Binlog is enabled on replica, but log_replica_updates is disabled. 1. Initially, both "Commit Order" and "Binlog Flush" queues are empty. 2. Replica applier thread 1 enters the group commit pipeline to register in the "Commit Order" queue since `log-replica-updates` is disabled on the replica node. 3. Since both "Commit Order" and "Binlog Flush" queues are empty, the applier thread 1 3.1. Becomes leader (In Commit_stage_manager::enroll_for()). 3.2. Registers in the commit order queue. 3.3. Acquires the lock MYSQL_BIN_LOG::LOCK_log. 3.4. Commit Order queue is emptied, but the lock MYSQL_BIN_LOG::LOCK_log is not yet released. NOTE: SE commit for applier thread is already done by the time it reaches here. 4. Replica applier thread 2 enters the group commit pipeline to register in the "Commit Order" queue since `log-replica-updates` is disabled on the replica node. 5. Since the "Commit Order" queue is empty (emptied by applier thread 1 in 3.4), the applier thread 2 5.1. Becomes leader (In Commit_stage_manager::enroll_for()) 5.2. Registers in the commit order queue. 5.3. Tries to acquire the lock MYSQL_BIN_LOG::LOCK_log. Since it is held by applier thread 1 it will wait until the lock is released. 6. Client thread enters the group commit pipeline to register in the "Binlog Flush" queue. 7. Since "Commit Order" queue is not empty (there is applier thread 2 in the queue), it enters the conditional wait `m_stage_cond_leader` with an intention to become the leader for both the "Binlog Flush" and "Commit Order" queues. 8. Applier thread 1 releases the lock MYSQL_BIN_LOG::LOCK_log and proceeds to update the GTID by calling gtid_state->update_commit_group() from Commit_order_manager::flush_engine_and_signal_threads(). 9. Applier thread 2 acquires the lock MYSQL_BIN_LOG::LOCK_log. 9.1. It checks if there is any thread waiting in the "Binlog Flush" queue to become the leader. Here it finds the client thread waiting to be the leader. 9.2. It releases the lock MYSQL_BIN_LOG::LOCK_log and signals on the cond_var `m_stage_cond_leader` and enters a conditional wait until the thread's `tx_commit_pending` is set to false by the client thread (will be done in the Commit_stage_manager::process_final_stage_for_ordered_commit_group() called by client thread from fetch_and_process_flush_stage_queue()). 10. The client thread wakes up from the cond_var `m_stage_cond_leader`. The thread has now become a leader and it is its responsibility to update GTID of applier thread 2. 10.1. It acquires the lock MYSQL_BIN_LOG::LOCK_log. 10.2. Returns from `enroll_for()` and proceeds to process the "Commit Order" and "Binlog Flush" queues. 10.3. Fetches the "Commit Order" and "Binlog Flush" queues. 10.4. Performs the storage engine flush by calling ha_flush_logs() from fetch_and_process_flush_stage_queue(). 10.5. Proceeds to update the GTID of threads in "Commit Order" queue by calling gtid_state->update_commit_group() from Commit_stage_manager::process_final_stage_for_ordered_commit_group(). 11. At this point, we will have - Client thread performing GTID update on behalf if applier thread 2 (from step 10.5), and - Applier thread 1 performing GTID update for itself (from step 8). Due to the lack of proper synchronization between the above two threads, there exists a time window where both threads can call gtid_state->update_commit_group() concurrently. In subsequent steps, both threads simultaneously try to modify the contents of the array `commit_group_sidnos` which is used to track the lock status of sidnos. This concurrent access to `update_commit_group()` can cause a lock-leak resulting in one thread acquiring the sidno lock and not releasing at all. ----------------------------------------------------------------------------------------------------------- Client thread Applier Thread 1 ----------------------------------------------------------------------------------------------------------- update_commit_group() => global_sid_lock->rdlock(); update_commit_group() => global_sid_lock->rdlock(); calls update_gtids_impl_lock_sidnos() calls update_gtids_impl_lock_sidnos() set commit_group_sidno[2] = true set commit_group_sidno[2] = true lock_sidno(2) -> successful lock_sidno(2) -> waits update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()` if (commit_group_sidnos[2]) { unlock_sidno(2); commit_group_sidnos[2] = false; } Applier thread continues.. lock_sidno(2) -> successful update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()` if (commit_group_sidnos[2]) { <=== this check fails and lock is not released. unlock_sidno(2); commit_group_sidnos[2] = false; } Client thread continues without releasing the lock ----------------------------------------------------------------------------------------------------------- 12. As the above lock-leak can also happen the other way i.e, the applier thread fails to unlock, there can be different consequences hereafter. 13. If the client thread continues without releasing the lock, then at a later stage, it can enter into a deadlock with the applier thread performing a GTID update with stack trace. Client_thread ------------- #1 __GI___lll_lock_wait #2 ___pthread_mutex_lock #3 native_mutex_lock <= waits for commit lock while holding sidno lock #4 Commit_stage_manager::enroll_for #5 MYSQL_BIN_LOG::change_stage #6 MYSQL_BIN_LOG::ordered_commit #7 MYSQL_BIN_LOG::commit #8 ha_commit_trans #9 trans_commit_implicit #10 mysql_create_like_table #11 Sql_cmd_create_table::execute #12 mysql_execute_command percona#13 dispatch_sql_command Applier thread -------------- #1 ___pthread_mutex_lock #2 native_mutex_lock #3 safe_mutex_lock #4 Gtid_state::update_gtids_impl_lock_sidnos <= waits for sidno lock #5 Gtid_state::update_commit_group #6 Commit_order_manager::flush_engine_and_signal_threads <= acquires commit lock here #7 Commit_order_manager::finish #8 Commit_order_manager::wait_and_finish #9 ha_commit_low #10 trx_coordinator::commit_in_engines #11 MYSQL_BIN_LOG::commit #12 ha_commit_trans percona#13 trans_commit percona#14 Xid_log_event::do_commit percona#15 Xid_apply_log_event::do_apply_event_worker percona#16 Slave_worker::slave_worker_exec_event percona#17 slave_worker_exec_job_group percona#18 handle_slave_worker 14. If the applier thread continues without releasing the lock, then at a later stage, it can perform recursive locking while setting the GTID for the next transaction (in set_gtid_next()). In debug builds the above case hits the assertion `safe_mutex_assert_not_owner()` meaning the lock is already acquired by the replica applier thread when it tries to re-acquire the lock. Solution -------- In the above problematic example, when seen from each thread individually, we can conclude that there is no problem in the order of lock acquisition, thus there is no need to change the lock order. However, the root cause for this problem is that multiple threads can concurrently access to the array `Gtid_state::commit_group_sidnos`. In its initial implementation, it was expected that threads should hold the `MYSQL_BIN_LOG::LOCK_commit` before modifying its contents. But it was not considered when upstream implemented WL#7846 (MTS: slave-preserve-commit-order when log-slave-updates/binlog is disabled). With this patch, we now ensure that `MYSQL_BIN_LOG::LOCK_commit` is acquired when the client thread (binlog flush leader) when it tries to perform GTID update on behalf of threads waiting in "Commit Order" queue, thus providing a guarantee that `Gtid_state::commit_group_sidnos` array is never accessed without the protection of `MYSQL_BIN_LOG::LOCK_commit`.
inikep
pushed a commit
to inikep/percona-server
that referenced
this pull request
Nov 14, 2024
…s=0 and a local DDL executed https://perconadev.atlassian.net/browse/PS-9018 Problem ------- In high concurrency scenarios, MySQL replica can enter into a deadlock due to a race condition between the replica applier thread and the client thread performing a binlog group commit. Analysis -------- It needs at least 3 threads for this deadlock to happen 1. One client thread 2. Two replica applier threads How this deadlock happens? -------------------------- 0. Binlog is enabled on replica, but log_replica_updates is disabled. 1. Initially, both "Commit Order" and "Binlog Flush" queues are empty. 2. Replica applier thread 1 enters the group commit pipeline to register in the "Commit Order" queue since `log-replica-updates` is disabled on the replica node. 3. Since both "Commit Order" and "Binlog Flush" queues are empty, the applier thread 1 3.1. Becomes leader (In Commit_stage_manager::enroll_for()). 3.2. Registers in the commit order queue. 3.3. Acquires the lock MYSQL_BIN_LOG::LOCK_log. 3.4. Commit Order queue is emptied, but the lock MYSQL_BIN_LOG::LOCK_log is not yet released. NOTE: SE commit for applier thread is already done by the time it reaches here. 4. Replica applier thread 2 enters the group commit pipeline to register in the "Commit Order" queue since `log-replica-updates` is disabled on the replica node. 5. Since the "Commit Order" queue is empty (emptied by applier thread 1 in 3.4), the applier thread 2 5.1. Becomes leader (In Commit_stage_manager::enroll_for()) 5.2. Registers in the commit order queue. 5.3. Tries to acquire the lock MYSQL_BIN_LOG::LOCK_log. Since it is held by applier thread 1 it will wait until the lock is released. 6. Client thread enters the group commit pipeline to register in the "Binlog Flush" queue. 7. Since "Commit Order" queue is not empty (there is applier thread 2 in the queue), it enters the conditional wait `m_stage_cond_leader` with an intention to become the leader for both the "Binlog Flush" and "Commit Order" queues. 8. Applier thread 1 releases the lock MYSQL_BIN_LOG::LOCK_log and proceeds to update the GTID by calling gtid_state->update_commit_group() from Commit_order_manager::flush_engine_and_signal_threads(). 9. Applier thread 2 acquires the lock MYSQL_BIN_LOG::LOCK_log. 9.1. It checks if there is any thread waiting in the "Binlog Flush" queue to become the leader. Here it finds the client thread waiting to be the leader. 9.2. It releases the lock MYSQL_BIN_LOG::LOCK_log and signals on the cond_var `m_stage_cond_leader` and enters a conditional wait until the thread's `tx_commit_pending` is set to false by the client thread (will be done in the Commit_stage_manager::process_final_stage_for_ordered_commit_group() called by client thread from fetch_and_process_flush_stage_queue()). 10. The client thread wakes up from the cond_var `m_stage_cond_leader`. The thread has now become a leader and it is its responsibility to update GTID of applier thread 2. 10.1. It acquires the lock MYSQL_BIN_LOG::LOCK_log. 10.2. Returns from `enroll_for()` and proceeds to process the "Commit Order" and "Binlog Flush" queues. 10.3. Fetches the "Commit Order" and "Binlog Flush" queues. 10.4. Performs the storage engine flush by calling ha_flush_logs() from fetch_and_process_flush_stage_queue(). 10.5. Proceeds to update the GTID of threads in "Commit Order" queue by calling gtid_state->update_commit_group() from Commit_stage_manager::process_final_stage_for_ordered_commit_group(). 11. At this point, we will have - Client thread performing GTID update on behalf if applier thread 2 (from step 10.5), and - Applier thread 1 performing GTID update for itself (from step 8). Due to the lack of proper synchronization between the above two threads, there exists a time window where both threads can call gtid_state->update_commit_group() concurrently. In subsequent steps, both threads simultaneously try to modify the contents of the array `commit_group_sidnos` which is used to track the lock status of sidnos. This concurrent access to `update_commit_group()` can cause a lock-leak resulting in one thread acquiring the sidno lock and not releasing at all. ----------------------------------------------------------------------------------------------------------- Client thread Applier Thread 1 ----------------------------------------------------------------------------------------------------------- update_commit_group() => global_sid_lock->rdlock(); update_commit_group() => global_sid_lock->rdlock(); calls update_gtids_impl_lock_sidnos() calls update_gtids_impl_lock_sidnos() set commit_group_sidno[2] = true set commit_group_sidno[2] = true lock_sidno(2) -> successful lock_sidno(2) -> waits update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()` if (commit_group_sidnos[2]) { unlock_sidno(2); commit_group_sidnos[2] = false; } Applier thread continues.. lock_sidno(2) -> successful update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()` if (commit_group_sidnos[2]) { <=== this check fails and lock is not released. unlock_sidno(2); commit_group_sidnos[2] = false; } Client thread continues without releasing the lock ----------------------------------------------------------------------------------------------------------- 12. As the above lock-leak can also happen the other way i.e, the applier thread fails to unlock, there can be different consequences hereafter. 13. If the client thread continues without releasing the lock, then at a later stage, it can enter into a deadlock with the applier thread performing a GTID update with stack trace. Client_thread ------------- #1 __GI___lll_lock_wait #2 ___pthread_mutex_lock #3 native_mutex_lock <= waits for commit lock while holding sidno lock #4 Commit_stage_manager::enroll_for #5 MYSQL_BIN_LOG::change_stage #6 MYSQL_BIN_LOG::ordered_commit #7 MYSQL_BIN_LOG::commit #8 ha_commit_trans #9 trans_commit_implicit #10 mysql_create_like_table #11 Sql_cmd_create_table::execute #12 mysql_execute_command percona#13 dispatch_sql_command Applier thread -------------- #1 ___pthread_mutex_lock #2 native_mutex_lock #3 safe_mutex_lock #4 Gtid_state::update_gtids_impl_lock_sidnos <= waits for sidno lock #5 Gtid_state::update_commit_group #6 Commit_order_manager::flush_engine_and_signal_threads <= acquires commit lock here #7 Commit_order_manager::finish #8 Commit_order_manager::wait_and_finish #9 ha_commit_low #10 trx_coordinator::commit_in_engines #11 MYSQL_BIN_LOG::commit #12 ha_commit_trans percona#13 trans_commit percona#14 Xid_log_event::do_commit percona#15 Xid_apply_log_event::do_apply_event_worker percona#16 Slave_worker::slave_worker_exec_event percona#17 slave_worker_exec_job_group percona#18 handle_slave_worker 14. If the applier thread continues without releasing the lock, then at a later stage, it can perform recursive locking while setting the GTID for the next transaction (in set_gtid_next()). In debug builds the above case hits the assertion `safe_mutex_assert_not_owner()` meaning the lock is already acquired by the replica applier thread when it tries to re-acquire the lock. Solution -------- In the above problematic example, when seen from each thread individually, we can conclude that there is no problem in the order of lock acquisition, thus there is no need to change the lock order. However, the root cause for this problem is that multiple threads can concurrently access to the array `Gtid_state::commit_group_sidnos`. In its initial implementation, it was expected that threads should hold the `MYSQL_BIN_LOG::LOCK_commit` before modifying its contents. But it was not considered when upstream implemented WL#7846 (MTS: slave-preserve-commit-order when log-slave-updates/binlog is disabled). With this patch, we now ensure that `MYSQL_BIN_LOG::LOCK_commit` is acquired when the client thread (binlog flush leader) when it tries to perform GTID update on behalf of threads waiting in "Commit Order" queue, thus providing a guarantee that `Gtid_state::commit_group_sidnos` array is never accessed without the protection of `MYSQL_BIN_LOG::LOCK_commit`.
inikep
pushed a commit
to inikep/percona-server
that referenced
this pull request
Nov 14, 2024
…s=0 and a local DDL executed https://perconadev.atlassian.net/browse/PS-9018 Problem ------- In high concurrency scenarios, MySQL replica can enter into a deadlock due to a race condition between the replica applier thread and the client thread performing a binlog group commit. Analysis -------- It needs at least 3 threads for this deadlock to happen 1. One client thread 2. Two replica applier threads How this deadlock happens? -------------------------- 0. Binlog is enabled on replica, but log_replica_updates is disabled. 1. Initially, both "Commit Order" and "Binlog Flush" queues are empty. 2. Replica applier thread 1 enters the group commit pipeline to register in the "Commit Order" queue since `log-replica-updates` is disabled on the replica node. 3. Since both "Commit Order" and "Binlog Flush" queues are empty, the applier thread 1 3.1. Becomes leader (In Commit_stage_manager::enroll_for()). 3.2. Registers in the commit order queue. 3.3. Acquires the lock MYSQL_BIN_LOG::LOCK_log. 3.4. Commit Order queue is emptied, but the lock MYSQL_BIN_LOG::LOCK_log is not yet released. NOTE: SE commit for applier thread is already done by the time it reaches here. 4. Replica applier thread 2 enters the group commit pipeline to register in the "Commit Order" queue since `log-replica-updates` is disabled on the replica node. 5. Since the "Commit Order" queue is empty (emptied by applier thread 1 in 3.4), the applier thread 2 5.1. Becomes leader (In Commit_stage_manager::enroll_for()) 5.2. Registers in the commit order queue. 5.3. Tries to acquire the lock MYSQL_BIN_LOG::LOCK_log. Since it is held by applier thread 1 it will wait until the lock is released. 6. Client thread enters the group commit pipeline to register in the "Binlog Flush" queue. 7. Since "Commit Order" queue is not empty (there is applier thread 2 in the queue), it enters the conditional wait `m_stage_cond_leader` with an intention to become the leader for both the "Binlog Flush" and "Commit Order" queues. 8. Applier thread 1 releases the lock MYSQL_BIN_LOG::LOCK_log and proceeds to update the GTID by calling gtid_state->update_commit_group() from Commit_order_manager::flush_engine_and_signal_threads(). 9. Applier thread 2 acquires the lock MYSQL_BIN_LOG::LOCK_log. 9.1. It checks if there is any thread waiting in the "Binlog Flush" queue to become the leader. Here it finds the client thread waiting to be the leader. 9.2. It releases the lock MYSQL_BIN_LOG::LOCK_log and signals on the cond_var `m_stage_cond_leader` and enters a conditional wait until the thread's `tx_commit_pending` is set to false by the client thread (will be done in the Commit_stage_manager::process_final_stage_for_ordered_commit_group() called by client thread from fetch_and_process_flush_stage_queue()). 10. The client thread wakes up from the cond_var `m_stage_cond_leader`. The thread has now become a leader and it is its responsibility to update GTID of applier thread 2. 10.1. It acquires the lock MYSQL_BIN_LOG::LOCK_log. 10.2. Returns from `enroll_for()` and proceeds to process the "Commit Order" and "Binlog Flush" queues. 10.3. Fetches the "Commit Order" and "Binlog Flush" queues. 10.4. Performs the storage engine flush by calling ha_flush_logs() from fetch_and_process_flush_stage_queue(). 10.5. Proceeds to update the GTID of threads in "Commit Order" queue by calling gtid_state->update_commit_group() from Commit_stage_manager::process_final_stage_for_ordered_commit_group(). 11. At this point, we will have - Client thread performing GTID update on behalf if applier thread 2 (from step 10.5), and - Applier thread 1 performing GTID update for itself (from step 8). Due to the lack of proper synchronization between the above two threads, there exists a time window where both threads can call gtid_state->update_commit_group() concurrently. In subsequent steps, both threads simultaneously try to modify the contents of the array `commit_group_sidnos` which is used to track the lock status of sidnos. This concurrent access to `update_commit_group()` can cause a lock-leak resulting in one thread acquiring the sidno lock and not releasing at all. ----------------------------------------------------------------------------------------------------------- Client thread Applier Thread 1 ----------------------------------------------------------------------------------------------------------- update_commit_group() => global_sid_lock->rdlock(); update_commit_group() => global_sid_lock->rdlock(); calls update_gtids_impl_lock_sidnos() calls update_gtids_impl_lock_sidnos() set commit_group_sidno[2] = true set commit_group_sidno[2] = true lock_sidno(2) -> successful lock_sidno(2) -> waits update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()` if (commit_group_sidnos[2]) { unlock_sidno(2); commit_group_sidnos[2] = false; } Applier thread continues.. lock_sidno(2) -> successful update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()` if (commit_group_sidnos[2]) { <=== this check fails and lock is not released. unlock_sidno(2); commit_group_sidnos[2] = false; } Client thread continues without releasing the lock ----------------------------------------------------------------------------------------------------------- 12. As the above lock-leak can also happen the other way i.e, the applier thread fails to unlock, there can be different consequences hereafter. 13. If the client thread continues without releasing the lock, then at a later stage, it can enter into a deadlock with the applier thread performing a GTID update with stack trace. Client_thread ------------- #1 __GI___lll_lock_wait #2 ___pthread_mutex_lock #3 native_mutex_lock <= waits for commit lock while holding sidno lock #4 Commit_stage_manager::enroll_for #5 MYSQL_BIN_LOG::change_stage #6 MYSQL_BIN_LOG::ordered_commit #7 MYSQL_BIN_LOG::commit #8 ha_commit_trans #9 trans_commit_implicit #10 mysql_create_like_table #11 Sql_cmd_create_table::execute #12 mysql_execute_command percona#13 dispatch_sql_command Applier thread -------------- #1 ___pthread_mutex_lock #2 native_mutex_lock #3 safe_mutex_lock #4 Gtid_state::update_gtids_impl_lock_sidnos <= waits for sidno lock #5 Gtid_state::update_commit_group #6 Commit_order_manager::flush_engine_and_signal_threads <= acquires commit lock here #7 Commit_order_manager::finish #8 Commit_order_manager::wait_and_finish #9 ha_commit_low #10 trx_coordinator::commit_in_engines #11 MYSQL_BIN_LOG::commit #12 ha_commit_trans percona#13 trans_commit percona#14 Xid_log_event::do_commit percona#15 Xid_apply_log_event::do_apply_event_worker percona#16 Slave_worker::slave_worker_exec_event percona#17 slave_worker_exec_job_group percona#18 handle_slave_worker 14. If the applier thread continues without releasing the lock, then at a later stage, it can perform recursive locking while setting the GTID for the next transaction (in set_gtid_next()). In debug builds the above case hits the assertion `safe_mutex_assert_not_owner()` meaning the lock is already acquired by the replica applier thread when it tries to re-acquire the lock. Solution -------- In the above problematic example, when seen from each thread individually, we can conclude that there is no problem in the order of lock acquisition, thus there is no need to change the lock order. However, the root cause for this problem is that multiple threads can concurrently access to the array `Gtid_state::commit_group_sidnos`. In its initial implementation, it was expected that threads should hold the `MYSQL_BIN_LOG::LOCK_commit` before modifying its contents. But it was not considered when upstream implemented WL#7846 (MTS: slave-preserve-commit-order when log-slave-updates/binlog is disabled). With this patch, we now ensure that `MYSQL_BIN_LOG::LOCK_commit` is acquired when the client thread (binlog flush leader) when it tries to perform GTID update on behalf of threads waiting in "Commit Order" queue, thus providing a guarantee that `Gtid_state::commit_group_sidnos` array is never accessed without the protection of `MYSQL_BIN_LOG::LOCK_commit`.
inikep
pushed a commit
to inikep/percona-server
that referenced
this pull request
Nov 18, 2024
…s=0 and a local DDL executed https://perconadev.atlassian.net/browse/PS-9018 Problem ------- In high concurrency scenarios, MySQL replica can enter into a deadlock due to a race condition between the replica applier thread and the client thread performing a binlog group commit. Analysis -------- It needs at least 3 threads for this deadlock to happen 1. One client thread 2. Two replica applier threads How this deadlock happens? -------------------------- 0. Binlog is enabled on replica, but log_replica_updates is disabled. 1. Initially, both "Commit Order" and "Binlog Flush" queues are empty. 2. Replica applier thread 1 enters the group commit pipeline to register in the "Commit Order" queue since `log-replica-updates` is disabled on the replica node. 3. Since both "Commit Order" and "Binlog Flush" queues are empty, the applier thread 1 3.1. Becomes leader (In Commit_stage_manager::enroll_for()). 3.2. Registers in the commit order queue. 3.3. Acquires the lock MYSQL_BIN_LOG::LOCK_log. 3.4. Commit Order queue is emptied, but the lock MYSQL_BIN_LOG::LOCK_log is not yet released. NOTE: SE commit for applier thread is already done by the time it reaches here. 4. Replica applier thread 2 enters the group commit pipeline to register in the "Commit Order" queue since `log-replica-updates` is disabled on the replica node. 5. Since the "Commit Order" queue is empty (emptied by applier thread 1 in 3.4), the applier thread 2 5.1. Becomes leader (In Commit_stage_manager::enroll_for()) 5.2. Registers in the commit order queue. 5.3. Tries to acquire the lock MYSQL_BIN_LOG::LOCK_log. Since it is held by applier thread 1 it will wait until the lock is released. 6. Client thread enters the group commit pipeline to register in the "Binlog Flush" queue. 7. Since "Commit Order" queue is not empty (there is applier thread 2 in the queue), it enters the conditional wait `m_stage_cond_leader` with an intention to become the leader for both the "Binlog Flush" and "Commit Order" queues. 8. Applier thread 1 releases the lock MYSQL_BIN_LOG::LOCK_log and proceeds to update the GTID by calling gtid_state->update_commit_group() from Commit_order_manager::flush_engine_and_signal_threads(). 9. Applier thread 2 acquires the lock MYSQL_BIN_LOG::LOCK_log. 9.1. It checks if there is any thread waiting in the "Binlog Flush" queue to become the leader. Here it finds the client thread waiting to be the leader. 9.2. It releases the lock MYSQL_BIN_LOG::LOCK_log and signals on the cond_var `m_stage_cond_leader` and enters a conditional wait until the thread's `tx_commit_pending` is set to false by the client thread (will be done in the Commit_stage_manager::process_final_stage_for_ordered_commit_group() called by client thread from fetch_and_process_flush_stage_queue()). 10. The client thread wakes up from the cond_var `m_stage_cond_leader`. The thread has now become a leader and it is its responsibility to update GTID of applier thread 2. 10.1. It acquires the lock MYSQL_BIN_LOG::LOCK_log. 10.2. Returns from `enroll_for()` and proceeds to process the "Commit Order" and "Binlog Flush" queues. 10.3. Fetches the "Commit Order" and "Binlog Flush" queues. 10.4. Performs the storage engine flush by calling ha_flush_logs() from fetch_and_process_flush_stage_queue(). 10.5. Proceeds to update the GTID of threads in "Commit Order" queue by calling gtid_state->update_commit_group() from Commit_stage_manager::process_final_stage_for_ordered_commit_group(). 11. At this point, we will have - Client thread performing GTID update on behalf if applier thread 2 (from step 10.5), and - Applier thread 1 performing GTID update for itself (from step 8). Due to the lack of proper synchronization between the above two threads, there exists a time window where both threads can call gtid_state->update_commit_group() concurrently. In subsequent steps, both threads simultaneously try to modify the contents of the array `commit_group_sidnos` which is used to track the lock status of sidnos. This concurrent access to `update_commit_group()` can cause a lock-leak resulting in one thread acquiring the sidno lock and not releasing at all. ----------------------------------------------------------------------------------------------------------- Client thread Applier Thread 1 ----------------------------------------------------------------------------------------------------------- update_commit_group() => global_sid_lock->rdlock(); update_commit_group() => global_sid_lock->rdlock(); calls update_gtids_impl_lock_sidnos() calls update_gtids_impl_lock_sidnos() set commit_group_sidno[2] = true set commit_group_sidno[2] = true lock_sidno(2) -> successful lock_sidno(2) -> waits update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()` if (commit_group_sidnos[2]) { unlock_sidno(2); commit_group_sidnos[2] = false; } Applier thread continues.. lock_sidno(2) -> successful update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()` if (commit_group_sidnos[2]) { <=== this check fails and lock is not released. unlock_sidno(2); commit_group_sidnos[2] = false; } Client thread continues without releasing the lock ----------------------------------------------------------------------------------------------------------- 12. As the above lock-leak can also happen the other way i.e, the applier thread fails to unlock, there can be different consequences hereafter. 13. If the client thread continues without releasing the lock, then at a later stage, it can enter into a deadlock with the applier thread performing a GTID update with stack trace. Client_thread ------------- #1 __GI___lll_lock_wait #2 ___pthread_mutex_lock #3 native_mutex_lock <= waits for commit lock while holding sidno lock #4 Commit_stage_manager::enroll_for #5 MYSQL_BIN_LOG::change_stage #6 MYSQL_BIN_LOG::ordered_commit #7 MYSQL_BIN_LOG::commit #8 ha_commit_trans #9 trans_commit_implicit #10 mysql_create_like_table #11 Sql_cmd_create_table::execute #12 mysql_execute_command percona#13 dispatch_sql_command Applier thread -------------- #1 ___pthread_mutex_lock #2 native_mutex_lock #3 safe_mutex_lock #4 Gtid_state::update_gtids_impl_lock_sidnos <= waits for sidno lock #5 Gtid_state::update_commit_group #6 Commit_order_manager::flush_engine_and_signal_threads <= acquires commit lock here #7 Commit_order_manager::finish #8 Commit_order_manager::wait_and_finish #9 ha_commit_low #10 trx_coordinator::commit_in_engines #11 MYSQL_BIN_LOG::commit #12 ha_commit_trans percona#13 trans_commit percona#14 Xid_log_event::do_commit percona#15 Xid_apply_log_event::do_apply_event_worker percona#16 Slave_worker::slave_worker_exec_event percona#17 slave_worker_exec_job_group percona#18 handle_slave_worker 14. If the applier thread continues without releasing the lock, then at a later stage, it can perform recursive locking while setting the GTID for the next transaction (in set_gtid_next()). In debug builds the above case hits the assertion `safe_mutex_assert_not_owner()` meaning the lock is already acquired by the replica applier thread when it tries to re-acquire the lock. Solution -------- In the above problematic example, when seen from each thread individually, we can conclude that there is no problem in the order of lock acquisition, thus there is no need to change the lock order. However, the root cause for this problem is that multiple threads can concurrently access to the array `Gtid_state::commit_group_sidnos`. In its initial implementation, it was expected that threads should hold the `MYSQL_BIN_LOG::LOCK_commit` before modifying its contents. But it was not considered when upstream implemented WL#7846 (MTS: slave-preserve-commit-order when log-slave-updates/binlog is disabled). With this patch, we now ensure that `MYSQL_BIN_LOG::LOCK_commit` is acquired when the client thread (binlog flush leader) when it tries to perform GTID update on behalf of threads waiting in "Commit Order" queue, thus providing a guarantee that `Gtid_state::commit_group_sidnos` array is never accessed without the protection of `MYSQL_BIN_LOG::LOCK_commit`.
inikep
pushed a commit
to inikep/percona-server
that referenced
this pull request
Jan 23, 2025
…s=0 and a local DDL executed https://perconadev.atlassian.net/browse/PS-9018 Problem ------- In high concurrency scenarios, MySQL replica can enter into a deadlock due to a race condition between the replica applier thread and the client thread performing a binlog group commit. Analysis -------- It needs at least 3 threads for this deadlock to happen 1. One client thread 2. Two replica applier threads How this deadlock happens? -------------------------- 0. Binlog is enabled on replica, but log_replica_updates is disabled. 1. Initially, both "Commit Order" and "Binlog Flush" queues are empty. 2. Replica applier thread 1 enters the group commit pipeline to register in the "Commit Order" queue since `log-replica-updates` is disabled on the replica node. 3. Since both "Commit Order" and "Binlog Flush" queues are empty, the applier thread 1 3.1. Becomes leader (In Commit_stage_manager::enroll_for()). 3.2. Registers in the commit order queue. 3.3. Acquires the lock MYSQL_BIN_LOG::LOCK_log. 3.4. Commit Order queue is emptied, but the lock MYSQL_BIN_LOG::LOCK_log is not yet released. NOTE: SE commit for applier thread is already done by the time it reaches here. 4. Replica applier thread 2 enters the group commit pipeline to register in the "Commit Order" queue since `log-replica-updates` is disabled on the replica node. 5. Since the "Commit Order" queue is empty (emptied by applier thread 1 in 3.4), the applier thread 2 5.1. Becomes leader (In Commit_stage_manager::enroll_for()) 5.2. Registers in the commit order queue. 5.3. Tries to acquire the lock MYSQL_BIN_LOG::LOCK_log. Since it is held by applier thread 1 it will wait until the lock is released. 6. Client thread enters the group commit pipeline to register in the "Binlog Flush" queue. 7. Since "Commit Order" queue is not empty (there is applier thread 2 in the queue), it enters the conditional wait `m_stage_cond_leader` with an intention to become the leader for both the "Binlog Flush" and "Commit Order" queues. 8. Applier thread 1 releases the lock MYSQL_BIN_LOG::LOCK_log and proceeds to update the GTID by calling gtid_state->update_commit_group() from Commit_order_manager::flush_engine_and_signal_threads(). 9. Applier thread 2 acquires the lock MYSQL_BIN_LOG::LOCK_log. 9.1. It checks if there is any thread waiting in the "Binlog Flush" queue to become the leader. Here it finds the client thread waiting to be the leader. 9.2. It releases the lock MYSQL_BIN_LOG::LOCK_log and signals on the cond_var `m_stage_cond_leader` and enters a conditional wait until the thread's `tx_commit_pending` is set to false by the client thread (will be done in the Commit_stage_manager::process_final_stage_for_ordered_commit_group() called by client thread from fetch_and_process_flush_stage_queue()). 10. The client thread wakes up from the cond_var `m_stage_cond_leader`. The thread has now become a leader and it is its responsibility to update GTID of applier thread 2. 10.1. It acquires the lock MYSQL_BIN_LOG::LOCK_log. 10.2. Returns from `enroll_for()` and proceeds to process the "Commit Order" and "Binlog Flush" queues. 10.3. Fetches the "Commit Order" and "Binlog Flush" queues. 10.4. Performs the storage engine flush by calling ha_flush_logs() from fetch_and_process_flush_stage_queue(). 10.5. Proceeds to update the GTID of threads in "Commit Order" queue by calling gtid_state->update_commit_group() from Commit_stage_manager::process_final_stage_for_ordered_commit_group(). 11. At this point, we will have - Client thread performing GTID update on behalf if applier thread 2 (from step 10.5), and - Applier thread 1 performing GTID update for itself (from step 8). Due to the lack of proper synchronization between the above two threads, there exists a time window where both threads can call gtid_state->update_commit_group() concurrently. In subsequent steps, both threads simultaneously try to modify the contents of the array `commit_group_sidnos` which is used to track the lock status of sidnos. This concurrent access to `update_commit_group()` can cause a lock-leak resulting in one thread acquiring the sidno lock and not releasing at all. ----------------------------------------------------------------------------------------------------------- Client thread Applier Thread 1 ----------------------------------------------------------------------------------------------------------- update_commit_group() => global_sid_lock->rdlock(); update_commit_group() => global_sid_lock->rdlock(); calls update_gtids_impl_lock_sidnos() calls update_gtids_impl_lock_sidnos() set commit_group_sidno[2] = true set commit_group_sidno[2] = true lock_sidno(2) -> successful lock_sidno(2) -> waits update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()` if (commit_group_sidnos[2]) { unlock_sidno(2); commit_group_sidnos[2] = false; } Applier thread continues.. lock_sidno(2) -> successful update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()` if (commit_group_sidnos[2]) { <=== this check fails and lock is not released. unlock_sidno(2); commit_group_sidnos[2] = false; } Client thread continues without releasing the lock ----------------------------------------------------------------------------------------------------------- 12. As the above lock-leak can also happen the other way i.e, the applier thread fails to unlock, there can be different consequences hereafter. 13. If the client thread continues without releasing the lock, then at a later stage, it can enter into a deadlock with the applier thread performing a GTID update with stack trace. Client_thread ------------- #1 __GI___lll_lock_wait #2 ___pthread_mutex_lock #3 native_mutex_lock <= waits for commit lock while holding sidno lock #4 Commit_stage_manager::enroll_for #5 MYSQL_BIN_LOG::change_stage #6 MYSQL_BIN_LOG::ordered_commit #7 MYSQL_BIN_LOG::commit #8 ha_commit_trans #9 trans_commit_implicit #10 mysql_create_like_table #11 Sql_cmd_create_table::execute #12 mysql_execute_command percona#13 dispatch_sql_command Applier thread -------------- #1 ___pthread_mutex_lock #2 native_mutex_lock #3 safe_mutex_lock #4 Gtid_state::update_gtids_impl_lock_sidnos <= waits for sidno lock #5 Gtid_state::update_commit_group #6 Commit_order_manager::flush_engine_and_signal_threads <= acquires commit lock here #7 Commit_order_manager::finish #8 Commit_order_manager::wait_and_finish #9 ha_commit_low #10 trx_coordinator::commit_in_engines #11 MYSQL_BIN_LOG::commit #12 ha_commit_trans percona#13 trans_commit percona#14 Xid_log_event::do_commit percona#15 Xid_apply_log_event::do_apply_event_worker percona#16 Slave_worker::slave_worker_exec_event percona#17 slave_worker_exec_job_group percona#18 handle_slave_worker 14. If the applier thread continues without releasing the lock, then at a later stage, it can perform recursive locking while setting the GTID for the next transaction (in set_gtid_next()). In debug builds the above case hits the assertion `safe_mutex_assert_not_owner()` meaning the lock is already acquired by the replica applier thread when it tries to re-acquire the lock. Solution -------- In the above problematic example, when seen from each thread individually, we can conclude that there is no problem in the order of lock acquisition, thus there is no need to change the lock order. However, the root cause for this problem is that multiple threads can concurrently access to the array `Gtid_state::commit_group_sidnos`. In its initial implementation, it was expected that threads should hold the `MYSQL_BIN_LOG::LOCK_commit` before modifying its contents. But it was not considered when upstream implemented WL#7846 (MTS: slave-preserve-commit-order when log-slave-updates/binlog is disabled). With this patch, we now ensure that `MYSQL_BIN_LOG::LOCK_commit` is acquired when the client thread (binlog flush leader) when it tries to perform GTID update on behalf of threads waiting in "Commit Order" queue, thus providing a guarantee that `Gtid_state::commit_group_sidnos` array is never accessed without the protection of `MYSQL_BIN_LOG::LOCK_commit`.
inikep
pushed a commit
to inikep/percona-server
that referenced
this pull request
Jan 23, 2025
…ocal DDL executed https://perconadev.atlassian.net/browse/PS-9018 Problem ------- In high concurrency scenarios, MySQL replica can enter into a deadlock due to a race condition between the replica applier thread and the client thread performing a binlog group commit. Analysis -------- It needs at least 3 threads for this deadlock to happen 1. One client thread 2. Two replica applier threads How this deadlock happens? -------------------------- 0. Binlog is enabled on replica, but log_replica_updates is disabled. 1. Initially, both "Commit Order" and "Binlog Flush" queues are empty. 2. Replica applier thread 1 enters the group commit pipeline to register in the "Commit Order" queue since `log-replica-updates` is disabled on the replica node. 3. Since both "Commit Order" and "Binlog Flush" queues are empty, the applier thread 1 3.1. Becomes leader (In Commit_stage_manager::enroll_for()). 3.2. Registers in the commit order queue. 3.3. Acquires the lock MYSQL_BIN_LOG::LOCK_log. 3.4. Commit Order queue is emptied, but the lock MYSQL_BIN_LOG::LOCK_log is not yet released. NOTE: SE commit for applier thread is already done by the time it reaches here. 4. Replica applier thread 2 enters the group commit pipeline to register in the "Commit Order" queue since `log-replica-updates` is disabled on the replica node. 5. Since the "Commit Order" queue is empty (emptied by applier thread 1 in 3.4), the applier thread 2 5.1. Becomes leader (In Commit_stage_manager::enroll_for()) 5.2. Registers in the commit order queue. 5.3. Tries to acquire the lock MYSQL_BIN_LOG::LOCK_log. Since it is held by applier thread 1 it will wait until the lock is released. 6. Client thread enters the group commit pipeline to register in the "Binlog Flush" queue. 7. Since "Commit Order" queue is not empty (there is applier thread 2 in the queue), it enters the conditional wait `m_stage_cond_leader` with an intention to become the leader for both the "Binlog Flush" and "Commit Order" queues. 8. Applier thread 1 releases the lock MYSQL_BIN_LOG::LOCK_log and proceeds to update the GTID by calling gtid_state->update_commit_group() from Commit_order_manager::flush_engine_and_signal_threads(). 9. Applier thread 2 acquires the lock MYSQL_BIN_LOG::LOCK_log. 9.1. It checks if there is any thread waiting in the "Binlog Flush" queue to become the leader. Here it finds the client thread waiting to be the leader. 9.2. It releases the lock MYSQL_BIN_LOG::LOCK_log and signals on the cond_var `m_stage_cond_leader` and enters a conditional wait until the thread's `tx_commit_pending` is set to false by the client thread (will be done in the Commit_stage_manager::process_final_stage_for_ordered_commit_group() called by client thread from fetch_and_process_flush_stage_queue()). 10. The client thread wakes up from the cond_var `m_stage_cond_leader`. The thread has now become a leader and it is its responsibility to update GTID of applier thread 2. 10.1. It acquires the lock MYSQL_BIN_LOG::LOCK_log. 10.2. Returns from `enroll_for()` and proceeds to process the "Commit Order" and "Binlog Flush" queues. 10.3. Fetches the "Commit Order" and "Binlog Flush" queues. 10.4. Performs the storage engine flush by calling ha_flush_logs() from fetch_and_process_flush_stage_queue(). 10.5. Proceeds to update the GTID of threads in "Commit Order" queue by calling gtid_state->update_commit_group() from Commit_stage_manager::process_final_stage_for_ordered_commit_group(). 11. At this point, we will have - Client thread performing GTID update on behalf if applier thread 2 (from step 10.5), and - Applier thread 1 performing GTID update for itself (from step 8). Due to the lack of proper synchronization between the above two threads, there exists a time window where both threads can call gtid_state->update_commit_group() concurrently. In subsequent steps, both threads simultaneously try to modify the contents of the array `commit_group_sidnos` which is used to track the lock status of sidnos. This concurrent access to `update_commit_group()` can cause a lock-leak resulting in one thread acquiring the sidno lock and not releasing at all. ----------------------------------------------------------------------------------------------------------- Client thread Applier Thread 1 ----------------------------------------------------------------------------------------------------------- update_commit_group() => global_sid_lock->rdlock(); update_commit_group() => global_sid_lock->rdlock(); calls update_gtids_impl_lock_sidnos() calls update_gtids_impl_lock_sidnos() set commit_group_sidno[2] = true set commit_group_sidno[2] = true lock_sidno(2) -> successful lock_sidno(2) -> waits update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()` if (commit_group_sidnos[2]) { unlock_sidno(2); commit_group_sidnos[2] = false; } Applier thread continues.. lock_sidno(2) -> successful update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()` if (commit_group_sidnos[2]) { <=== this check fails and lock is not released. unlock_sidno(2); commit_group_sidnos[2] = false; } Client thread continues without releasing the lock ----------------------------------------------------------------------------------------------------------- 12. As the above lock-leak can also happen the other way i.e, the applier thread fails to unlock, there can be different consequences hereafter. 13. If the client thread continues without releasing the lock, then at a later stage, it can enter into a deadlock with the applier thread performing a GTID update with stack trace. Client_thread ------------- #1 __GI___lll_lock_wait #2 ___pthread_mutex_lock #3 native_mutex_lock <= waits for commit lock while holding sidno lock #4 Commit_stage_manager::enroll_for #5 MYSQL_BIN_LOG::change_stage #6 MYSQL_BIN_LOG::ordered_commit #7 MYSQL_BIN_LOG::commit #8 ha_commit_trans #9 trans_commit_implicit #10 mysql_create_like_table #11 Sql_cmd_create_table::execute #12 mysql_execute_command percona#13 dispatch_sql_command Applier thread -------------- #1 ___pthread_mutex_lock #2 native_mutex_lock #3 safe_mutex_lock #4 Gtid_state::update_gtids_impl_lock_sidnos <= waits for sidno lock #5 Gtid_state::update_commit_group #6 Commit_order_manager::flush_engine_and_signal_threads <= acquires commit lock here #7 Commit_order_manager::finish #8 Commit_order_manager::wait_and_finish #9 ha_commit_low #10 trx_coordinator::commit_in_engines #11 MYSQL_BIN_LOG::commit #12 ha_commit_trans percona#13 trans_commit percona#14 Xid_log_event::do_commit percona#15 Xid_apply_log_event::do_apply_event_worker percona#16 Slave_worker::slave_worker_exec_event percona#17 slave_worker_exec_job_group percona#18 handle_slave_worker 14. If the applier thread continues without releasing the lock, then at a later stage, it can perform recursive locking while setting the GTID for the next transaction (in set_gtid_next()). In debug builds the above case hits the assertion `safe_mutex_assert_not_owner()` meaning the lock is already acquired by the replica applier thread when it tries to re-acquire the lock. Solution -------- In the above problematic example, when seen from each thread individually, we can conclude that there is no problem in the order of lock acquisition, thus there is no need to change the lock order. However, the root cause for this problem is that multiple threads can concurrently access to the array `Gtid_state::commit_group_sidnos`. In its initial implementation, it was expected that threads should hold the `MYSQL_BIN_LOG::LOCK_commit` before modifying its contents. But it was not considered when upstream implemented WL#7846 (MTS: slave-preserve-commit-order when log-slave-updates/binlog is disabled). With this patch, we now ensure that `MYSQL_BIN_LOG::LOCK_commit` is acquired when the client thread (binlog flush leader) when it tries to perform GTID update on behalf of threads waiting in "Commit Order" queue, thus providing a guarantee that `Gtid_state::commit_group_sidnos` array is never accessed without the protection of `MYSQL_BIN_LOG::LOCK_commit`.
inikep
pushed a commit
to inikep/percona-server
that referenced
this pull request
Jan 27, 2025
…ocal DDL executed https://perconadev.atlassian.net/browse/PS-9018 Problem ------- In high concurrency scenarios, MySQL replica can enter into a deadlock due to a race condition between the replica applier thread and the client thread performing a binlog group commit. Analysis -------- It needs at least 3 threads for this deadlock to happen 1. One client thread 2. Two replica applier threads How this deadlock happens? -------------------------- 0. Binlog is enabled on replica, but log_replica_updates is disabled. 1. Initially, both "Commit Order" and "Binlog Flush" queues are empty. 2. Replica applier thread 1 enters the group commit pipeline to register in the "Commit Order" queue since `log-replica-updates` is disabled on the replica node. 3. Since both "Commit Order" and "Binlog Flush" queues are empty, the applier thread 1 3.1. Becomes leader (In Commit_stage_manager::enroll_for()). 3.2. Registers in the commit order queue. 3.3. Acquires the lock MYSQL_BIN_LOG::LOCK_log. 3.4. Commit Order queue is emptied, but the lock MYSQL_BIN_LOG::LOCK_log is not yet released. NOTE: SE commit for applier thread is already done by the time it reaches here. 4. Replica applier thread 2 enters the group commit pipeline to register in the "Commit Order" queue since `log-replica-updates` is disabled on the replica node. 5. Since the "Commit Order" queue is empty (emptied by applier thread 1 in 3.4), the applier thread 2 5.1. Becomes leader (In Commit_stage_manager::enroll_for()) 5.2. Registers in the commit order queue. 5.3. Tries to acquire the lock MYSQL_BIN_LOG::LOCK_log. Since it is held by applier thread 1 it will wait until the lock is released. 6. Client thread enters the group commit pipeline to register in the "Binlog Flush" queue. 7. Since "Commit Order" queue is not empty (there is applier thread 2 in the queue), it enters the conditional wait `m_stage_cond_leader` with an intention to become the leader for both the "Binlog Flush" and "Commit Order" queues. 8. Applier thread 1 releases the lock MYSQL_BIN_LOG::LOCK_log and proceeds to update the GTID by calling gtid_state->update_commit_group() from Commit_order_manager::flush_engine_and_signal_threads(). 9. Applier thread 2 acquires the lock MYSQL_BIN_LOG::LOCK_log. 9.1. It checks if there is any thread waiting in the "Binlog Flush" queue to become the leader. Here it finds the client thread waiting to be the leader. 9.2. It releases the lock MYSQL_BIN_LOG::LOCK_log and signals on the cond_var `m_stage_cond_leader` and enters a conditional wait until the thread's `tx_commit_pending` is set to false by the client thread (will be done in the Commit_stage_manager::process_final_stage_for_ordered_commit_group() called by client thread from fetch_and_process_flush_stage_queue()). 10. The client thread wakes up from the cond_var `m_stage_cond_leader`. The thread has now become a leader and it is its responsibility to update GTID of applier thread 2. 10.1. It acquires the lock MYSQL_BIN_LOG::LOCK_log. 10.2. Returns from `enroll_for()` and proceeds to process the "Commit Order" and "Binlog Flush" queues. 10.3. Fetches the "Commit Order" and "Binlog Flush" queues. 10.4. Performs the storage engine flush by calling ha_flush_logs() from fetch_and_process_flush_stage_queue(). 10.5. Proceeds to update the GTID of threads in "Commit Order" queue by calling gtid_state->update_commit_group() from Commit_stage_manager::process_final_stage_for_ordered_commit_group(). 11. At this point, we will have - Client thread performing GTID update on behalf if applier thread 2 (from step 10.5), and - Applier thread 1 performing GTID update for itself (from step 8). Due to the lack of proper synchronization between the above two threads, there exists a time window where both threads can call gtid_state->update_commit_group() concurrently. In subsequent steps, both threads simultaneously try to modify the contents of the array `commit_group_sidnos` which is used to track the lock status of sidnos. This concurrent access to `update_commit_group()` can cause a lock-leak resulting in one thread acquiring the sidno lock and not releasing at all. ----------------------------------------------------------------------------------------------------------- Client thread Applier Thread 1 ----------------------------------------------------------------------------------------------------------- update_commit_group() => global_sid_lock->rdlock(); update_commit_group() => global_sid_lock->rdlock(); calls update_gtids_impl_lock_sidnos() calls update_gtids_impl_lock_sidnos() set commit_group_sidno[2] = true set commit_group_sidno[2] = true lock_sidno(2) -> successful lock_sidno(2) -> waits update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()` if (commit_group_sidnos[2]) { unlock_sidno(2); commit_group_sidnos[2] = false; } Applier thread continues.. lock_sidno(2) -> successful update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()` if (commit_group_sidnos[2]) { <=== this check fails and lock is not released. unlock_sidno(2); commit_group_sidnos[2] = false; } Client thread continues without releasing the lock ----------------------------------------------------------------------------------------------------------- 12. As the above lock-leak can also happen the other way i.e, the applier thread fails to unlock, there can be different consequences hereafter. 13. If the client thread continues without releasing the lock, then at a later stage, it can enter into a deadlock with the applier thread performing a GTID update with stack trace. Client_thread ------------- #1 __GI___lll_lock_wait #2 ___pthread_mutex_lock #3 native_mutex_lock <= waits for commit lock while holding sidno lock #4 Commit_stage_manager::enroll_for #5 MYSQL_BIN_LOG::change_stage #6 MYSQL_BIN_LOG::ordered_commit #7 MYSQL_BIN_LOG::commit #8 ha_commit_trans #9 trans_commit_implicit #10 mysql_create_like_table #11 Sql_cmd_create_table::execute #12 mysql_execute_command percona#13 dispatch_sql_command Applier thread -------------- #1 ___pthread_mutex_lock #2 native_mutex_lock #3 safe_mutex_lock #4 Gtid_state::update_gtids_impl_lock_sidnos <= waits for sidno lock #5 Gtid_state::update_commit_group #6 Commit_order_manager::flush_engine_and_signal_threads <= acquires commit lock here #7 Commit_order_manager::finish #8 Commit_order_manager::wait_and_finish #9 ha_commit_low #10 trx_coordinator::commit_in_engines #11 MYSQL_BIN_LOG::commit #12 ha_commit_trans percona#13 trans_commit percona#14 Xid_log_event::do_commit percona#15 Xid_apply_log_event::do_apply_event_worker percona#16 Slave_worker::slave_worker_exec_event percona#17 slave_worker_exec_job_group percona#18 handle_slave_worker 14. If the applier thread continues without releasing the lock, then at a later stage, it can perform recursive locking while setting the GTID for the next transaction (in set_gtid_next()). In debug builds the above case hits the assertion `safe_mutex_assert_not_owner()` meaning the lock is already acquired by the replica applier thread when it tries to re-acquire the lock. Solution -------- In the above problematic example, when seen from each thread individually, we can conclude that there is no problem in the order of lock acquisition, thus there is no need to change the lock order. However, the root cause for this problem is that multiple threads can concurrently access to the array `Gtid_state::commit_group_sidnos`. In its initial implementation, it was expected that threads should hold the `MYSQL_BIN_LOG::LOCK_commit` before modifying its contents. But it was not considered when upstream implemented WL#7846 (MTS: slave-preserve-commit-order when log-slave-updates/binlog is disabled). With this patch, we now ensure that `MYSQL_BIN_LOG::LOCK_commit` is acquired when the client thread (binlog flush leader) when it tries to perform GTID update on behalf of threads waiting in "Commit Order" queue, thus providing a guarantee that `Gtid_state::commit_group_sidnos` array is never accessed without the protection of `MYSQL_BIN_LOG::LOCK_commit`.
lukin-oleksiy
pushed a commit
to lukin-oleksiy/percona-server
that referenced
this pull request
Jan 27, 2025
…s=0 and a local DDL executed https://perconadev.atlassian.net/browse/PS-9018 Problem ------- In high concurrency scenarios, MySQL replica can enter into a deadlock due to a race condition between the replica applier thread and the client thread performing a binlog group commit. Analysis -------- It needs at least 3 threads for this deadlock to happen 1. One client thread 2. Two replica applier threads How this deadlock happens? -------------------------- 0. Binlog is enabled on replica, but log_replica_updates is disabled. 1. Initially, both "Commit Order" and "Binlog Flush" queues are empty. 2. Replica applier thread 1 enters the group commit pipeline to register in the "Commit Order" queue since `log-replica-updates` is disabled on the replica node. 3. Since both "Commit Order" and "Binlog Flush" queues are empty, the applier thread 1 3.1. Becomes leader (In Commit_stage_manager::enroll_for()). 3.2. Registers in the commit order queue. 3.3. Acquires the lock MYSQL_BIN_LOG::LOCK_log. 3.4. Commit Order queue is emptied, but the lock MYSQL_BIN_LOG::LOCK_log is not yet released. NOTE: SE commit for applier thread is already done by the time it reaches here. 4. Replica applier thread 2 enters the group commit pipeline to register in the "Commit Order" queue since `log-replica-updates` is disabled on the replica node. 5. Since the "Commit Order" queue is empty (emptied by applier thread 1 in 3.4), the applier thread 2 5.1. Becomes leader (In Commit_stage_manager::enroll_for()) 5.2. Registers in the commit order queue. 5.3. Tries to acquire the lock MYSQL_BIN_LOG::LOCK_log. Since it is held by applier thread 1 it will wait until the lock is released. 6. Client thread enters the group commit pipeline to register in the "Binlog Flush" queue. 7. Since "Commit Order" queue is not empty (there is applier thread 2 in the queue), it enters the conditional wait `m_stage_cond_leader` with an intention to become the leader for both the "Binlog Flush" and "Commit Order" queues. 8. Applier thread 1 releases the lock MYSQL_BIN_LOG::LOCK_log and proceeds to update the GTID by calling gtid_state->update_commit_group() from Commit_order_manager::flush_engine_and_signal_threads(). 9. Applier thread 2 acquires the lock MYSQL_BIN_LOG::LOCK_log. 9.1. It checks if there is any thread waiting in the "Binlog Flush" queue to become the leader. Here it finds the client thread waiting to be the leader. 9.2. It releases the lock MYSQL_BIN_LOG::LOCK_log and signals on the cond_var `m_stage_cond_leader` and enters a conditional wait until the thread's `tx_commit_pending` is set to false by the client thread (will be done in the Commit_stage_manager::process_final_stage_for_ordered_commit_group() called by client thread from fetch_and_process_flush_stage_queue()). 10. The client thread wakes up from the cond_var `m_stage_cond_leader`. The thread has now become a leader and it is its responsibility to update GTID of applier thread 2. 10.1. It acquires the lock MYSQL_BIN_LOG::LOCK_log. 10.2. Returns from `enroll_for()` and proceeds to process the "Commit Order" and "Binlog Flush" queues. 10.3. Fetches the "Commit Order" and "Binlog Flush" queues. 10.4. Performs the storage engine flush by calling ha_flush_logs() from fetch_and_process_flush_stage_queue(). 10.5. Proceeds to update the GTID of threads in "Commit Order" queue by calling gtid_state->update_commit_group() from Commit_stage_manager::process_final_stage_for_ordered_commit_group(). 11. At this point, we will have - Client thread performing GTID update on behalf if applier thread 2 (from step 10.5), and - Applier thread 1 performing GTID update for itself (from step 8). Due to the lack of proper synchronization between the above two threads, there exists a time window where both threads can call gtid_state->update_commit_group() concurrently. In subsequent steps, both threads simultaneously try to modify the contents of the array `commit_group_sidnos` which is used to track the lock status of sidnos. This concurrent access to `update_commit_group()` can cause a lock-leak resulting in one thread acquiring the sidno lock and not releasing at all. ----------------------------------------------------------------------------------------------------------- Client thread Applier Thread 1 ----------------------------------------------------------------------------------------------------------- update_commit_group() => global_sid_lock->rdlock(); update_commit_group() => global_sid_lock->rdlock(); calls update_gtids_impl_lock_sidnos() calls update_gtids_impl_lock_sidnos() set commit_group_sidno[2] = true set commit_group_sidno[2] = true lock_sidno(2) -> successful lock_sidno(2) -> waits update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()` if (commit_group_sidnos[2]) { unlock_sidno(2); commit_group_sidnos[2] = false; } Applier thread continues.. lock_sidno(2) -> successful update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()` if (commit_group_sidnos[2]) { <=== this check fails and lock is not released. unlock_sidno(2); commit_group_sidnos[2] = false; } Client thread continues without releasing the lock ----------------------------------------------------------------------------------------------------------- 12. As the above lock-leak can also happen the other way i.e, the applier thread fails to unlock, there can be different consequences hereafter. 13. If the client thread continues without releasing the lock, then at a later stage, it can enter into a deadlock with the applier thread performing a GTID update with stack trace. Client_thread ------------- #1 __GI___lll_lock_wait #2 ___pthread_mutex_lock #3 native_mutex_lock <= waits for commit lock while holding sidno lock #4 Commit_stage_manager::enroll_for #5 MYSQL_BIN_LOG::change_stage percona#6 MYSQL_BIN_LOG::ordered_commit percona#7 MYSQL_BIN_LOG::commit percona#8 ha_commit_trans percona#9 trans_commit_implicit percona#10 mysql_create_like_table percona#11 Sql_cmd_create_table::execute percona#12 mysql_execute_command percona#13 dispatch_sql_command Applier thread -------------- #1 ___pthread_mutex_lock #2 native_mutex_lock #3 safe_mutex_lock #4 Gtid_state::update_gtids_impl_lock_sidnos <= waits for sidno lock #5 Gtid_state::update_commit_group percona#6 Commit_order_manager::flush_engine_and_signal_threads <= acquires commit lock here percona#7 Commit_order_manager::finish percona#8 Commit_order_manager::wait_and_finish percona#9 ha_commit_low percona#10 trx_coordinator::commit_in_engines percona#11 MYSQL_BIN_LOG::commit percona#12 ha_commit_trans percona#13 trans_commit percona#14 Xid_log_event::do_commit percona#15 Xid_apply_log_event::do_apply_event_worker percona#16 Slave_worker::slave_worker_exec_event percona#17 slave_worker_exec_job_group percona#18 handle_slave_worker 14. If the applier thread continues without releasing the lock, then at a later stage, it can perform recursive locking while setting the GTID for the next transaction (in set_gtid_next()). In debug builds the above case hits the assertion `safe_mutex_assert_not_owner()` meaning the lock is already acquired by the replica applier thread when it tries to re-acquire the lock. Solution -------- In the above problematic example, when seen from each thread individually, we can conclude that there is no problem in the order of lock acquisition, thus there is no need to change the lock order. However, the root cause for this problem is that multiple threads can concurrently access to the array `Gtid_state::commit_group_sidnos`. In its initial implementation, it was expected that threads should hold the `MYSQL_BIN_LOG::LOCK_commit` before modifying its contents. But it was not considered when upstream implemented WL#7846 (MTS: slave-preserve-commit-order when log-slave-updates/binlog is disabled). With this patch, we now ensure that `MYSQL_BIN_LOG::LOCK_commit` is acquired when the client thread (binlog flush leader) when it tries to perform GTID update on behalf of threads waiting in "Commit Order" queue, thus providing a guarantee that `Gtid_state::commit_group_sidnos` array is never accessed without the protection of `MYSQL_BIN_LOG::LOCK_commit`.
inikep
pushed a commit
to inikep/percona-server
that referenced
this pull request
Jan 28, 2025
…ocal DDL executed https://perconadev.atlassian.net/browse/PS-9018 Problem ------- In high concurrency scenarios, MySQL replica can enter into a deadlock due to a race condition between the replica applier thread and the client thread performing a binlog group commit. Analysis -------- It needs at least 3 threads for this deadlock to happen 1. One client thread 2. Two replica applier threads How this deadlock happens? -------------------------- 0. Binlog is enabled on replica, but log_replica_updates is disabled. 1. Initially, both "Commit Order" and "Binlog Flush" queues are empty. 2. Replica applier thread 1 enters the group commit pipeline to register in the "Commit Order" queue since `log-replica-updates` is disabled on the replica node. 3. Since both "Commit Order" and "Binlog Flush" queues are empty, the applier thread 1 3.1. Becomes leader (In Commit_stage_manager::enroll_for()). 3.2. Registers in the commit order queue. 3.3. Acquires the lock MYSQL_BIN_LOG::LOCK_log. 3.4. Commit Order queue is emptied, but the lock MYSQL_BIN_LOG::LOCK_log is not yet released. NOTE: SE commit for applier thread is already done by the time it reaches here. 4. Replica applier thread 2 enters the group commit pipeline to register in the "Commit Order" queue since `log-replica-updates` is disabled on the replica node. 5. Since the "Commit Order" queue is empty (emptied by applier thread 1 in 3.4), the applier thread 2 5.1. Becomes leader (In Commit_stage_manager::enroll_for()) 5.2. Registers in the commit order queue. 5.3. Tries to acquire the lock MYSQL_BIN_LOG::LOCK_log. Since it is held by applier thread 1 it will wait until the lock is released. 6. Client thread enters the group commit pipeline to register in the "Binlog Flush" queue. 7. Since "Commit Order" queue is not empty (there is applier thread 2 in the queue), it enters the conditional wait `m_stage_cond_leader` with an intention to become the leader for both the "Binlog Flush" and "Commit Order" queues. 8. Applier thread 1 releases the lock MYSQL_BIN_LOG::LOCK_log and proceeds to update the GTID by calling gtid_state->update_commit_group() from Commit_order_manager::flush_engine_and_signal_threads(). 9. Applier thread 2 acquires the lock MYSQL_BIN_LOG::LOCK_log. 9.1. It checks if there is any thread waiting in the "Binlog Flush" queue to become the leader. Here it finds the client thread waiting to be the leader. 9.2. It releases the lock MYSQL_BIN_LOG::LOCK_log and signals on the cond_var `m_stage_cond_leader` and enters a conditional wait until the thread's `tx_commit_pending` is set to false by the client thread (will be done in the Commit_stage_manager::process_final_stage_for_ordered_commit_group() called by client thread from fetch_and_process_flush_stage_queue()). 10. The client thread wakes up from the cond_var `m_stage_cond_leader`. The thread has now become a leader and it is its responsibility to update GTID of applier thread 2. 10.1. It acquires the lock MYSQL_BIN_LOG::LOCK_log. 10.2. Returns from `enroll_for()` and proceeds to process the "Commit Order" and "Binlog Flush" queues. 10.3. Fetches the "Commit Order" and "Binlog Flush" queues. 10.4. Performs the storage engine flush by calling ha_flush_logs() from fetch_and_process_flush_stage_queue(). 10.5. Proceeds to update the GTID of threads in "Commit Order" queue by calling gtid_state->update_commit_group() from Commit_stage_manager::process_final_stage_for_ordered_commit_group(). 11. At this point, we will have - Client thread performing GTID update on behalf if applier thread 2 (from step 10.5), and - Applier thread 1 performing GTID update for itself (from step 8). Due to the lack of proper synchronization between the above two threads, there exists a time window where both threads can call gtid_state->update_commit_group() concurrently. In subsequent steps, both threads simultaneously try to modify the contents of the array `commit_group_sidnos` which is used to track the lock status of sidnos. This concurrent access to `update_commit_group()` can cause a lock-leak resulting in one thread acquiring the sidno lock and not releasing at all. ----------------------------------------------------------------------------------------------------------- Client thread Applier Thread 1 ----------------------------------------------------------------------------------------------------------- update_commit_group() => global_sid_lock->rdlock(); update_commit_group() => global_sid_lock->rdlock(); calls update_gtids_impl_lock_sidnos() calls update_gtids_impl_lock_sidnos() set commit_group_sidno[2] = true set commit_group_sidno[2] = true lock_sidno(2) -> successful lock_sidno(2) -> waits update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()` if (commit_group_sidnos[2]) { unlock_sidno(2); commit_group_sidnos[2] = false; } Applier thread continues.. lock_sidno(2) -> successful update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()` if (commit_group_sidnos[2]) { <=== this check fails and lock is not released. unlock_sidno(2); commit_group_sidnos[2] = false; } Client thread continues without releasing the lock ----------------------------------------------------------------------------------------------------------- 12. As the above lock-leak can also happen the other way i.e, the applier thread fails to unlock, there can be different consequences hereafter. 13. If the client thread continues without releasing the lock, then at a later stage, it can enter into a deadlock with the applier thread performing a GTID update with stack trace. Client_thread ------------- #1 __GI___lll_lock_wait #2 ___pthread_mutex_lock #3 native_mutex_lock <= waits for commit lock while holding sidno lock #4 Commit_stage_manager::enroll_for #5 MYSQL_BIN_LOG::change_stage #6 MYSQL_BIN_LOG::ordered_commit #7 MYSQL_BIN_LOG::commit #8 ha_commit_trans #9 trans_commit_implicit #10 mysql_create_like_table #11 Sql_cmd_create_table::execute #12 mysql_execute_command percona#13 dispatch_sql_command Applier thread -------------- #1 ___pthread_mutex_lock #2 native_mutex_lock #3 safe_mutex_lock #4 Gtid_state::update_gtids_impl_lock_sidnos <= waits for sidno lock #5 Gtid_state::update_commit_group #6 Commit_order_manager::flush_engine_and_signal_threads <= acquires commit lock here #7 Commit_order_manager::finish #8 Commit_order_manager::wait_and_finish #9 ha_commit_low #10 trx_coordinator::commit_in_engines #11 MYSQL_BIN_LOG::commit #12 ha_commit_trans percona#13 trans_commit percona#14 Xid_log_event::do_commit percona#15 Xid_apply_log_event::do_apply_event_worker percona#16 Slave_worker::slave_worker_exec_event percona#17 slave_worker_exec_job_group percona#18 handle_slave_worker 14. If the applier thread continues without releasing the lock, then at a later stage, it can perform recursive locking while setting the GTID for the next transaction (in set_gtid_next()). In debug builds the above case hits the assertion `safe_mutex_assert_not_owner()` meaning the lock is already acquired by the replica applier thread when it tries to re-acquire the lock. Solution -------- In the above problematic example, when seen from each thread individually, we can conclude that there is no problem in the order of lock acquisition, thus there is no need to change the lock order. However, the root cause for this problem is that multiple threads can concurrently access to the array `Gtid_state::commit_group_sidnos`. In its initial implementation, it was expected that threads should hold the `MYSQL_BIN_LOG::LOCK_commit` before modifying its contents. But it was not considered when upstream implemented WL#7846 (MTS: slave-preserve-commit-order when log-slave-updates/binlog is disabled). With this patch, we now ensure that `MYSQL_BIN_LOG::LOCK_commit` is acquired when the client thread (binlog flush leader) when it tries to perform GTID update on behalf of threads waiting in "Commit Order" queue, thus providing a guarantee that `Gtid_state::commit_group_sidnos` array is never accessed without the protection of `MYSQL_BIN_LOG::LOCK_commit`.
lukin-oleksiy
pushed a commit
to lukin-oleksiy/percona-server
that referenced
this pull request
Jan 29, 2025
…s=0 and a local DDL executed https://perconadev.atlassian.net/browse/PS-9018 Problem ------- In high concurrency scenarios, MySQL replica can enter into a deadlock due to a race condition between the replica applier thread and the client thread performing a binlog group commit. Analysis -------- It needs at least 3 threads for this deadlock to happen 1. One client thread 2. Two replica applier threads How this deadlock happens? -------------------------- 0. Binlog is enabled on replica, but log_replica_updates is disabled. 1. Initially, both "Commit Order" and "Binlog Flush" queues are empty. 2. Replica applier thread 1 enters the group commit pipeline to register in the "Commit Order" queue since `log-replica-updates` is disabled on the replica node. 3. Since both "Commit Order" and "Binlog Flush" queues are empty, the applier thread 1 3.1. Becomes leader (In Commit_stage_manager::enroll_for()). 3.2. Registers in the commit order queue. 3.3. Acquires the lock MYSQL_BIN_LOG::LOCK_log. 3.4. Commit Order queue is emptied, but the lock MYSQL_BIN_LOG::LOCK_log is not yet released. NOTE: SE commit for applier thread is already done by the time it reaches here. 4. Replica applier thread 2 enters the group commit pipeline to register in the "Commit Order" queue since `log-replica-updates` is disabled on the replica node. 5. Since the "Commit Order" queue is empty (emptied by applier thread 1 in 3.4), the applier thread 2 5.1. Becomes leader (In Commit_stage_manager::enroll_for()) 5.2. Registers in the commit order queue. 5.3. Tries to acquire the lock MYSQL_BIN_LOG::LOCK_log. Since it is held by applier thread 1 it will wait until the lock is released. 6. Client thread enters the group commit pipeline to register in the "Binlog Flush" queue. 7. Since "Commit Order" queue is not empty (there is applier thread 2 in the queue), it enters the conditional wait `m_stage_cond_leader` with an intention to become the leader for both the "Binlog Flush" and "Commit Order" queues. 8. Applier thread 1 releases the lock MYSQL_BIN_LOG::LOCK_log and proceeds to update the GTID by calling gtid_state->update_commit_group() from Commit_order_manager::flush_engine_and_signal_threads(). 9. Applier thread 2 acquires the lock MYSQL_BIN_LOG::LOCK_log. 9.1. It checks if there is any thread waiting in the "Binlog Flush" queue to become the leader. Here it finds the client thread waiting to be the leader. 9.2. It releases the lock MYSQL_BIN_LOG::LOCK_log and signals on the cond_var `m_stage_cond_leader` and enters a conditional wait until the thread's `tx_commit_pending` is set to false by the client thread (will be done in the Commit_stage_manager::process_final_stage_for_ordered_commit_group() called by client thread from fetch_and_process_flush_stage_queue()). 10. The client thread wakes up from the cond_var `m_stage_cond_leader`. The thread has now become a leader and it is its responsibility to update GTID of applier thread 2. 10.1. It acquires the lock MYSQL_BIN_LOG::LOCK_log. 10.2. Returns from `enroll_for()` and proceeds to process the "Commit Order" and "Binlog Flush" queues. 10.3. Fetches the "Commit Order" and "Binlog Flush" queues. 10.4. Performs the storage engine flush by calling ha_flush_logs() from fetch_and_process_flush_stage_queue(). 10.5. Proceeds to update the GTID of threads in "Commit Order" queue by calling gtid_state->update_commit_group() from Commit_stage_manager::process_final_stage_for_ordered_commit_group(). 11. At this point, we will have - Client thread performing GTID update on behalf if applier thread 2 (from step 10.5), and - Applier thread 1 performing GTID update for itself (from step 8). Due to the lack of proper synchronization between the above two threads, there exists a time window where both threads can call gtid_state->update_commit_group() concurrently. In subsequent steps, both threads simultaneously try to modify the contents of the array `commit_group_sidnos` which is used to track the lock status of sidnos. This concurrent access to `update_commit_group()` can cause a lock-leak resulting in one thread acquiring the sidno lock and not releasing at all. ----------------------------------------------------------------------------------------------------------- Client thread Applier Thread 1 ----------------------------------------------------------------------------------------------------------- update_commit_group() => global_sid_lock->rdlock(); update_commit_group() => global_sid_lock->rdlock(); calls update_gtids_impl_lock_sidnos() calls update_gtids_impl_lock_sidnos() set commit_group_sidno[2] = true set commit_group_sidno[2] = true lock_sidno(2) -> successful lock_sidno(2) -> waits update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()` if (commit_group_sidnos[2]) { unlock_sidno(2); commit_group_sidnos[2] = false; } Applier thread continues.. lock_sidno(2) -> successful update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()` if (commit_group_sidnos[2]) { <=== this check fails and lock is not released. unlock_sidno(2); commit_group_sidnos[2] = false; } Client thread continues without releasing the lock ----------------------------------------------------------------------------------------------------------- 12. As the above lock-leak can also happen the other way i.e, the applier thread fails to unlock, there can be different consequences hereafter. 13. If the client thread continues without releasing the lock, then at a later stage, it can enter into a deadlock with the applier thread performing a GTID update with stack trace. Client_thread ------------- #1 __GI___lll_lock_wait #2 ___pthread_mutex_lock #3 native_mutex_lock <= waits for commit lock while holding sidno lock #4 Commit_stage_manager::enroll_for #5 MYSQL_BIN_LOG::change_stage percona#6 MYSQL_BIN_LOG::ordered_commit percona#7 MYSQL_BIN_LOG::commit percona#8 ha_commit_trans percona#9 trans_commit_implicit percona#10 mysql_create_like_table percona#11 Sql_cmd_create_table::execute percona#12 mysql_execute_command percona#13 dispatch_sql_command Applier thread -------------- #1 ___pthread_mutex_lock #2 native_mutex_lock #3 safe_mutex_lock #4 Gtid_state::update_gtids_impl_lock_sidnos <= waits for sidno lock #5 Gtid_state::update_commit_group percona#6 Commit_order_manager::flush_engine_and_signal_threads <= acquires commit lock here percona#7 Commit_order_manager::finish percona#8 Commit_order_manager::wait_and_finish percona#9 ha_commit_low percona#10 trx_coordinator::commit_in_engines percona#11 MYSQL_BIN_LOG::commit percona#12 ha_commit_trans percona#13 trans_commit percona#14 Xid_log_event::do_commit percona#15 Xid_apply_log_event::do_apply_event_worker percona#16 Slave_worker::slave_worker_exec_event percona#17 slave_worker_exec_job_group percona#18 handle_slave_worker 14. If the applier thread continues without releasing the lock, then at a later stage, it can perform recursive locking while setting the GTID for the next transaction (in set_gtid_next()). In debug builds the above case hits the assertion `safe_mutex_assert_not_owner()` meaning the lock is already acquired by the replica applier thread when it tries to re-acquire the lock. Solution -------- In the above problematic example, when seen from each thread individually, we can conclude that there is no problem in the order of lock acquisition, thus there is no need to change the lock order. However, the root cause for this problem is that multiple threads can concurrently access to the array `Gtid_state::commit_group_sidnos`. In its initial implementation, it was expected that threads should hold the `MYSQL_BIN_LOG::LOCK_commit` before modifying its contents. But it was not considered when upstream implemented WL#7846 (MTS: slave-preserve-commit-order when log-slave-updates/binlog is disabled). With this patch, we now ensure that `MYSQL_BIN_LOG::LOCK_commit` is acquired when the client thread (binlog flush leader) when it tries to perform GTID update on behalf of threads waiting in "Commit Order" queue, thus providing a guarantee that `Gtid_state::commit_group_sidnos` array is never accessed without the protection of `MYSQL_BIN_LOG::LOCK_commit`.
inikep
pushed a commit
to lukin-oleksiy/percona-server
that referenced
this pull request
Feb 4, 2025
…s=0 and a local DDL executed https://perconadev.atlassian.net/browse/PS-9018 Problem ------- In high concurrency scenarios, MySQL replica can enter into a deadlock due to a race condition between the replica applier thread and the client thread performing a binlog group commit. Analysis -------- It needs at least 3 threads for this deadlock to happen 1. One client thread 2. Two replica applier threads How this deadlock happens? -------------------------- 0. Binlog is enabled on replica, but log_replica_updates is disabled. 1. Initially, both "Commit Order" and "Binlog Flush" queues are empty. 2. Replica applier thread 1 enters the group commit pipeline to register in the "Commit Order" queue since `log-replica-updates` is disabled on the replica node. 3. Since both "Commit Order" and "Binlog Flush" queues are empty, the applier thread 1 3.1. Becomes leader (In Commit_stage_manager::enroll_for()). 3.2. Registers in the commit order queue. 3.3. Acquires the lock MYSQL_BIN_LOG::LOCK_log. 3.4. Commit Order queue is emptied, but the lock MYSQL_BIN_LOG::LOCK_log is not yet released. NOTE: SE commit for applier thread is already done by the time it reaches here. 4. Replica applier thread 2 enters the group commit pipeline to register in the "Commit Order" queue since `log-replica-updates` is disabled on the replica node. 5. Since the "Commit Order" queue is empty (emptied by applier thread 1 in 3.4), the applier thread 2 5.1. Becomes leader (In Commit_stage_manager::enroll_for()) 5.2. Registers in the commit order queue. 5.3. Tries to acquire the lock MYSQL_BIN_LOG::LOCK_log. Since it is held by applier thread 1 it will wait until the lock is released. 6. Client thread enters the group commit pipeline to register in the "Binlog Flush" queue. 7. Since "Commit Order" queue is not empty (there is applier thread 2 in the queue), it enters the conditional wait `m_stage_cond_leader` with an intention to become the leader for both the "Binlog Flush" and "Commit Order" queues. 8. Applier thread 1 releases the lock MYSQL_BIN_LOG::LOCK_log and proceeds to update the GTID by calling gtid_state->update_commit_group() from Commit_order_manager::flush_engine_and_signal_threads(). 9. Applier thread 2 acquires the lock MYSQL_BIN_LOG::LOCK_log. 9.1. It checks if there is any thread waiting in the "Binlog Flush" queue to become the leader. Here it finds the client thread waiting to be the leader. 9.2. It releases the lock MYSQL_BIN_LOG::LOCK_log and signals on the cond_var `m_stage_cond_leader` and enters a conditional wait until the thread's `tx_commit_pending` is set to false by the client thread (will be done in the Commit_stage_manager::process_final_stage_for_ordered_commit_group() called by client thread from fetch_and_process_flush_stage_queue()). 10. The client thread wakes up from the cond_var `m_stage_cond_leader`. The thread has now become a leader and it is its responsibility to update GTID of applier thread 2. 10.1. It acquires the lock MYSQL_BIN_LOG::LOCK_log. 10.2. Returns from `enroll_for()` and proceeds to process the "Commit Order" and "Binlog Flush" queues. 10.3. Fetches the "Commit Order" and "Binlog Flush" queues. 10.4. Performs the storage engine flush by calling ha_flush_logs() from fetch_and_process_flush_stage_queue(). 10.5. Proceeds to update the GTID of threads in "Commit Order" queue by calling gtid_state->update_commit_group() from Commit_stage_manager::process_final_stage_for_ordered_commit_group(). 11. At this point, we will have - Client thread performing GTID update on behalf if applier thread 2 (from step 10.5), and - Applier thread 1 performing GTID update for itself (from step 8). Due to the lack of proper synchronization between the above two threads, there exists a time window where both threads can call gtid_state->update_commit_group() concurrently. In subsequent steps, both threads simultaneously try to modify the contents of the array `commit_group_sidnos` which is used to track the lock status of sidnos. This concurrent access to `update_commit_group()` can cause a lock-leak resulting in one thread acquiring the sidno lock and not releasing at all. ----------------------------------------------------------------------------------------------------------- Client thread Applier Thread 1 ----------------------------------------------------------------------------------------------------------- update_commit_group() => global_sid_lock->rdlock(); update_commit_group() => global_sid_lock->rdlock(); calls update_gtids_impl_lock_sidnos() calls update_gtids_impl_lock_sidnos() set commit_group_sidno[2] = true set commit_group_sidno[2] = true lock_sidno(2) -> successful lock_sidno(2) -> waits update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()` if (commit_group_sidnos[2]) { unlock_sidno(2); commit_group_sidnos[2] = false; } Applier thread continues.. lock_sidno(2) -> successful update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()` if (commit_group_sidnos[2]) { <=== this check fails and lock is not released. unlock_sidno(2); commit_group_sidnos[2] = false; } Client thread continues without releasing the lock ----------------------------------------------------------------------------------------------------------- 12. As the above lock-leak can also happen the other way i.e, the applier thread fails to unlock, there can be different consequences hereafter. 13. If the client thread continues without releasing the lock, then at a later stage, it can enter into a deadlock with the applier thread performing a GTID update with stack trace. Client_thread ------------- #1 __GI___lll_lock_wait #2 ___pthread_mutex_lock #3 native_mutex_lock <= waits for commit lock while holding sidno lock #4 Commit_stage_manager::enroll_for #5 MYSQL_BIN_LOG::change_stage percona#6 MYSQL_BIN_LOG::ordered_commit percona#7 MYSQL_BIN_LOG::commit percona#8 ha_commit_trans percona#9 trans_commit_implicit percona#10 mysql_create_like_table percona#11 Sql_cmd_create_table::execute percona#12 mysql_execute_command percona#13 dispatch_sql_command Applier thread -------------- #1 ___pthread_mutex_lock #2 native_mutex_lock #3 safe_mutex_lock #4 Gtid_state::update_gtids_impl_lock_sidnos <= waits for sidno lock #5 Gtid_state::update_commit_group percona#6 Commit_order_manager::flush_engine_and_signal_threads <= acquires commit lock here percona#7 Commit_order_manager::finish percona#8 Commit_order_manager::wait_and_finish percona#9 ha_commit_low percona#10 trx_coordinator::commit_in_engines percona#11 MYSQL_BIN_LOG::commit percona#12 ha_commit_trans percona#13 trans_commit percona#14 Xid_log_event::do_commit percona#15 Xid_apply_log_event::do_apply_event_worker percona#16 Slave_worker::slave_worker_exec_event percona#17 slave_worker_exec_job_group percona#18 handle_slave_worker 14. If the applier thread continues without releasing the lock, then at a later stage, it can perform recursive locking while setting the GTID for the next transaction (in set_gtid_next()). In debug builds the above case hits the assertion `safe_mutex_assert_not_owner()` meaning the lock is already acquired by the replica applier thread when it tries to re-acquire the lock. Solution -------- In the above problematic example, when seen from each thread individually, we can conclude that there is no problem in the order of lock acquisition, thus there is no need to change the lock order. However, the root cause for this problem is that multiple threads can concurrently access to the array `Gtid_state::commit_group_sidnos`. In its initial implementation, it was expected that threads should hold the `MYSQL_BIN_LOG::LOCK_commit` before modifying its contents. But it was not considered when upstream implemented WL#7846 (MTS: slave-preserve-commit-order when log-slave-updates/binlog is disabled). With this patch, we now ensure that `MYSQL_BIN_LOG::LOCK_commit` is acquired when the client thread (binlog flush leader) when it tries to perform GTID update on behalf of threads waiting in "Commit Order" queue, thus providing a guarantee that `Gtid_state::commit_group_sidnos` array is never accessed without the protection of `MYSQL_BIN_LOG::LOCK_commit`.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a change before in upstream to enable mysql maintainer_mode for linux debug build by default which created problems with our current build flags so this is to fix it.
build_binary and rpm spec file build using the %optflags so it was added to build_binary to remove optimization flags when Debug build is used - that behavior was already in rpm spec but it didn't contain removal of "-Wp,-D_FORTIFY_SOURCE=2" which needs to be removed also if optimization flags get removed or we get error.
In debian packaging for debug build only optimization flag was removed.
And in both debian/rpm we remove the "-DMYSQL_MAINTAINER_MODE=OFF" option which was there only temporary anyway.
Here's some tests (I've included the git and bzr builds because git has some error in centos7 because of the centos7 patch related to bzr/git migration, but from what I see it should be resolved with the next upstream merge):
5.5 git build:
http://jenkins.percona.com/view/Percona-RELEASES/job/percona-server-5.5-RELEASE/132/
5.5 bzr build:
http://jenkins.percona.com/view/Percona-RELEASES/job/percona-server-5.5-RELEASE-bzr/1/