Skip to content

Commit

Permalink
Bug#36189820 Server crash happened while running 'explain format=json…
Browse files Browse the repository at this point in the history
… for connection xxx'.

The new iterator based explains are not impacted.

The issue here is a race condition. More than one thread is using the
query term iterator at the same time (whoch is neithe threas safe nor
reantrant), and part of its state is in the query terms being visited
which leads to interference/race conditions.

a) the explain thread

uses an iterator here:

   Sql_cmd_explain_other_thread::execute

is inspecting the Query_expression of the running query
calling master_query_expression()->find_blocks_query_term which uses
an iterator over the query terms in the query expression:

   for (auto qt : query_terms<>()) {
       if (qt->query_block() == qb) {
           return qt;
       }
   }

the above search fails to find qb due to the interference of the
thread b), see below, and then tries to access a nullpointer:

    * thread percona#36, name = ‘connection’, stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
  frame #0: 0x000000010bb3cf0d mysqld`Query_block::type(this=0x00007f8f82719088) const at sql_lex.cc:4441:11
  frame #1: 0x000000010b83763e mysqld`(anonymous namespace)::Explain::explain_select_type(this=0x00007000020611b8) at opt_explain.cc:792:50
  frame #2: 0x000000010b83cc4d mysqld`(anonymous namespace)::Explain_join::explain_select_type(this=0x00007000020611b8) at opt_explain.cc:1487:21
  frame #3: 0x000000010b837c34 mysqld`(anonymous namespace)::Explain::prepare_columns(this=0x00007000020611b8) at opt_explain.cc:744:26
  frame #4: 0x000000010b83ea0e mysqld`(anonymous namespace)::Explain_join::explain_qep_tab(this=0x00007000020611b8, tabnum=0) at opt_explain.cc:1415:32
  frame #5: 0x000000010b83ca0a mysqld`(anonymous namespace)::Explain_join::shallow_explain(this=0x00007000020611b8) at opt_explain.cc:1364:9
  frame percona#6: 0x000000010b83379b mysqld`(anonymous namespace)::Explain::send(this=0x00007000020611b8) at opt_explain.cc:770:14
  frame percona#7: 0x000000010b834147 mysqld`explain_query_specification(explain_thd=0x00007f8fbb111e00, query_thd=0x00007f8fbb919c00, query_term=0x00007f8f82719088, ctx=CTX_JOIN) at opt_explain.cc:2088:20
  frame percona#8: 0x000000010bd36b91 mysqld`Query_expression::explain_query_term(this=0x00007f8f7a090360, explain_thd=0x00007f8fbb111e00, query_thd=0x00007f8fbb919c00, qt=0x00007f8f82719088) at sql_union.cc:1519:11
  frame percona#9: 0x000000010bd36c68 mysqld`Query_expression::explain_query_term(this=0x00007f8f7a090360, explain_thd=0x00007f8fbb111e00, query_thd=0x00007f8fbb919c00, qt=0x00007f8f8271d748) at sql_union.cc:1526:13
  frame percona#10: 0x000000010bd373f7 mysqld`Query_expression::explain(this=0x00007f8f7a090360, explain_thd=0x00007f8fbb111e00, query_thd=0x00007f8fbb919c00) at sql_union.cc:1591:7
  frame percona#11: 0x000000010b835820 mysqld`mysql_explain_query_expression(explain_thd=0x00007f8fbb111e00, query_thd=0x00007f8fbb919c00, unit=0x00007f8f7a090360) at opt_explain.cc:2392:17
  frame percona#12: 0x000000010b835400 mysqld`explain_query(explain_thd=0x00007f8fbb111e00, query_thd=0x00007f8fbb919c00, unit=0x00007f8f7a090360) at opt_explain.cc:2353:13
 * frame percona#13: 0x000000010b8363e4 mysqld`Sql_cmd_explain_other_thread::execute(this=0x00007f8fba585b68, thd=0x00007f8fbb111e00) at opt_explain.cc:2531:11
  frame percona#14: 0x000000010bba7d8b mysqld`mysql_execute_command(thd=0x00007f8fbb111e00, first_level=true) at sql_parse.cc:4648:29
  frame percona#15: 0x000000010bb9e230 mysqld`dispatch_sql_command(thd=0x00007f8fbb111e00, parser_state=0x0000700002065de8) at sql_parse.cc:5303:19
  frame percona#16: 0x000000010bb9a4cb mysqld`dispatch_command(thd=0x00007f8fbb111e00, com_data=0x0000700002066e38, command=COM_QUERY) at sql_parse.cc:2135:7
  frame percona#17: 0x000000010bb9c846 mysqld`do_command(thd=0x00007f8fbb111e00) at sql_parse.cc:1464:18
  frame percona#18: 0x000000010b2f2574 mysqld`handle_connection(arg=0x0000600000e34200) at connection_handler_per_thread.cc:304:13
  frame percona#19: 0x000000010e072fc4 mysqld`pfs_spawn_thread(arg=0x00007f8fba8160b0) at pfs.cc:3051:3
  frame percona#20: 0x00007ff806c2b202 libsystem_pthread.dylib`_pthread_start + 99
  frame percona#21: 0x00007ff806c26bab libsystem_pthread.dylib`thread_start + 15

b) the query thread being explained is itself performing LEX::cleanup
and as part of the iterates over the query terms, but still allows
EXPLAIN of the query plan since

   thd->query_plan.set_query_plan(SQLCOM_END, ...)

hasn't been called yet.

     20:frame: Query_terms<(Visit_order)1, (Visit_leaves)0>::Query_term_iterator::operator++() (in mysqld) (query_term.h:613)
     21:frame: Query_expression::cleanup(bool) (in mysqld) (sql_union.cc:1861)
     22:frame: LEX::cleanup(bool) (in mysqld) (sql_lex.h:4286)
     30:frame: Sql_cmd_dml::execute(THD*) (in mysqld) (sql_select.cc:799)
     31:frame: mysql_execute_command(THD*, bool) (in mysqld) (sql_parse.cc:4648)
     32:frame: dispatch_sql_command(THD*, Parser_state*) (in mysqld) (sql_parse.cc:5303)
     33:frame: dispatch_command(THD*, COM_DATA const*, enum_server_command) (in mysqld) (sql_parse.cc:2135)
     34:frame: do_command(THD*) (in mysqld) (sql_parse.cc:1464)
     57:frame: handle_connection(void*) (in mysqld) (connection_handler_per_thread.cc:304)
     58:frame: pfs_spawn_thread(void*) (in mysqld) (pfs.cc:3053)
     65:frame: _pthread_start (in libsystem_pthread.dylib) + 99
     66:frame: thread_start (in libsystem_pthread.dylib) + 15

Solution:

This patch solves the issue by removing iterator state from
Query_term, making the query_term iterators thread safe. This solution
labels every child query_term with its index in its parent's
m_children vector.  The iterator can therefore easily compute the next
child to visit based on Query_term::m_sibling_idx.

A unit test case is added to check reentrancy.

One can also manually verify that we have no remaining race condition
by running two client connections files (with \. <file>) with a big
number of copies of the repro query in one connection and a big number
of EXPLAIN format=json FOR <connection>, e.g.

    EXPLAIN FORMAT=json FOR CONNECTION 8\G

in the other. The actual connection number would need to verified
in connection one, of course.

Change-Id: Ie7d56610914738ccbbecf399ccc4f465f7d26ea7
  • Loading branch information
Dag Wanvik committed Apr 6, 2024
1 parent c6c3296 commit 01a8081
Show file tree
Hide file tree
Showing 6 changed files with 109 additions and 35 deletions.
1 change: 1 addition & 0 deletions sql/parse_tree_nodes.cc
Original file line number Diff line number Diff line change
Expand Up @@ -1773,6 +1773,7 @@ bool PT_set_operation::contextualize_setop(Parse_context *pc,
if (setop == nullptr) return true;

merge_descendants(pc, setop, ql);
setop->label_children();

Query_expression *qe = pc->select->master_query_expression();
if (setop->set_block(qe->create_post_processing_block(setop))) return true;
Expand Down
3 changes: 3 additions & 0 deletions sql/query_term.cc
Original file line number Diff line number Diff line change
Expand Up @@ -79,8 +79,11 @@ Query_term *Query_term::pushdown_limit_order_by(Query_term_set_op *parent) {
case QT_EXCEPT: {
auto setop = down_cast<Query_term_set_op *>(this);
for (Query_term *&child : setop->m_children) {
const uint sibling_idx = child->sibling_idx();
child = child->pushdown_limit_order_by(
down_cast<Query_term_set_op *>(this));
// Make sure the new child inherits the old child's sibling index
child->set_sibling_idx(sibling_idx);
}
} break;
case QT_UNARY: {
Expand Down
101 changes: 69 additions & 32 deletions sql/query_term.h
Original file line number Diff line number Diff line change
Expand Up @@ -260,16 +260,28 @@ class Query_term {
*/
virtual size_t child_count() const { return 0; }

/// Set the correct value of \c Query_term::m_sibling_idx recursively for
/// set operations. For \c Query_term_unary, this is done in its constructor.
/// A no-op for \c Query_block. See also \c set_sibling_idx.
virtual void label_children() = 0;

protected:
/**
Back pointer to the node whose child we are, or nullptr (root term).
*/
Query_term_set_op *m_parent{nullptr};

/// If parent is non-null, this holds the index of the current sibling.
/// Used for efficient iterator traversal up and down the tree.
uint m_sibling_idx{0};

public:
/// Getter for m_parent, q.v.
Query_term_set_op *parent() const { return m_parent; }

/// Setter for \c m_sibling_idx, q.v.
void set_sibling_idx(uint idx) { m_sibling_idx = idx; }
/// Getter for \c m_sibling_idx, q.v.
uint sibling_idx() { return m_sibling_idx; }
/**
Reset resources used.
@param full do full cleanup. Same semantics as for Query_expression's
Expand Down Expand Up @@ -384,13 +396,6 @@ class Query_term {
void set_fields(mem_root_deque<Item *> *fields) { m_fields = fields; }
// Getter for m_fields, q.v.
mem_root_deque<Item *> *fields() { return m_fields; }

private:
/// class Query_terms iterator traversal state variable, hence friend
/// declaration immediately below
uint m_curr_id;
template <Visit_order order, Visit_leaves visit_leaves>
friend class Query_terms;
};

/// Common base class for n-ary set operations, including unary.
Expand Down Expand Up @@ -441,6 +446,14 @@ class Query_term_set_op : public Query_term {
return false;
}

void label_children() override {
uint idx = 0;
for (auto child : m_children) {
child->set_sibling_idx(idx++);
child->label_children();
}
}

size_t child_count() const override { return m_children.size(); }
bool open_result_tables(THD *thd, int level) override;
void cleanup(bool full) override;
Expand Down Expand Up @@ -549,6 +562,7 @@ class Query_term_unary : public Query_term_set_op {
: Query_term_set_op(mem_root) {
m_last_distinct = 0;
m_children.push_back(t);
t->set_sibling_idx(0);
}
Query_term_type term_type() const override { return QT_UNARY; }
const char *operator_string() const override { return "result"; }
Expand All @@ -559,7 +573,7 @@ class Query_term_unary : public Query_term_set_op {
Containing class for iterator over the query term tree. The structure is
in part dictated by C++ conventions for iterators.
@tparam visit_order indicates whether pre or post order visiting is requested
@tparam visit_leaves indicated whether to visit the leaf nodes (query blocks)
@tparam visit_leaves indicates whether to visit the leaf nodes (query blocks)
*/
template <Visit_order visit_order, Visit_leaves visit_leaves>
class Query_terms {
Expand All @@ -580,50 +594,38 @@ class Query_terms {
*/
Query_term_iterator(Query_term *root) {
m_current = root;
if (m_current != nullptr) m_current->m_curr_id = 0;
m_child_idx = 0;
if constexpr (visit_order == QTC_POST_ORDER) {
// start at left-most leaf node
while (m_current != nullptr &&
m_current->term_type() != QT_QUERY_BLOCK) {
m_current = down_cast<Query_term_set_op *>(m_current)
->m_children[m_current->m_curr_id];
m_current->m_curr_id = 0;
}
dive_to_leftmost_leaf_of_child();
if constexpr (visit_leaves == VL_SKIP_LEAVES) operator++();
}
}

Query_term_iterator() = default;

Query_term_iterator &operator++() {
if (m_current == nullptr) {
return *this;
}
assert(m_current != nullptr);
while (m_current != nullptr) {
uint children = m_current->child_count();
if constexpr (visit_order == QTC_PRE_ORDER) {
while (m_current != nullptr && (m_current->m_curr_id >= children)) {
while (m_current != nullptr && m_child_idx >= children) {
// no more at this level, go back up
m_current = m_current->m_parent;
prepare_for_next_sibling();
if (m_current != nullptr) {
children = m_current->child_count();
}
}

if (m_current == nullptr) return *this;
m_current = down_cast<Query_term_set_op *>(m_current)
->m_children[m_current->m_curr_id++];
m_current->m_curr_id = 0;
->m_children[m_child_idx];
m_child_idx = 0;
} else {
m_current = m_current->m_parent;
prepare_for_next_sibling();
if (m_current == nullptr) return *this;
if (++m_current->m_curr_id < m_current->child_count()) {
while (m_current != nullptr &&
m_current->term_type() != QT_QUERY_BLOCK) {
m_current = down_cast<Query_term_set_op *>(m_current)
->m_children[m_current->m_curr_id];
m_current->m_curr_id = 0;
}
if (m_child_idx < m_current->child_count()) {
dive_to_leftmost_leaf_of_child();
} else {
// return non-leaf
}
Expand All @@ -645,8 +647,43 @@ class Query_terms {
bool operator!=(const Query_term_iterator &other) const {
return !((*this) == other);
}
/// Iterator state consists of this and Query_term::m_curr_id

private:
/// Iterator state consists of the next two member variables
Query_term *m_current{nullptr};

/// Used to find next child node to dive into, see \c set_next_child_idx
uint m_child_idx{0};

/// Starting at \c m_current->m_children[m_child_index], dive down through
/// any left-most (index == 0) further children till we find left-most leaf
/// term (a \c Query_block) under the child pointed to by
/// \c m_child_index. After the dive, \c m_child_index will be zero, and
/// \c m_current will be set to the leaf term.
void dive_to_leftmost_leaf_of_child() {
while (m_current != nullptr && m_current->term_type() != QT_QUERY_BLOCK) {
m_current =
down_cast<Query_term_set_op *>(m_current)->m_children[m_child_idx];
m_child_idx = 0;
}
}

/// Find the index of the next sibling, if any, of \c m_current qua child
/// of its parent, so we can visit it: assign it to \c m_child_idx.
/// Setting \c m_child_idx to a value of \c Query_term::child_count means we
/// signal that we are done. Also, \c m_current is set to its parent.
void prepare_for_next_sibling() {
assert(m_current != nullptr);
if (m_current->parent() == nullptr) {
m_current = m_current->parent();
return;
}
assert(m_current->parent()->m_children[m_current->sibling_idx()] ==
m_current);
m_child_idx = m_current->sibling_idx() + 1;

m_current = m_current->parent();
}
};

public:
Expand Down
1 change: 1 addition & 0 deletions sql/sql_lex.h
Original file line number Diff line number Diff line change
Expand Up @@ -1183,6 +1183,7 @@ class Query_block : public Query_term {
Query_block *query_block() const override {
return const_cast<Query_block *>(this);
}
void label_children() override {}
void destroy_tree() override { m_parent = nullptr; }

bool open_result_tables(THD *, int) override;
Expand Down
6 changes: 3 additions & 3 deletions sql/sql_union.cc
Original file line number Diff line number Diff line change
Expand Up @@ -689,9 +689,9 @@ bool Query_expression::can_materialize_directly_into_result() const {
}

/**
Prepares all query blocks of a query expression, including
fake_query_block. If a recursive query expression, this also creates the
materialized temporary table.
Prepares all query blocks of a query expression.
If a recursive query expression, this also creates the materialized temporary
table.
@param thd Thread handler
@param sel_result Result object where the unit's output should go.
Expand Down
32 changes: 32 additions & 0 deletions unittest/gunit/union_syntax-t.cc
Original file line number Diff line number Diff line change
Expand Up @@ -342,4 +342,36 @@ TEST_F(UnionSyntaxTest, InnerVsOuterOrder) {
// EXPECT_EQ(2, get_limit(query_expression->fake_query_block));
}

TEST_F(UnionSyntaxTest, QueryTermIteratorReentrancy) {
Query_block *query_block = parse(
"(SELECT * FROM r UNION ALL SELECT * FROM s ORDER BY a LIMIT 10)"
" UNION ALL "
" (SELECT * FROM r UNION DISTINCT SELECT * FROM s) LIMIT 7");

Query_expression *qe = query_block->master_query_expression();
Query_terms<QTC_POST_ORDER, VL_VISIT_LEAVES> terms(qe->query_term());
// set of nodes collected without any "interference"
std::vector<Query_term *> nodes_a_priori;
std::vector<Query_term *> nodes_outer;
std::vector<Query_term *> nodes_inner;

for (Query_term *term1 : terms) {
nodes_a_priori.push_back(term1);
}

EXPECT_EQ(7, nodes_a_priori.size());

for (Query_term *term1 : terms) {
// run a second iterator over the same query terms and verify that it
// doesn't interfere with the outer iterator's job
for (Query_term *term2 : terms) {
nodes_inner.push_back(term2);
}
EXPECT_EQ(nodes_a_priori, nodes_inner);
nodes_inner.clear();
nodes_outer.push_back(term1);
}
EXPECT_EQ(nodes_a_priori, nodes_outer);
}

} // namespace union_syntax_unittest

0 comments on commit 01a8081

Please sign in to comment.