-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DocDB] GeoTransactionsPromotionTest.TestPromotionReturningToAbortedState flaky test #24317
Closed
1 task done
Labels
2.20 Backport Required
2024.1 Backport Required
2024.1.3_blocker
2024.1.3.1_blocker
2024.2 Backport Required
area/docdb
YugabyteDB core features
kind/bug
This issue is a bug
priority/high
High Priority
Comments
es1024
added
area/docdb
YugabyteDB core features
status/awaiting-triage
Issue awaiting triage
labels
Oct 8, 2024
yugabyte-ci
added
kind/bug
This issue is a bug
priority/medium
Medium priority issue
labels
Oct 8, 2024
yugabyte-ci
added
priority/high
High Priority
and removed
status/awaiting-triage
Issue awaiting triage
priority/medium
Medium priority issue
labels
Oct 8, 2024
es1024
added a commit
that referenced
this issue
Oct 9, 2024
…anup Summary: There exists a race condition between commit/abort path and old transaction cleanup for promoted transactions, where commit/abort path observes that old transaction cleanup is still ongoing and sets `cleanup_waiter_`, but old transaction cleanup finishes before `cleanup_waiter_` is set, resulting in the waiter never getting called. This is the cause of occasional failures of GeoTransactionsPromotionTest.TestPromotionReturningToAbortedState and GeoTransactionsPromotionRF1Test.TestTwoTabletPromotionFailure with the following stack: ``` F20241007 19:23:50 ../../src/yb/rpc/rpc.cc:339] Check failed: calls_.empty() Calls: [0x000035bd35b49d60 -> AbortTransaction: tablet_id: "36df9b80658448848075dd10894f489c" transaction_id: "`\2051\'S&K\333\267[<\276\373P\302\207" propagated_hybrid_time: 7079338864676978688, retrier: { task_id: -1 state: kFinished deadline: 314126.220s }] *** Check failure stack trace: *** @ 0x7fa6097965c0 google::LogMessage::SendToLog() @ 0x7fa609796c00 google::LogMessage::Flush() @ 0x7fa609799979 google::LogMessageFatal::~LogMessageFatal() @ 0x7fa60990b545 yb::rpc::Rpcs::Shutdown() @ 0x7fa60b40216d yb::client::TransactionManager::Impl::~Impl() @ 0x7fa60b3fdfad yb::client::TransactionManager::~TransactionManager() @ 0x7fa60cfe6f6e yb::tserver::DbServerBase::~DbServerBase() @ 0x7fa60d0b702e yb::tserver::TabletServer::~TabletServer() @ 0x7fa60e3e3393 yb::tserver::MiniTabletServer::Shutdown() @ 0x7fa60e561a05 yb::MiniCluster::Shutdown() @ 0x7fa60e595aca yb::YBMiniClusterTestBase<>::DoTearDown() @ 0x7fa60de00c1d testing::internal::HandleExceptionsInMethodIfSupported<>() @ 0x7fa60dde78c7 testing::TestInfo::Run() @ 0x7fa60dde8575 testing::TestSuite::Run() @ 0x7fa60ddf7e4e testing::internal::UnitTestImpl::RunAllTests() @ 0x7fa60de0190d testing::internal::HandleExceptionsInMethodIfSupported<>() @ 0x7fa60ddf795f testing::UnitTest::Run() @ 0x7fa60de81177 main @ 0x7fa607a29d90 (unknown) @ 0x7fa607a29e40 __libc_start_main @ 0x55f94fc7e325 _start ``` Jira: DB-13207 Test Plan: Jenkins. Also ran GeoTransactionsPromotionTest.TestPromotionReturningToAbortedState 500x and ensured above stack did not appear. Reviewers: sergei Reviewed By: sergei Subscribers: rthallam, ybase Differential Revision: https://phorge.dev.yugabyte.com/D38789
es1024
added a commit
that referenced
this issue
Oct 11, 2024
…d transaction cleanup Summary: Original commit: f51e54d / D38789 There exists a race condition between commit/abort path and old transaction cleanup for promoted transactions, where commit/abort path observes that old transaction cleanup is still ongoing and sets `cleanup_waiter_`, but old transaction cleanup finishes before `cleanup_waiter_` is set, resulting in the waiter never getting called. This is the cause of occasional failures of GeoTransactionsPromotionTest.TestPromotionReturningToAbortedState and GeoTransactionsPromotionRF1Test.TestTwoTabletPromotionFailure with the following stack: ``` F20241007 19:23:50 ../../src/yb/rpc/rpc.cc:339] Check failed: calls_.empty() Calls: [0x000035bd35b49d60 -> AbortTransaction: tablet_id: "36df9b80658448848075dd10894f489c" transaction_id: "`\2051\'S&K\333\267[<\276\373P\302\207" propagated_hybrid_time: 7079338864676978688, retrier: { task_id: -1 state: kFinished deadline: 314126.220s }] *** Check failure stack trace: *** @ 0x7fa6097965c0 google::LogMessage::SendToLog() @ 0x7fa609796c00 google::LogMessage::Flush() @ 0x7fa609799979 google::LogMessageFatal::~LogMessageFatal() @ 0x7fa60990b545 yb::rpc::Rpcs::Shutdown() @ 0x7fa60b40216d yb::client::TransactionManager::Impl::~Impl() @ 0x7fa60b3fdfad yb::client::TransactionManager::~TransactionManager() @ 0x7fa60cfe6f6e yb::tserver::DbServerBase::~DbServerBase() @ 0x7fa60d0b702e yb::tserver::TabletServer::~TabletServer() @ 0x7fa60e3e3393 yb::tserver::MiniTabletServer::Shutdown() @ 0x7fa60e561a05 yb::MiniCluster::Shutdown() @ 0x7fa60e595aca yb::YBMiniClusterTestBase<>::DoTearDown() @ 0x7fa60de00c1d testing::internal::HandleExceptionsInMethodIfSupported<>() @ 0x7fa60dde78c7 testing::TestInfo::Run() @ 0x7fa60dde8575 testing::TestSuite::Run() @ 0x7fa60ddf7e4e testing::internal::UnitTestImpl::RunAllTests() @ 0x7fa60de0190d testing::internal::HandleExceptionsInMethodIfSupported<>() @ 0x7fa60ddf795f testing::UnitTest::Run() @ 0x7fa60de81177 main @ 0x7fa607a29d90 (unknown) @ 0x7fa607a29e40 __libc_start_main @ 0x55f94fc7e325 _start ``` Jira: DB-13207 Test Plan: Jenkins. Also ran GeoTransactionsPromotionTest.TestPromotionReturningToAbortedState 500x and ensured above stack did not appear. Reviewers: sergei, rthallam Reviewed By: rthallam Subscribers: ybase, rthallam Differential Revision: https://phorge.dev.yugabyte.com/D38892
es1024
added a commit
that referenced
this issue
Oct 11, 2024
…d transaction cleanup Summary: Original commit: f51e54d / D38789 There exists a race condition between commit/abort path and old transaction cleanup for promoted transactions, where commit/abort path observes that old transaction cleanup is still ongoing and sets `cleanup_waiter_`, but old transaction cleanup finishes before `cleanup_waiter_` is set, resulting in the waiter never getting called. This is the cause of occasional failures of GeoTransactionsPromotionTest.TestPromotionReturningToAbortedState and GeoTransactionsPromotionRF1Test.TestTwoTabletPromotionFailure with the following stack: ``` F20241007 19:23:50 ../../src/yb/rpc/rpc.cc:339] Check failed: calls_.empty() Calls: [0x000035bd35b49d60 -> AbortTransaction: tablet_id: "36df9b80658448848075dd10894f489c" transaction_id: "`\2051\'S&K\333\267[<\276\373P\302\207" propagated_hybrid_time: 7079338864676978688, retrier: { task_id: -1 state: kFinished deadline: 314126.220s }] *** Check failure stack trace: *** @ 0x7fa6097965c0 google::LogMessage::SendToLog() @ 0x7fa609796c00 google::LogMessage::Flush() @ 0x7fa609799979 google::LogMessageFatal::~LogMessageFatal() @ 0x7fa60990b545 yb::rpc::Rpcs::Shutdown() @ 0x7fa60b40216d yb::client::TransactionManager::Impl::~Impl() @ 0x7fa60b3fdfad yb::client::TransactionManager::~TransactionManager() @ 0x7fa60cfe6f6e yb::tserver::DbServerBase::~DbServerBase() @ 0x7fa60d0b702e yb::tserver::TabletServer::~TabletServer() @ 0x7fa60e3e3393 yb::tserver::MiniTabletServer::Shutdown() @ 0x7fa60e561a05 yb::MiniCluster::Shutdown() @ 0x7fa60e595aca yb::YBMiniClusterTestBase<>::DoTearDown() @ 0x7fa60de00c1d testing::internal::HandleExceptionsInMethodIfSupported<>() @ 0x7fa60dde78c7 testing::TestInfo::Run() @ 0x7fa60dde8575 testing::TestSuite::Run() @ 0x7fa60ddf7e4e testing::internal::UnitTestImpl::RunAllTests() @ 0x7fa60de0190d testing::internal::HandleExceptionsInMethodIfSupported<>() @ 0x7fa60ddf795f testing::UnitTest::Run() @ 0x7fa60de81177 main @ 0x7fa607a29d90 (unknown) @ 0x7fa607a29e40 __libc_start_main @ 0x55f94fc7e325 _start ``` Jira: DB-13207 Test Plan: Jenkins. Also ran GeoTransactionsPromotionTest.TestPromotionReturningToAbortedState 500x and ensured above stack did not appear. Reviewers: sergei, rthallam Reviewed By: rthallam Subscribers: rthallam, ybase Differential Revision: https://phorge.dev.yugabyte.com/D38891
es1024
added a commit
that referenced
this issue
Oct 12, 2024
…transaction cleanup Summary: Original commit: f51e54d / D38789 There exists a race condition between commit/abort path and old transaction cleanup for promoted transactions, where commit/abort path observes that old transaction cleanup is still ongoing and sets `cleanup_waiter_`, but old transaction cleanup finishes before `cleanup_waiter_` is set, resulting in the waiter never getting called. This is the cause of occasional failures of GeoTransactionsPromotionTest.TestPromotionReturningToAbortedState and GeoTransactionsPromotionRF1Test.TestTwoTabletPromotionFailure with the following stack: ``` F20241007 19:23:50 ../../src/yb/rpc/rpc.cc:339] Check failed: calls_.empty() Calls: [0x000035bd35b49d60 -> AbortTransaction: tablet_id: "36df9b80658448848075dd10894f489c" transaction_id: "`\2051\'S&K\333\267[<\276\373P\302\207" propagated_hybrid_time: 7079338864676978688, retrier: { task_id: -1 state: kFinished deadline: 314126.220s }] *** Check failure stack trace: *** @ 0x7fa6097965c0 google::LogMessage::SendToLog() @ 0x7fa609796c00 google::LogMessage::Flush() @ 0x7fa609799979 google::LogMessageFatal::~LogMessageFatal() @ 0x7fa60990b545 yb::rpc::Rpcs::Shutdown() @ 0x7fa60b40216d yb::client::TransactionManager::Impl::~Impl() @ 0x7fa60b3fdfad yb::client::TransactionManager::~TransactionManager() @ 0x7fa60cfe6f6e yb::tserver::DbServerBase::~DbServerBase() @ 0x7fa60d0b702e yb::tserver::TabletServer::~TabletServer() @ 0x7fa60e3e3393 yb::tserver::MiniTabletServer::Shutdown() @ 0x7fa60e561a05 yb::MiniCluster::Shutdown() @ 0x7fa60e595aca yb::YBMiniClusterTestBase<>::DoTearDown() @ 0x7fa60de00c1d testing::internal::HandleExceptionsInMethodIfSupported<>() @ 0x7fa60dde78c7 testing::TestInfo::Run() @ 0x7fa60dde8575 testing::TestSuite::Run() @ 0x7fa60ddf7e4e testing::internal::UnitTestImpl::RunAllTests() @ 0x7fa60de0190d testing::internal::HandleExceptionsInMethodIfSupported<>() @ 0x7fa60ddf795f testing::UnitTest::Run() @ 0x7fa60de81177 main @ 0x7fa607a29d90 (unknown) @ 0x7fa607a29e40 __libc_start_main @ 0x55f94fc7e325 _start ``` Jira: DB-13207 Test Plan: Jenkins. Also ran GeoTransactionsPromotionTest.TestPromotionReturningToAbortedState 500x and ensured above stack did not appear. Reviewers: sergei, rthallam Reviewed By: rthallam Subscribers: ybase, rthallam Differential Revision: https://phorge.dev.yugabyte.com/D38890
es1024
added a commit
that referenced
this issue
Nov 8, 2024
…old transaction cleanup Summary: There exists a race condition between commit/abort path and old transaction cleanup for promoted transactions, where commit/abort path observes that old transaction cleanup is still ongoing and sets `cleanup_waiter_`, but old transaction cleanup finishes before `cleanup_waiter_` is set, resulting in the waiter never getting called. This is the cause of occasional failures of GeoTransactionsPromotionTest.TestPromotionReturningToAbortedState and GeoTransactionsPromotionRF1Test.TestTwoTabletPromotionFailure with the following stack: ``` F20241007 19:23:50 ../../src/yb/rpc/rpc.cc:339] Check failed: calls_.empty() Calls: [0x000035bd35b49d60 -> AbortTransaction: tablet_id: "36df9b80658448848075dd10894f489c" transaction_id: "`\2051\'S&K\333\267[<\276\373P\302\207" propagated_hybrid_time: 7079338864676978688, retrier: { task_id: -1 state: kFinished deadline: 314126.220s }] *** Check failure stack trace: *** @ 0x7fa6097965c0 google::LogMessage::SendToLog() @ 0x7fa609796c00 google::LogMessage::Flush() @ 0x7fa609799979 google::LogMessageFatal::~LogMessageFatal() @ 0x7fa60990b545 yb::rpc::Rpcs::Shutdown() @ 0x7fa60b40216d yb::client::TransactionManager::Impl::~Impl() @ 0x7fa60b3fdfad yb::client::TransactionManager::~TransactionManager() @ 0x7fa60cfe6f6e yb::tserver::DbServerBase::~DbServerBase() @ 0x7fa60d0b702e yb::tserver::TabletServer::~TabletServer() @ 0x7fa60e3e3393 yb::tserver::MiniTabletServer::Shutdown() @ 0x7fa60e561a05 yb::MiniCluster::Shutdown() @ 0x7fa60e595aca yb::YBMiniClusterTestBase<>::DoTearDown() @ 0x7fa60de00c1d testing::internal::HandleExceptionsInMethodIfSupported<>() @ 0x7fa60dde78c7 testing::TestInfo::Run() @ 0x7fa60dde8575 testing::TestSuite::Run() @ 0x7fa60ddf7e4e testing::internal::UnitTestImpl::RunAllTests() @ 0x7fa60de0190d testing::internal::HandleExceptionsInMethodIfSupported<>() @ 0x7fa60ddf795f testing::UnitTest::Run() @ 0x7fa60de81177 main @ 0x7fa607a29d90 (unknown) @ 0x7fa607a29e40 __libc_start_main @ 0x55f94fc7e325 _start ``` Jira: DB-13207 Original commit: f51e54d / D38789 Test Plan: Jenkins. Also ran GeoTransactionsPromotionTest.TestPromotionReturningToAbortedState 500x and ensured above stack did not appear. Reviewers: sergei Reviewed By: sergei Subscribers: ybase, rthallam Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D39830
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
2.20 Backport Required
2024.1 Backport Required
2024.1.3_blocker
2024.1.3.1_blocker
2024.2 Backport Required
area/docdb
YugabyteDB core features
kind/bug
This issue is a bug
priority/high
High Priority
Jira Link: DB-13207
Description
GeoTransactionsPromotionTest.TestPromotionReturningToAbortedState rarely fails with the following stack:
Issue Type
kind/bug
Warning: Please confirm that this issue does not contain any sensitive information
The text was updated successfully, but these errors were encountered: