Skip to content

Commit

Permalink
[#2978] Various TSAN fixes for QLTransactionTest.RemoteBootstrap
Browse files Browse the repository at this point in the history
Summary:
Started digging into some TSAN failures in QLTransactionTest.RemoteBootstrap.
Found and fixed a number of issues:

- [#3002] Data race on `yb::log::Log::active_segment_sequence_number()`
Seems this field is protected by a read lock for reads, but was not protected on writes. Turned it
into an atomic.

- [#3007] Race condition between TabletPeer `Init` and `Shutdown`
> std::__1::shared_ptr<yb::tablet::enterprise::Tablet>::get()
> td::__1::shared_ptr<yb::consensus::RaftConsensus>::get()
> src/yb/tablet/tablet_peer.cc:385:17 in yb::tablet::TabletPeer::StartShutdown()
Seems like `Shutdown` did not take the appropriate locks to access either `tablet_` or `consensus_`.

- [#3008] Race condition in thread pool `Worker` Shutdown path:
> #12 yb::rpc::ThreadPool::Impl::Shutdown() /n/users/bogdan/code/yugabyte-db/build/tsan-clang-dynamic-ninja/../../src/yb/rpc/thread_pool.cc:224 (libyrpc.so+0x20c73f)
> #3 yb::rpc::(anonymous namespace)::Worker::Notify() /n/users/bogdan/code/yugabyte-db/build/tsan-clang-dynamic-ninja/../../src/yb/rpc/thread_pool.cc:75 (libyrpc.so+0x20ad6e)
Essentially, we're destroying the vector of workers, but it's possible we still end up trying to notify
them afterwards. Moved some of the code around and expoxed an explicit `Join`. Logic should stay
basically the same. Also moved to shared_ptr instead of raw pointers.

- [#3009] Race condition in Master async RPC task vs CatalogManager reading the task description
> #0 yb::master::PickLeaderReplica::PickReplica(yb::master::TSDescriptor**) /n/users/bogdan/code/yugabyte-db/build/tsan-clang-dynamic-ninja/../../src/yb/master/async_rpc_tasks.cc:95:12 (libmaster.so+0x2450b8)
> #2 yb::master::CatalogManager::SendAddServerRequest(scoped_refptr<yb::master::TabletInfo> const&, yb::consensus::RaftPeerPB_MemberType, yb::consensus::ConsensusStatePB const&, std::__1::basic_string<char, std::__1::char_traits      <char>, std::__1::allocator<char> > const&) /n/users/bogdan/code/yugabyte-db/build/tsan-clang-dynamic-ninja/../../src/yb/master/catalog_manager.cc:5046:54
Just removed the log line..

There are a couple more issues I am still seeing:
- [#3010] Another `Long wait for safe op id`, but one seems like a bootstrap bug
Currently, doing a remote bootstrap triggers an inline OpenTablet in TSTabletManager, unlike the
normal ones, which are scheduled through a thread pool. That causes issues, because on shutdown,
we wait for the threadpool tasks to finish / get aborted. However, when done inline, this exposes
race conditions between Init and Shutdown paths for TabletPeer, RaftConsensus, Log, etc.

- [#3011] SEGV during `DisableFailureDetector`, during raft shutdown
Caused by the same race between Start (which creates the timer) and shutdown, which aborts it.

- [#3012] Log Close failures not flipping the state to closed
> F20191106 04:56:48 ../../src/yb/consensus/log_util.cc:874] Check failed: !IsFooterWritten()
Caused by the same race above. If TSTabletManager starts a remote bootstrap, we open a log. If we
shutdown the tablet manager, before finishing a bootstrap, we wipe the data, but then when we
close the log, we error out as files do not exist anymore.

- [#3013] Race condition in Master async RPC tasks state transitions
> F20191106 05:10:05 ../../src/yb/master/async_rpc_tasks.cc:126] Check failed: task_state == MonitoredTaskState::kWaiting State: kScheduling
Seems like there was a race between scheduling the task to run on the reactor thread and only AFTER
flipping the state from kScheduling. This can be a standalone investigation.

Test Plan: `ybd tsan --cxx-test client_ql-transaction-test --gtest_filter QLTransactionTest.RemoteBootstrap -n 100 --tp 4`

Reviewers: mikhail, sergei

Reviewed By: sergei

Subscribers: hector, ybase

Differential Revision: https://phabricator.dev.yugabyte.com/D7529
  • Loading branch information
bmatican committed Nov 27, 2019
1 parent 0b59f8d commit f9e0091
Show file tree
Hide file tree
Showing 7 changed files with 25 additions and 15 deletions.
5 changes: 3 additions & 2 deletions src/yb/consensus/log.cc
Original file line number Diff line number Diff line change
Expand Up @@ -367,7 +367,6 @@ Log::Log(LogOptions options, string log_path,
tablet_wal_path_(std::move(tablet_wal_path)),
schema_(schema),
schema_version_(schema_version),
active_segment_sequence_number_(0),
log_state_(kLogInitialized),
max_segment_size_(options_.segment_size_bytes),
appender_(new Appender(this, append_thread_pool)),
Expand Down Expand Up @@ -409,6 +408,7 @@ Status Log::Init() {
vector<scoped_refptr<ReadableLogSegment> > segments;
RETURN_NOT_OK(reader_->GetSegmentsSnapshot(&segments));
active_segment_sequence_number_ = segments.back()->header().sequence_number();
LOG_WITH_PREFIX(INFO) << "Opened existing logs. Last segment is " << segments.back()->path();
}

if (durable_wal_write_) {
Expand Down Expand Up @@ -837,6 +837,8 @@ yb::OpId Log::WaitForSafeOpIdToApply(const yb::OpId& min_allowed, MonoDelta dura
if (duration) {
return yb::OpId();
}
// TODO(bogdan): If the log is closed at this point, consider refactoring to return status
// and fail cleanly.
LOG_WITH_PREFIX(DFATAL) << "Long wait for safe op id: " << min_allowed
<< ", passed: " << (CoarseMonoClock::Now() - start);
}
Expand Down Expand Up @@ -1156,7 +1158,6 @@ Status Log::CreatePlaceholderSegment(const WritableFileOptions& opts,
}

uint64_t Log::active_segment_sequence_number() const {
SharedLock<rw_spinlock> read_lock(state_lock_.get_lock());
return active_segment_sequence_number_;
}

Expand Down
2 changes: 1 addition & 1 deletion src/yb/consensus/log.h
Original file line number Diff line number Diff line change
Expand Up @@ -381,7 +381,7 @@ class Log : public RefCountedThreadSafe<Log> {
gscoped_ptr<WritableLogSegment> active_segment_;

// The current (active) segment sequence number.
uint64_t active_segment_sequence_number_;
std::atomic<uint64_t> active_segment_sequence_number_ = {0};

// The writable file for the next allocated segment
std::shared_ptr<WritableFile> next_segment_file_;
Expand Down
3 changes: 3 additions & 0 deletions src/yb/master/async_rpc_tasks.cc
Original file line number Diff line number Diff line change
Expand Up @@ -123,6 +123,9 @@ Status RetryingTSRpcTask::Run() {
UnregisterAsyncTask(); // May delete this.
return STATUS(IllegalState, "Unable to run task because it has been aborted");
}
// TODO(bogdan): There is a race between scheduling and running and can cause this to fail.
// Should look into removing the kScheduling state, if not needed, and simplifying the state
// transitions!
DCHECK(task_state == MonitoredTaskState::kWaiting) << "State: " << ToString(task_state);

const Status s = ResetTSProxy();
Expand Down
5 changes: 0 additions & 5 deletions src/yb/master/catalog_manager.cc
Original file line number Diff line number Diff line change
Expand Up @@ -5145,11 +5145,6 @@ void CatalogManager::SendAddServerRequest(
Status status = task->Run();
WARN_NOT_OK(status, Substitute("Failed to send AddServer of tserver $0 to tablet $1",
change_config_ts_uuid, tablet.get()->ToString()));

// Need to print this after Run() because that's where it picks the TS which description()
// needs.
if (status.ok())
LOG(INFO) << "Started AddServer task: " << task->description();
}

void CatalogManager::GetPendingServerTasksUnlocked(const TableId &table_uuid,
Expand Down
7 changes: 4 additions & 3 deletions src/yb/rpc/thread_pool.cc
Original file line number Diff line number Diff line change
Expand Up @@ -181,13 +181,14 @@ class ThreadPool::Impl {
}
bool added = share_.task_queue.push(task);
DCHECK(added); // BasketQueue always succeed.
--adding_;
Worker* worker = nullptr;
while (share_.waiting_workers.pop(worker)) {
if (worker->Notify()) {
--adding_;
return true;
}
}
--adding_;

// We increment created_workers_ every time, the first max_worker increments would produce
// a new worker. And after that, we will just increment it doing nothing after that.
Expand Down Expand Up @@ -221,13 +222,13 @@ class ThreadPool::Impl {
worker->Stop();
}
}
workers_.clear();
// Shutdown is quite rare situation otherwise enqueue is quite frequent.
// Because of this we use "atomic lock" in enqueue and busy wait in shutdown.
// So we could process enqueue quickly, and stuck in shutdown for sometime.
while(adding_ != 0) {
while (adding_ != 0) {
std::this_thread::sleep_for(std::chrono::milliseconds(10));
}
workers_.clear();
ThreadPoolTask* task = nullptr;
while (share_.task_queue.pop(task)) {
task->Done(shutdown_status_);
Expand Down
14 changes: 10 additions & 4 deletions src/yb/tablet/tablet_peer.cc
Original file line number Diff line number Diff line change
Expand Up @@ -356,8 +356,11 @@ const consensus::RaftConfigPB TabletPeer::RaftConfig() const {
bool TabletPeer::StartShutdown() {
LOG_WITH_PREFIX(INFO) << "Initiating TabletPeer shutdown";

if (tablet_) {
tablet_->SetShutdownRequestedFlag();
{
std::lock_guard<decltype(lock_)> lock(lock_);
if (tablet_) {
tablet_->SetShutdownRequestedFlag();
}
}

{
Expand All @@ -381,8 +384,11 @@ bool TabletPeer::StartShutdown() {
// indirectly end up calling into the log, which we are about to shut down.
UnregisterMaintenanceOps();

if (consensus_) {
consensus_->Shutdown();
{
std::lock_guard<decltype(lock_)> lock(lock_);
if (consensus_) {
consensus_->Shutdown();
}
}

return true;
Expand Down
4 changes: 4 additions & 0 deletions src/yb/tserver/ts_tablet_manager.cc
Original file line number Diff line number Diff line change
Expand Up @@ -894,6 +894,10 @@ Status TSTabletManager::StartRemoteBootstrap(const StartRemoteBootstrapRequestPB
// to check what happens when this server receives raft consensus requests since at this point,
// this tablet server could be a voter (if the ChangeRole request in Finish succeeded and its
// initial role was PRE_VOTER).
//
// TODO(bogdan): This leads to a bunch of startup / shutdown race conditions, as this call is not
// executed on the thread pool, like the other OpenTablet calls, so on shutdown, it could be
// running initialization code for TabletPeer, Raft, Log, etc, which exposes some of these races.
OpenTablet(meta, nullptr);

// If OpenTablet fails, tablet_peer->error() will be set.
Expand Down

0 comments on commit f9e0091

Please sign in to comment.