Creating thread local cluster with LoadBalancerType::OriginalDst can't find its ClusterData in active_clusters_ and crash #7500

l8huang · 2019-07-08T23:29:02Z

Title: Creating thread local cluster with LoadBalancerType::OriginalDst can't find its ClusterData in active_clusters_ and crash

Description:
In our k8s+Istio environment, sometimes Envoy crashes when constructing ClusterManagerImpl::ThreadLocalClusterManagerImpl::ClusterEntry::ClusterEntry which type is LoadBalancerType::OriginalDst. It tried to look up its ClusterData in active_clusters_ by std::map's at() method, but std::out_of_range was thrown because element not exist:

    case LoadBalancerType::OriginalDst: {
      ASSERT(lb_factory_ == nullptr);
      lb_ = std::make_unique<OriginalDstCluster::LoadBalancer>(
          priority_set_, parent.parent_.active_clusters_.at(cluster->name())->cluster_,
          cluster->lbOriginalDstConfig());
      break;
    }

Looks like in normal case this couldn't happen:

CdsApiImpl::onConfigUpdate() makes to_add_repeated and to_remove_repeated being exclusive.
ClusterManagerImpl::addOrUpdateCluster() makes sure active_clusters_ has needed ClusterData

So this might be a concurrency issue, I doubt there were 2 config updating, the first one created a cluster, and the second one removed it before ThreadLocalClusterManagerImpl::ClusterEntry got constructed on worker threads.

The Envoy version is:

version: 4f5b5e101a081e05924990b1903d9d46553558d4/1.11.0-dev/Clean/RELEASE/BoringSSL

Considering Envoy guarantees eventually consistence, I guess the solution could be adding defensive code to handle the exception gracefully instead of crash. But from design point of view, do you have any rule like worker thread should not access active_clusters_ directly? I just wanna to figure out what's the suitable way to resolve such kind of issue ;)

If there are any similar issue was fixed before, could you please kindly let me know the PR or issue number?

Repro steps:
We don't have a stable method to reproduce this issue right now, some cluster info related to the cash are:

(gdb) p ((Envoy::Upstream::ClusterInfoImpl*)cluster_info_._M_ptr)->type_
$6 = envoy::api::v2::Cluster_DiscoveryType_ORIGINAL_DST   

(gdb) p ((Envoy::Upstream::ClusterInfoImpl*)cluster_info_._M_ptr)->added_via_api_
$7 = true  

(gdb) p ((Envoy::Upstream::ClusterInfoImpl*)cluster_info_._M_ptr)->lb_type_   
$1 = Envoy::Upstream::LoadBalancerType::OriginalDst

(gdb) p cluster_manager.parent_.init_helper_.state_
$12 = Envoy::Upstream::ClusterManagerInitHelper::State::AllClustersInitialized

(gdb) p cluster_manager.parent_.warming_clusters_ 
$21 = std::map with 0 elements

Call Stack:

(gdb) bt
#0  0x00007f004e85c428 in __GI_raise (sig=sig@entry=6) 
     at ../sysdeps/unix/sysv/linux/raise.c:54
#1  0x00007f004e85e02a in __GI_abort () 
     at abort.c:89
#2  0x0000000001063375 in Envoy::TerminateHandler::logOnTerminate() const::$_0::operator()() const (this=<optimized out>) 
     at source/exe/terminate_handler.cc:15
#3  0x0000000001063279 in Envoy::TerminateHandler::logOnTerminate() const::$_0::__invoke() () 
     at source/exe/terminate_handler.cc:12
#4  0x00000000011d55a6 in __cxxabiv1::__terminate(void (*)()) ()
#5  0x00000000011d55f1 in std::terminate() ()
#6  0x00000000011e1794 in __cxa_throw ()
#7  0x00000000011eb9ef in std::__throw_out_of_range(char const*) ()
#8  0x0000000000b54f27 in std::map<
            std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, 
            std::unique_ptr<Envoy::Upstream::ClusterManagerImpl::ClusterData, std::default_delete<Envoy::Upstream::ClusterManagerImpl::ClusterData> >, 
            std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, 
            std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > 
            const, std::unique_ptr<Envoy::Upstream::ClusterManagerImpl::ClusterData, std::default_delete<Envoy::Upstream::ClusterManagerImpl::ClusterData> > > > >
            ::at (this=<optimized out>, __k=...) 
     at /usr/lib/gcc/x86_64-linux-gnu/7.4.0/../../../../include/c++/7.4.0/bits/stl_map.h:533
#9  0x0000000000b5c8db in Envoy::Upstream::ClusterManagerImpl::ThreadLocalClusterManagerImpl::ClusterEntry::ClusterEntry (this=0x438b4000, parent=..., cluster=..., lb_factory=...) 
     at source/common/upstream/cluster_manager_impl.cc:1099
#10 0x0000000000b60c8c in Envoy::Upstream::ClusterManagerImpl::createOrUpdateThreadLocalCluster(Envoy::Upstream::ClusterManagerImpl::ClusterData&)::$_10::operator()() const (this=0x1ecabd10) 
     at source/common/upstream/cluster_manager_impl.cc:530
#11 std::_Function_handler<void (), Envoy::Upstream::ClusterManagerImpl::createOrUpdateThreadLocalCluster(Envoy::Upstream::ClusterManagerImpl::ClusterData&)::$_10>::_M_invoke(std::_Any_data const&) (__functor=...) 
     at /usr/lib/gcc/x86_64-linux-gnu/7.4.0/../../../../include/c++/7.4.0/bits/std_function.h:316
#12 0x0000000000b1f10d in std::function<void ()>::operator()() const (this=<optimized out>) 
     at /usr/lib/gcc/x86_64-linux-gnu/7.4.0/../../../../include/c++/7.4.0/bits/std_function.h:706
#13 Envoy::Event::DispatcherImpl::runPostCallbacks (this=0x2bf1200) 
     at source/common/event/dispatcher_impl.cc:198
#14 0x0000000000b1efbe in Envoy::Event::DispatcherImpl::run (this=0x2bf1200, type=Envoy::Event::Dispatcher::RunType::Block) 
     at source/common/event/dispatcher_impl.cc:177
#15 0x0000000000b19442 in Envoy::Server::WorkerImpl::threadRoutine (this=0x2c3f2c0, guard_dog=...) 
     at source/server/worker_impl.cc:104
#16 0x00000000010639e5 in std::function<void ()>::operator()() const (this=0x18) 
     at /usr/lib/gcc/x86_64-linux-gnu/7.4.0/../../../../include/c++/7.4.0/bits/std_function.h:706
#17 Envoy::Thread::ThreadImplPosix::ThreadImplPosix(std::function<void ()>)::$_0::operator()(void*) const (this=<optimized out>, arg=0x0) 
     at source/common/common/posix/thread_impl.cc:39
#18 Envoy::Thread::ThreadImplPosix::ThreadImplPosix(std::function<void ()>)::$_0::__invoke(void*) (arg=0x0) 
     at source/common/common/posix/thread_impl.cc:38
#19 0x00007f004ef016ba in start_thread (arg=0x7f0030feb700) 
     at pthread_create.c:333
#20 0x00007f004e92e41d in clone () 
     at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

The text was updated successfully, but these errors were encountered:

alyssawilk · 2019-07-09T13:35:12Z

I'm not 100% sure who could best look into this.
@jrajahalme or @ramaraochavali would either of you be able to check this out?

l8huang · 2019-07-09T17:54:12Z

I added a simple test case to test/integration_test.cc to reproduce this issue(that's CdsClusterCreateDelete in commit l8huang@279ae5f):

# bazel  test -c dbg  //test/integration:cds_integration_test      --test_env=ENVOY_IP_TEST_VERSIONS=v4only     --test_output=streamed --cache_test_results=no     --test_arg="-l info" 
WARNING: The following rc files are no longer being read, please transfer their contents or import their path into one of the standard rc files:
/develop/envoy/tools/bazel.rc
WARNING: Streamed test output requested. All tests will be run locally, without sharding, one at a time
INFO: Build options --run_under, --test_arg, --test_env, and 1 more have changed, discarding analysis cache.
INFO: Analyzed target //test/integration:cds_integration_test (0 packages loaded, 13781 targets configured).
INFO: Found 1 test target...
WARNING: Perftools heap leak checker is active -- Performance may suffer
[==========] Running 8 tests from 1 test suite.
[----------] Global test environment set-up.
[----------] 8 tests from IpVersionsClientTypeDelta/CdsIntegrationTest
[ RUN      ] IpVersionsClientTypeDelta/CdsIntegrationTest.CdsClusterUpDownUp/0
[       OK ] IpVersionsClientTypeDelta/CdsIntegrationTest.CdsClusterUpDownUp/0 (2250 ms)
[ RUN      ] IpVersionsClientTypeDelta/CdsIntegrationTest.CdsClusterUpDownUp/1
[       OK ] IpVersionsClientTypeDelta/CdsIntegrationTest.CdsClusterUpDownUp/1 (3167 ms)
[ RUN      ] IpVersionsClientTypeDelta/CdsIntegrationTest.CdsClusterCreateDelete/0
terminate called after throwing an instance of 'std::out_of_range'
  what():  map::at
[2019-07-09 17:47:23.029][42][critical][backtrace] [bazel-out/k8-dbg/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:81] Caught Aborted, suspect faulting address 0xe
[2019-07-09 17:47:23.029][42][critical][backtrace] [bazel-out/k8-dbg/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:69] Backtrace (use tools/stack_decode.py to get line numbers):
[2019-07-09 17:47:23.050][42][critical][backtrace] [bazel-out/k8-dbg/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:73] #0: Envoy::SignalAction::sigHandler() [0x1a5d36e]
[2019-07-09 17:47:23.050][42][critical][backtrace] [bazel-out/k8-dbg/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:73] #1: __restore_rt [0x7fd5524dc390]
external/bazel_tools/tools/test/test-setup.sh: line 310:    14 Aborted                 "${TEST_PATH}" "$@" 2>&1
FAIL: //test/integration:cds_integration_test (see /root/.cache/bazel/_bazel_root/8a829d5918f76371d36df361a9235db6/execroot/envoy/bazel-out/k8-dbg/testlogs/test/integration/cds_integration_test/test.log)
Target //test/integration:cds_integration_test up-to-date:
  bazel-bin/test/integration/cds_integration_test
INFO: Elapsed time: 14.577s, Critical Path: 12.96s
INFO: 2 processes: 2 linux-sandbox.
INFO: Build completed, 1 test FAILED, 2 total actions

I'm trying to make a patch to fix this issue in a reasonable way(I would try to use defensive code firstly), if you have any suggestion please kindly let me know ;)

l8huang · 2019-07-10T20:36:01Z

I'm working on fixing this issue, basic idea is using ClusterSharedPtr instead of ClusterInfoConstSharedPtr to construct ClusterManagerImpl::ThreadLocalClusterManagerImpl::ClusterEntry:

 ClusterManagerImpl::ThreadLocalClusterManagerImpl::ClusterEntry::ClusterEntry(
-    ThreadLocalClusterManagerImpl& parent, ClusterInfoConstSharedPtr cluster,
+    ThreadLocalClusterManagerImpl& parent, ClusterSharedPtr cluster,
     const LoadBalancerFactorySharedPtr& lb_factory)

In this way, it needn't to access active_clusters_ for getting the ClusterSharedPtr in worker thread.

mattklein123 · 2019-07-10T20:49:40Z

I don't think ^ is the correct fix, as the cluster is not generally thread safe, only the info. Do you have a summary of what the issue is?

jrajahalme · 2019-07-10T22:18:31Z

I think the analysis above is correct, the access to active_clusters_ from a worker thread is wrong, as active_clusters_ is not thread safe. This originates from my initial commit that adds the support for OriginalDstCluster.

I checked for other cases for improper access and found that constructor of ThreadLocalClusterManagerImpl which is called from the ClusterMangerImpl via SlotImpl::set() also accesses active_clusters_. This seems intentional, as active_clusters_ is protected rather than private. However, the initializeCb given to SlotImpl::set() is called on each thread:

  /**
   * Set thread local data on all threads previously registered via registerThread().
   * @param initializeCb supplies the functor that will be called *on each thread*. The functor
   *                     returns the thread local object which is then stored. The storage is via
   *                     a shared_ptr. Thus, this is a flexible mechanism that can be used to share
   *                     the same data across all threads or to share different data on each thread.
   */

So the problem with improper access to active_clusters_ is a bit larger. The simplest fix would be to protect all access to active_clusters_ with a absl::Mutex like in source/common/upstream/thread_aware_lb_impl.cc for the members annotated with GUARDED_BY(mutex_), for example. Maybe there is a more elegant fix out there as well?

l8huang · 2019-07-10T22:54:41Z

thanks @jrajahalme for summary.

Looks like adding protection to active_clusters_ with a mutex wouldn't help, a 'ClusterSharedPtr' is added to and removed from active_clusters_ sequentially by main thread, no need to use mutex in this case. The problem is when a worker thread callback accessing active_clusters_ for getting 'ClusterSharedPtr', it could be already removed by main thread. This is a more lifecycle problem than mutex protection issue.

Considering 'ClusterSharedPtr' is a shared pointer(ClusterDataPtr is a unique pointer, owner is active_clusters_), I guess it's fine to let callback to capture it and use it to create OriginalDstCluster which holds it with a weak pointer. If there were any concurrency issue when using ClusterSharedPtr(e.g lock weak pointer, modifying cluster) in worker thread, the cluster implementation should take care of that by it self.

jrajahalme · 2019-07-10T23:15:57Z

STL map operations are not thread safe for concurrent reads and writes, so this is a more general concurrent access problem. All we know the worker thread may crash just reading from the map!

In more detail, a worker thread may be reading from a map that is in inconsistent state, e.g., partially updated by the main thread when in a middle of an insert operation, maybe reallocating the map to make more space.

l8huang · 2019-07-10T23:31:23Z

Sorry for confusing. I meant callback can capture ClusterSharedPtr, then worker thread needn't to access active_clusters_ at all.

Here is a patch for fixing this issue: l8huang@08f9c01

Let me know if this fix is ok.

If a original_dst cluster got deleted right after it's created, on worker thread, its creation callback may not be able to find its ClusterData in ClusterManagerImpl::active_clusters_, then it would crash with std::out_of_rang exception, because it tried to look up its ClusterData in ClusterManagerImpl::active_clusters_ by std::map::at(). This change makes thread local cluster creation callback to capture ClusterSharedPtr directly and use it to create OriginalDstCluster which holds it with a weak pointer. In this way, worker thread needn't to access ClusterManagerImpl::active_clusters_ anymore. Testing: Add cds integration test, create/delete original_dst cluster repeatedly. Fixes issue envoyproxy#7500 Signed-off-by: lhuang8 <[email protected]>

mattklein123 · 2019-07-11T03:40:03Z

I will take a look at this in more detail tomorrow and advise on the fix.

jrajahalme · 2019-07-11T16:43:23Z

@l8huang Your fix looks good to me, and resolves the bug you found. I left a couple of comments to remind you to clean up some testing/debugging related changes you still have in there.
However, the issue of invalid access to active_clusters_ in the constructor of ThreadLocalClusterManagerImpl still remains.

Use ThreadAwareLoadBalancer in OriginalDstCluster to pass OriginalDstClusterSharedPtr to LoadBalancer. Fixes: envoyproxy#7500 Signed-off-by: Jarno Rajahalme <[email protected]>

alyssawilk added the bug label Jul 9, 2019

mattklein123 added this to the 1.12.0 milestone Jul 9, 2019

mattklein123 added the help wanted Needs help! label Jul 9, 2019

l8huang mentioned this issue Jul 11, 2019

cds: fix original_dst cluster creation crash in worker thread callback #7531

Closed

jrajahalme mentioned this issue Aug 3, 2019

OriginalDstCluster: Use ThreadAwareLoadBalancer #7820

Merged

mattklein123 closed this as completed in #7820 Aug 8, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Creating thread local cluster with LoadBalancerType::OriginalDst can't find its ClusterData in active_clusters_ and crash #7500

Creating thread local cluster with LoadBalancerType::OriginalDst can't find its ClusterData in active_clusters_ and crash #7500

l8huang commented Jul 8, 2019

alyssawilk commented Jul 9, 2019

l8huang commented Jul 9, 2019

l8huang commented Jul 10, 2019

mattklein123 commented Jul 10, 2019

jrajahalme commented Jul 10, 2019

l8huang commented Jul 10, 2019

jrajahalme commented Jul 10, 2019 •

edited

Loading

l8huang commented Jul 10, 2019

mattklein123 commented Jul 11, 2019

jrajahalme commented Jul 11, 2019

Creating thread local cluster with LoadBalancerType::OriginalDst can't find its ClusterData in active_clusters_ and crash #7500

Creating thread local cluster with LoadBalancerType::OriginalDst can't find its ClusterData in active_clusters_ and crash #7500

Comments

l8huang commented Jul 8, 2019

alyssawilk commented Jul 9, 2019

l8huang commented Jul 9, 2019

l8huang commented Jul 10, 2019

mattklein123 commented Jul 10, 2019

jrajahalme commented Jul 10, 2019

l8huang commented Jul 10, 2019

jrajahalme commented Jul 10, 2019 • edited Loading

l8huang commented Jul 10, 2019

mattklein123 commented Jul 11, 2019

jrajahalme commented Jul 11, 2019

jrajahalme commented Jul 10, 2019 •

edited

Loading