Fix a crash that happens when Firestore is being terminated when the last shared pointer to it is in a listener #7412

var-const · 2021-01-29T23:29:35Z

If a user calls terminate and immediately disposes their reference to Firestore, it's possible that while termination is in process, the last remaining reference (more precisely, shared pointer) to Firestore is in a listener. When that listener is destroyed as part of the termination, it leads to api::Firestore being destroyed. Importantly, termination happens on the worker queue. The destructor of api::Firestore calls Dispose which presumes it's not called from the worker queue and tries to enqueue work, leading to a failing assertion and a crash.

The general problem is that FirestoreClient::Dispose may be invoked on the worker queue from TerminateInternal and needs a way to handle it gracefully. This PR largely makes Dispose a no-op in that case.

Fixes #6909.

var-const · 2021-01-29T23:32:37Z

@wilhuff Any suggestions on different approaches/potential pitfalls/etc. are very welcome.

var-const · 2021-01-30T00:10:39Z

@wilhuff Looks like extending the lifetime of api::Firestore isn't really necessary -- Dispose works fine when invoked from TerminateInternal as long as I use is_terminated to check whether enqueuing TerminateInternal is necessary. This approach seems simpler -- sorry if you have already started with the review.

var-const · 2021-01-30T03:26:42Z

Hmm, I'm getting an Asan error in CI. I'll investigate and let you know when this PR is ready for another look.

var-const · 2021-01-30T03:52:47Z

Ok, it's ready for review again. The simpler approach doesn't work, and the previous approach needed a fix.

var-const · 2021-01-30T03:53:50Z

Firestore/core/src/util/async_queue.cc

@@ -86,6 +86,7 @@ void AsyncQueue::ExecuteBlocking(const Operation& operation) {
              "ExecuteBlocking may not be called "
              "before the previous operation finishes executing");

+  auto self = shared_from_this();


Is there a reason we haven't done this before? is_operation_in_progress_ = false; may crash when api::Firestore (and, consequently, the worker queue) is destroyed on the queue (in FirestoreClient::TerminateAsync).

This seems like an oversight.

I think the idea at the time was that FirestoreClient would keep the AsyncQueue alive as long as it needed to, but this doesn't account for the case where the Firestore destructor is running on the AsyncQueue.

wilhuff · 2021-02-01T17:34:29Z

Firestore/core/src/core/firestore_client.cc

+  worker_queue_->EnqueueEvenWhileRestricted([&, this, callback] {
+    // Make sure this `FirestoreClient` is not destroyed during
+    // `TerminateInternal`, so that `user_executor_` stays valid.
+    auto self = shared_from_this();


If we're depending on the shared_ptr keeping the instance alive, we should create it outside the callback. Otherwise, this can be destroyed while this lambda is awaiting execution.

It seems like both callbacks that call TerminateInternal should capture a shared_ptr<FirestoreClient> to ensure the instance lives past that point.

wilhuff · 2021-02-01T17:40:23Z

Firestore/core/src/core/firestore_client.h

@@ -223,6 +225,9 @@ class FirestoreClient : public std::enable_shared_from_this<FirestoreClient> {
  bool credentials_initialized_ = false;
  local::LruDelegate* _Nullable lru_delegate_;
  util::DelayedOperation lru_callback_;
+
+  // Used during shutdown to guarantee lifetimes.
+  std::weak_ptr<api::Firestore> weak_firestore_;


The type here doesn't actually matter since we don't need to manipulate this instance. Could this be std::weak_ptr<void>?

wilhuff · 2021-02-01T17:46:59Z

Firestore/core/src/core/firestore_client.cc

-    TerminateInternal();
-  });
+  bool enqueued = false;
+  // If `remote_store_` is null, it means termination has finished already. In


There's no reference to remote_store_ in the code below, and remote_store_ isn't actually a good signal anyway, since it is reset asynchronously. I think you're referring to is_terminated(), right?

Second, phrasing this in terms of "the sequential order invariant" obscures what's going on. I think the issue is that EnqueueEvenWhileRestricted can't be called while already on the async queue. Restated that way it seems like we should just fix that rather than trying to introduce yet another guard on top of that.

wilhuff · 2021-02-01T17:59:02Z

Firestore/core/src/core/firestore_client.cc

@@ -263,7 +274,13 @@ void FirestoreClient::Dispose() {

 void FirestoreClient::TerminateAsync(StatusCallback callback) {
  worker_queue_->EnterRestrictedMode();
-  worker_queue_->EnqueueEvenWhileRestricted([this, callback] {
+  worker_queue_->EnqueueEvenWhileRestricted([&, this, callback] {


Default capture by reference in an asynchronously executing block is a recipe for disaster. It doesn't even seem like it's necessary. Remove?

wilhuff · 2021-02-01T17:59:45Z

Firestore/core/src/core/firestore_client.cc

+  // dangerous in the case when `Dispose` is invoked immediately after the
+  // termination, still on the worker queue -- it would break the sequential
+  // order invariant of the queue.
+  if (!is_terminated()) {


This is racy. Two threads could observe !is_terminated() simultaneously and then both attempt the block below.

A better solution might be to have EnqueueEvenWhileRestricted do the right thing. This already has logic to deal with the race inherent in two callers attempting to dispose the instance. Perhaps we just need to make it EnqueueRelaxedEvenWhileRestricted?

wilhuff · 2021-02-01T18:06:19Z

Firestore/core/src/util/async_queue.cc

@@ -86,6 +86,7 @@ void AsyncQueue::ExecuteBlocking(const Operation& operation) {
              "ExecuteBlocking may not be called "
              "before the previous operation finishes executing");

+  auto self = shared_from_this();


This seems like an oversight.

I think the idea at the time was that FirestoreClient would keep the AsyncQueue alive as long as it needed to, but this doesn't account for the case where the Firestore destructor is running on the AsyncQueue.

var-const · 2021-02-02T00:55:04Z

Closing in favor of #7421.

…last shared pointer to it is in a listener (#7421) If a user calls terminate and immediately disposes their reference to Firestore, it's possible that while termination is in process, the last remaining reference (more precisely, shared pointer) to Firestore is in a listener. When that listener is destroyed as part of the termination, it leads to `api::Firestore` being destroyed. Importantly, termination happens on the worker queue. The destructor of `api::Firestore` calls `Dispose` which presumes it's not called from the worker queue and tries to enqueue work, leading to a failing assertion and a crash. The solution is simply to remove the sequential order checks when the queue enters/is in restricted mode. There is a legitimate case where the Firestore destructor can run on the worker queue, as the original issue shows. The complexity of #7412 also indicates that a simpler solution is preferable. Fixes #6909.

var-const added 8 commits January 27, 2021 20:05

Sketch a failing test

64883d9

Wip

338d0ba

wip

3281b16

working

99f0708

cleanup

b128d35

More cleanup

f752eee

More cleanup

1da99bc

style.sh

2bc17dd

var-const added the api: firestore label Jan 29, 2021

google-cla bot added the cla: yes label Jan 29, 2021

var-const requested a review from wilhuff January 29, 2021 23:31

var-const assigned wilhuff Jan 29, 2021

var-const added 5 commits January 29, 2021 18:39

changelog

cfec3fd

A simpler approach

e7b095a

style.sh

bfe3b34

Remove redundant using declaration

9e99c2a

Fix comment

8a3676b

Remove unused parameters

d600f1d

var-const unassigned wilhuff Jan 30, 2021

var-const added 4 commits January 29, 2021 22:40

Revert

a908828

Undo accidental

9b7093a

Fix compilation

80d35e3

Fix

d5e230d

var-const assigned wilhuff Jan 30, 2021

var-const commented Jan 30, 2021

View reviewed changes

wilhuff reviewed Feb 1, 2021

View reviewed changes

wilhuff assigned var-const and unassigned wilhuff Feb 1, 2021

var-const mentioned this pull request Feb 2, 2021

Fix a crash that happens when Firestore is being terminated when the last shared pointer to it is in a listener #7421

Merged

var-const closed this Feb 2, 2021

firebase locked and limited conversation to collaborators Mar 4, 2021

paulb777 deleted the varconst/terminate-fix-2 branch January 21, 2022 00:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix a crash that happens when Firestore is being terminated when the last shared pointer to it is in a listener #7412

Fix a crash that happens when Firestore is being terminated when the last shared pointer to it is in a listener #7412

var-const commented Jan 29, 2021 •

edited

Loading

var-const commented Jan 29, 2021

var-const commented Jan 30, 2021

var-const commented Jan 30, 2021

var-const commented Jan 30, 2021

var-const Jan 30, 2021

wilhuff Feb 1, 2021

wilhuff Feb 1, 2021

wilhuff Feb 1, 2021

wilhuff Feb 1, 2021

wilhuff Feb 1, 2021

wilhuff Feb 1, 2021

wilhuff Feb 1, 2021

var-const commented Feb 2, 2021

Fix a crash that happens when Firestore is being terminated when the last shared pointer to it is in a listener #7412

Fix a crash that happens when Firestore is being terminated when the last shared pointer to it is in a listener #7412

Conversation

var-const commented Jan 29, 2021 • edited Loading

var-const commented Jan 29, 2021

var-const commented Jan 30, 2021

var-const commented Jan 30, 2021

var-const commented Jan 30, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

var-const commented Feb 2, 2021

var-const commented Jan 29, 2021 •

edited

Loading