-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deadlock in Foundation (pre-macOS 11) triggered by GDT code. #7171
Comments
I found a few problems with this issue:
|
@bjhomer Thank you for reporting it and your detailed investigation! The fix looks reasonable to me. Would you like to crate a PR with the fix? Your contribution is appreciated! |
If an upload coordinator gets created during `+load` calls, it can end up running its immediately, while +load calls are still running. It's generally a bad idea to be splitting off threads while +load calls are still running. The upload coordinator checks every 30 seconds, and doesn't really need to run right away. Since it's easy to transitively create an upload coordinator from other calls that happen during +load, we'll just push off the initial run of the timer slightly. That makes it less likely to interfere (or deadlock) with other load calls. See firebase#7171 for an example of a problem solved by this change.
I've added a PR to make the proposed changes: |
If this is merged soon, is there any chance we could get a new version of the Firebase pod cut? We're hoping to put out a new release by tomorrow, and I'd rather not have to point to a fork of the whole SDK. |
Thank you for the PR! As for the release - we don't have any regular releases scheduled until the beginning of 2021 due to the holidays. We will have to evaluate if it is safe to release a version with the change as a patch release. If you are going to release soon, I would recommend to be ready to use the version from the fork until the next release. We will post expected release date later. |
If an upload coordinator gets created during `+load` calls, it can end up running its immediately, while +load calls are still running. It's generally a bad idea to be splitting off threads while +load calls are still running. The upload coordinator checks every 30 seconds, and doesn't really need to run right away. Since it's easy to transitively create an upload coordinator from other calls that happen during +load, we'll just push off the initial run of the timer slightly. That makes it less likely to interfere (or deadlock) with other load calls. See firebase#7171 for an example of a problem solved by this change.
Just to make sure we are on the same page here. If by the whole SDK you mean entire Firebase SDK, then actually you can avoid it. E.g. for Cocoapods you can add something like the following to install only GoogleDataTransport from the fork:
|
Thanks. I tried this: pod 'Firebase/Crashlytics'
pod 'GoogleDataTransport', :git => 'https://github.com/bjhomer/firebase-ios-sdk.git', :commit => 'd5e421c' But I get this error on pod install:
It seems like there's a dependency conflict of some kind, but I'm not familiar enough with these particular dependencies to know how to resolve it. |
@bjhomer It seems like |
Yep, our next minor release is doing a coordinated update of the nanopb dependency, so parts of the master branch are not separably compatible with the current latest. The PR would need to be backported/cherry-picked to the |
If an upload coordinator gets created during `+load` calls, it can end up running its immediately, while +load calls are still running. It's generally a bad idea to be splitting off threads while +load calls are still running. The upload coordinator checks every 30 seconds, and doesn't really need to run right away. Since it's easy to transitively create an upload coordinator from other calls that happen during +load, we'll just push off the initial run of the timer slightly. That makes it less likely to interfere (or deadlock) with other load calls. See firebase#7171 for an example of a problem solved by this change.
Okay, I can do that. It looks like my PR has some failures in |
If an upload coordinator gets created during `+load` calls, it can end up running its immediately, while +load calls are still running. It's generally a bad idea to be splitting off threads while +load calls are still running. The upload coordinator checks every 30 seconds, and doesn't really need to run right away. Since it's easy to transitively create an upload coordinator from other calls that happen during +load, we'll just push off the initial run of the timer slightly. That makes it less likely to interfere (or deadlock) with other load calls. See firebase#7171 for an example of a problem solved by this change.
If an upload coordinator gets created during `+load` calls, it can end up running its immediately, while +load calls are still running. It's generally a bad idea to be splitting off threads while +load calls are still running. The upload coordinator checks every 30 seconds, and doesn't really need to run right away. Since it's easy to transitively create an upload coordinator from other calls that happen during +load, we'll just push off the initial run of the timer slightly. That makes it less likely to interfere (or deadlock) with other load calls. See #7171 for an example of a problem solved by this change.
[REQUIRED] Step 1: Describe your environment
CocoaPods
[REQUIRED] Step 2: Describe the problem
We have users reporting that our macOS app often hangs on launch on pre-macOS 11 systems. We've traced this to a deadlock happening when two threads try to do an initial access of NSUserDefaults at the same time during
+load
calls. See the attached backtrace for details. We're able to reproduce this on ~50% of launches on these older systems. This is probably dependent on the particular ordering and timing of+load
calls, and so may be very sensitive to the presence of otherwise-unrelated files.This makes it ultimately an Apple issue, and they seem to have fixed it in macOS 11 (Big Sur). However, our users are still running into it, and it's easy to fix by making one change to GoogleDataTransport.
Steps to reproduce:
pod Firebase/Crashlytics
in a macOS app that runs on something older than macOS 11. (We've seen it on both 10.13 and 10.15).What happened? How can we make the problem occur?
The main thread is doing this:
A second thread is doing this:
A full stack trace is attached at the end of this report.
Note that the second thread is calling back into
dlopen()
down in Foundation, which seems to be causing a deadlock with the main thread. This secondary thread call is happening becauseGDTCORUploadCoordinator
is being initialized from some+load
call, which is triggering a Dispatch timer to check for uploads.As far as I can tell,
GDTCORUploadCoordinator
is checking every 30 seconds whether it has information to report. That 30 second suggests that it is probably not critical that this check happen right at launch, while we're still loading in frameworks. Thus, I'm proposing a simple workaround: deferGDTCORUploadCoordinator
's upload check for 1s after launch.Relevant Code:
+[GDTCORFlatFileStorage load]
calls+[GDTCORUploadCoordinator sharedInstance]
, which calls-[GDTCORUploadCoordinator startTimer]
. That creates a timer with an initial deadline ofDISPATCH_TIME_NOW
, which makes it run during the processing of+load
calls, which means it can race with other+load
calls.I have verified that I can fix it by changing the following code in
-startTimer
:Stack trace of deadlock attached here:
deadlock_sample.txt
The text was updated successfully, but these errors were encountered: