Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ANRs causing SIGABRT in 5.5.2 on some devices #1100

Closed
RicoYao opened this issue Jan 28, 2021 · 5 comments
Closed

ANRs causing SIGABRT in 5.5.2 on some devices #1100

RicoYao opened this issue Jan 28, 2021 · 5 comments

Comments

@RicoYao
Copy link

RicoYao commented Jan 28, 2021

Describe the bug

In v5.5.2 I notice that ANRs cause the app to terminate, and the Google ANR dialog is not shown.
This appears to be dependent on the device. I can reproduce on my OnePlus 7T but NOT on a Pixel 2XL.
The bug does not seem to occur on v5.5.0 and v5.5.1. It seems new in v5.5.2.

Steps to reproduce

  1. Trigger an ANR
  2. A SIGABRT results and is reported to the BugSnag console:
SIGABRT Abort program 
    /apex/com.android.runtime/lib64/bionic/libc.so:472112 abort
    /apex/com.android.runtime/lib64/libart.so:4953592 art::Runtime::Abort(char const*)
    /system/lib64/libbase.so:46172 android::base::LogMessage::~LogMessage()
    /apex/com.android.runtime/lib64/libart.so:4482236 art::OatHeader::GetCompilerFilter() const
    /apex/com.android.runtime/lib64/libart.so:4513252 art::OatFile::GetCompilerFilter() const
    /apex/com.android.runtime/lib64/libart.so:4560212 art::OatFileManager::DumpForSigQuit(std::__1::basic_ostream<char, std::__1::char_traits<char> >&)
    /apex/com.android.runtime/lib64/libart.so:5007812 art::Runtime::DumpForSigQuit(std::__1::basic_ostream<char, std::__1::char_traits<char> >&)
    /apex/com.android.runtime/lib64/libart.so:5090908 art::SignalCatcher::HandleSigQuit()
    /apex/com.android.runtime/lib64/libart.so:5086984 art::SignalCatcher::Run(void*)
    /apex/com.android.runtime/lib64/bionic/libc.so:879476 0x7b078e2b74
    /apex/com.android.runtime/lib64/bionic/libc.so:478896 0x7b07880eb0
    unknown 0x0
  1. The Android ANR dialog is NOT shown.

Environment

  • Android version: Android Q
  • Bugsnag version: 5.5.2
  • Emulator or physical device: Physical device. Reproducible on a OnePlus 7t, but not on a Pixel 2XL.
@xljones
Copy link

xljones commented Jan 29, 2021

Hi @RicoYao, thanks for raising this. We're looking to reproduce this issue and investigate further now.

@xljones xljones added bug Confirmed bug needs discussion Requires internal analysis/discussion labels Jan 29, 2021
@bugsnagbot bugsnagbot added the scheduled Work is starting on this feature/bug label Jan 29, 2021
@xljones xljones removed the needs discussion Requires internal analysis/discussion label Jan 29, 2021
@xljones
Copy link

xljones commented Jan 29, 2021

We are not able to reproduce the behaviour described using a OnePlus 7 Pro/7T using the example app in our repo. In our testing, the dialog is shown and the app is automatically terminated by the OS after ~10 seconds with no SIGABRT being raised/reported.

Can you please try the example app on the device that was causing this issue, and let us know if this issue still occurs on this app?

Additionally:

  • does the issue still occur in your app without ANR detection enabled?
  • does the issue occur 100% of the time in your app with bugsnag-android 5.5.2?
  • what manufacturer is the event that’s created being reported to Bugsnag as?

@RicoYao
Copy link
Author

RicoYao commented Jan 29, 2021

Hi @xander-jones thanks for looking into it.

  • If I do not enable BugSnag ANR detection, then this issue does not occur. I get the ANR dialog as expected.
  • This issue does not seem to occur in the sample app. With that sample app, I get the ANR dialog as expected, and it gets reported to BugSnag.
  • In our app, it seems to only occur in our production (signed, shrunk/obfuscated) builds. It does not happen in debug. It does happen 100% of the time on my OnePlus 7T, and we were also able to reproduce it once on a Pixel (though the majority of the time, the PIxel does not experience it).
  • The manufacturer is reported as "OnePlus" in the BugSnag console (or "Google" for the one time we reproduce on a Pixel).

@mattdyoung
Copy link

Hi @RicoYao

Thanks for the above and for the other information you've been able to share with us via email.

Our analysis so far:

The failing code is here:

CHECK(key_value != nullptr) << "compiler-filter not found in oat header";

This is just a sanity check in the runtime library. User code should have no means to cause this.

From the Android documentation:
"One core ART option to configure these two categories is compiler filters. Compiler filters drive how ART compiles DEX code and is an option passed to the dex2oat tool."

There are very few places where compiler-filter (or its constant kCompilerFilter) exists in the ART runtime library:

$ grep -r kCompilerFilter *|grep -v test
dex2oat/dex2oat.cc:    key_value_store_->Put(OatHeader::kCompilerFilter,
runtime/oat.h:  static constexpr const char* kCompilerFilter = "compiler-filter";
runtime/oat.cc:  const char* key_value = GetStoreValueByKey(kCompilerFilter);
runtime/oat_file.cc:    store.Put(OatHeader::kCompilerFilter, CompilerFilter::NameOfFilter(CompilerFilter::kVerify));

Looking at the dex2oat source, the dex2oat tool sets the compiler-filter value in the finished binary here.

The runtime gets this value and puts it into an internal store here.

The runtime later calls OatHeader::GetCompilerFilter() as part of its ANR handling process, and that's where the sanity check occurs. In this case it's failing because for whatever reason the compiler-filter value didn't get set earlier by the runtime.

The only reference we found to this bug is here, but there wasn’t any follow up.

It seems probable that this is an OS/runtime bug in Huawei/OnePlus devices that is triggered more reliably by Bugsnag's ANR handling in conjunction with something specific with this app.

Our current thinking is that some sort of tooling or compiler options are causing the assertion to fail.

@mattdyoung mattdyoung removed bug Confirmed bug scheduled Work is starting on this feature/bug labels Apr 9, 2021
@mattdyoung
Copy link

Hi @RicoYao

To add to our discussion via email for the benefit of anyone else following this issue, we've found evidence of this SIGABRT stacktrace occurring in similar volumes with or without Bugsnag ANR detection enabled. So it seems likely it is an existing OS/runtime bug on certain devices and that Bugsnag's ANR detection is just influencing the timing of whether it occurs in specific testing.

I think Bugsnag are unlikely to be able to progress this issue in the absence of a shareable reproduction case.

Closing for now but we can re-open to investigate further if more evidence comes to light or you can share a way to reproduce.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants