Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash in 8.3.0 in +[SentryProfiler slicedSamples:transaction:] #2779

Closed
eric opened this issue Mar 12, 2023 · 15 comments · Fixed by #2786
Closed

Crash in 8.3.0 in +[SentryProfiler slicedSamples:transaction:] #2779

eric opened this issue Mar 12, 2023 · 15 comments · Fixed by #2786

Comments

@eric
Copy link

eric commented Mar 12, 2023

Platform

iOS

Installed

CocoaPods

Version

8.3.0

Steps to Reproduce

  1. Open app
  2. Tap around
  3. App Crashes

Expected Result

App shouldn't crash.

Actual Result

screenshot-2023-03-11-20 17 47@2x

Are you willing to submit a PR?

No response

@brustolin
Copy link
Contributor

Hello @eric
Can you share the options you are using to start SentrySDK? I will try to reproduce this.

@philipphofmann
Copy link
Member

philipphofmann commented Mar 13, 2023

@brustolin, to reproduce this, I think you need to run multiple transactions in parallel. Then it should pop up.

@armcknight, I think the profiler keeps adding to the samples array while slicing it. We could use locking on the data structures we iterate on or copy them. I vote for copying as locks will slow us down.

NSArray<SentrySample *> *samples = _gCurrentProfiler->_profileData[@"profile"][@"samples"];
// We need at least two samples to be able to draw a stack frame for any given function: one
// sample for the start of the frame and another for the end. Otherwise we would only have a
// stack frame with 0 duration, which wouldn't make sense.
if ([samples count] < 2) {
SENTRY_LOG_DEBUG(@"Not enough samples in profile");
return nil;
}
// slice the profile data to only include the samples/metrics within the transaction
const auto slicedSamples = [self slicedSamples:samples transaction:transaction];

We should also copy the data for the payload in places like the following, as the profiler could keep adding data before the payload gets stored on disk, leading to inaccurate data. The downside is, of course, increased memory usage.

payload[@"profile"] = @{
@"samples" : serializedSamplesWithRelativeTimestamps(slicedSamples, transaction),
@"stacks" : _gCurrentProfiler->_profileData[@"profile"][@"stacks"],
@"frames" : _gCurrentProfiler->_profileData[@"profile"][@"frames"],
@"thread_metadata" : _gCurrentProfiler->_profileData[@"profile"][@"thread_metadata"],
@"queue_metadata" : _gCurrentProfiler->_profileData[@"profile"][@"queue_metadata"],
};

@armcknight
Copy link
Member

Correct about the sampling profiler being able to still mutate the data structure by adding to it while these enumeration blocks are running. Easiest to do the copy for now, you're correct we don't want to have that additional lock in sampling profiler.

@armcknight
Copy link
Member

We talked about trimming out any frames or stacks added after slicing the samples array, but deprioritized it as a small optimization that probably won't save us much, as those data structures are relatively small and compactible in compression. We can have another look though.

@eric
Copy link
Author

eric commented Mar 14, 2023

Did this impact anyone else? Were there other reports?

Having my exception reporting library cause my app to crash is a very bad experience. To me I consider it on the same level as a serious outage. Was this release pulled? Were customers notified of the issue?

Is there anything in place in the Sentry platform to track and identify potential issues in the exception reporting libraries so they're caught before they impact many customers?

@kahest
Copy link
Member

kahest commented Mar 14, 2023

Hi @eric! We have one other report for this issue. Please note that the crash is related to the Profiling integration, which is currently in beta and can have bugs. Nonetheless we take this issue very seriously and already have a working fix with a hotfix release incoming.

Along with the fix we also added a test case reproducing the issue to the suite of automated tests that are executed before all releases.

@eric
Copy link
Author

eric commented Mar 14, 2023

Is that the SLA for the clients? Beta features may cause app crashes? If so, I would request a flag on the options when creating a SentrySDK instance to opt out of all beta features.

I understand that you’re working on a hot fix, but during the last four days since I reported this issue, the website has still had a banner telling me my SDK is out of date and I should upgrade. The system is actively encouraging folks to upgrade to a release that has a crash in certain configurations. I don’t think I even see a warning on the release notes for that release that using the profiling beta features will cause a crash.

This response hasn’t given me confidence that Sentry takes crashing issues in the SDK as seriously as I do.

@HazAT
Copy link
Member

HazAT commented Mar 15, 2023

@eric we don't have an SLA for clients (yet) but as @kahest already described, we take it very seriously. We have an internal Post Mortem process for such incidents to discuss what we have to do that something like this doesn't happen again.

WRT is a more sophisticated process of yanking releases and such - we don't have this automated in any SDK today but we'll at least update the release notes manually so that no one should directly install 8.3.0 (with Profiling).

@philipphofmann
Copy link
Member

PR for updating the release notes #2800.

@eric
Copy link
Author

eric commented Mar 15, 2023

Thanks for the details.

@armcknight
Copy link
Member

@eric My apologies for the ongoing issues around this incident. One of your asks stood out to me:

I would request a flag on the options when creating a SentrySDK instance to opt out of all beta features

That's a good suggestion for defensiveness we should consider separately. To my knowledge, our experimental and beta features are always opt in, but it could be helpful to have a centralized override/killswitch (or really, entirely separate builds of the SDK, where the production doesn't even compile them in).

I went looking and noticed that if you had just been reading the headerdocs for our profiling options in SentryOptions.h, we didn't call out that profiling is a beta feature there, so you wouldn't have even expected potentially buggy conditions. I have a fix for that: #2804

We do call attention to it in our docs, but I was curious if you had discovered it through any other than these two avenues, so I could take the appropriate action to make sure we're educating all of our customers as best we can.

@eric
Copy link
Author

eric commented Mar 16, 2023

I saw the "Profiling" tab was added to the Sentry UI and though it would be interesting to try out.

It wasn't clear to me in any of the documentation what beta meant or that by enabling beta features I was opening up myself to potential crashing issues in the Sentry SDK. I feel like it is important to put a big "Here be dragons" disclaimer anywhere that discusses beta features that the expectation is that the SDK may introduce crashes in the app.

Fortunately, in this case, I had an opt-in flag in the beta app that only a few people beyond myself had enabled, but it was surprising to me because I was not aware of the expectation around these features marked as beta in the UI.

@philipphofmann
Copy link
Member

by enabling beta features I was opening up myself to potential crashing issues in the Sentry SDK.

We have a header in our docs for profiling

Profiling is currently in beta. Beta features are still in-progress and may have bugs. We recognize the irony. If you have any questions or feedback, please email us at [email protected].

We don't clearly define what beta and alpha mean at Sentry. Obviously, alpha is more unstable than beta. Beta can still have bugs, but of course, a crash in beta is something we try to avoid, but it's more likely to happen than when a feature is officially released. I would not recommend using a beta feature in production if you are sensitive to potential crashes or bugs impacting your application. Using a feature flag in a beta app was a smart approach to minimize the damage, @eric.

@eric
Copy link
Author

eric commented Mar 17, 2023

We have a header in our docs for profiling

When I see "may have bugs", I take that more to mean, "sometimes we may not report the right data" or something like that, not "sometimes we may crash your entire app".

Beta can still have bugs, but of course, a crash in beta is something we try to avoid, but it's more likely to happen than when a feature is officially released. I would not recommend using a beta feature in production if you are sensitive to potential crashes or bugs impacting your application.

Now that I've been through this process, I have a better understanding of what the internal expectation at Sentry is around reliability. My problem is that before I had been through this, it was not obvious. Part of my confusion was that, to me, any crash in the client SDK was a "drop everything and mitigate this now" level of issue. I didn't understand the nuance of what "beta" meant (especially regarding an SLA for mitigating/announcing the issue).

The bottom line of what I'm trying to articulate is that the current level of messaging was not sufficient for me to have a realistic understanding of what I was signing for by enabling this feature and I believe stronger language could have helped me here.

@philipphofmann
Copy link
Member

@eric, thanks for the feedback. Would an official explanation of what alpha and beta stand for at Sentry in our docs help?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

6 participants