Crash in 8.3.0 in +[SentryProfiler slicedSamples:transaction:] #2779

eric · 2023-03-12T04:18:42Z

Platform

iOS

Installed

CocoaPods

Version

8.3.0

Steps to Reproduce

Open app
Tap around
App Crashes

Expected Result

App shouldn't crash.

Actual Result

Are you willing to submit a PR?

No response

brustolin · 2023-03-13T10:32:07Z

Hello @eric
Can you share the options you are using to start SentrySDK? I will try to reproduce this.

philipphofmann · 2023-03-13T11:05:17Z

@brustolin, to reproduce this, I think you need to run multiple transactions in parallel. Then it should pop up.

@armcknight, I think the profiler keeps adding to the samples array while slicing it. We could use locking on the data structures we iterate on or copy them. I vote for copying as locks will slow us down.

sentry-cocoa/Sources/Sentry/SentryProfiler.mm

Lines 387 to 398 in f715499

    
           NSArray<SentrySample *> *samples = _gCurrentProfiler->_profileData[@"profile"][@"samples"]; 
        
           // We need at least two samples to be able to draw a stack frame for any given function: one 
        
           // sample for the start of the frame and another for the end. Otherwise we would only have a 
        
           // stack frame with 0 duration, which wouldn't make sense. 
        
           if ([samples count] < 2) { 
        
               SENTRY_LOG_DEBUG(@"Not enough samples in profile"); 
        
               return nil; 
        
           } 
        
           // slice the profile data to only include the samples/metrics within the transaction 
        
           const auto slicedSamples = [self slicedSamples:samples transaction:transaction];

We should also copy the data for the payload in places like the following, as the profiler could keep adding data before the payload gets stored on disk, leading to inaccurate data. The downside is, of course, increased memory usage.

sentry-cocoa/Sources/Sentry/SentryProfiler.mm

Lines 404 to 410 in f715499

    
           payload[@"profile"] = @{ 
        
               @"samples" : serializedSamplesWithRelativeTimestamps(slicedSamples, transaction), 
        
               @"stacks" : _gCurrentProfiler->_profileData[@"profile"][@"stacks"], 
        
               @"frames" : _gCurrentProfiler->_profileData[@"profile"][@"frames"], 
        
               @"thread_metadata" : _gCurrentProfiler->_profileData[@"profile"][@"thread_metadata"], 
        
               @"queue_metadata" : _gCurrentProfiler->_profileData[@"profile"][@"queue_metadata"], 
        
           };

armcknight · 2023-03-13T16:43:38Z

Correct about the sampling profiler being able to still mutate the data structure by adding to it while these enumeration blocks are running. Easiest to do the copy for now, you're correct we don't want to have that additional lock in sampling profiler.

armcknight · 2023-03-13T16:48:47Z

We talked about trimming out any frames or stacks added after slicing the samples array, but deprioritized it as a small optimization that probably won't save us much, as those data structures are relatively small and compactible in compression. We can have another look though.

eric · 2023-03-14T19:47:56Z

Did this impact anyone else? Were there other reports?

Having my exception reporting library cause my app to crash is a very bad experience. To me I consider it on the same level as a serious outage. Was this release pulled? Were customers notified of the issue?

Is there anything in place in the Sentry platform to track and identify potential issues in the exception reporting libraries so they're caught before they impact many customers?

kahest · 2023-03-14T20:51:54Z

Hi @eric! We have one other report for this issue. Please note that the crash is related to the Profiling integration, which is currently in beta and can have bugs. Nonetheless we take this issue very seriously and already have a working fix with a hotfix release incoming.

Along with the fix we also added a test case reproducing the issue to the suite of automated tests that are executed before all releases.

eric · 2023-03-14T21:23:03Z

Is that the SLA for the clients? Beta features may cause app crashes? If so, I would request a flag on the options when creating a SentrySDK instance to opt out of all beta features.

I understand that you’re working on a hot fix, but during the last four days since I reported this issue, the website has still had a banner telling me my SDK is out of date and I should upgrade. The system is actively encouraging folks to upgrade to a release that has a crash in certain configurations. I don’t think I even see a warning on the release notes for that release that using the profiling beta features will cause a crash.

This response hasn’t given me confidence that Sentry takes crashing issues in the SDK as seriously as I do.

HazAT · 2023-03-15T11:02:50Z

@eric we don't have an SLA for clients (yet) but as @kahest already described, we take it very seriously. We have an internal Post Mortem process for such incidents to discuss what we have to do that something like this doesn't happen again.

WRT is a more sophisticated process of yanking releases and such - we don't have this automated in any SDK today but we'll at least update the release notes manually so that no one should directly install 8.3.0 (with Profiling).

philipphofmann · 2023-03-15T15:40:54Z

PR for updating the release notes #2800.

eric · 2023-03-15T17:54:44Z

Thanks for the details.

armcknight · 2023-03-16T19:26:19Z

@eric My apologies for the ongoing issues around this incident. One of your asks stood out to me:

I would request a flag on the options when creating a SentrySDK instance to opt out of all beta features

That's a good suggestion for defensiveness we should consider separately. To my knowledge, our experimental and beta features are always opt in, but it could be helpful to have a centralized override/killswitch (or really, entirely separate builds of the SDK, where the production doesn't even compile them in).

I went looking and noticed that if you had just been reading the headerdocs for our profiling options in SentryOptions.h, we didn't call out that profiling is a beta feature there, so you wouldn't have even expected potentially buggy conditions. I have a fix for that: #2804

We do call attention to it in our docs, but I was curious if you had discovered it through any other than these two avenues, so I could take the appropriate action to make sure we're educating all of our customers as best we can.

eric · 2023-03-16T19:36:50Z

I saw the "Profiling" tab was added to the Sentry UI and though it would be interesting to try out.

It wasn't clear to me in any of the documentation what beta meant or that by enabling beta features I was opening up myself to potential crashing issues in the Sentry SDK. I feel like it is important to put a big "Here be dragons" disclaimer anywhere that discusses beta features that the expectation is that the SDK may introduce crashes in the app.

Fortunately, in this case, I had an opt-in flag in the beta app that only a few people beyond myself had enabled, but it was surprising to me because I was not aware of the expectation around these features marked as beta in the UI.

philipphofmann · 2023-03-17T09:32:48Z

by enabling beta features I was opening up myself to potential crashing issues in the Sentry SDK.

We have a header in our docs for profiling

Profiling is currently in beta. Beta features are still in-progress and may have bugs. We recognize the irony. If you have any questions or feedback, please email us at [email protected].

We don't clearly define what beta and alpha mean at Sentry. Obviously, alpha is more unstable than beta. Beta can still have bugs, but of course, a crash in beta is something we try to avoid, but it's more likely to happen than when a feature is officially released. I would not recommend using a beta feature in production if you are sensitive to potential crashes or bugs impacting your application. Using a feature flag in a beta app was a smart approach to minimize the damage, @eric.

eric · 2023-03-17T18:41:23Z

We have a header in our docs for profiling

When I see "may have bugs", I take that more to mean, "sometimes we may not report the right data" or something like that, not "sometimes we may crash your entire app".

Beta can still have bugs, but of course, a crash in beta is something we try to avoid, but it's more likely to happen than when a feature is officially released. I would not recommend using a beta feature in production if you are sensitive to potential crashes or bugs impacting your application.

Now that I've been through this process, I have a better understanding of what the internal expectation at Sentry is around reliability. My problem is that before I had been through this, it was not obvious. Part of my confusion was that, to me, any crash in the client SDK was a "drop everything and mitigate this now" level of issue. I didn't understand the nuance of what "beta" meant (especially regarding an SLA for mitigating/announcing the issue).

The bottom line of what I'm trying to articulate is that the current level of messaging was not sufficient for me to have a realistic understanding of what I was signing for by enabling this feature and I believe stronger language could have helped me here.

philipphofmann · 2023-03-20T08:19:01Z

@eric, thanks for the feedback. Would an official explanation of what alpha and beta stand for at Sentry in our docs help?

eric added Platform: Cocoa Type: Bug labels Mar 12, 2023

github-project-automation bot added this to Mobile & Cross Platform SDK Mar 12, 2023

getsantry bot added the Status: Untriaged label Mar 12, 2023

github-project-automation bot moved this to Needs Discussion in Mobile & Cross Platform SDK Mar 12, 2023

philipphofmann removed the Status: Untriaged label Mar 13, 2023

armcknight mentioned this issue Mar 13, 2023

fix: mutation while enumerating samples array #2786

Merged

philipphofmann linked a pull request Mar 13, 2023 that will close this issue

fix: mutation while enumerating samples array #2786

Merged

kahest mentioned this issue Mar 14, 2023

Crash in +[SentryProfiler slicedSamples:transaction:] with 8.3.0 #2797

Closed

armcknight closed this as completed in #2786 Mar 14, 2023

github-project-automation bot moved this from Needs Discussion to Done in Mobile & Cross Platform SDK Mar 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Crash in 8.3.0 in +[SentryProfiler slicedSamples:transaction:] #2779

Crash in 8.3.0 in +[SentryProfiler slicedSamples:transaction:] #2779

eric commented Mar 12, 2023

brustolin commented Mar 13, 2023

philipphofmann commented Mar 13, 2023 •

edited

Loading

armcknight commented Mar 13, 2023

armcknight commented Mar 13, 2023

eric commented Mar 14, 2023

kahest commented Mar 14, 2023 •

edited

Loading

eric commented Mar 14, 2023 •

edited

Loading

HazAT commented Mar 15, 2023

philipphofmann commented Mar 15, 2023

eric commented Mar 15, 2023

armcknight commented Mar 16, 2023

eric commented Mar 16, 2023

philipphofmann commented Mar 17, 2023

eric commented Mar 17, 2023

philipphofmann commented Mar 20, 2023

Crash in 8.3.0 in +[SentryProfiler slicedSamples:transaction:] #2779

Crash in 8.3.0 in +[SentryProfiler slicedSamples:transaction:] #2779

Comments

eric commented Mar 12, 2023

Platform

Installed

Version

Steps to Reproduce

Expected Result

Actual Result

Are you willing to submit a PR?

brustolin commented Mar 13, 2023

philipphofmann commented Mar 13, 2023 • edited Loading

armcknight commented Mar 13, 2023

armcknight commented Mar 13, 2023

eric commented Mar 14, 2023

kahest commented Mar 14, 2023 • edited Loading

eric commented Mar 14, 2023 • edited Loading

HazAT commented Mar 15, 2023

philipphofmann commented Mar 15, 2023

eric commented Mar 15, 2023

armcknight commented Mar 16, 2023

eric commented Mar 16, 2023

philipphofmann commented Mar 17, 2023

eric commented Mar 17, 2023

philipphofmann commented Mar 20, 2023

philipphofmann commented Mar 13, 2023 •

edited

Loading

kahest commented Mar 14, 2023 •

edited

Loading

eric commented Mar 14, 2023 •

edited

Loading