-
-
Notifications
You must be signed in to change notification settings - Fork 342
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crash in 8.3.0 in +[SentryProfiler slicedSamples:transaction:] #2779
Comments
Hello @eric |
@brustolin, to reproduce this, I think you need to run multiple transactions in parallel. Then it should pop up. @armcknight, I think the profiler keeps adding to the samples array while slicing it. We could use locking on the data structures we iterate on or copy them. I vote for copying as locks will slow us down. sentry-cocoa/Sources/Sentry/SentryProfiler.mm Lines 387 to 398 in f715499
We should also copy the data for the payload in places like the following, as the profiler could keep adding data before the payload gets stored on disk, leading to inaccurate data. The downside is, of course, increased memory usage. sentry-cocoa/Sources/Sentry/SentryProfiler.mm Lines 404 to 410 in f715499
|
Correct about the sampling profiler being able to still mutate the data structure by adding to it while these enumeration blocks are running. Easiest to do the copy for now, you're correct we don't want to have that additional lock in sampling profiler. |
We talked about trimming out any frames or stacks added after slicing the samples array, but deprioritized it as a small optimization that probably won't save us much, as those data structures are relatively small and compactible in compression. We can have another look though. |
Did this impact anyone else? Were there other reports? Having my exception reporting library cause my app to crash is a very bad experience. To me I consider it on the same level as a serious outage. Was this release pulled? Were customers notified of the issue? Is there anything in place in the Sentry platform to track and identify potential issues in the exception reporting libraries so they're caught before they impact many customers? |
Hi @eric! We have one other report for this issue. Please note that the crash is related to the Profiling integration, which is currently in beta and can have bugs. Nonetheless we take this issue very seriously and already have a working fix with a hotfix release incoming. Along with the fix we also added a test case reproducing the issue to the suite of automated tests that are executed before all releases. |
Is that the SLA for the clients? Beta features may cause app crashes? If so, I would request a flag on the options when creating a SentrySDK instance to opt out of all beta features. I understand that you’re working on a hot fix, but during the last four days since I reported this issue, the website has still had a banner telling me my SDK is out of date and I should upgrade. The system is actively encouraging folks to upgrade to a release that has a crash in certain configurations. I don’t think I even see a warning on the release notes for that release that using the profiling beta features will cause a crash. This response hasn’t given me confidence that Sentry takes crashing issues in the SDK as seriously as I do. |
@eric we don't have an SLA for clients (yet) but as @kahest already described, we take it very seriously. We have an internal Post Mortem process for such incidents to discuss what we have to do that something like this doesn't happen again. WRT is a more sophisticated process of yanking releases and such - we don't have this automated in any SDK today but we'll at least update the release notes manually so that no one should directly install 8.3.0 (with Profiling). |
PR for updating the release notes #2800. |
Thanks for the details. |
@eric My apologies for the ongoing issues around this incident. One of your asks stood out to me:
That's a good suggestion for defensiveness we should consider separately. To my knowledge, our experimental and beta features are always opt in, but it could be helpful to have a centralized override/killswitch (or really, entirely separate builds of the SDK, where the production doesn't even compile them in). I went looking and noticed that if you had just been reading the headerdocs for our profiling options in We do call attention to it in our docs, but I was curious if you had discovered it through any other than these two avenues, so I could take the appropriate action to make sure we're educating all of our customers as best we can. |
I saw the "Profiling" tab was added to the Sentry UI and though it would be interesting to try out. It wasn't clear to me in any of the documentation what beta meant or that by enabling beta features I was opening up myself to potential crashing issues in the Sentry SDK. I feel like it is important to put a big "Here be dragons" disclaimer anywhere that discusses beta features that the expectation is that the SDK may introduce crashes in the app. Fortunately, in this case, I had an opt-in flag in the beta app that only a few people beyond myself had enabled, but it was surprising to me because I was not aware of the expectation around these features marked as beta in the UI. |
We have a header in our docs for profiling
We don't clearly define what beta and alpha mean at Sentry. Obviously, alpha is more unstable than beta. Beta can still have bugs, but of course, a crash in beta is something we try to avoid, but it's more likely to happen than when a feature is officially released. I would not recommend using a beta feature in production if you are sensitive to potential crashes or bugs impacting your application. Using a feature flag in a beta app was a smart approach to minimize the damage, @eric. |
When I see "may have bugs", I take that more to mean, "sometimes we may not report the right data" or something like that, not "sometimes we may crash your entire app".
Now that I've been through this process, I have a better understanding of what the internal expectation at Sentry is around reliability. My problem is that before I had been through this, it was not obvious. Part of my confusion was that, to me, any crash in the client SDK was a "drop everything and mitigate this now" level of issue. I didn't understand the nuance of what "beta" meant (especially regarding an SLA for mitigating/announcing the issue). The bottom line of what I'm trying to articulate is that the current level of messaging was not sufficient for me to have a realistic understanding of what I was signing for by enabling this feature and I believe stronger language could have helped me here. |
Platform
iOS
Installed
CocoaPods
Version
8.3.0
Steps to Reproduce
Expected Result
App shouldn't crash.
Actual Result
Are you willing to submit a PR?
No response
The text was updated successfully, but these errors were encountered: