-
Notifications
You must be signed in to change notification settings - Fork 135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RUM-3463 feat(watchdog-termination): track app state and detect watchdog terminations #1889
Conversation
Datadog ReportBranch report: ✅ 0 Failed, 3351 Passed, 0 Skipped, 3m 46.76s Total Time 🔻 Code Coverage Decreases vs Default Branch (10)
|
0821d6f
to
3e90059
Compare
ffc03c1
to
fb70ce7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! I have only reviewed the ~2/3 for now, but I asked some open questions. Feel free to disregard the nits of course.
@@ -85,6 +82,9 @@ class ExampleAppDelegate: UIResponder, UIApplicationDelegate { | |||
) | |||
RUMMonitor.shared().debug = true | |||
|
|||
// Enable Crash Reporting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/question: What is the reason for moving this call later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I needed crash reporting enabled first in my testing but no longer needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might revisit it laster when doing full end to end integration because we want RUM to be enabled first in order to get the launch message.
DatadogRUM/Sources/Instrumentation/WatchdogTerminations/VendorIdProvider.swift
Outdated
Show resolved
Hide resolved
|
||
guard let previous = previous else { | ||
return false | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tiny nit suggestion:
guard let previous else {
return false
}
// Watchdog Termination detection doesn't work on simulators. | ||
guard deviceInfo.isSimulator == false else { | ||
return false | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tiny nit suggestion:
guard !deviceInfo.isSimulator else {
return false
}
Same for all the other guard statement below, but I also totally understand if you prefer being explicit.
DatadogRUM/Sources/Instrumentation/WatchdogTerminations/WatchdogTerminationChecker.swift
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was fast! 🏃💨 Thank you for the changes and answers 😊
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Massive effort 🏋️🏅. Very elegant code that reads smooth and is well integrated into the whole architecture.
I only reviewed the implementation part (no yet tests), leaving several questions and few change requests.
On the process level, I wonder if this PR shouldn't target a feature branch instead of develop
. It is missing the "send RUM error" part, which makes it not functional but it brings impact. Alternatively, we could disable it at the entry point and only enable with the final touch on the whole topic.
// We use baggage message to pass the launch report instead updating the global context | ||
// because under the hood, some integrations start certain features based on the launch report (e.g. `WatchdogTerminationMonitor`). | ||
// If we update the global context, the integrations will keep starting the features on every update which is not desired. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question/ It feels that having LaunchReport
as part of DatadogContext
will be more scalable:
- no requirement to enable
DatadogCrashReporting
after Logs or RUM, - every feature can access it when writing event or through
featureScope.context {}
explicitly;
What is the actual limitation behind following?
If we update the global context, the integrations will keep starting the features on every update which is not desired.
It sounds like architecture flaw that should be fixed rather than bypassed through point-to-point messaging over bus. Would we be up to improving it as part of this / next PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are mixing two concepts here and I would prefer not to re-architect whole flow here.
- We want something to start working when something happens in some other feature.
Why? because if start the monitor before launch report, it will update the state in the disk and we will lose the previous session date. This can't be achieved by global context sharing.
- Share data
This is easily doable with both global context and point to point messaging.
That said, in an ideal world, I want to subscribe to only launch report
(not context), so my receiver must only be invoked for launch report only and with additional context (extra data but not required).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for clarifying 👍. I'm okay with proposed way as long as there is no requirement on the order of RUM.enable()
vs CrashReporting.enable()
- isn't there 🤔💭?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keeping it for the next PR, my hunch says there will be because SDK sends the message as soon as crash reporting is enabled and if RUM is not enabled yet, it will be missed.
Which may lead to re-architecture of this.
...dogRUM/Sources/Instrumentation/WatchdogTerminations/WatchdogTerminationAppStateManager.swift
Outdated
Show resolved
Hide resolved
DatadogRUM/Sources/Instrumentation/WatchdogTerminations/VendorIdProvider.swift
Outdated
Show resolved
Hide resolved
DatadogRUM/Sources/Instrumentation/WatchdogTerminations/WatchdogTerminationChecker.swift
Show resolved
Hide resolved
255ff38
to
a2a51d1
Compare
I just disabled by default until we finish the reporting part. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for changes, looks great. I gave it another look, this time focusing more on tests. Left few suggestions with the most important one on incorporating mocked core vs our mid-term efforts. Few remarks, no blockers 👍
Once again, well done 💪 🏋️♂️ 🏅
core = PassthroughCoreMock( | ||
context: .mockWith(applicationStateHistory: .mockAppInBackground()), | ||
messageReceiver: sut | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
convention/ Ref.: #1744, we want to get rid of the pattern of testing message receivers (here sut
) through integration with mocked core. This convention proves to be error-prone and can increase flakiness due to leaking core reference. Most importantly, it doesn't test the critical aspect of message receiver which is the return type from receive(message:from:)
as this detail is hidden inside PassthroughCoreMock
implementation. In the long term, we plan to simplify receive(message:from:)
signature by removing the from core:
parameter at all (it is not required).
That said, the recommended convention is to create standalone DatadogContext
value and pass it to the receiver directly. For tests in this file that would mean:
// app state changes
var context: DatadogContext = .mockWith(applicationStateHistory: .mockAppInBackground())
context.applicationStateHistory.append(.init(state: .active, date: .init()))
XCTAssertFalse(sut.receive(message: .context(context), from: NOPDatadogCore()), "It should not consume the message")
// app state changes again
context.applicationStateHistory.append(.init(state: .background, date: .init()))
XCTAssertFalse(sut.receive(message: .context(context), from: NOPDatadogCore()), "It should not consume the message")
...RUM/Tests/Instrumentation/WatchdogTerminations/WatchdogTerminationAppStateManagerTests.swift
Outdated
Show resolved
Hide resolved
...RUM/Tests/Instrumentation/WatchdogTerminations/WatchdogTerminationAppStateManagerTests.swift
Outdated
Show resolved
Hide resolved
// swiftlint:enable implicitly_unwrapped_optional | ||
|
||
func testNoPreviousState_NoWatchdogTermination() throws { | ||
given(isSimulator: .mockRandom()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice/ ⭐ I'll borrow it
DatadogRUM/Tests/Instrumentation/WatchdogTerminations/WatchdogTerminationMonitorTests.swift
Show resolved
Hide resolved
20254d9
to
0358d1b
Compare
What and why?
We want to detect the watchdog terminations in the SDK and report them as a crash to the backend.
How?
This PR setups the building block of detecting a watchdog termination using the heuristic based approach. The heuristic works on checking all the possible ways an app can be terminated and if the termination is from an unknown method, it is considered as a watchdog termination.
Read more about the approach in the internal RFC.
The reporting part is not yet implemented in this PR.
Review checklist
Custom CI job configuration (optional)
tools/