Measurement caching and performance improvements #209

watt · 2021-04-02T02:10:04Z

This PR adds a couple of performance-related features. Happy to split this up into multiple PRs if folks prefer that.

Signpost Logging

Signpost intervals are generated for:

Layout
View Updates
Individual element measurements

The first two are nice to get a high level view of rendering, and the third one provides a pretty nice visualization of the way elements are recursively measured.

This fixes #173.

Layout pass measurement caching

This is a new caching mechanism that caches every measured size during a layout pass. The cache structure mirrors the element tree itself, so it is able to cache measurements for every element, without any requirements on the element itself.

This cache's lifetime is only a single layout pass. Because of the way Blueprint repeatedly measures each subtree as it lays out each element, this prevents a lot of duplicate work from re-measuring, especially in deep trees. It does not cache the layout results returned by performLayout, because that is only called once per element per layout pass.

The cache is also used during measurement outside of layout (such as from sizeThatFits or a direct call to element.content.measure()), but it's less useful in that scenario since most elements do not re-measure children.

Shortcuts

When laying out stacks, if the alignment is fill and the constraint is exact, we can skip the entire second measurement for cross axis sizes.
We can skip measurement in ConstrainedSize if both axes are absolute constraints.

Potential follow-up work

We can't persist this new cache if the element changes, but we might be able to persist it between layout passes on the same element. If we make that change, we should persist the values returned by performLayout too.
We should consider merging the previous MeasurementCache functionality into this one, or removing it.
Additional logging during view updates, to visualize view churn.

Todo

Tests
Update changelog
Move signpost logging into a helper
doc SignpostToken

watt · 2021-04-02T20:16:19Z

BlueprintUI/Sources/Internal/CacheTree.swift

+import os.log
+
+/// A size cache that also holds subcaches.
+protocol CacheTree: AnyObject {


Considered renaming this to something more generalized, like RenderContext

This seems good to me – I would've maybe gone with MeasurementCache or something

kyleve · 2021-04-02T21:25:30Z

BlueprintUI/Sources/Blueprint View/BlueprintView.swift

+extension BlueprintView {
+    private func logLayoutStart() {
+        if #available(iOS 12.0, *) {
+            os_signpost(


Fwiw in the past (and when I took a stab at signpost logging in a draft PR), I wrapped up the logging into a little helper so all the os_signpost stuff was in one place: https://github.com/square/Blueprint/pull/174/files#diff-34f7283a6db50334cae24989cc3fd7656859bae36d059ba5bb7a35206d3017e4R2

Fwiw in the past (and when I took a stab at signpost logging in a draft PR), I wrapped up the logging into a little helper so all the os_signpost stuff was in one place: https://github.com/square/Blueprint/pull/174/files#diff-34f7283a6db50334cae24989cc3fd7656859bae36d059ba5bb7a35206d3017e4R2

Sure, I can centralize these.

kyleve · 2021-04-02T21:27:54Z

BlueprintUI/Sources/Blueprint View/BlueprintView.swift


            setNeedsViewHierarchyUpdate()
            invalidateIntrinsicContentSize()
        }
    }
+
+    /// A name to help identify this view when profiling or debugging
+    public var name: String?


Nit: debugIdentifier?

Nit: debugIdentifier?

Well, I see this potentially coming from a workflow screen, and I kind of want to avoid "identifier" since it's just a display thing and not used for identity checks.

True – I guess when I see "name" I think of it as something that's potentially user-facing without a "debug" prefix.

debugName?

BlueprintUI/Sources/Internal/CacheFactory.swift

kyleve · 2021-04-02T22:21:31Z

BlueprintUI/Sources/Internal/CacheTree.swift

+import os.log
+
+/// A size cache that also holds subcaches.
+protocol CacheTree: AnyObject {


This seems good to me – I would've maybe gone with MeasurementCache or something

kyleve · 2021-04-02T22:22:18Z

BlueprintUI/Sources/Internal/CacheTree.swift

+
+struct SubcacheKey: RawRepresentable, Hashable {
+    /// A key indicating that this will be the only subcache
+    static let singleton = SubcacheKey(rawValue: -1)


Should we do this, or would something like this be clearer?

enum SubcacheKey : Hashable { case singleton case indexed(Int) }

Should we do this, or would something like this be clearer?

enum SubcacheKey : Hashable { case singleton case indexed(Int) }

Maybe? I was torn on how much to optimize this one. The singleton case is only used to improve the element name emitted to signpost, so we could also get rid of this type entirely and just pass in a hint along with the name.

kyleve · 2021-04-02T22:23:50Z

BlueprintUI/Sources/Internal/CacheTree.swift

+            key: key,
+            name: key == .singleton
+                ? "\(self.name).\(type(of: element))"
+                : "\(self.name)[\(key.rawValue)].\(type(of: element))"


Curious what sort of cost these strings getting built have. Wonder if we could make them memoized to avoid needing to do it during every traversal.

Curious what sort of cost these strings getting built have. Wonder if we could make them memoized to avoid needing to do it during every traversal.

Yeah, I had the same concern, and that's the reason for the autoclosure. It's only hit once per element per traversal, so I think it's pretty small compared to the amount of other work we do during layout. Dunno if we can do much better.

Ahh missed the autoclosure – and it's only used for signpost logging, meaning we won't hit it in release, right?

Ahh missed the autoclosure – and it's only used for signpost logging, meaning we won't hit it in release, right?

It will be hit in release, since profiling is typically done against a release build.

Thought / ask: Worth adding a way to easily short circuit the signpost logging if we want to do a pure perf trace without the signpost logging appearing? When I was testing, this was always visible in the traces, which while easy to ignore at a top level if inverting the call tree, using something like top functions means it can be a bit harder to realize that the stuff you're trying to optimize is actually just from the signpost logging.

Also, any idea if this stuff fires in device beta or app store builds? When I was testing w/ this in the checkout applet, of the ~15s of so of CPU time, the signpost logging took ~600ms, so its definitely not free.

(I did this: https://github.com/kyleve/Listable/blob/main/ListableUI/Sources/Debugging%20and%20Logging/SignpostLogger.swift#L69 in listable for these cases)

Thought / ask: Worth adding a way to easily short circuit the signpost logging if we want to do a pure perf trace without the signpost logging appearing? When I was testing, this was always visible in the traces, which while easy to ignore at a top level if inverting the call tree, using something like top functions means it can be a bit harder to realize that the stuff you're trying to optimize is actually just from the signpost logging.

Sure. The WWDC video on this talks about how they're meant to be super lightweight, but also gives a nice way to disable them.

Also, any idea if this stuff fires in device beta or app store builds? When I was testing w/ this in the checkout applet, of the ~15s of so of CPU time, the signpost logging took ~600ms, so its definitely not free.

An app store build is still just a release build, so I would guess yes? But I would also guess that in a release build it's much faster than that.

Config added! This is now off by default. To enable it, there is a global config at BlueprintLogging.enabled.

BlueprintUI/Sources/Layout/Stack.swift

kyleve · 2021-04-02T22:29:01Z

This is really smart – well done!

kyleve · 2021-04-03T01:26:54Z

Ok dumb question – I am testing this out locally to see what effect removing the measurement caching key stuff has on perf, and switching between the fake cache and the real one isn't giving me much perf difference:

Iterations: 71, Average Time: 0.14124150679145062
Iterations: 68, Average Time: 0.14753383923979366

I'm getting a small boost, but not much – I would've expected more here, but maybe not?

kyleve · 2021-04-03T01:41:22Z

I removed the measurement cache key machinery to see what this gets us in a plain environment, it still seems slow given it should be caching at every node – maybe that's not happening? With cache:

Iterations: 12, Average Time: 0.9106179177761078

Fake cache:

Iterations: 9, Average Time: 1.1527716716130574

So it does help a bit – but not nearly as much as I'd expect w/ caching the layout at each node in the tree. Note that this is with the test_deep_element_hierarchy test.

BlueprintUI/Sources/Element/ElementContent.swift

watt · 2021-04-06T02:48:23Z

I removed the measurement cache key machinery to see what this gets us in a plain environment, it still seems slow given it should be caching at every node – maybe that's not happening? With cache:
Iterations: 12, Average Time: 0.9106179177761078
Fake cache:
Iterations: 9, Average Time: 1.1527716716130574
So it does help a bit – but not nearly as much as I'd expect w/ caching the layout at each node in the tree. Note that this is with the test_deep_element_hierarchy test.

I think there are a couple of reasons why test_deep_element_hierarchy doesn't see much improvement from this.

test_deep_element_hierarchy is much wider than it is deep. It's only 4 elements deep, but has 1000 leaf labels. As each stack layer measures its children to divvy up the space, the leaves can be measured up to 8 times with a unique size constraint:

The caching doesn't help when there are so many unique measurements.

It looks like floating point rounding error in the stack layout that can cause cache misses with values that are almost the same. Just by increasing the width constraint from 1000 to 10000 I saw a big improvement in caching for this test.

kyleve · 2021-04-06T02:50:52Z

Oh, the cache misses are interesting... huh. Wonder if we should round those measurements somehow? Though not sure how to do that super intelligently..

n8chur

Great work!

n8chur · 2021-04-06T16:02:12Z

BlueprintUI/Sources/Blueprint View/BlueprintView.swift


            setNeedsViewHierarchyUpdate()
            invalidateIntrinsicContentSize()
        }
    }
+
+    /// A name to help identify this view when profiling or debugging
+    public var name: String?


debugName?

BlueprintUI/Sources/Internal/CacheTree.swift

BlueprintUI/Tests/Element+Testing.swift

watt · 2021-04-08T23:31:26Z

I think I've addressed all feedback so far.

kyleve · 2021-04-09T00:13:59Z

BlueprintUI/Sources/Internal/OSLog+Blueprint.swift

+import Foundation
+import os.log
+
+extension OSLog {


Nit: Could this go into the Logger file now?

I wound up leaving it in a separate file along with the config to enable/disable.

kyleve · 2021-04-09T00:15:29Z

BlueprintUI/Tests/ElementContentTests.swift

@@ -70,7 +70,80 @@ class ElementContentTests: XCTestCase {
            container.measure(in: SizeConstraint(CGSize.zero), environment: .empty),
            CGSize(width: 600, height: 800))
    }
-
+
+    func test_cacheTree() {


Worth writing a test to ensure that for a given size constraint during a full layout pass, the actual measure function is only called once? Eg, to verify the original bug that caused re-measurements due to the singleton key not being used in all cases.

watt requested review from n8chur, tyten, kyleve and kylebshr April 2, 2021 19:16

watt marked this pull request as ready for review April 2, 2021 19:16

watt commented Apr 2, 2021

View reviewed changes

kyleve reviewed Apr 2, 2021

View reviewed changes

kyleve approved these changes Apr 3, 2021

View reviewed changes

kyleve reviewed Apr 6, 2021

View reviewed changes

BlueprintUI/Sources/Element/ElementContent.swift Outdated Show resolved Hide resolved

n8chur approved these changes Apr 6, 2021

View reviewed changes

watt force-pushed the watt/perf branch from 40e4e1f to 65a9fe2 Compare April 8, 2021 23:28

kyleve reviewed Apr 9, 2021

View reviewed changes

kyleve approved these changes Apr 9, 2021

View reviewed changes

watt added 9 commits April 13, 2021 13:36

Signpost logging

ac684cc

Render pass cache

7583c4c

Remove special singleton cache key

c6a1729

ElementContent cache test

59a9070

Centralize logging into a helper

022f8ff

Feedback tweaks

ace05d1

Cache tests

6377d28

Update changelog

9223a4c

Validate measure counts in ElementContent tests

3733857

Config to enable or disable logging

0093dc5

watt force-pushed the watt/perf branch from d13eeff to 0093dc5 Compare April 13, 2021 20:37

watt enabled auto-merge (squash) April 13, 2021 20:44

watt merged commit 60fca36 into main Apr 13, 2021

watt deleted the watt/perf branch September 3, 2021 19:45

watt mentioned this pull request Nov 18, 2021

Change how we cache sizing across sizeThatFits and autolayout #277

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Measurement caching and performance improvements #209

Measurement caching and performance improvements #209

watt commented Apr 2, 2021 •

edited

Loading

watt Apr 2, 2021

kyleve Apr 2, 2021

kyleve Apr 2, 2021

watt Apr 3, 2021

kyleve Apr 2, 2021

watt Apr 3, 2021

kyleve Apr 3, 2021

n8chur Apr 6, 2021

kyleve Apr 2, 2021

kyleve Apr 2, 2021

watt Apr 3, 2021

kyleve Apr 2, 2021

watt Apr 3, 2021

kyleve Apr 3, 2021

watt Apr 8, 2021

kyleve Apr 9, 2021 •

edited

Loading

watt Apr 10, 2021

watt Apr 13, 2021

kyleve commented Apr 2, 2021

kyleve commented Apr 3, 2021

kyleve commented Apr 3, 2021

watt commented Apr 6, 2021

kyleve commented Apr 6, 2021

n8chur left a comment

n8chur Apr 6, 2021

watt commented Apr 8, 2021

kyleve Apr 9, 2021

watt Apr 13, 2021

kyleve Apr 9, 2021

Measurement caching and performance improvements #209

Measurement caching and performance improvements #209

Conversation

watt commented Apr 2, 2021 • edited Loading

Signpost Logging

Layout pass measurement caching

Shortcuts

Potential follow-up work

Todo

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kyleve Apr 9, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kyleve commented Apr 2, 2021

kyleve commented Apr 3, 2021

kyleve commented Apr 3, 2021

watt commented Apr 6, 2021

kyleve commented Apr 6, 2021

n8chur left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

watt commented Apr 8, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

watt commented Apr 2, 2021 •

edited

Loading

kyleve Apr 9, 2021 •

edited

Loading