Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Measurement caching and performance improvements #209

Merged
merged 10 commits into from
Apr 13, 2021
Merged

Measurement caching and performance improvements #209

merged 10 commits into from
Apr 13, 2021

Conversation

watt
Copy link
Collaborator

@watt watt commented Apr 2, 2021

This PR adds a couple of performance-related features. Happy to split this up into multiple PRs if folks prefer that.

Signpost Logging

Signpost intervals are generated for:

  1. Layout
  2. View Updates
  3. Individual element measurements

The first two are nice to get a high level view of rendering, and the third one provides a pretty nice visualization of the way elements are recursively measured.

Screen Shot 2021-04-01 at 6 45 40 PM

This fixes #173.

Layout pass measurement caching

This is a new caching mechanism that caches every measured size during a layout pass. The cache structure mirrors the element tree itself, so it is able to cache measurements for every element, without any requirements on the element itself.

This cache's lifetime is only a single layout pass. Because of the way Blueprint repeatedly measures each subtree as it lays out each element, this prevents a lot of duplicate work from re-measuring, especially in deep trees. It does not cache the layout results returned by performLayout, because that is only called once per element per layout pass.

The cache is also used during measurement outside of layout (such as from sizeThatFits or a direct call to element.content.measure()), but it's less useful in that scenario since most elements do not re-measure children.

Shortcuts

  1. When laying out stacks, if the alignment is fill and the constraint is exact, we can skip the entire second measurement for cross axis sizes.
  2. We can skip measurement in ConstrainedSize if both axes are absolute constraints.

Potential follow-up work

  1. We can't persist this new cache if the element changes, but we might be able to persist it between layout passes on the same element. If we make that change, we should persist the values returned by performLayout too.
  2. We should consider merging the previous MeasurementCache functionality into this one, or removing it.
  3. Additional logging during view updates, to visualize view churn.

Todo

  • Tests
  • Update changelog
  • Move signpost logging into a helper
  • doc SignpostToken

@watt watt requested review from n8chur, tyten, kyleve and kylebshr April 2, 2021 19:16
@watt watt marked this pull request as ready for review April 2, 2021 19:16
import os.log

/// A size cache that also holds subcaches.
protocol CacheTree: AnyObject {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considered renaming this to something more generalized, like RenderContext

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems good to me – I would've maybe gone with MeasurementCache or something

extension BlueprintView {
private func logLayoutStart() {
if #available(iOS 12.0, *) {
os_signpost(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fwiw in the past (and when I took a stab at signpost logging in a draft PR), I wrapped up the logging into a little helper so all the os_signpost stuff was in one place: https://github.com/square/Blueprint/pull/174/files#diff-34f7283a6db50334cae24989cc3fd7656859bae36d059ba5bb7a35206d3017e4R2

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fwiw in the past (and when I took a stab at signpost logging in a draft PR), I wrapped up the logging into a little helper so all the os_signpost stuff was in one place: https://github.com/square/Blueprint/pull/174/files#diff-34f7283a6db50334cae24989cc3fd7656859bae36d059ba5bb7a35206d3017e4R2

Sure, I can centralize these.


setNeedsViewHierarchyUpdate()
invalidateIntrinsicContentSize()
}
}

/// A name to help identify this view when profiling or debugging
public var name: String?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: debugIdentifier?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: debugIdentifier?

Well, I see this potentially coming from a workflow screen, and I kind of want to avoid "identifier" since it's just a display thing and not used for identity checks.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True – I guess when I see "name" I think of it as something that's potentially user-facing without a "debug" prefix.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

debugName?

import os.log

/// A size cache that also holds subcaches.
protocol CacheTree: AnyObject {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems good to me – I would've maybe gone with MeasurementCache or something


struct SubcacheKey: RawRepresentable, Hashable {
/// A key indicating that this will be the only subcache
static let singleton = SubcacheKey(rawValue: -1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we do this, or would something like this be clearer?

enum SubcacheKey : Hashable {
   case singleton
   case indexed(Int)
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we do this, or would something like this be clearer?

enum SubcacheKey : Hashable {
   case singleton
   case indexed(Int)
}

Maybe? I was torn on how much to optimize this one. The singleton case is only used to improve the element name emitted to signpost, so we could also get rid of this type entirely and just pass in a hint along with the name.

key: key,
name: key == .singleton
? "\(self.name).\(type(of: element))"
: "\(self.name)[\(key.rawValue)].\(type(of: element))"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious what sort of cost these strings getting built have. Wonder if we could make them memoized to avoid needing to do it during every traversal.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious what sort of cost these strings getting built have. Wonder if we could make them memoized to avoid needing to do it during every traversal.

Yeah, I had the same concern, and that's the reason for the autoclosure. It's only hit once per element per traversal, so I think it's pretty small compared to the amount of other work we do during layout. Dunno if we can do much better.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh missed the autoclosure – and it's only used for signpost logging, meaning we won't hit it in release, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh missed the autoclosure – and it's only used for signpost logging, meaning we won't hit it in release, right?

It will be hit in release, since profiling is typically done against a release build.

Copy link
Collaborator

@kyleve kyleve Apr 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought / ask: Worth adding a way to easily short circuit the signpost logging if we want to do a pure perf trace without the signpost logging appearing? When I was testing, this was always visible in the traces, which while easy to ignore at a top level if inverting the call tree, using something like top functions means it can be a bit harder to realize that the stuff you're trying to optimize is actually just from the signpost logging.

Also, any idea if this stuff fires in device beta or app store builds? When I was testing w/ this in the checkout applet, of the ~15s of so of CPU time, the signpost logging took ~600ms, so its definitely not free.

(I did this: https://github.com/kyleve/Listable/blob/main/ListableUI/Sources/Debugging%20and%20Logging/SignpostLogger.swift#L69 in listable for these cases)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought / ask: Worth adding a way to easily short circuit the signpost logging if we want to do a pure perf trace without the signpost logging appearing? When I was testing, this was always visible in the traces, which while easy to ignore at a top level if inverting the call tree, using something like top functions means it can be a bit harder to realize that the stuff you're trying to optimize is actually just from the signpost logging.

Sure. The WWDC video on this talks about how they're meant to be super lightweight, but also gives a nice way to disable them.

Also, any idea if this stuff fires in device beta or app store builds? When I was testing w/ this in the checkout applet, of the ~15s of so of CPU time, the signpost logging took ~600ms, so its definitely not free.

An app store build is still just a release build, so I would guess yes? But I would also guess that in a release build it's much faster than that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Config added! This is now off by default. To enable it, there is a global config at BlueprintLogging.enabled.

BlueprintUI/Sources/Layout/Stack.swift Show resolved Hide resolved
@kyleve
Copy link
Collaborator

kyleve commented Apr 2, 2021

This is really smart – well done!

@kyleve
Copy link
Collaborator

kyleve commented Apr 3, 2021

Ok dumb question – I am testing this out locally to see what effect removing the measurement caching key stuff has on perf, and switching between the fake cache and the real one isn't giving me much perf difference:

Iterations: 71, Average Time: 0.14124150679145062
Iterations: 68, Average Time: 0.14753383923979366

I'm getting a small boost, but not much – I would've expected more here, but maybe not?

@kyleve
Copy link
Collaborator

kyleve commented Apr 3, 2021

I removed the measurement cache key machinery to see what this gets us in a plain environment, it still seems slow given it should be caching at every node – maybe that's not happening? With cache:

Iterations: 12, Average Time: 0.9106179177761078

Fake cache:

Iterations: 9, Average Time: 1.1527716716130574

So it does help a bit – but not nearly as much as I'd expect w/ caching the layout at each node in the tree. Note that this is with the test_deep_element_hierarchy test.

@watt
Copy link
Collaborator Author

watt commented Apr 6, 2021

I removed the measurement cache key machinery to see what this gets us in a plain environment, it still seems slow given it should be caching at every node – maybe that's not happening? With cache:

Iterations: 12, Average Time: 0.9106179177761078

Fake cache:

Iterations: 9, Average Time: 1.1527716716130574

So it does help a bit – but not nearly as much as I'd expect w/ caching the layout at each node in the tree. Note that this is with the test_deep_element_hierarchy test.

I think there are a couple of reasons why test_deep_element_hierarchy doesn't see much improvement from this.

  1. test_deep_element_hierarchy is much wider than it is deep. It's only 4 elements deep, but has 1000 leaf labels. As each stack layer measures its children to divvy up the space, the leaves can be measured up to 8 times with a unique size constraint:

Screen Shot 2021-04-05 at 7 30 40 PM

The caching doesn't help when there are so many unique measurements.

  1. It looks like floating point rounding error in the stack layout that can cause cache misses with values that are almost the same. Just by increasing the width constraint from 1000 to 10000 I saw a big improvement in caching for this test.

@kyleve
Copy link
Collaborator

kyleve commented Apr 6, 2021

Oh, the cache misses are interesting... huh. Wonder if we should round those measurements somehow? Though not sure how to do that super intelligently..

Copy link
Contributor

@n8chur n8chur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!


setNeedsViewHierarchyUpdate()
invalidateIntrinsicContentSize()
}
}

/// A name to help identify this view when profiling or debugging
public var name: String?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

debugName?

BlueprintUI/Sources/Internal/CacheTree.swift Outdated Show resolved Hide resolved
BlueprintUI/Tests/Element+Testing.swift Outdated Show resolved Hide resolved
@watt
Copy link
Collaborator Author

watt commented Apr 8, 2021

I think I've addressed all feedback so far.

import Foundation
import os.log

extension OSLog {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Could this go into the Logger file now?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wound up leaving it in a separate file along with the config to enable/disable.

@@ -70,7 +70,80 @@ class ElementContentTests: XCTestCase {
container.measure(in: SizeConstraint(CGSize.zero), environment: .empty),
CGSize(width: 600, height: 800))
}


func test_cacheTree() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth writing a test to ensure that for a given size constraint during a full layout pass, the actual measure function is only called once? Eg, to verify the original bug that caused re-measurements due to the singleton key not being used in all cases.

@watt watt enabled auto-merge (squash) April 13, 2021 20:44
@watt watt merged commit 60fca36 into main Apr 13, 2021
@watt watt deleted the watt/perf branch September 3, 2021 19:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Profiling Signposts
3 participants