Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Few benchmarks for predicate evaluation #291

Merged
merged 37 commits into from
Nov 7, 2023

Conversation

ser-0xff
Copy link
Contributor

@ser-0xff ser-0xff commented Oct 11, 2023

Motivation

Predicate matching performance is 10x slower over custom implementation, this PR adds benchmark plugin to get a baseline performance and to help future Predicates development.

Benchmark is added in the same fashion as for swift-nio and swift-certifcates.

There are 5 benchmarks.

  1. Benchmark with a very simple predicate giving a 'true' for everything shows we can evaluate about 4 million objects per second on M1 Pro.
  2. predicate with single KeyPath condition with variable evaluates ~ 2.5 million/sec
  3. predicate with single KeyPath condition with computed property evaluates ~ 1.5 million/sec
  4. predicate with single KeyPath condition with nested computed property evaluates ~ 1.3 million/sec
  5. predicate with three KeyPath conditions with nested computed properties evaluates ~ 600 K/sec

The most common case for our product is more than one condition, and we widely use computed properties, so if we will use more than 3 conditions then we will have throughout less than 600K/sec, which not too much for us.

All benchmarks can be executed with swift package benchmark --scale in the project 'Benchmarks' directory.

We observe swift_getAtKeyPath execution takes significant time relative to whole execution time.
image
The issue can be reproduced with a nested computed properties in benchmark (4) and (5). A single benchmark can be executed with command swift package benchmark --scale --filter "Predicate #5 - 3 KeyPath nested computed property conditions"
image

Also we observe that predicate evaluation entails few malloc calls, what is also possibly the cause of poor performance.

Each benchmark report a set of information, the most interesting for now is a 'Throughput' line, which shows how many evaluations per second can be executed.

To run with profile in Xcode:

Running benchmarks in Xcode and using Instruments for profiling benchmarks
Profiling benchmarks or building the benchmarks in release mode in Xcode with jemalloc is currently not supported (as Xcode currently doesn’t support interposition of the malloc library) and requires disabling jemalloc.
Make sure Xcode is closed and then open it from the CLI with the BENCHMARK_DISABLE_JEMALLOC environment variable set e.g.:

open --env BENCHMARK_DISABLE_JEMALLOC=true Package.swift

This will disable the jemalloc dependency and you can simply build in Xcode for profiling and use Instruments as normal - including signpost information for the benchmark run.

ser-0xff and others added 17 commits October 3, 2023 18:02
fix(patch): [sc-3840] Refer branch instead of commit hash.
…ndation-functionality-instead

fix(patch): [sc-3840] Refer dependencies using branches to avoid SPM complaining.
…ndation-functionality-instead-2

fix(patch): Rename ordo/swift-foundation to ordo/package-swift-foundation.
…ndation-functionality-instead-3

Bump dependency version.
…ndation-functionality-instead-4

fix(patch): Bump dependency version
…ndation-functionality-instead-5

Bump dependency version.
@ser-0xff ser-0xff requested a review from FranzBusch October 11, 2023 14:57
@hassila
Copy link
Contributor

hassila commented Oct 11, 2023

@ser-0xff I think the test needs to use existentials as we do for transactions internally, the current benchmark does not show the same key path problem we see with a protocol type wrapper.

@ser-0xff ser-0xff marked this pull request as draft October 11, 2023 15:49
@ser-0xff
Copy link
Contributor Author

Alright, let me look into.

@ser-0xff ser-0xff requested a review from hassila October 12, 2023 15:28
Copy link
Contributor

@hassila hassila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now, thanks for the rework!

Add standard project header
@iCharlesHu
Copy link
Contributor

@swift-ci please test

@jmschonfeld
Copy link
Contributor

Thanks again for posting this, this looks super exciting and will definitely help us expand on our performance testing of this package going forward. We're still looking into this and will get back to you on being able to merge this soon!

Copy link
Contributor

@jmschonfeld jmschonfeld left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks great, thanks for posting this! A few comments about some various pieces, but I'm looking forward to being able to land this so we can start building up our benchmarking suite to measure performance of this package

@hassila
Copy link
Contributor

hassila commented Nov 6, 2023

Addressed PR feedback in 80a84c3 - also added ARC output by default (even though some implicit retains are not necessarily counted currently as some hooks in the runtime are missing, see swiftlang/swift#64636 - it's still useful to see the relative usage of ARC traffic between different test scenarios even if retain/releases doesn't square out exactly.)

Current output (throughput is in K):

hassila@ice ~/g/s/Benchmarks (feature/predicate-benchmark)> swift package benchmark
Building for debugging...
Build complete! (0.57s)
Building for debugging...
Build complete! (1.42s)
Building benchmark targets in release mode for benchmark run...
Building PredicateBenchmarks
Build complete!

==================
Running Benchmarks
==================

100% [------------------------------------------------------------] ETA: 00:00:00 | PredicateBenchmarks:Predicate #1 - simple 'true' condition
100% [------------------------------------------------------------] ETA: 00:00:00 | PredicateBenchmarks:Predicate #2 - 1 KeyPath variable condition
100% [------------------------------------------------------------] ETA: 00:00:00 | PredicateBenchmarks:Predicate #3 - 1 KeyPath computed property condition
100% [------------------------------------------------------------] ETA: 00:00:00 | PredicateBenchmarks:Predicate #4 - 1 KeyPath nested computed property condition
100% [------------------------------------------------------------] ETA: 00:00:00 | PredicateBenchmarks:Predicate #5 - 3 KeyPath nested computed property conditions

=====================================================================================================
Baseline 'Current_run'
=====================================================================================================

Host 'ice.local' with 20 'arm64' processors with 128 GB memory, running:
Darwin Kernel Version 23.1.0: Mon Oct  9 21:27:24 PDT 2023; root:xnu-10002.41.9~6/RELEASE_ARM64_T6000

===================
PredicateBenchmarks
===================

Predicate #1 - simple 'true' condition
╒══════════════════════════════════╤═════════╤═════════╤═════════╤═════════╤═════════╤═════════╤═════════╤═════════╕
│ Metric                           │      p0 │     p25 │     p50 │     p75 │     p90 │     p99 │    p100 │ Samples │
╞══════════════════════════════════╪═════════╪═════════╪═════════╪═════════╪═════════╪═════════╪═════════╪═════════╡
│ (Alloc + Retain) - Release Δ     │    1000 │    1000 │    1000 │    1000 │    1000 │    1000 │    1000 │   14810 │
├──────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Malloc (total)                   │    2000 │    2000 │    2000 │    2000 │    2000 │    2000 │    2000 │   14810 │
├──────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Object allocs                    │    2000 │    2000 │    2000 │    2000 │    2000 │    2000 │    2000 │   14810 │
├──────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Releases                         │    3000 │    3000 │    3000 │    3000 │    3000 │    3000 │    3000 │   14810 │
├──────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Retains                          │       0 │       0 │       0 │       0 │       0 │       0 │       0 │   14810 │
├──────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Throughput (# / s)               │    3432 │    3299 │    3207 │    3187 │    3137 │    2883 │    2438 │   14810 │
├──────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Time (total CPU) (μs)            │     312 │     324 │     334 │     335 │     337 │     363 │     414 │   14810 │
├──────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Time (wall clock) (μs)           │     291 │     303 │     312 │     314 │     318 │     347 │     410 │   14810 │
╘══════════════════════════════════╧═════════╧═════════╧═════════╧═════════╧═════════╧═════════╧═════════╧═════════╛

Predicate #2 - 1 KeyPath variable condition
╒══════════════════════════════════╤═════════╤═════════╤═════════╤═════════╤═════════╤═════════╤═════════╤═════════╕
│ Metric                           │      p0 │     p25 │     p50 │     p75 │     p90 │     p99 │    p100 │ Samples │
╞══════════════════════════════════╪═════════╪═════════╪═════════╪═════════╪═════════╪═════════╪═════════╪═════════╡
│ (Alloc + Retain) - Release Δ     │    1000 │    1000 │    1000 │    1000 │    1000 │    1000 │    1000 │    8164 │
├──────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Malloc (total)                   │    2000 │    2000 │    2000 │    2000 │    2000 │    2000 │    2000 │    8164 │
├──────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Object allocs                    │    2000 │    2000 │    2000 │    2000 │    2000 │    2000 │    2000 │    8164 │
├──────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Releases                         │    6000 │    6000 │    6000 │    6000 │    6000 │    6000 │    6000 │    8164 │
├──────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Retains                          │    3000 │    3000 │    3000 │    3000 │    3000 │    3000 │    3000 │    8164 │
├──────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Throughput (# / s)               │    2109 │    2042 │    1995 │    1973 │    1672 │     533 │      31 │    8164 │
├──────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Time (total CPU) (μs)            │     494 │     511 │     521 │     529 │     569 │     836 │    1304 │    8164 │
├──────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Time (wall clock) (μs)           │     474 │     489 │     501 │     506 │     598 │    1875 │   31791 │    8164 │
╘══════════════════════════════════╧═════════╧═════════╧═════════╧═════════╧═════════╧═════════╧═════════╧═════════╛

Predicate #3 - 1 KeyPath computed property condition
╒══════════════════════════════════╤═════════╤═════════╤═════════╤═════════╤═════════╤═════════╤═════════╤═════════╕
│ Metric                           │      p0 │     p25 │     p50 │     p75 │     p90 │     p99 │    p100 │ Samples │
╞══════════════════════════════════╪═════════╪═════════╪═════════╪═════════╪═════════╪═════════╪═════════╪═════════╡
│ (Alloc + Retain) - Release Δ     │    1000 │    1000 │    1000 │    1000 │    1000 │    1000 │    1000 │    6082 │
├──────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Malloc (total)                   │    4000 │    4000 │    4000 │    4000 │    4000 │    4000 │    4000 │    6082 │
├──────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Object allocs                    │    4000 │    4000 │    4000 │    4000 │    4000 │    4000 │    4000 │    6082 │
├──────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Releases                         │    9000 │    9000 │    9000 │    9000 │    9000 │    9000 │    9000 │    6082 │
├──────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Retains                          │    4000 │    4000 │    4000 │    4000 │    4000 │    4000 │    4000 │    6082 │
├──────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Throughput (# / s)               │    1313 │    1273 │    1269 │    1234 │    1227 │    1146 │    1068 │    6082 │
├──────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Time (total CPU) (μs)            │     782 │     807 │     809 │     829 │     836 │     877 │     954 │    6082 │
├──────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Time (wall clock) (μs)           │     761 │     785 │     788 │     810 │     815 │     872 │     936 │    6082 │
╘══════════════════════════════════╧═════════╧═════════╧═════════╧═════════╧═════════╧═════════╧═════════╧═════════╛

Predicate #4 - 1 KeyPath nested computed property condition
╒══════════════════════════════════╤═════════╤═════════╤═════════╤═════════╤═════════╤═════════╤═════════╤═════════╕
│ Metric                           │      p0 │     p25 │     p50 │     p75 │     p90 │     p99 │    p100 │ Samples │
╞══════════════════════════════════╪═════════╪═════════╪═════════╪═════════╪═════════╪═════════╪═════════╪═════════╡
│ (Alloc + Retain) - Release Δ     │    1000 │    1000 │    1000 │    1000 │    1000 │    1000 │    1000 │    5154 │
├──────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Malloc (total)                   │    4000 │    4000 │    4000 │    4000 │    4000 │    4000 │    4000 │    5154 │
├──────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Object allocs                    │    4000 │    4000 │    4000 │    4000 │    4000 │    4000 │    4000 │    5154 │
├──────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Releases                         │    9000 │    9000 │    9000 │    9000 │    9000 │    9000 │    9000 │    5154 │
├──────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Retains                          │    4000 │    4000 │    4000 │    4000 │    4000 │    4000 │    4000 │    5154 │
├──────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Throughput (# / s)               │    1118 │    1082 │    1062 │    1046 │    1035 │     972 │     447 │    5154 │
├──────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Time (total CPU) (μs)            │     915 │     946 │     962 │     977 │     981 │    1025 │    1203 │    5154 │
├──────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Time (wall clock) (μs)           │     894 │     924 │     942 │     955 │     966 │    1028 │    2236 │    5154 │
╘══════════════════════════════════╧═════════╧═════════╧═════════╧═════════╧═════════╧═════════╧═════════╧═════════╛

Predicate #5 - 3 KeyPath nested computed property conditions
╒══════════════════════════════════╤═════════╤═════════╤═════════╤═════════╤═════════╤═════════╤═════════╤═════════╕
│ Metric                           │      p0 │     p25 │     p50 │     p75 │     p90 │     p99 │    p100 │ Samples │
╞══════════════════════════════════╪═════════╪═════════╪═════════╪═════════╪═════════╪═════════╪═════════╪═════════╡
│ (Alloc + Retain) - Release Δ     │    1000 │    1000 │    1000 │    1000 │    1000 │    1000 │    1000 │    2124 │
├──────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Malloc (total)                   │    8000 │    8000 │    8000 │    8000 │    8000 │    8000 │    8000 │    2124 │
├──────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Object allocs                    │    8000 │    8000 │    8000 │    8000 │    8000 │    8000 │    8000 │    2124 │
├──────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Releases (K)                     │      24 │      24 │      24 │      24 │      24 │      24 │      24 │    2124 │
├──────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Retains (K)                      │      15 │      15 │      15 │      15 │      15 │      15 │      15 │    2124 │
├──────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Throughput (# / s)               │     449 │     434 │     433 │     424 │     420 │     414 │     369 │    2124 │
├──────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Time (total CPU) (μs)            │    2250 │    2324 │    2330 │    2377 │    2402 │    2418 │    2734 │    2124 │
├──────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Time (wall clock) (μs)           │    2229 │    2303 │    2312 │    2359 │    2381 │    2414 │    2711 │    2124 │
╘══════════════════════════════════╧═════════╧═════════╧═════════╧═════════╧═════════╧═════════╧═════════╧═════════╛

@hassila
Copy link
Contributor

hassila commented Nov 7, 2023

Ok, simplified the naming of the test to make runs for specific tests easier, e.g.

swift package benchmark --filter predicateKeypathPropertyCondition

Also added a simple variadic test to see what the impact would be when going from 1 -> 2 at least (seems to be 30% or so). Definitely room for extending that later.

@jmschonfeld
Copy link
Contributor

@swift-ci please test

1 similar comment
@jmschonfeld
Copy link
Contributor

@swift-ci please test

Copy link
Contributor

@jmschonfeld jmschonfeld left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great now, thanks for all the work on this! I'm excited to build upon this with more benchmarks in the future.

@jmschonfeld jmschonfeld merged commit f8c7846 into swiftlang:main Nov 7, 2023
@hassila
Copy link
Contributor

hassila commented Nov 8, 2023

This looks great now, thanks for all the work on this! I'm excited to build upon this with more benchmarks in the future.

Sounds great, thanks - if you are interested in adding the benchmark suite into CI for regression checking in the future, I would suggest having a look at how swift-nio does it.

Happy to help out if you run into any questions on the usage of package-benchmark if looking at wider adoption (and we'll provide new PR:s for any performance issues we run into, currently we are focused on adopting Predicates though).

@hassila hassila deleted the feature/predicate-benchmark branch November 8, 2023 13:51
@jmschonfeld
Copy link
Contributor

if you are interested in adding the benchmark suite into CI for regression checking in the future, I would suggest having a look at how swift-nio does it.

Good to know, thanks! @iCharlesHu and I were discussing this earlier, so Charles it looks like that'd be a good place to start.

Happy to help out if you run into any questions on the usage of package-benchmark if looking at wider adoption (and we'll provide new PR:s for any performance issues we run into, currently we are focused on adopting Predicates though).

Sounds great! Definitely let us know if you hit other performance bottlenecks and we can definitely take a look to see if we can come up with some solutions!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants