Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ZDD-based N-queens solver benchmark #478

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jberdine
Copy link
Contributor

@jberdine jberdine commented Feb 1, 2025

This PR adds a benchmark that solves the N-queens problem using ZDDs. The
ZDD data structure code is based on the existing ZDD benchmark. The
motivation for this benchmark is that its performance benefits greatly from
the memoization of the ZDD operations, while the existing benchmark is
faster without memoization. In this sense, this benchmark is more
representative of actual use cases where the sharing introduced and enforced
by hash-consing is used to share subcomputations using memoization to give
polynomial-time algorithms that would be exponential-time without
sharing. (Although note that some ZDD operations are polynomial time with
sharing, others still have worst-case exponential time even if in practice
they are often fast.) Additionally, this benchmark shows Ephemeron-based
caching outperforming (simplistic) Hashtbl-based caching.

Compile:

ocamlopt -O3 zdd_queens.ml -o zdd_queens.exe

Usage example:

MEMO_HC=2 MEMO_OP=2 MEMO_SCALE=1 ./zdd_queens.exe 8

The argument (8 above) indicates the size of the chess board and number of
queens to place. The correct numbers of solutions are given in
https://oeis.org/A000170.

In addition to the problem size argument, three environment variables are
read:

  • MEMO_HC: Control hash-consing of ZDD nodes:
    • 2: use Weak (default)
    • 1: use Hashtbl
    • 0: none
  • MEMO_OP: Control memoization of ZDD operations
    • 2: use Ephemeron (default)
    • 1: use Hashtbl
    • 0: none
  • MEMO_SCALE: Adjust initial sizes of tables
    • a multiplicative factor, defaults to 1

Exploring this parameter space a bit, the first observation is that
disabling node hash-consing or operation memoization is much slower:

> hyperfine -L m 2,1,0 -L n 2,1,0 'MEMO_HC={m} MEMO_OP={n} ./zdd_queens.exe 5'
Benchmark 1: MEMO_HC=2 MEMO_OP=2 ./zdd_queens.exe 5
  Time (mean ± σ):       5.9 ms ±   0.4 ms    [User: 5.1 ms, System: 0.6 ms]
  Range (min … max):     5.4 ms …   8.6 ms    368 runs

Benchmark 2: MEMO_HC=1 MEMO_OP=2 ./zdd_queens.exe 5
  Time (mean ± σ):       5.1 ms ±   0.3 ms    [User: 4.4 ms, System: 0.6 ms]
  Range (min … max):     4.7 ms …   6.7 ms    359 runs

Benchmark 3: MEMO_HC=0 MEMO_OP=2 ./zdd_queens.exe 5
  Time (mean ± σ):      2.853 s ±  0.031 s    [User: 2.826 s, System: 0.025 s]
  Range (min … max):    2.819 s …  2.907 s    10 runs

Benchmark 4: MEMO_HC=2 MEMO_OP=1 ./zdd_queens.exe 5
  Time (mean ± σ):       7.4 ms ±   0.5 ms    [User: 6.7 ms, System: 0.6 ms]
  Range (min … max):     6.8 ms …  10.8 ms    228 runs

Benchmark 5: MEMO_HC=1 MEMO_OP=1 ./zdd_queens.exe 5
  Time (mean ± σ):       6.5 ms ±   0.4 ms    [User: 5.8 ms, System: 0.5 ms]
  Range (min … max):     6.0 ms …   8.1 ms    310 runs

Benchmark 6: MEMO_HC=0 MEMO_OP=1 ./zdd_queens.exe 5
  Time (mean ± σ):      2.850 s ±  0.040 s    [User: 2.835 s, System: 0.014 s]
  Range (min … max):    2.813 s …  2.926 s    10 runs

Benchmark 7: MEMO_HC=2 MEMO_OP=0 ./zdd_queens.exe 5
  Time (mean ± σ):      5.223 s ±  0.028 s    [User: 5.187 s, System: 0.034 s]
  Range (min … max):    5.179 s …  5.259 s    10 runs

Benchmark 8: MEMO_HC=1 MEMO_OP=0 ./zdd_queens.exe 5
  Time (mean ± σ):      3.851 s ±  0.032 s    [User: 3.823 s, System: 0.026 s]
  Range (min … max):    3.788 s …  3.899 s    10 runs

Benchmark 9: MEMO_HC=0 MEMO_OP=0 ./zdd_queens.exe 5
  Time (mean ± σ):      9.340 s ±  0.044 s    [User: 9.183 s, System: 0.151 s]
  Range (min … max):    9.288 s …  9.410 s    10 runs

Summary
  MEMO_HC=1 MEMO_OP=2 ./zdd_queens.exe 5 ran
    1.14 ± 0.09 times faster than MEMO_HC=2 MEMO_OP=2 ./zdd_queens.exe 5
    1.27 ± 0.10 times faster than MEMO_HC=1 MEMO_OP=1 ./zdd_queens.exe 5
    1.44 ± 0.12 times faster than MEMO_HC=2 MEMO_OP=1 ./zdd_queens.exe 5
  554.63 ± 29.51 times faster than MEMO_HC=0 MEMO_OP=1 ./zdd_queens.exe 5
  555.19 ± 29.11 times faster than MEMO_HC=0 MEMO_OP=2 ./zdd_queens.exe 5
  749.41 ± 38.92 times faster than MEMO_HC=1 MEMO_OP=0 ./zdd_queens.exe 5
 1016.48 ± 52.41 times faster than MEMO_HC=2 MEMO_OP=0 ./zdd_queens.exe 5
 1817.77 ± 93.62 times faster than MEMO_HC=0 MEMO_OP=0 ./zdd_queens.exe 5

Another observation is that a lot of time can be spent resizing tables, and
the initial sizes make a significant difference:

> hyperfine -L n 0,1,64,128,192 'MEMO_SCALE={n} ./zdd_queens.exe 8'
Benchmark 1: MEMO_SCALE=0 ./zdd_queens.exe 8
  Time (mean ± σ):     12.624 s ±  0.040 s    [User: 12.312 s, System: 0.307 s]
  Range (min … max):   12.563 s … 12.682 s    10 runs

Benchmark 2: MEMO_SCALE=1 ./zdd_queens.exe 8
  Time (mean ± σ):     12.158 s ±  0.069 s    [User: 11.834 s, System: 0.319 s]
  Range (min … max):   12.029 s … 12.246 s    10 runs

Benchmark 3: MEMO_SCALE=64 ./zdd_queens.exe 8
  Time (mean ± σ):     10.887 s ±  0.031 s    [User: 10.579 s, System: 0.303 s]
  Range (min … max):   10.835 s … 10.927 s    10 runs

Benchmark 4: MEMO_SCALE=128 ./zdd_queens.exe 8
  Time (mean ± σ):      8.003 s ±  0.046 s    [User: 7.787 s, System: 0.211 s]
  Range (min … max):    7.946 s …  8.070 s    10 runs

Benchmark 5: MEMO_SCALE=192 ./zdd_queens.exe 8
  Time (mean ± σ):      7.722 s ±  0.188 s    [User: 7.516 s, System: 0.196 s]
  Range (min … max):    7.564 s …  8.189 s    10 runs

Summary
  MEMO_SCALE=192 ./zdd_queens.exe 8 ran
    1.04 ± 0.03 times faster than MEMO_SCALE=128 ./zdd_queens.exe 8
    1.41 ± 0.03 times faster than MEMO_SCALE=64 ./zdd_queens.exe 8
    1.57 ± 0.04 times faster than MEMO_SCALE=1 ./zdd_queens.exe 8
    1.63 ± 0.04 times faster than MEMO_SCALE=0 ./zdd_queens.exe 8

Here MEMO_SCALE=0 sets the initial tables to the minimum allowed by Weak or
Hashtbl, 1 is small, and 192 is large enough to avoid needing to resize.

Finally, Ephemeron-based caching is more efficient than Hashtbl-based
caching, and this is true for OCaml 4 and 5:

hyperfine -L m 2,1 -L n 2,1 -L v 414,trunk 'MEMO_HC={m} MEMO_OP={n} MEMO_SCALE=128 ./zdd_queens_{v}.exe 8'
Benchmark 1: MEMO_HC=2 MEMO_OP=2 MEMO_SCALE=128 ./zdd_queens_414.exe 8
  Time (mean ± σ):      9.089 s ±  0.257 s    [User: 8.898 s, System: 0.186 s]
  Range (min … max):    8.836 s …  9.470 s    10 runs

Benchmark 2: MEMO_HC=1 MEMO_OP=2 MEMO_SCALE=128 ./zdd_queens_414.exe 8
  Time (mean ± σ):      8.661 s ±  0.305 s    [User: 8.478 s, System: 0.168 s]
  Range (min … max):    8.345 s …  9.367 s    10 runs

Benchmark 3: MEMO_HC=2 MEMO_OP=1 MEMO_SCALE=128 ./zdd_queens_414.exe 8
  Time (mean ± σ):     14.299 s ±  0.801 s    [User: 14.100 s, System: 0.180 s]
  Range (min … max):   13.318 s … 15.931 s    10 runs

Benchmark 4: MEMO_HC=1 MEMO_OP=1 MEMO_SCALE=128 ./zdd_queens_414.exe 8
  Time (mean ± σ):     14.495 s ±  0.376 s    [User: 14.323 s, System: 0.163 s]
  Range (min … max):   13.692 s … 14.974 s    10 runs

Benchmark 5: MEMO_HC=2 MEMO_OP=2 MEMO_SCALE=128 ./zdd_queens_trunk.exe 8
  Time (mean ± σ):      8.053 s ±  0.078 s    [User: 7.834 s, System: 0.214 s]
  Range (min … max):    7.983 s …  8.217 s    10 runs

Benchmark 6: MEMO_HC=1 MEMO_OP=2 MEMO_SCALE=128 ./zdd_queens_trunk.exe 8
  Time (mean ± σ):      8.564 s ±  0.197 s    [User: 8.320 s, System: 0.236 s]
  Range (min … max):    8.252 s …  8.828 s    10 runs

Benchmark 7: MEMO_HC=2 MEMO_OP=1 MEMO_SCALE=128 ./zdd_queens_trunk.exe 8
  Time (mean ± σ):     12.547 s ±  0.075 s    [User: 12.301 s, System: 0.241 s]
  Range (min … max):   12.429 s … 12.685 s    10 runs

Benchmark 8: MEMO_HC=1 MEMO_OP=1 MEMO_SCALE=128 ./zdd_queens_trunk.exe 8
  Time (mean ± σ):     13.378 s ±  0.295 s    [User: 13.080 s, System: 0.272 s]
  Range (min … max):   12.882 s … 13.796 s    10 runs

Summary
  MEMO_HC=2 MEMO_OP=2 MEMO_SCALE=128 ./zdd_queens_trunk.exe 8 ran
    1.06 ± 0.03 times faster than MEMO_HC=1 MEMO_OP=2 MEMO_SCALE=128 ./zdd_queens_trunk.exe 8
    1.08 ± 0.04 times faster than MEMO_HC=1 MEMO_OP=2 MEMO_SCALE=128 ./zdd_queens_414.exe 8
    1.13 ± 0.03 times faster than MEMO_HC=2 MEMO_OP=2 MEMO_SCALE=128 ./zdd_queens_414.exe 8
    1.56 ± 0.02 times faster than MEMO_HC=2 MEMO_OP=1 MEMO_SCALE=128 ./zdd_queens_trunk.exe 8
    1.66 ± 0.04 times faster than MEMO_HC=1 MEMO_OP=1 MEMO_SCALE=128 ./zdd_queens_trunk.exe 8
    1.78 ± 0.10 times faster than MEMO_HC=2 MEMO_OP=1 MEMO_SCALE=128 ./zdd_queens_414.exe 8
    1.80 ± 0.05 times faster than MEMO_HC=1 MEMO_OP=1 MEMO_SCALE=128 ./zdd_queens_414.exe 8

For the record, "trunk" here is from commit 137dd26adc of Sat Jan 18, 414 is
the official 4.14.2 release, with flambda enabled for both.

Signed-off-by: Josh Berdine [email protected]

This PR adds a benchmark that solves the N-queens problem using ZDDs. The
ZDD data structure code is based on the existing ZDD benchmark. The
motivation for this benchmark is that its performance benefits greatly from
the memoization of the ZDD operations, while the existing benchmark is
faster without memoization. In this sense, this benchmark is more
representative of actual use cases where the sharing introduced and enforced
by hash-consing is used to share subcomputations using memoization to give
polynomial-time algorithms that would be exponential-time without
sharing. (Although note that some ZDD operations are polynomial time with
sharing, others still have worst-case exponential time even if in practice
they are often fast.) Additionally, this benchmark shows Ephemeron-based
caching outperforming (simplistic) Hashtbl-based caching.

Signed-off-by: Josh Berdine <[email protected]>
@jberdine
Copy link
Contributor Author

jberdine commented Feb 1, 2025

CI is unhappy, but I will wait to see if there is interest in including this benchmark before fighting with it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant