Add ZDD-based N-queens solver benchmark #478

jberdine · 2025-02-01T14:32:14Z

This PR adds a benchmark that solves the N-queens problem using ZDDs. The
ZDD data structure code is based on the existing ZDD benchmark. The
motivation for this benchmark is that its performance benefits greatly from
the memoization of the ZDD operations, while the existing benchmark is
faster without memoization. In this sense, this benchmark is more
representative of actual use cases where the sharing introduced and enforced
by hash-consing is used to share subcomputations using memoization to give
polynomial-time algorithms that would be exponential-time without
sharing. (Although note that some ZDD operations are polynomial time with
sharing, others still have worst-case exponential time even if in practice
they are often fast.) Additionally, this benchmark shows Ephemeron-based
caching outperforming (simplistic) Hashtbl-based caching.

Compile:

ocamlopt -O3 zdd_queens.ml -o zdd_queens.exe

Usage example:

MEMO_HC=2 MEMO_OP=2 MEMO_SCALE=1 ./zdd_queens.exe 8

The argument (8 above) indicates the size of the chess board and number of
queens to place. The correct numbers of solutions are given in
https://oeis.org/A000170.

In addition to the problem size argument, three environment variables are
read:

MEMO_HC: Control hash-consing of ZDD nodes:
- 2: use Weak (default)
- 1: use Hashtbl
- 0: none
MEMO_OP: Control memoization of ZDD operations
- 2: use Ephemeron (default)
- 1: use Hashtbl
- 0: none
MEMO_SCALE: Adjust initial sizes of tables
- a multiplicative factor, defaults to 1

Exploring this parameter space a bit, the first observation is that
disabling node hash-consing or operation memoization is much slower:

> hyperfine -L m 2,1,0 -L n 2,1,0 'MEMO_HC={m} MEMO_OP={n} ./zdd_queens.exe 5'
Benchmark 1: MEMO_HC=2 MEMO_OP=2 ./zdd_queens.exe 5
  Time (mean ± σ):       5.9 ms ±   0.4 ms    [User: 5.1 ms, System: 0.6 ms]
  Range (min … max):     5.4 ms …   8.6 ms    368 runs

Benchmark 2: MEMO_HC=1 MEMO_OP=2 ./zdd_queens.exe 5
  Time (mean ± σ):       5.1 ms ±   0.3 ms    [User: 4.4 ms, System: 0.6 ms]
  Range (min … max):     4.7 ms …   6.7 ms    359 runs

Benchmark 3: MEMO_HC=0 MEMO_OP=2 ./zdd_queens.exe 5
  Time (mean ± σ):      2.853 s ±  0.031 s    [User: 2.826 s, System: 0.025 s]
  Range (min … max):    2.819 s …  2.907 s    10 runs

Benchmark 4: MEMO_HC=2 MEMO_OP=1 ./zdd_queens.exe 5
  Time (mean ± σ):       7.4 ms ±   0.5 ms    [User: 6.7 ms, System: 0.6 ms]
  Range (min … max):     6.8 ms …  10.8 ms    228 runs

Benchmark 5: MEMO_HC=1 MEMO_OP=1 ./zdd_queens.exe 5
  Time (mean ± σ):       6.5 ms ±   0.4 ms    [User: 5.8 ms, System: 0.5 ms]
  Range (min … max):     6.0 ms …   8.1 ms    310 runs

Benchmark 6: MEMO_HC=0 MEMO_OP=1 ./zdd_queens.exe 5
  Time (mean ± σ):      2.850 s ±  0.040 s    [User: 2.835 s, System: 0.014 s]
  Range (min … max):    2.813 s …  2.926 s    10 runs

Benchmark 7: MEMO_HC=2 MEMO_OP=0 ./zdd_queens.exe 5
  Time (mean ± σ):      5.223 s ±  0.028 s    [User: 5.187 s, System: 0.034 s]
  Range (min … max):    5.179 s …  5.259 s    10 runs

Benchmark 8: MEMO_HC=1 MEMO_OP=0 ./zdd_queens.exe 5
  Time (mean ± σ):      3.851 s ±  0.032 s    [User: 3.823 s, System: 0.026 s]
  Range (min … max):    3.788 s …  3.899 s    10 runs

Benchmark 9: MEMO_HC=0 MEMO_OP=0 ./zdd_queens.exe 5
  Time (mean ± σ):      9.340 s ±  0.044 s    [User: 9.183 s, System: 0.151 s]
  Range (min … max):    9.288 s …  9.410 s    10 runs

Summary
  MEMO_HC=1 MEMO_OP=2 ./zdd_queens.exe 5 ran
    1.14 ± 0.09 times faster than MEMO_HC=2 MEMO_OP=2 ./zdd_queens.exe 5
    1.27 ± 0.10 times faster than MEMO_HC=1 MEMO_OP=1 ./zdd_queens.exe 5
    1.44 ± 0.12 times faster than MEMO_HC=2 MEMO_OP=1 ./zdd_queens.exe 5
  554.63 ± 29.51 times faster than MEMO_HC=0 MEMO_OP=1 ./zdd_queens.exe 5
  555.19 ± 29.11 times faster than MEMO_HC=0 MEMO_OP=2 ./zdd_queens.exe 5
  749.41 ± 38.92 times faster than MEMO_HC=1 MEMO_OP=0 ./zdd_queens.exe 5
 1016.48 ± 52.41 times faster than MEMO_HC=2 MEMO_OP=0 ./zdd_queens.exe 5
 1817.77 ± 93.62 times faster than MEMO_HC=0 MEMO_OP=0 ./zdd_queens.exe 5

Another observation is that a lot of time can be spent resizing tables, and
the initial sizes make a significant difference:

> hyperfine -L n 0,1,64,128,192 'MEMO_SCALE={n} ./zdd_queens.exe 8'
Benchmark 1: MEMO_SCALE=0 ./zdd_queens.exe 8
  Time (mean ± σ):     12.624 s ±  0.040 s    [User: 12.312 s, System: 0.307 s]
  Range (min … max):   12.563 s … 12.682 s    10 runs

Benchmark 2: MEMO_SCALE=1 ./zdd_queens.exe 8
  Time (mean ± σ):     12.158 s ±  0.069 s    [User: 11.834 s, System: 0.319 s]
  Range (min … max):   12.029 s … 12.246 s    10 runs

Benchmark 3: MEMO_SCALE=64 ./zdd_queens.exe 8
  Time (mean ± σ):     10.887 s ±  0.031 s    [User: 10.579 s, System: 0.303 s]
  Range (min … max):   10.835 s … 10.927 s    10 runs

Benchmark 4: MEMO_SCALE=128 ./zdd_queens.exe 8
  Time (mean ± σ):      8.003 s ±  0.046 s    [User: 7.787 s, System: 0.211 s]
  Range (min … max):    7.946 s …  8.070 s    10 runs

Benchmark 5: MEMO_SCALE=192 ./zdd_queens.exe 8
  Time (mean ± σ):      7.722 s ±  0.188 s    [User: 7.516 s, System: 0.196 s]
  Range (min … max):    7.564 s …  8.189 s    10 runs

Summary
  MEMO_SCALE=192 ./zdd_queens.exe 8 ran
    1.04 ± 0.03 times faster than MEMO_SCALE=128 ./zdd_queens.exe 8
    1.41 ± 0.03 times faster than MEMO_SCALE=64 ./zdd_queens.exe 8
    1.57 ± 0.04 times faster than MEMO_SCALE=1 ./zdd_queens.exe 8
    1.63 ± 0.04 times faster than MEMO_SCALE=0 ./zdd_queens.exe 8

Here MEMO_SCALE=0 sets the initial tables to the minimum allowed by Weak or
Hashtbl, 1 is small, and 192 is large enough to avoid needing to resize.

Finally, Ephemeron-based caching is more efficient than Hashtbl-based
caching, and this is true for OCaml 4 and 5:

hyperfine -L m 2,1 -L n 2,1 -L v 414,trunk 'MEMO_HC={m} MEMO_OP={n} MEMO_SCALE=128 ./zdd_queens_{v}.exe 8'
Benchmark 1: MEMO_HC=2 MEMO_OP=2 MEMO_SCALE=128 ./zdd_queens_414.exe 8
  Time (mean ± σ):      9.089 s ±  0.257 s    [User: 8.898 s, System: 0.186 s]
  Range (min … max):    8.836 s …  9.470 s    10 runs

Benchmark 2: MEMO_HC=1 MEMO_OP=2 MEMO_SCALE=128 ./zdd_queens_414.exe 8
  Time (mean ± σ):      8.661 s ±  0.305 s    [User: 8.478 s, System: 0.168 s]
  Range (min … max):    8.345 s …  9.367 s    10 runs

Benchmark 3: MEMO_HC=2 MEMO_OP=1 MEMO_SCALE=128 ./zdd_queens_414.exe 8
  Time (mean ± σ):     14.299 s ±  0.801 s    [User: 14.100 s, System: 0.180 s]
  Range (min … max):   13.318 s … 15.931 s    10 runs

Benchmark 4: MEMO_HC=1 MEMO_OP=1 MEMO_SCALE=128 ./zdd_queens_414.exe 8
  Time (mean ± σ):     14.495 s ±  0.376 s    [User: 14.323 s, System: 0.163 s]
  Range (min … max):   13.692 s … 14.974 s    10 runs

Benchmark 5: MEMO_HC=2 MEMO_OP=2 MEMO_SCALE=128 ./zdd_queens_trunk.exe 8
  Time (mean ± σ):      8.053 s ±  0.078 s    [User: 7.834 s, System: 0.214 s]
  Range (min … max):    7.983 s …  8.217 s    10 runs

Benchmark 6: MEMO_HC=1 MEMO_OP=2 MEMO_SCALE=128 ./zdd_queens_trunk.exe 8
  Time (mean ± σ):      8.564 s ±  0.197 s    [User: 8.320 s, System: 0.236 s]
  Range (min … max):    8.252 s …  8.828 s    10 runs

Benchmark 7: MEMO_HC=2 MEMO_OP=1 MEMO_SCALE=128 ./zdd_queens_trunk.exe 8
  Time (mean ± σ):     12.547 s ±  0.075 s    [User: 12.301 s, System: 0.241 s]
  Range (min … max):   12.429 s … 12.685 s    10 runs

Benchmark 8: MEMO_HC=1 MEMO_OP=1 MEMO_SCALE=128 ./zdd_queens_trunk.exe 8
  Time (mean ± σ):     13.378 s ±  0.295 s    [User: 13.080 s, System: 0.272 s]
  Range (min … max):   12.882 s … 13.796 s    10 runs

Summary
  MEMO_HC=2 MEMO_OP=2 MEMO_SCALE=128 ./zdd_queens_trunk.exe 8 ran
    1.06 ± 0.03 times faster than MEMO_HC=1 MEMO_OP=2 MEMO_SCALE=128 ./zdd_queens_trunk.exe 8
    1.08 ± 0.04 times faster than MEMO_HC=1 MEMO_OP=2 MEMO_SCALE=128 ./zdd_queens_414.exe 8
    1.13 ± 0.03 times faster than MEMO_HC=2 MEMO_OP=2 MEMO_SCALE=128 ./zdd_queens_414.exe 8
    1.56 ± 0.02 times faster than MEMO_HC=2 MEMO_OP=1 MEMO_SCALE=128 ./zdd_queens_trunk.exe 8
    1.66 ± 0.04 times faster than MEMO_HC=1 MEMO_OP=1 MEMO_SCALE=128 ./zdd_queens_trunk.exe 8
    1.78 ± 0.10 times faster than MEMO_HC=2 MEMO_OP=1 MEMO_SCALE=128 ./zdd_queens_414.exe 8
    1.80 ± 0.05 times faster than MEMO_HC=1 MEMO_OP=1 MEMO_SCALE=128 ./zdd_queens_414.exe 8

For the record, "trunk" here is from commit 137dd26adc of Sat Jan 18, 414 is
the official 4.14.2 release, with flambda enabled for both.

Signed-off-by: Josh Berdine [email protected]

This PR adds a benchmark that solves the N-queens problem using ZDDs. The ZDD data structure code is based on the existing ZDD benchmark. The motivation for this benchmark is that its performance benefits greatly from the memoization of the ZDD operations, while the existing benchmark is faster without memoization. In this sense, this benchmark is more representative of actual use cases where the sharing introduced and enforced by hash-consing is used to share subcomputations using memoization to give polynomial-time algorithms that would be exponential-time without sharing. (Although note that some ZDD operations are polynomial time with sharing, others still have worst-case exponential time even if in practice they are often fast.) Additionally, this benchmark shows Ephemeron-based caching outperforming (simplistic) Hashtbl-based caching. Signed-off-by: Josh Berdine <[email protected]>

jberdine · 2025-02-01T14:35:55Z

CI is unhappy, but I will wait to see if there is interest in including this benchmark before fighting with it.

jberdine mentioned this pull request Feb 1, 2025

Allow values reachable from ephemeron keys to be collected by minor GC ocaml/ocaml#13643

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ZDD-based N-queens solver benchmark #478

Add ZDD-based N-queens solver benchmark #478

jberdine commented Feb 1, 2025

jberdine commented Feb 1, 2025

Add ZDD-based N-queens solver benchmark #478

Are you sure you want to change the base?

Add ZDD-based N-queens solver benchmark #478

Conversation

jberdine commented Feb 1, 2025

jberdine commented Feb 1, 2025