Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: teach ALPArray to store validity only in the encoded array & other minor changes #2053

Closed
wants to merge 15 commits into from

Conversation

danking
Copy link
Member

@danking danking commented Jan 22, 2025

The original change of this PR trims invalid values from the patches and makes the patches validity either AllValid (for nullable arrays) or NonNullable.

Benchmarks on latest commit:

parameter is: (number of elements, fraction patched, fraction valid).

Any ratio greater than 1.1 or less than 0.9 has a ***

alp_compress                    │ PR median     │ develop median │ ratio
├─ compress_alp                 │               │                │
│  ├─ f32                       │               │                │
│  │  ├─ (100000, 0.0, 0.25)    │ 160.4 µs      │ 159.6 µs       │ 1.0050
│  │  ├─ (100000, 0.0, 0.95)    │ 145.9 µs      │ 143.8 µs       │ 1.0146
│  │  ├─ (100000, 0.0, 1.0)     │ 137.0 µs      │ 135.5 µs       │ 1.0110
│  │  ├─ (100000, 0.01, 0.25)   │ 227.7 µs      │ 230.7 µs       │ 0.9869
│  │  ├─ (100000, 0.01, 0.95)   │ 227.9 µs      │ 227.2 µs       │ 1.0030
│  │  ├─ (100000, 0.01, 1.0)    │ 226.6 µs      │ 227.5 µs       │ 0.9960
│  │  ├─ (100000, 0.1, 0.25)    │ 238.3 µs      │ 248.9 µs       │ 0.9574
│  │  ├─ (100000, 0.1, 0.95)    │ 238.2 µs      │ 269.8 µs       │ 0.8828  ***
│  │  ├─ (100000, 0.1, 1.0)     │ 230.6 µs      │ 231.9 µs       │ 0.9943
│  │  ├─ (10000000, 0.0, 0.25)  │ 14.17 ms      │ 13.77 ms       │ 1.0290
│  │  ├─ (10000000, 0.0, 0.95)  │ 14.16 ms      │ 13.8 ms        │ 1.0260
│  │  ├─ (10000000, 0.0, 1.0)   │ 14.0 ms       │ 12.47 ms       │ 1.1226  ***
│  │  ├─ (10000000, 0.01, 0.25) │ 22.29 ms      │ 23.13 ms       │ 0.9636
│  │  ├─ (10000000, 0.01, 0.95) │ 22.26 ms      │ 23.78 ms       │ 0.9360
│  │  ├─ (10000000, 0.01, 1.0)  │ 22.19 ms      │ 21.79 ms       │ 1.0183
│  │  ├─ (10000000, 0.1, 0.25)  │ 23.31 ms      │ 27.72 ms       │ 0.8409  ***
│  │  ├─ (10000000, 0.1, 0.95)  │ 23.4 ms       │ 27.47 ms       │ 0.8518  ***
│  │  ╰─ (10000000, 0.1, 1.0)   │ 22.99 ms      │ 22.31 ms       │ 1.0304
│  ╰─ f64                       │               │                │
│     ├─ (100000, 0.0, 0.25)    │ 165.2 µs      │ 165.4 µs       │ 0.9987
│     ├─ (100000, 0.0, 0.95)    │ 166.1 µs      │ 163.4 µs       │ 1.0165
│     ├─ (100000, 0.0, 1.0)     │ 164.7 µs      │ 179.9 µs       │ 0.9155
│     ├─ (100000, 0.01, 0.25)   │ 269.7 µs      │ 259.1 µs       │ 1.0409
│     ├─ (100000, 0.01, 0.95)   │ 270.5 µs      │ 259.6 µs       │ 1.0419
│     ├─ (100000, 0.01, 1.0)    │ 268.9 µs      │ 270.6 µs       │ 0.9937
│     ├─ (100000, 0.1, 0.25)    │ 281.7 µs      │ 281.3 µs       │ 1.0014
│     ├─ (100000, 0.1, 0.95)    │ 279.1 µs      │ 315.3 µs       │ 0.8851  ***
│     ├─ (100000, 0.1, 1.0)     │ 273.0 µs      │ 275.7 µs       │ 0.9902
│     ├─ (10000000, 0.0, 0.25)  │ 16.16 ms      │ 15.86 ms       │ 1.0189
│     ├─ (10000000, 0.0, 0.95)  │ 16.19 ms      │ 15.75 ms       │ 1.0279
│     ├─ (10000000, 0.0, 1.0)   │ 16.2 ms       │ 15.83 ms       │ 1.0233
│     ├─ (10000000, 0.01, 0.25) │ 25.29 ms      │ 25.77 ms       │ 0.9813
│     ├─ (10000000, 0.01, 0.95) │ 25.74 ms      │ 25.94 ms       │ 0.9922
│     ├─ (10000000, 0.01, 1.0)  │ 25.54 ms      │ 25.32 ms       │ 1.0086
│     ├─ (10000000, 0.1, 0.25)  │ 26.89 ms      │ 30.73 ms       │ 0.8750  ***
│     ├─ (10000000, 0.1, 0.95)  │ 27.05 ms      │ 30.53 ms       │ 0.8860  ***
│     ╰─ (10000000, 0.1, 1.0)   │ 26.22 ms      │ 25.98 ms       │ 1.0092
├─ decompress_alp               │               │                │
│  ├─ f32                       │               │                │
│  │  ├─ (100000, 0.0, 0.25)    │ 12.24 µs      │ 12.33 µs       │ 0.9927
│  │  ├─ (100000, 0.0, 0.95)    │ 12.24 µs      │ 12.16 µs       │ 1.0065
│  │  ├─ (100000, 0.0, 1.0)     │ 12.2 µs       │ 12.16 µs       │ 1.0032
│  │  ├─ (100000, 0.01, 0.25)   │ 15.12 µs      │ 14.04 µs       │ 1.0769
│  │  ├─ (100000, 0.01, 0.95)   │ 14.95 µs      │ 14.81 µs       │ 1.0094
│  │  ├─ (100000, 0.01, 1.0)    │ 13.43 µs      │ 13.24 µs       │ 1.0143
│  │  ├─ (100000, 0.1, 0.25)    │ 26.08 µs      │ 17.41 µs       │ 1.4979  ***
│  │  ├─ (100000, 0.1, 0.95)    │ 25.87 µs      │ 25.04 µs       │ 1.0331
│  │  ├─ (100000, 0.1, 1.0)     │ 19.33 µs      │ 21.08 µs       │ 0.9169
│  │  ├─ (10000000, 0.0, 0.25)  │ 2.067 ms      │ 2.057 ms       │ 1.0048
│  │  ├─ (10000000, 0.0, 0.95)  │ 2.068 ms      │ 2.055 ms       │ 1.0063
│  │  ├─ (10000000, 0.0, 1.0)   │ 2.07 ms       │ 1.261 ms       │ 1.6415  ***
│  │  ├─ (10000000, 0.01, 0.25) │ 1.51 ms       │ 2.113 ms       │ 0.7146  ***
│  │  ├─ (10000000, 0.01, 0.95) │ 1.477 ms      │ 2.621 ms       │ 0.5635  ***
│  │  ├─ (10000000, 0.01, 1.0)  │ 1.35 ms       │ 1.346 ms       │ 1.0029
│  │  ├─ (10000000, 0.1, 0.25)  │ 3.765 ms      │ 2.58 ms        │ 1.4593  ***
│  │  ├─ (10000000, 0.1, 0.95)  │ 2.784 ms      │ 3.28 ms        │ 0.8487  ***
│  │  ╰─ (10000000, 0.1, 1.0)   │ 1.764 ms      │ 1.754 ms       │ 1.0057
│  ╰─ f64                       │               │                │
│     ├─ (100000, 0.0, 0.25)    │ 23.33 µs      │ 23.45 µs       │ 0.9948
│     ├─ (100000, 0.0, 0.95)    │ 23.41 µs      │ 23.33 µs       │ 1.0034
│     ├─ (100000, 0.0, 1.0)     │ 23.33 µs      │ 23.49 µs       │ 0.9931
│     ├─ (100000, 0.01, 0.25)   │ 25.58 µs      │ 24.66 µs       │ 1.0373
│     ├─ (100000, 0.01, 0.95)   │ 25.58 µs      │ 25.79 µs       │ 0.9918
│     ├─ (100000, 0.01, 1.0)    │ 24.2 µs       │ 24.62 µs       │ 0.9829
│     ├─ (100000, 0.1, 0.25)    │ 39.83 µs      │ 27.87 µs       │ 1.4291  ***
│     ├─ (100000, 0.1, 0.95)    │ 39.7 µs       │ 39.56 µs       │ 1.0035
│     ├─ (100000, 0.1, 1.0)     │ 34.43 µs      │ 31.66 µs       │ 1.0874
│     ├─ (10000000, 0.0, 0.25)  │ 4.246 ms      │ 4.239 ms       │ 1.0016
│     ├─ (10000000, 0.0, 0.95)  │ 4.227 ms      │ 4.292 ms       │ 0.9848
│     ├─ (10000000, 0.0, 1.0)   │ 4.227 ms      │ 4.246 ms       │ 0.9955
│     ├─ (10000000, 0.01, 0.25) │ 4.696 ms      │ 4.356 ms       │ 1.0780
│     ├─ (10000000, 0.01, 0.95) │ 4.933 ms      │ 4.637 ms       │ 1.0638
│     ├─ (10000000, 0.01, 1.0)  │ 4.538 ms      │ 4.545 ms       │ 0.9984
│     ├─ (10000000, 0.1, 0.25)  │ 7.23 ms       │ 5.304 ms       │ 1.3631  ***
│     ├─ (10000000, 0.1, 0.95)  │ 6.227 ms      │ 5.913 ms       │ 1.0531
│     ╰─ (10000000, 0.1, 1.0)   │ 5.207 ms      │ 5.29 ms        │ 0.9843

Benchmarks before reverting to develop's chunking code

[1] Seems like this PR is about the same except for compressing really large f64 arrays. The PR that introduced chunking, #924, reported substantially larger reductions (~5ms of 29ms) in time than this increase of ~1ms (of 17ms).

alp_compress               │ PR median     │ PR mean   │ develop median │ develop mean │
├─ compress_alp            │               │           │                │              │
│  ├─ f32                  │               │           │                │              │
│  │  ├─ (100000, 0.25)    │ 136.4 µs      │ 137.9 µs  │ 143 µs         │ 145.9 µs     │
│  │  ├─ (100000, 0.95)    │ 136.3 µs      │ 137.1 µs  │ 133.1 µs       │ 134.3 µs     │
│  │  ├─ (100000, 1.0)     │ 136 µs        │ 137.3 µs  │ 133.6 µs       │ 134.6 µs     │
│  │  ├─ (10000000, 0.25)  │ 13.54 ms      │ 13.67 ms  │ 13.74 ms       │ 13.84 ms     │
│  │  ├─ (10000000, 0.95)  │ 13.54 ms      │ 13.64 ms  │ 13.49 ms       │ 13.59 ms     │
│  │  ╰─ (10000000, 1.0)   │ 13.47 ms      │ 13.57 ms  │ 13.58 ms       │ 13.73 ms     │
│  ╰─ f64                  │               │           │                │              │
│     ├─ (100000, 0.25)    │ 152.5 µs      │ 153.9 µs  │ 166.1 µs       │ 167.2 µs     │
│     ├─ (100000, 0.95)    │ 152.5 µs      │ 154.3 µs  │ 166.4 µs       │ 167 µs       │
│     ├─ (100000, 1.0)     │ 151.5 µs      │ 153 µs    │ 166.2 µs       │ 166.9 µs     │
│     ├─ (10000000, 0.25)  │ 16.89 ms      │ 17 ms     │ 15.87 ms       │ 15.91 ms     │
│     ├─ (10000000, 0.95)  │ 16.96 ms      │ 17.19 ms  │ 16.14 ms       │ 16.12 ms     │
│     ╰─ (10000000, 1.0)   │ 16.93 ms      │ 16.99 ms  │ 16.15 ms       │ 16.18 ms     │
╰─ decompress_alp          │               │           │                │              │
   ├─ f32                  │               │           │                │              │
   │  ├─ (100000, 0.25)    │ 12.33 µs      │ 12.4 µs   │ 12.37 µs       │ 12.55 µs     │
   │  ├─ (100000, 0.95)    │ 11.99 µs      │ 12.01 µs  │ 12.45 µs       │ 12.58 µs     │
   │  ├─ (100000, 1.0)     │ 11.95 µs      │ 11.98 µs  │ 11.91 µs       │ 11.96 µs     │
   │  ├─ (10000000, 0.25)  │ 1.233 ms      │ 1.24 ms   │ 2.064 ms       │ 2.088 ms     │
   │  ├─ (10000000, 0.95)  │ 1.232 ms      │ 1.235 ms  │ 2.063 ms       │ 2.094 ms     │
   │  ╰─ (10000000, 1.0)   │ 1.233 ms      │ 1.236 ms  │ 2.061 ms       │ 2.088 ms     │
   ╰─ f64                  │               │           │                │              │
      ├─ (100000, 0.25)    │ 23.29 µs      │ 23.46 µs  │ 23.33 µs       │ 23.4 µs      │
      ├─ (100000, 0.95)    │ 22.87 µs      │ 22.92 µs  │ 22.99 µs       │ 23.06 µs     │
      ├─ (100000, 1.0)     │ 22.87 µs      │ 23 µs     │ 22.95 µs       │ 23 µs        │
      ├─ (10000000, 0.25)  │ 4.254 ms      │ 4.393 ms  │ 4.239 ms       │ 4.28 ms      │
      ├─ (10000000, 0.95)  │ 4.703 ms      │ 4.639 ms  │ 4.27 ms        │ 4.437 ms     │
      ╰─ (10000000, 1.0)   │ 4.479 ms      │ 4.58 ms   │ 4.684 ms       │ 4.618 ms     │

@danking danking added the benchmark Run benchmarks on this branch label Jan 22, 2025
@github-actions github-actions bot removed the benchmark Run benchmarks on this branch label Jan 22, 2025
@danking
Copy link
Member Author

danking commented Jan 22, 2025

previously: #1951

Copy link
Contributor

github-actions bot commented Jan 22, 2025

Benchmarks: random_access

Table of Results
name PR 65dcaee base a7876ca ratio (PR/base) unit
random-access/vortex-tokio-local-disk 2.78486e+06 2.74869e+06 1.01316 ns
random-access/vortex-local-fs 3.4561e+06 3.44152e+06 1.00424 ns
random-access/parquet-tokio-local-disk 2.36578e+08 2.32132e+08 1.01915 ns

Copy link
Contributor

Benchmarks: datafusion

Table of Results
name PR 47ddb6a base ee7abec ratio (PR/base) unit
arrow/planning 941690 946100 0.995339 ns
arrow/exec 1.99437e+06 1.97353e+06 1.01056 ns
vortex-pushdown-compressed/planning 576362 577093 0.998733 ns
vortex-pushdown-compressed/exec 2.71224e+06 2.72758e+06 0.994379 ns
vortex-pushdown-uncompressed/planning 573850 585435 0.980211 ns
vortex-pushdown-uncompressed/exec 1.55374e+06 1.57173e+06 0.988554 ns
vortex-nopushdown-compressed/planning 949597 946758 1.003 ns
vortex-nopushdown-compressed/exec 3.10029e+06 3.24769e+06 0.954612 ns
vortex-nopushdown-uncompressed/planning 949927 955033 0.994653 ns
vortex-nopushdown-uncompressed/exec 5.18226e+06 5.1997e+06 0.996646 ns

Copy link
Contributor

github-actions bot commented Jan 22, 2025

Benchmarks: Clickbench

Table of Results
name PR 65dcaee base a7876ca ratio (PR/base) unit
clickbench_q00/parquet 1893482 2.02036e+06 0.937202 ns
clickbench_q01/parquet 61635258 6.2968e+07 0.978835 ns
clickbench_q02/parquet 116972399 1.22199e+08 0.957231 ns
clickbench_q03/parquet 83343053 8.41092e+07 0.990891 ns
clickbench_q04/parquet 671233257 6.6348e+08 1.01169 ns
clickbench_q05/parquet 842944445 8.47761e+08 0.994319 ns
clickbench_q06/parquet 2029864 1.92344e+06 1.05533 ns
clickbench_q07/parquet 64372724 6.2556e+07 1.02904 ns
clickbench_q08/parquet 767268264 7.7805e+08 0.986142 ns
clickbench_q09/parquet 1080836837 1.06958e+09 1.01052 ns
clickbench_q10/parquet 260081344 2.55987e+08 1.01599 ns
clickbench_q11/parquet 325269610 3.05733e+08 1.0639 ns
clickbench_q12/parquet 844832034 8.59597e+08 0.982823 ns
clickbench_q13/parquet 1116267957 1.13418e+09 0.984207 ns
clickbench_q14/parquet 838917805 8.56438e+08 0.979543 ns
clickbench_q15/parquet 797719010 7.86701e+08 1.01401 ns
clickbench_q16/parquet 1731647769 1.67281e+09 1.03517 ns
clickbench_q17/parquet 1497899778 1.49342e+09 1.003 ns
clickbench_q18/parquet 3124291281 3.07459e+09 1.01616 ns
clickbench_q19/parquet 69889765 6.60167e+07 1.05867 ns
clickbench_q20/parquet 1237482438 1.19007e+09 1.03984 ns
clickbench_q21/parquet 1454923334 1.37927e+09 1.05485 ns
clickbench_q22/parquet 2446366209 2.47845e+09 0.987053 ns
clickbench_q23/parquet 8474265495 8.41857e+09 1.00662 ns
clickbench_q24/parquet 539427306 5.3128e+08 1.01533 ns
clickbench_q25/parquet 518074885 5.22446e+08 0.991633 ns
clickbench_q26/parquet 599564734 5.9736e+08 1.00369 ns
clickbench_q27/parquet 1709998673 1.6574e+09 1.03174 ns
clickbench_q28/parquet 11503328620 1.15089e+10 0.999517 ns
clickbench_q29/parquet 427489760 4.27969e+08 0.998881 ns
clickbench_q30/parquet 778196574 7.67879e+08 1.01344 ns
clickbench_q31/parquet 816064573 8.25005e+08 0.989164 ns
clickbench_q32/parquet 2886687289 2.77232e+09 1.04125 ns
clickbench_q33/parquet 2889379033 2.80511e+09 1.03004 ns
clickbench_q34/parquet 2901788885 2.80691e+09 1.0338 ns
clickbench_q35/parquet 889346868 8.71516e+08 1.02046 ns
clickbench_q36/parquet 182630054 1.71337e+08 1.06591 ns
clickbench_q37/parquet 88754603 8.61864e+07 1.0298 ns
clickbench_q38/parquet 117256703 1.12833e+08 1.03921 ns
clickbench_q39/parquet 333703252 3.30302e+08 1.0103 ns
clickbench_q40/parquet 51773874 4.86737e+07 1.06369 ns
clickbench_q41/parquet 50473650 4.89553e+07 1.03101 ns
clickbench_q42/parquet 68045125 6.81624e+07 0.99828 ns
clickbench_q00/vortex-file-compressed 2097306 2.01496e+06 1.04087 ns
clickbench_q01/vortex-file-compressed 28840862 2.71315e+07 1.063 ns
clickbench_q02/vortex-file-compressed 84040021 8.39559e+07 1.001 ns
clickbench_q03/vortex-file-compressed 78528988 7.75597e+07 1.0125 ns
clickbench_q04/vortex-file-compressed 628609289 6.2208e+08 1.0105 ns
clickbench_q05/vortex-file-compressed 656291225 6.51709e+08 1.00703 ns
clickbench_q06/vortex-file-compressed 2137519 2.14613e+06 0.99599 ns
clickbench_q07/vortex-file-compressed 42856314 4.1415e+07 1.0348 ns
clickbench_q08/vortex-file-compressed 752516320 7.81813e+08 0.962527 ns
clickbench_q09/vortex-file-compressed 954456371 9.43324e+08 1.0118 ns
clickbench_q10/vortex-file-compressed 216513096 2.2909e+08 0.945099 ns
clickbench_q11/vortex-file-compressed 238696549 2.5554e+08 0.934088 ns
clickbench_q12/vortex-file-compressed 601805579 5.86968e+08 1.02528 ns
clickbench_q13/vortex-file-compressed 918512303 8.9457e+08 1.02676 ns
clickbench_q14/vortex-file-compressed 597742009 5.91838e+08 1.00998 ns
clickbench_q15/vortex-file-compressed 772128484 7.56529e+08 1.02062 ns
clickbench_q16/vortex-file-compressed 1438349994 1.39471e+09 1.03129 ns
clickbench_q17/vortex-file-compressed 1316402539 1.31832e+09 0.998547 ns
clickbench_q18/vortex-file-compressed 2879090432 2.96451e+09 0.971187 ns
clickbench_q19/vortex-file-compressed 43780891 4.23308e+07 1.03426 ns
clickbench_q20/vortex-file-compressed 508906123 5.06504e+08 1.00474 ns
clickbench_q21/vortex-file-compressed 786976943 7.62595e+08 1.03197 ns
clickbench_q22/vortex-file-compressed 1900178234 1.8706e+09 1.01581 ns
clickbench_q23/vortex-file-compressed 3876166054 3.92467e+09 0.98764 ns
clickbench_q24/vortex-file-compressed 357982020 3.71763e+08 0.962931 ns
clickbench_q25/vortex-file-compressed 336874058 3.33457e+08 1.01025 ns
clickbench_q26/vortex-file-compressed 413242139 4.15802e+08 0.993843 ns
clickbench_q27/vortex-file-compressed 1364187010 1.34836e+09 1.01174 ns
clickbench_q28/vortex-file-compressed 10658023338 1.06431e+10 1.0014 ns
clickbench_q29/vortex-file-compressed 679952875 6.99629e+08 0.971877 ns
clickbench_q30/vortex-file-compressed 585587146 5.75904e+08 1.01681 ns
clickbench_q31/vortex-file-compressed 620989440 6.22303e+08 0.99789 ns
clickbench_q32/vortex-file-compressed 2766649620 2.81823e+09 0.981696 ns
clickbench_q33/vortex-file-compressed 2244492602 2.40779e+09 0.93218 ns
clickbench_q34/vortex-file-compressed 2241308423 2.3666e+09 0.947059 ns
clickbench_q35/vortex-file-compressed 964321209 1.00834e+09 0.956345 ns
clickbench_q36/vortex-file-compressed 52750564 6.30315e+07 0.836892 ns
clickbench_q37/vortex-file-compressed 49534466 5.36539e+07 0.923222 ns
clickbench_q38/vortex-file-compressed 42438641 4.93527e+07 0.859905 ns
clickbench_q39/vortex-file-compressed 87686080 1.0195e+08 0.860086 ns
clickbench_q40/vortex-file-compressed 30194421 3.15424e+07 0.957264 ns
clickbench_q41/vortex-file-compressed 29060659 3.08105e+07 0.943205 ns
clickbench_q42/vortex-file-compressed 37301045 4.09478e+07 0.910942 ns

let (encoded, exceptional_positions) = T::chunked_encode(values.as_slice::<T>(), exponents);

let encoded_array = PrimitiveArray::new(encoded, values.validity()).into_array();
let exceptional_positions = match values.logical_validity() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we add like 1-2 comments in this section just to make it clear what is going on

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LMK what you think, I added a comment above this line.

@danking danking force-pushed the dk/alp-validity-in-encoded-only branch from b5b44e0 to ca17b75 Compare January 30, 2025 19:41
@danking danking added the benchmark Run benchmarks on this branch label Jan 30, 2025
@github-actions github-actions bot removed the benchmark Run benchmarks on this branch label Jan 30, 2025
@danking

This comment was marked as outdated.

@danking danking added the benchmark Run benchmarks on this branch label Jan 30, 2025
Copy link
Contributor

github-actions bot commented Jan 30, 2025

Benchmarks: TPC-H

Table of Results
name PR 65dcaee base a7876ca ratio (PR/base) unit

@github-actions github-actions bot removed the benchmark Run benchmarks on this branch label Jan 30, 2025
@robert3005
Copy link
Member

@danking you should make sure you run with RUSTFLAGS='-C target-cpu=native' we had to axe it from our repo cargo config since it messed up cross compilation

@danking

This comment was marked as outdated.

Copy link
Contributor

github-actions bot commented Jan 30, 2025

Benchmarks: compress

Table of Results
name PR 65dcaee base a7876ca ratio (PR/base) unit
compress time/wide table cols=10 chunks=1 rows=1000 4.11079e+06 4.22199e+06 0.973662 ns
compress time/wide table cols=10 chunks=1 rows=1000 throughput 0.029226 0.0284563 1.02705 bytes/ns
parquet_rs-zstd compress time/wide table cols=10 chunks=1 rows=1000 738886 739146 0.999648 ns
parquet_rs-zstd compress time/wide table cols=10 chunks=1 rows=1000 throughput 0.162599 0.162542 1.00035 bytes/ns
decompress time/wide table cols=10 chunks=1 rows=1000 413794 416725 0.992966 ns
decompress time/wide table cols=10 chunks=1 rows=1000 throughput 0.290343 0.2883 1.00708 bytes/ns
parquet_rs-zstd decompress time/wide table cols=10 chunks=1 rows=1000 271186 277187 0.978352 ns
parquet_rs-zstd decompress time/wide table cols=10 chunks=1 rows=1000 throughput 0.443024 0.433433 1.02213 bytes/ns
compress time/wide table cols=100 chunks=1 rows=1000 4.12314e+07 4.34149e+07 0.949707 ns
compress time/wide table cols=100 chunks=1 rows=1000 throughput 0.0291337 0.0276685 1.05296 bytes/ns
parquet_rs-zstd compress time/wide table cols=100 chunks=1 rows=1000 7.70192e+06 8.5056e+06 0.905511 ns
parquet_rs-zstd compress time/wide table cols=100 chunks=1 rows=1000 throughput 0.155964 0.141227 1.10435 bytes/ns
decompress time/wide table cols=100 chunks=1 rows=1000 4.01831e+06 4.10978e+06 0.977742 ns
decompress time/wide table cols=100 chunks=1 rows=1000 throughput 0.298937 0.292284 1.02277 bytes/ns
parquet_rs-zstd decompress time/wide table cols=100 chunks=1 rows=1000 2.74729e+06 2.89853e+06 0.947823 ns
parquet_rs-zstd decompress time/wide table cols=100 chunks=1 rows=1000 throughput 0.437239 0.414425 1.05505 bytes/ns
compress time/wide table cols=1000 chunks=1 rows=1000 4.25888e+08 4.30145e+08 0.990103 ns
compress time/wide table cols=1000 chunks=1 rows=1000 throughput 0.0282046 0.0279255 1.01 bytes/ns
parquet_rs-zstd compress time/wide table cols=1000 chunks=1 rows=1000 9.2937e+07 9.51301e+07 0.976947 ns
parquet_rs-zstd compress time/wide table cols=1000 chunks=1 rows=1000 throughput 0.129249 0.126269 1.0236 bytes/ns
decompress time/wide table cols=1000 chunks=1 rows=1000 7.54628e+07 7.96789e+07 0.947086 ns
decompress time/wide table cols=1000 chunks=1 rows=1000 throughput 0.159178 0.150755 1.05587 bytes/ns
parquet_rs-zstd decompress time/wide table cols=1000 chunks=1 rows=1000 3.17539e+07 3.36228e+07 0.944417 ns
parquet_rs-zstd decompress time/wide table cols=1000 chunks=1 rows=1000 throughput 0.378285 0.357258 1.05885 bytes/ns
compress time/wide table cols=10 chunks=50 rows=1000 8.33279e+06 8.5026e+06 0.980028 ns
compress time/wide table cols=10 chunks=50 rows=1000 throughput 0.0151766 0.0148734 1.02038 bytes/ns
parquet_rs-zstd compress time/wide table cols=10 chunks=50 rows=1000 1.11033e+06 1.13984e+06 0.974109 ns
parquet_rs-zstd compress time/wide table cols=10 chunks=50 rows=1000 throughput 0.113897 0.110948 1.02658 bytes/ns
decompress time/wide table cols=10 chunks=50 rows=1000 433285 435006 0.996044 ns
decompress time/wide table cols=10 chunks=50 rows=1000 throughput 0.29187 0.290716 1.00397 bytes/ns
parquet_rs-zstd decompress time/wide table cols=10 chunks=50 rows=1000 279558 280037 0.998291 ns
parquet_rs-zstd decompress time/wide table cols=10 chunks=50 rows=1000 throughput 0.452368 0.451595 1.00171 bytes/ns
compress time/wide table cols=100 chunks=50 rows=1000 8.94013e+07 9.1106e+07 0.981289 ns
compress time/wide table cols=100 chunks=50 rows=1000 throughput 0.0140989 0.0138351 1.01907 bytes/ns
parquet_rs-zstd compress time/wide table cols=100 chunks=50 rows=1000 1.35341e+07 1.45264e+07 0.93169 ns
parquet_rs-zstd compress time/wide table cols=100 chunks=50 rows=1000 throughput 0.0931322 0.0867703 1.07332 bytes/ns
decompress time/wide table cols=100 chunks=50 rows=1000 4.18102e+06 4.25089e+06 0.983564 ns
decompress time/wide table cols=100 chunks=50 rows=1000 throughput 0.301472 0.296517 1.01671 bytes/ns
parquet_rs-zstd decompress time/wide table cols=100 chunks=50 rows=1000 2.86783e+06 2.9278e+06 0.979518 ns
parquet_rs-zstd decompress time/wide table cols=100 chunks=50 rows=1000 throughput 0.439518 0.430516 1.02091 bytes/ns
compress time/wide table cols=1000 chunks=50 rows=1000 9.59031e+08 9.80859e+08 0.977746 ns
compress time/wide table cols=1000 chunks=50 rows=1000 throughput 0.0131387 0.0128464 1.02276 bytes/ns
parquet_rs-zstd compress time/wide table cols=1000 chunks=50 rows=1000 1.75425e+08 1.91682e+08 0.91519 ns
parquet_rs-zstd compress time/wide table cols=1000 chunks=50 rows=1000 throughput 0.071828 0.0657363 1.09267 bytes/ns
decompress time/wide table cols=1000 chunks=50 rows=1000 8.06546e+07 8.96356e+07 0.899805 ns
decompress time/wide table cols=1000 chunks=50 rows=1000 throughput 0.156227 0.140574 1.11135 bytes/ns
parquet_rs-zstd decompress time/wide table cols=1000 chunks=50 rows=1000 3.71423e+07 3.86265e+07 0.961575 ns
parquet_rs-zstd decompress time/wide table cols=1000 chunks=50 rows=1000 throughput 0.339249 0.326213 1.03996 bytes/ns
compress time/taxi 1.59083e+09 1.61659e+09 0.984068 ns
compress time/taxi throughput 0.295953 0.291238 1.01619 bytes/ns
parquet_rs-zstd compress time/taxi 1.8139e+09 1.84739e+09 0.981869 ns
parquet_rs-zstd compress time/taxi throughput 0.259557 0.254851 1.01847 bytes/ns
decompress time/taxi 3.8271e+08 3.79719e+08 1.00788 ns
decompress time/taxi throughput 1.2302 1.23989 0.992185 bytes/ns
parquet_rs-zstd decompress time/taxi 3.10568e+08 3.12995e+08 0.992244 ns
parquet_rs-zstd decompress time/taxi throughput 1.51597 1.50421 1.00782 bytes/ns
compress time/AirlineSentiment 285579 285481 1.00035 ns
compress time/AirlineSentiment throughput 0.00714337 0.00714584 0.999655 bytes/ns
parquet_rs-zstd compress time/AirlineSentiment 54749.9 56303.1 0.972414 ns
parquet_rs-zstd compress time/AirlineSentiment throughput 0.0372603 0.0362325 1.02837 bytes/ns
decompress time/AirlineSentiment 186888 189909 0.984093 ns
decompress time/AirlineSentiment throughput 0.0109156 0.010742 1.01616 bytes/ns
parquet_rs-zstd decompress time/AirlineSentiment 30898.9 31866.5 0.969634 ns
parquet_rs-zstd decompress time/AirlineSentiment throughput 0.0660219 0.064017 1.03132 bytes/ns
compress time/Arade 2.75493e+09 2.78523e+09 0.989121 ns
compress time/Arade throughput 0.285681 0.282573 1.011 bytes/ns
parquet_rs-zstd compress time/Arade 3.02583e+09 3.06865e+09 0.986047 ns
parquet_rs-zstd compress time/Arade throughput 0.260103 0.256474 1.01415 bytes/ns
decompress time/Arade 8.02333e+08 7.28471e+08 1.10139 ns
decompress time/Arade throughput 0.980926 1.08038 0.907942 bytes/ns
parquet_rs-zstd decompress time/Arade 6.93619e+08 6.80535e+08 1.01923 ns
parquet_rs-zstd decompress time/Arade throughput 1.13467 1.15649 0.981135 bytes/ns
compress time/Bimbo 1.2004e+10 1.20271e+10 0.998079 ns
compress time/Bimbo throughput 0.59325 0.59211 1.00192 bytes/ns
parquet_rs-zstd compress time/Bimbo 2.21634e+10 2.27723e+10 0.973258 ns
parquet_rs-zstd compress time/Bimbo throughput 0.321312 0.312719 1.02748 bytes/ns
decompress time/Bimbo 4.64528e+09 4.78511e+09 0.970778 ns
decompress time/Bimbo throughput 1.53303 1.48823 1.0301 bytes/ns
parquet_rs-zstd decompress time/Bimbo 3.26095e+09 4.02577e+09 0.810019 ns
parquet_rs-zstd decompress time/Bimbo throughput 2.18383 1.76894 1.23454 bytes/ns
compress time/CMSprovider 1.23465e+10 1.28623e+10 0.959898 ns
compress time/CMSprovider throughput 0.417053 0.400328 1.04178 bytes/ns
parquet_rs-zstd compress time/CMSprovider 1.82751e+10 1.89505e+10 0.964361 ns
parquet_rs-zstd compress time/CMSprovider throughput 0.281758 0.271716 1.03696 bytes/ns
decompress time/CMSprovider 4.63231e+09 4.67752e+09 0.990334 ns
decompress time/CMSprovider throughput 1.11158 1.10083 1.00976 bytes/ns
parquet_rs-zstd decompress time/CMSprovider 4.98212e+09 5.27342e+09 0.94476 ns
parquet_rs-zstd decompress time/CMSprovider throughput 1.03353 0.976436 1.05847 bytes/ns
compress time/Euro2016 2.05673e+09 2.14945e+09 0.956864 ns
compress time/Euro2016 throughput 0.191204 0.182956 1.04508 bytes/ns
parquet_rs-zstd compress time/Euro2016 1.56379e+09 1.59307e+09 0.981617 ns
parquet_rs-zstd compress time/Euro2016 throughput 0.251476 0.246853 1.01873 bytes/ns
decompress time/Euro2016 2.76666e+08 2.87648e+08 0.961822 ns
decompress time/Euro2016 throughput 1.4214 1.36714 1.03969 bytes/ns
parquet_rs-zstd decompress time/Euro2016 4.81894e+08 5.06167e+08 0.952045 ns
parquet_rs-zstd decompress time/Euro2016 throughput 0.81606 0.776926 1.05037 bytes/ns
compress time/Food 1.03987e+09 1.07218e+09 0.969866 ns
compress time/Food throughput 0.319963 0.310321 1.03107 bytes/ns
parquet_rs-zstd compress time/Food 1.05489e+09 1.10911e+09 0.951115 ns
parquet_rs-zstd compress time/Food throughput 0.315407 0.299989 1.0514 bytes/ns
decompress time/Food 1.81973e+08 1.92055e+08 0.947502 ns
decompress time/Food throughput 1.82841 1.73242 1.05541 bytes/ns
parquet_rs-zstd decompress time/Food 2.18871e+08 2.265e+08 0.966316 ns
parquet_rs-zstd decompress time/Food throughput 1.52016 1.46896 1.03486 bytes/ns
compress time/HashTags 2.48933e+09 2.57127e+09 0.968133 ns
compress time/HashTags throughput 0.323178 0.31288 1.03292 bytes/ns
parquet_rs-zstd compress time/HashTags 2.45418e+09 2.51587e+09 0.97548 ns
parquet_rs-zstd compress time/HashTags throughput 0.327807 0.319769 1.02514 bytes/ns
decompress time/HashTags 4.42347e+08 4.60031e+08 0.96156 ns
decompress time/HashTags throughput 1.81871 1.74879 1.03998 bytes/ns
parquet_rs-zstd decompress time/HashTags 7.64234e+08 8.39928e+08 0.90988 ns
parquet_rs-zstd decompress time/HashTags throughput 1.05269 0.957819 1.09905 bytes/ns
compress time/TPC-H l_comment chunked without fsst 2.97876e+09 3.43041e+09 0.868338 ns
compress time/TPC-H l_comment chunked without fsst throughput 0.0836586 0.0726439 1.15163 bytes/ns
parquet_rs-zstd compress time/TPC-H l_comment chunked without fsst 9.08086e+08 9.37191e+08 0.968944 ns
parquet_rs-zstd compress time/TPC-H l_comment chunked without fsst throughput 0.274422 0.265899 1.03205 bytes/ns
decompress time/TPC-H l_comment chunked without fsst 5.43717e+07 5.70223e+07 0.953516 ns
decompress time/TPC-H l_comment chunked without fsst throughput 4.58324 4.37019 1.04875 bytes/ns
parquet_rs-zstd decompress time/TPC-H l_comment chunked without fsst 2.48868e+08 2.52631e+08 0.985106 ns
parquet_rs-zstd decompress time/TPC-H l_comment chunked without fsst throughput 1.00133 0.986414 1.01512 bytes/ns
compress time/TPC-H l_comment chunked 9.98087e+08 1.04242e+09 0.957468 ns
compress time/TPC-H l_comment chunked throughput 0.249676 0.239057 1.04442 bytes/ns
parquet_rs-zstd compress time/TPC-H l_comment chunked 9.0321e+08 9.36567e+08 0.964384 ns
parquet_rs-zstd compress time/TPC-H l_comment chunked throughput 0.275903 0.266077 1.03693 bytes/ns
decompress time/TPC-H l_comment chunked 1.02961e+08 1.0375e+08 0.992402 ns
decompress time/TPC-H l_comment chunked throughput 2.42031 2.40192 1.00766 bytes/ns
parquet_rs-zstd decompress time/TPC-H l_comment chunked 2.48578e+08 2.52852e+08 0.983098 ns
parquet_rs-zstd decompress time/TPC-H l_comment chunked throughput 1.0025 0.985553 1.01719 bytes/ns
compress time/TPC-H l_comment canonical 9.9922e+08 1.03267e+09 0.967608 ns
compress time/TPC-H l_comment canonical throughput 0.249392 0.241314 1.03348 bytes/ns
parquet_rs-zstd compress time/TPC-H l_comment canonical 9.11616e+08 9.22616e+08 0.988078 ns
parquet_rs-zstd compress time/TPC-H l_comment canonical throughput 0.273358 0.270099 1.01207 bytes/ns
decompress time/TPC-H l_comment canonical 1.02431e+08 1.02982e+08 0.994644 ns
decompress time/TPC-H l_comment canonical throughput 2.43284 2.41981 1.00539 bytes/ns
parquet_rs-zstd decompress time/TPC-H l_comment canonical 2.51196e+08 2.54699e+08 0.986246 ns
parquet_rs-zstd decompress time/TPC-H l_comment canonical throughput 0.992046 0.978402 1.01395 bytes/ns

The patches are now always non-nullable.

This required PrimitiveArray::patch to gracefully handle non-nullable patches when the array is
nullable.

I modified the benchmarks to include patch manipulation time, but notice that the test data has no
patches. The benchmarks measure the overhead of `is_valid`. If we had test data where the invalid
positions contained exceptional values, I would expect a modest improvement in both decompression
and compression time.
@danking danking force-pushed the dk/alp-validity-in-encoded-only branch from 0351f19 to e5709ef Compare January 30, 2025 21:16
@danking danking changed the title Dk/alp validity in encoded only feat: teach ALPArray to store validity only in the encoded array Jan 30, 2025
@danking danking changed the title feat: teach ALPArray to store validity only in the encoded array feat: teach ALPArray to store validity only in the encoded array & other minor changes Jan 30, 2025
@danking danking requested review from a10y and lwwmanning January 30, 2025 23:08
@danking danking added the benchmark Run benchmarks on this branch label Jan 30, 2025
@github-actions github-actions bot removed the benchmark Run benchmarks on this branch label Jan 30, 2025
Comment on lines 167 to 168
if let Some(fill_value) =
Self::first_non_patched_encoded_value(&encoded, &patch_indices)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just use unwrap_or_default() to fallback to zero

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines -214 to -220
encoded_output.extend(chunk.iter().map(|v| {
let encoded = unsafe { T::encode_single_unchecked(*v, exp) };
let decoded = T::decode_single(encoded, exp);
let neq = (decoded != *v) as usize;
chunk_patch_count += neq;
encoded
}));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This loop structure is actually pretty important for performance.

You can refer to the original code where this is actually separated out into two loops:

Loop 1: calculate and materialize encoded + decoded vectors

Loop 2: find the exceptions by looping through the zipped vectors to find mismatches

https://github.com/cwida/ALP/blob/main/include/alp/encoder.hpp#L338-L346

https://github.com/cwida/ALP/blob/main/include/alp/encoder.hpp#L370-L376

I'm not sure i personally find the new version on the right more readable, but I want to make sure we keep a similar structure that the branch predictor likes

@danking danking force-pushed the dk/alp-validity-in-encoded-only branch from e0e2528 to c2580f0 Compare February 3, 2025 16:38
@danking danking closed this Feb 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants