feat: teach ALPArray to store validity only in the encoded array & other minor changes #2053

danking · 2025-01-22T17:14:03Z

The original change of this PR trims invalid values from the patches and makes the patches validity either AllValid (for nullable arrays) or NonNullable.

Benchmarks on latest commit:

PR: 7fb595b
develop: 0a18498

parameter is: (number of elements, fraction patched, fraction valid).

Any ratio greater than 1.1 or less than 0.9 has a ***

alp_compress                    │ PR median     │ develop median │ ratio
├─ compress_alp                 │               │                │
│  ├─ f32                       │               │                │
│  │  ├─ (100000, 0.0, 0.25)    │ 160.4 µs      │ 159.6 µs       │ 1.0050
│  │  ├─ (100000, 0.0, 0.95)    │ 145.9 µs      │ 143.8 µs       │ 1.0146
│  │  ├─ (100000, 0.0, 1.0)     │ 137.0 µs      │ 135.5 µs       │ 1.0110
│  │  ├─ (100000, 0.01, 0.25)   │ 227.7 µs      │ 230.7 µs       │ 0.9869
│  │  ├─ (100000, 0.01, 0.95)   │ 227.9 µs      │ 227.2 µs       │ 1.0030
│  │  ├─ (100000, 0.01, 1.0)    │ 226.6 µs      │ 227.5 µs       │ 0.9960
│  │  ├─ (100000, 0.1, 0.25)    │ 238.3 µs      │ 248.9 µs       │ 0.9574
│  │  ├─ (100000, 0.1, 0.95)    │ 238.2 µs      │ 269.8 µs       │ 0.8828  ***
│  │  ├─ (100000, 0.1, 1.0)     │ 230.6 µs      │ 231.9 µs       │ 0.9943
│  │  ├─ (10000000, 0.0, 0.25)  │ 14.17 ms      │ 13.77 ms       │ 1.0290
│  │  ├─ (10000000, 0.0, 0.95)  │ 14.16 ms      │ 13.8 ms        │ 1.0260
│  │  ├─ (10000000, 0.0, 1.0)   │ 14.0 ms       │ 12.47 ms       │ 1.1226  ***
│  │  ├─ (10000000, 0.01, 0.25) │ 22.29 ms      │ 23.13 ms       │ 0.9636
│  │  ├─ (10000000, 0.01, 0.95) │ 22.26 ms      │ 23.78 ms       │ 0.9360
│  │  ├─ (10000000, 0.01, 1.0)  │ 22.19 ms      │ 21.79 ms       │ 1.0183
│  │  ├─ (10000000, 0.1, 0.25)  │ 23.31 ms      │ 27.72 ms       │ 0.8409  ***
│  │  ├─ (10000000, 0.1, 0.95)  │ 23.4 ms       │ 27.47 ms       │ 0.8518  ***
│  │  ╰─ (10000000, 0.1, 1.0)   │ 22.99 ms      │ 22.31 ms       │ 1.0304
│  ╰─ f64                       │               │                │
│     ├─ (100000, 0.0, 0.25)    │ 165.2 µs      │ 165.4 µs       │ 0.9987
│     ├─ (100000, 0.0, 0.95)    │ 166.1 µs      │ 163.4 µs       │ 1.0165
│     ├─ (100000, 0.0, 1.0)     │ 164.7 µs      │ 179.9 µs       │ 0.9155
│     ├─ (100000, 0.01, 0.25)   │ 269.7 µs      │ 259.1 µs       │ 1.0409
│     ├─ (100000, 0.01, 0.95)   │ 270.5 µs      │ 259.6 µs       │ 1.0419
│     ├─ (100000, 0.01, 1.0)    │ 268.9 µs      │ 270.6 µs       │ 0.9937
│     ├─ (100000, 0.1, 0.25)    │ 281.7 µs      │ 281.3 µs       │ 1.0014
│     ├─ (100000, 0.1, 0.95)    │ 279.1 µs      │ 315.3 µs       │ 0.8851  ***
│     ├─ (100000, 0.1, 1.0)     │ 273.0 µs      │ 275.7 µs       │ 0.9902
│     ├─ (10000000, 0.0, 0.25)  │ 16.16 ms      │ 15.86 ms       │ 1.0189
│     ├─ (10000000, 0.0, 0.95)  │ 16.19 ms      │ 15.75 ms       │ 1.0279
│     ├─ (10000000, 0.0, 1.0)   │ 16.2 ms       │ 15.83 ms       │ 1.0233
│     ├─ (10000000, 0.01, 0.25) │ 25.29 ms      │ 25.77 ms       │ 0.9813
│     ├─ (10000000, 0.01, 0.95) │ 25.74 ms      │ 25.94 ms       │ 0.9922
│     ├─ (10000000, 0.01, 1.0)  │ 25.54 ms      │ 25.32 ms       │ 1.0086
│     ├─ (10000000, 0.1, 0.25)  │ 26.89 ms      │ 30.73 ms       │ 0.8750  ***
│     ├─ (10000000, 0.1, 0.95)  │ 27.05 ms      │ 30.53 ms       │ 0.8860  ***
│     ╰─ (10000000, 0.1, 1.0)   │ 26.22 ms      │ 25.98 ms       │ 1.0092
├─ decompress_alp               │               │                │
│  ├─ f32                       │               │                │
│  │  ├─ (100000, 0.0, 0.25)    │ 12.24 µs      │ 12.33 µs       │ 0.9927
│  │  ├─ (100000, 0.0, 0.95)    │ 12.24 µs      │ 12.16 µs       │ 1.0065
│  │  ├─ (100000, 0.0, 1.0)     │ 12.2 µs       │ 12.16 µs       │ 1.0032
│  │  ├─ (100000, 0.01, 0.25)   │ 15.12 µs      │ 14.04 µs       │ 1.0769
│  │  ├─ (100000, 0.01, 0.95)   │ 14.95 µs      │ 14.81 µs       │ 1.0094
│  │  ├─ (100000, 0.01, 1.0)    │ 13.43 µs      │ 13.24 µs       │ 1.0143
│  │  ├─ (100000, 0.1, 0.25)    │ 26.08 µs      │ 17.41 µs       │ 1.4979  ***
│  │  ├─ (100000, 0.1, 0.95)    │ 25.87 µs      │ 25.04 µs       │ 1.0331
│  │  ├─ (100000, 0.1, 1.0)     │ 19.33 µs      │ 21.08 µs       │ 0.9169
│  │  ├─ (10000000, 0.0, 0.25)  │ 2.067 ms      │ 2.057 ms       │ 1.0048
│  │  ├─ (10000000, 0.0, 0.95)  │ 2.068 ms      │ 2.055 ms       │ 1.0063
│  │  ├─ (10000000, 0.0, 1.0)   │ 2.07 ms       │ 1.261 ms       │ 1.6415  ***
│  │  ├─ (10000000, 0.01, 0.25) │ 1.51 ms       │ 2.113 ms       │ 0.7146  ***
│  │  ├─ (10000000, 0.01, 0.95) │ 1.477 ms      │ 2.621 ms       │ 0.5635  ***
│  │  ├─ (10000000, 0.01, 1.0)  │ 1.35 ms       │ 1.346 ms       │ 1.0029
│  │  ├─ (10000000, 0.1, 0.25)  │ 3.765 ms      │ 2.58 ms        │ 1.4593  ***
│  │  ├─ (10000000, 0.1, 0.95)  │ 2.784 ms      │ 3.28 ms        │ 0.8487  ***
│  │  ╰─ (10000000, 0.1, 1.0)   │ 1.764 ms      │ 1.754 ms       │ 1.0057
│  ╰─ f64                       │               │                │
│     ├─ (100000, 0.0, 0.25)    │ 23.33 µs      │ 23.45 µs       │ 0.9948
│     ├─ (100000, 0.0, 0.95)    │ 23.41 µs      │ 23.33 µs       │ 1.0034
│     ├─ (100000, 0.0, 1.0)     │ 23.33 µs      │ 23.49 µs       │ 0.9931
│     ├─ (100000, 0.01, 0.25)   │ 25.58 µs      │ 24.66 µs       │ 1.0373
│     ├─ (100000, 0.01, 0.95)   │ 25.58 µs      │ 25.79 µs       │ 0.9918
│     ├─ (100000, 0.01, 1.0)    │ 24.2 µs       │ 24.62 µs       │ 0.9829
│     ├─ (100000, 0.1, 0.25)    │ 39.83 µs      │ 27.87 µs       │ 1.4291  ***
│     ├─ (100000, 0.1, 0.95)    │ 39.7 µs       │ 39.56 µs       │ 1.0035
│     ├─ (100000, 0.1, 1.0)     │ 34.43 µs      │ 31.66 µs       │ 1.0874
│     ├─ (10000000, 0.0, 0.25)  │ 4.246 ms      │ 4.239 ms       │ 1.0016
│     ├─ (10000000, 0.0, 0.95)  │ 4.227 ms      │ 4.292 ms       │ 0.9848
│     ├─ (10000000, 0.0, 1.0)   │ 4.227 ms      │ 4.246 ms       │ 0.9955
│     ├─ (10000000, 0.01, 0.25) │ 4.696 ms      │ 4.356 ms       │ 1.0780
│     ├─ (10000000, 0.01, 0.95) │ 4.933 ms      │ 4.637 ms       │ 1.0638
│     ├─ (10000000, 0.01, 1.0)  │ 4.538 ms      │ 4.545 ms       │ 0.9984
│     ├─ (10000000, 0.1, 0.25)  │ 7.23 ms       │ 5.304 ms       │ 1.3631  ***
│     ├─ (10000000, 0.1, 0.95)  │ 6.227 ms      │ 5.913 ms       │ 1.0531
│     ╰─ (10000000, 0.1, 1.0)   │ 5.207 ms      │ 5.29 ms        │ 0.9843

Benchmarks before reverting to develop's chunking code

[1] Seems like this PR is about the same except for compressing really large f64 arrays. The PR that introduced chunking, #924, reported substantially larger reductions (~5ms of 29ms) in time than this increase of ~1ms (of 17ms).

alp_compress               │ PR median     │ PR mean   │ develop median │ develop mean │
├─ compress_alp            │               │           │                │              │
│  ├─ f32                  │               │           │                │              │
│  │  ├─ (100000, 0.25)    │ 136.4 µs      │ 137.9 µs  │ 143 µs         │ 145.9 µs     │
│  │  ├─ (100000, 0.95)    │ 136.3 µs      │ 137.1 µs  │ 133.1 µs       │ 134.3 µs     │
│  │  ├─ (100000, 1.0)     │ 136 µs        │ 137.3 µs  │ 133.6 µs       │ 134.6 µs     │
│  │  ├─ (10000000, 0.25)  │ 13.54 ms      │ 13.67 ms  │ 13.74 ms       │ 13.84 ms     │
│  │  ├─ (10000000, 0.95)  │ 13.54 ms      │ 13.64 ms  │ 13.49 ms       │ 13.59 ms     │
│  │  ╰─ (10000000, 1.0)   │ 13.47 ms      │ 13.57 ms  │ 13.58 ms       │ 13.73 ms     │
│  ╰─ f64                  │               │           │                │              │
│     ├─ (100000, 0.25)    │ 152.5 µs      │ 153.9 µs  │ 166.1 µs       │ 167.2 µs     │
│     ├─ (100000, 0.95)    │ 152.5 µs      │ 154.3 µs  │ 166.4 µs       │ 167 µs       │
│     ├─ (100000, 1.0)     │ 151.5 µs      │ 153 µs    │ 166.2 µs       │ 166.9 µs     │
│     ├─ (10000000, 0.25)  │ 16.89 ms      │ 17 ms     │ 15.87 ms       │ 15.91 ms     │
│     ├─ (10000000, 0.95)  │ 16.96 ms      │ 17.19 ms  │ 16.14 ms       │ 16.12 ms     │
│     ╰─ (10000000, 1.0)   │ 16.93 ms      │ 16.99 ms  │ 16.15 ms       │ 16.18 ms     │
╰─ decompress_alp          │               │           │                │              │
   ├─ f32                  │               │           │                │              │
   │  ├─ (100000, 0.25)    │ 12.33 µs      │ 12.4 µs   │ 12.37 µs       │ 12.55 µs     │
   │  ├─ (100000, 0.95)    │ 11.99 µs      │ 12.01 µs  │ 12.45 µs       │ 12.58 µs     │
   │  ├─ (100000, 1.0)     │ 11.95 µs      │ 11.98 µs  │ 11.91 µs       │ 11.96 µs     │
   │  ├─ (10000000, 0.25)  │ 1.233 ms      │ 1.24 ms   │ 2.064 ms       │ 2.088 ms     │
   │  ├─ (10000000, 0.95)  │ 1.232 ms      │ 1.235 ms  │ 2.063 ms       │ 2.094 ms     │
   │  ╰─ (10000000, 1.0)   │ 1.233 ms      │ 1.236 ms  │ 2.061 ms       │ 2.088 ms     │
   ╰─ f64                  │               │           │                │              │
      ├─ (100000, 0.25)    │ 23.29 µs      │ 23.46 µs  │ 23.33 µs       │ 23.4 µs      │
      ├─ (100000, 0.95)    │ 22.87 µs      │ 22.92 µs  │ 22.99 µs       │ 23.06 µs     │
      ├─ (100000, 1.0)     │ 22.87 µs      │ 23 µs     │ 22.95 µs       │ 23 µs        │
      ├─ (10000000, 0.25)  │ 4.254 ms      │ 4.393 ms  │ 4.239 ms       │ 4.28 ms      │
      ├─ (10000000, 0.95)  │ 4.703 ms      │ 4.639 ms  │ 4.27 ms        │ 4.437 ms     │
      ╰─ (10000000, 1.0)   │ 4.479 ms      │ 4.58 ms   │ 4.684 ms       │ 4.618 ms     │

danking · 2025-01-22T17:20:11Z

previously: #1951

github-actions · 2025-01-22T17:21:18Z

Benchmarks: random_access

Table of Results

name	PR `65dcaee`	base `a7876ca`	ratio (PR/base)	unit
random-access/vortex-tokio-local-disk	2.78486e+06	2.74869e+06	1.01316	ns
random-access/vortex-local-fs	3.4561e+06	3.44152e+06	1.00424	ns
random-access/parquet-tokio-local-disk	2.36578e+08	2.32132e+08	1.01915	ns

github-actions · 2025-01-22T17:22:47Z

Benchmarks: datafusion

Table of Results

name	PR `47ddb6a`	base `ee7abec`	ratio (PR/base)	unit
arrow/planning	941690	946100	0.995339	ns
arrow/exec	1.99437e+06	1.97353e+06	1.01056	ns
vortex-pushdown-compressed/planning	576362	577093	0.998733	ns
vortex-pushdown-compressed/exec	2.71224e+06	2.72758e+06	0.994379	ns
vortex-pushdown-uncompressed/planning	573850	585435	0.980211	ns
vortex-pushdown-uncompressed/exec	1.55374e+06	1.57173e+06	0.988554	ns
vortex-nopushdown-compressed/planning	949597	946758	1.003	ns
vortex-nopushdown-compressed/exec	3.10029e+06	3.24769e+06	0.954612	ns
vortex-nopushdown-uncompressed/planning	949927	955033	0.994653	ns
vortex-nopushdown-uncompressed/exec	5.18226e+06	5.1997e+06	0.996646	ns

github-actions · 2025-01-22T17:39:43Z

Benchmarks: Clickbench

Table of Results

name	PR `65dcaee`	base `a7876ca`	ratio (PR/base)	unit
clickbench_q00/parquet	1893482	2.02036e+06	0.937202	ns
clickbench_q01/parquet	61635258	6.2968e+07	0.978835	ns
clickbench_q02/parquet	116972399	1.22199e+08	0.957231	ns
clickbench_q03/parquet	83343053	8.41092e+07	0.990891	ns
clickbench_q04/parquet	671233257	6.6348e+08	1.01169	ns
clickbench_q05/parquet	842944445	8.47761e+08	0.994319	ns
clickbench_q06/parquet	2029864	1.92344e+06	1.05533	ns
clickbench_q07/parquet	64372724	6.2556e+07	1.02904	ns
clickbench_q08/parquet	767268264	7.7805e+08	0.986142	ns
clickbench_q09/parquet	1080836837	1.06958e+09	1.01052	ns
clickbench_q10/parquet	260081344	2.55987e+08	1.01599	ns
clickbench_q11/parquet	325269610	3.05733e+08	1.0639	ns
clickbench_q12/parquet	844832034	8.59597e+08	0.982823	ns
clickbench_q13/parquet	1116267957	1.13418e+09	0.984207	ns
clickbench_q14/parquet	838917805	8.56438e+08	0.979543	ns
clickbench_q15/parquet	797719010	7.86701e+08	1.01401	ns
clickbench_q16/parquet	1731647769	1.67281e+09	1.03517	ns
clickbench_q17/parquet	1497899778	1.49342e+09	1.003	ns
clickbench_q18/parquet	3124291281	3.07459e+09	1.01616	ns
clickbench_q19/parquet	69889765	6.60167e+07	1.05867	ns
clickbench_q20/parquet	1237482438	1.19007e+09	1.03984	ns
clickbench_q21/parquet	1454923334	1.37927e+09	1.05485	ns
clickbench_q22/parquet	2446366209	2.47845e+09	0.987053	ns
clickbench_q23/parquet	8474265495	8.41857e+09	1.00662	ns
clickbench_q24/parquet	539427306	5.3128e+08	1.01533	ns
clickbench_q25/parquet	518074885	5.22446e+08	0.991633	ns
clickbench_q26/parquet	599564734	5.9736e+08	1.00369	ns
clickbench_q27/parquet	1709998673	1.6574e+09	1.03174	ns
clickbench_q28/parquet	11503328620	1.15089e+10	0.999517	ns
clickbench_q29/parquet	427489760	4.27969e+08	0.998881	ns
clickbench_q30/parquet	778196574	7.67879e+08	1.01344	ns
clickbench_q31/parquet	816064573	8.25005e+08	0.989164	ns
clickbench_q32/parquet	2886687289	2.77232e+09	1.04125	ns
clickbench_q33/parquet	2889379033	2.80511e+09	1.03004	ns
clickbench_q34/parquet	2901788885	2.80691e+09	1.0338	ns
clickbench_q35/parquet	889346868	8.71516e+08	1.02046	ns
clickbench_q36/parquet	182630054	1.71337e+08	1.06591	ns
clickbench_q37/parquet	88754603	8.61864e+07	1.0298	ns
clickbench_q38/parquet	117256703	1.12833e+08	1.03921	ns
clickbench_q39/parquet	333703252	3.30302e+08	1.0103	ns
clickbench_q40/parquet	51773874	4.86737e+07	1.06369	ns
clickbench_q41/parquet	50473650	4.89553e+07	1.03101	ns
clickbench_q42/parquet	68045125	6.81624e+07	0.99828	ns
clickbench_q00/vortex-file-compressed	2097306	2.01496e+06	1.04087	ns
clickbench_q01/vortex-file-compressed	28840862	2.71315e+07	1.063	ns
clickbench_q02/vortex-file-compressed	84040021	8.39559e+07	1.001	ns
clickbench_q03/vortex-file-compressed	78528988	7.75597e+07	1.0125	ns
clickbench_q04/vortex-file-compressed	628609289	6.2208e+08	1.0105	ns
clickbench_q05/vortex-file-compressed	656291225	6.51709e+08	1.00703	ns
clickbench_q06/vortex-file-compressed	2137519	2.14613e+06	0.99599	ns
clickbench_q07/vortex-file-compressed	42856314	4.1415e+07	1.0348	ns
clickbench_q08/vortex-file-compressed	752516320	7.81813e+08	0.962527	ns
clickbench_q09/vortex-file-compressed	954456371	9.43324e+08	1.0118	ns
clickbench_q10/vortex-file-compressed	216513096	2.2909e+08	0.945099	ns
clickbench_q11/vortex-file-compressed	238696549	2.5554e+08	0.934088	ns
clickbench_q12/vortex-file-compressed	601805579	5.86968e+08	1.02528	ns
clickbench_q13/vortex-file-compressed	918512303	8.9457e+08	1.02676	ns
clickbench_q14/vortex-file-compressed	597742009	5.91838e+08	1.00998	ns
clickbench_q15/vortex-file-compressed	772128484	7.56529e+08	1.02062	ns
clickbench_q16/vortex-file-compressed	1438349994	1.39471e+09	1.03129	ns
clickbench_q17/vortex-file-compressed	1316402539	1.31832e+09	0.998547	ns
clickbench_q18/vortex-file-compressed	2879090432	2.96451e+09	0.971187	ns
clickbench_q19/vortex-file-compressed	43780891	4.23308e+07	1.03426	ns
clickbench_q20/vortex-file-compressed	508906123	5.06504e+08	1.00474	ns
clickbench_q21/vortex-file-compressed	786976943	7.62595e+08	1.03197	ns
clickbench_q22/vortex-file-compressed	1900178234	1.8706e+09	1.01581	ns
clickbench_q23/vortex-file-compressed	3876166054	3.92467e+09	0.98764	ns
clickbench_q24/vortex-file-compressed	357982020	3.71763e+08	0.962931	ns
clickbench_q25/vortex-file-compressed	336874058	3.33457e+08	1.01025	ns
clickbench_q26/vortex-file-compressed	413242139	4.15802e+08	0.993843	ns
clickbench_q27/vortex-file-compressed	1364187010	1.34836e+09	1.01174	ns
clickbench_q28/vortex-file-compressed	10658023338	1.06431e+10	1.0014	ns
clickbench_q29/vortex-file-compressed	679952875	6.99629e+08	0.971877	ns
clickbench_q30/vortex-file-compressed	585587146	5.75904e+08	1.01681	ns
clickbench_q31/vortex-file-compressed	620989440	6.22303e+08	0.99789	ns
clickbench_q32/vortex-file-compressed	2766649620	2.81823e+09	0.981696	ns
clickbench_q33/vortex-file-compressed	2244492602	2.40779e+09	0.93218	ns
clickbench_q34/vortex-file-compressed	2241308423	2.3666e+09	0.947059	ns
clickbench_q35/vortex-file-compressed	964321209	1.00834e+09	0.956345	ns
clickbench_q36/vortex-file-compressed	52750564	6.30315e+07	0.836892	ns
clickbench_q37/vortex-file-compressed	49534466	5.36539e+07	0.923222	ns
clickbench_q38/vortex-file-compressed	42438641	4.93527e+07	0.859905	ns
clickbench_q39/vortex-file-compressed	87686080	1.0195e+08	0.860086	ns
clickbench_q40/vortex-file-compressed	30194421	3.15424e+07	0.957264	ns
clickbench_q41/vortex-file-compressed	29060659	3.08105e+07	0.943205	ns
clickbench_q42/vortex-file-compressed	37301045	4.09478e+07	0.910942	ns

a10y · 2025-01-22T18:09:29Z

encodings/alp/src/alp/compress.rs

+    let (encoded, exceptional_positions) = T::chunked_encode(values.as_slice::<T>(), exponents);
+
+    let encoded_array = PrimitiveArray::new(encoded, values.validity()).into_array();
+    let exceptional_positions = match values.logical_validity() {


could we add like 1-2 comments in this section just to make it clear what is going on

LMK what you think, I added a comment above this line.

github-actions · 2025-01-30T20:15:41Z

Benchmarks: TPC-H

Table of Results

name	PR `65dcaee`	base `a7876ca`	ratio (PR/base)	unit

robert3005 · 2025-01-30T20:25:12Z

@danking you should make sure you run with RUSTFLAGS='-C target-cpu=native' we had to axe it from our repo cargo config since it messed up cross compilation

github-actions · 2025-01-30T20:58:13Z

Benchmarks: compress

Table of Results

name	PR `65dcaee`	base `a7876ca`	ratio (PR/base)	unit
compress time/wide table cols=10 chunks=1 rows=1000	4.11079e+06	4.22199e+06	0.973662	ns
compress time/wide table cols=10 chunks=1 rows=1000 throughput	0.029226	0.0284563	1.02705	bytes/ns
parquet_rs-zstd compress time/wide table cols=10 chunks=1 rows=1000	738886	739146	0.999648	ns
parquet_rs-zstd compress time/wide table cols=10 chunks=1 rows=1000 throughput	0.162599	0.162542	1.00035	bytes/ns
decompress time/wide table cols=10 chunks=1 rows=1000	413794	416725	0.992966	ns
decompress time/wide table cols=10 chunks=1 rows=1000 throughput	0.290343	0.2883	1.00708	bytes/ns
parquet_rs-zstd decompress time/wide table cols=10 chunks=1 rows=1000	271186	277187	0.978352	ns
parquet_rs-zstd decompress time/wide table cols=10 chunks=1 rows=1000 throughput	0.443024	0.433433	1.02213	bytes/ns
compress time/wide table cols=100 chunks=1 rows=1000	4.12314e+07	4.34149e+07	0.949707	ns
compress time/wide table cols=100 chunks=1 rows=1000 throughput	0.0291337	0.0276685	1.05296	bytes/ns
parquet_rs-zstd compress time/wide table cols=100 chunks=1 rows=1000	7.70192e+06	8.5056e+06	0.905511	ns
parquet_rs-zstd compress time/wide table cols=100 chunks=1 rows=1000 throughput	0.155964	0.141227	1.10435	bytes/ns
decompress time/wide table cols=100 chunks=1 rows=1000	4.01831e+06	4.10978e+06	0.977742	ns
decompress time/wide table cols=100 chunks=1 rows=1000 throughput	0.298937	0.292284	1.02277	bytes/ns
parquet_rs-zstd decompress time/wide table cols=100 chunks=1 rows=1000	2.74729e+06	2.89853e+06	0.947823	ns
parquet_rs-zstd decompress time/wide table cols=100 chunks=1 rows=1000 throughput	0.437239	0.414425	1.05505	bytes/ns
compress time/wide table cols=1000 chunks=1 rows=1000	4.25888e+08	4.30145e+08	0.990103	ns
compress time/wide table cols=1000 chunks=1 rows=1000 throughput	0.0282046	0.0279255	1.01	bytes/ns
parquet_rs-zstd compress time/wide table cols=1000 chunks=1 rows=1000	9.2937e+07	9.51301e+07	0.976947	ns
parquet_rs-zstd compress time/wide table cols=1000 chunks=1 rows=1000 throughput	0.129249	0.126269	1.0236	bytes/ns
decompress time/wide table cols=1000 chunks=1 rows=1000	7.54628e+07	7.96789e+07	0.947086	ns
decompress time/wide table cols=1000 chunks=1 rows=1000 throughput	0.159178	0.150755	1.05587	bytes/ns
parquet_rs-zstd decompress time/wide table cols=1000 chunks=1 rows=1000	3.17539e+07	3.36228e+07	0.944417	ns
parquet_rs-zstd decompress time/wide table cols=1000 chunks=1 rows=1000 throughput	0.378285	0.357258	1.05885	bytes/ns
compress time/wide table cols=10 chunks=50 rows=1000	8.33279e+06	8.5026e+06	0.980028	ns
compress time/wide table cols=10 chunks=50 rows=1000 throughput	0.0151766	0.0148734	1.02038	bytes/ns
parquet_rs-zstd compress time/wide table cols=10 chunks=50 rows=1000	1.11033e+06	1.13984e+06	0.974109	ns
parquet_rs-zstd compress time/wide table cols=10 chunks=50 rows=1000 throughput	0.113897	0.110948	1.02658	bytes/ns
decompress time/wide table cols=10 chunks=50 rows=1000	433285	435006	0.996044	ns
decompress time/wide table cols=10 chunks=50 rows=1000 throughput	0.29187	0.290716	1.00397	bytes/ns
parquet_rs-zstd decompress time/wide table cols=10 chunks=50 rows=1000	279558	280037	0.998291	ns
parquet_rs-zstd decompress time/wide table cols=10 chunks=50 rows=1000 throughput	0.452368	0.451595	1.00171	bytes/ns
compress time/wide table cols=100 chunks=50 rows=1000	8.94013e+07	9.1106e+07	0.981289	ns
compress time/wide table cols=100 chunks=50 rows=1000 throughput	0.0140989	0.0138351	1.01907	bytes/ns
parquet_rs-zstd compress time/wide table cols=100 chunks=50 rows=1000	1.35341e+07	1.45264e+07	0.93169	ns
parquet_rs-zstd compress time/wide table cols=100 chunks=50 rows=1000 throughput	0.0931322	0.0867703	1.07332	bytes/ns
decompress time/wide table cols=100 chunks=50 rows=1000	4.18102e+06	4.25089e+06	0.983564	ns
decompress time/wide table cols=100 chunks=50 rows=1000 throughput	0.301472	0.296517	1.01671	bytes/ns
parquet_rs-zstd decompress time/wide table cols=100 chunks=50 rows=1000	2.86783e+06	2.9278e+06	0.979518	ns
parquet_rs-zstd decompress time/wide table cols=100 chunks=50 rows=1000 throughput	0.439518	0.430516	1.02091	bytes/ns
compress time/wide table cols=1000 chunks=50 rows=1000	9.59031e+08	9.80859e+08	0.977746	ns
compress time/wide table cols=1000 chunks=50 rows=1000 throughput	0.0131387	0.0128464	1.02276	bytes/ns
parquet_rs-zstd compress time/wide table cols=1000 chunks=50 rows=1000	1.75425e+08	1.91682e+08	0.91519	ns
parquet_rs-zstd compress time/wide table cols=1000 chunks=50 rows=1000 throughput	0.071828	0.0657363	1.09267	bytes/ns
decompress time/wide table cols=1000 chunks=50 rows=1000	8.06546e+07	8.96356e+07	0.899805	ns
decompress time/wide table cols=1000 chunks=50 rows=1000 throughput	0.156227	0.140574	1.11135	bytes/ns
parquet_rs-zstd decompress time/wide table cols=1000 chunks=50 rows=1000	3.71423e+07	3.86265e+07	0.961575	ns
parquet_rs-zstd decompress time/wide table cols=1000 chunks=50 rows=1000 throughput	0.339249	0.326213	1.03996	bytes/ns
compress time/taxi	1.59083e+09	1.61659e+09	0.984068	ns
compress time/taxi throughput	0.295953	0.291238	1.01619	bytes/ns
parquet_rs-zstd compress time/taxi	1.8139e+09	1.84739e+09	0.981869	ns
parquet_rs-zstd compress time/taxi throughput	0.259557	0.254851	1.01847	bytes/ns
decompress time/taxi	3.8271e+08	3.79719e+08	1.00788	ns
decompress time/taxi throughput	1.2302	1.23989	0.992185	bytes/ns
parquet_rs-zstd decompress time/taxi	3.10568e+08	3.12995e+08	0.992244	ns
parquet_rs-zstd decompress time/taxi throughput	1.51597	1.50421	1.00782	bytes/ns
compress time/AirlineSentiment	285579	285481	1.00035	ns
compress time/AirlineSentiment throughput	0.00714337	0.00714584	0.999655	bytes/ns
parquet_rs-zstd compress time/AirlineSentiment	54749.9	56303.1	0.972414	ns
parquet_rs-zstd compress time/AirlineSentiment throughput	0.0372603	0.0362325	1.02837	bytes/ns
decompress time/AirlineSentiment	186888	189909	0.984093	ns
decompress time/AirlineSentiment throughput	0.0109156	0.010742	1.01616	bytes/ns
parquet_rs-zstd decompress time/AirlineSentiment	30898.9	31866.5	0.969634	ns
parquet_rs-zstd decompress time/AirlineSentiment throughput	0.0660219	0.064017	1.03132	bytes/ns
compress time/Arade	2.75493e+09	2.78523e+09	0.989121	ns
compress time/Arade throughput	0.285681	0.282573	1.011	bytes/ns
parquet_rs-zstd compress time/Arade	3.02583e+09	3.06865e+09	0.986047	ns
parquet_rs-zstd compress time/Arade throughput	0.260103	0.256474	1.01415	bytes/ns
decompress time/Arade	8.02333e+08	7.28471e+08	1.10139	ns
decompress time/Arade throughput	0.980926	1.08038	0.907942	bytes/ns
parquet_rs-zstd decompress time/Arade	6.93619e+08	6.80535e+08	1.01923	ns
parquet_rs-zstd decompress time/Arade throughput	1.13467	1.15649	0.981135	bytes/ns
compress time/Bimbo	1.2004e+10	1.20271e+10	0.998079	ns
compress time/Bimbo throughput	0.59325	0.59211	1.00192	bytes/ns
parquet_rs-zstd compress time/Bimbo	2.21634e+10	2.27723e+10	0.973258	ns
parquet_rs-zstd compress time/Bimbo throughput	0.321312	0.312719	1.02748	bytes/ns
decompress time/Bimbo	4.64528e+09	4.78511e+09	0.970778	ns
decompress time/Bimbo throughput	1.53303	1.48823	1.0301	bytes/ns
parquet_rs-zstd decompress time/Bimbo	3.26095e+09	4.02577e+09	0.810019	ns
parquet_rs-zstd decompress time/Bimbo throughput	2.18383	1.76894	1.23454	bytes/ns
compress time/CMSprovider	1.23465e+10	1.28623e+10	0.959898	ns
compress time/CMSprovider throughput	0.417053	0.400328	1.04178	bytes/ns
parquet_rs-zstd compress time/CMSprovider	1.82751e+10	1.89505e+10	0.964361	ns
parquet_rs-zstd compress time/CMSprovider throughput	0.281758	0.271716	1.03696	bytes/ns
decompress time/CMSprovider	4.63231e+09	4.67752e+09	0.990334	ns
decompress time/CMSprovider throughput	1.11158	1.10083	1.00976	bytes/ns
parquet_rs-zstd decompress time/CMSprovider	4.98212e+09	5.27342e+09	0.94476	ns
parquet_rs-zstd decompress time/CMSprovider throughput	1.03353	0.976436	1.05847	bytes/ns
compress time/Euro2016	2.05673e+09	2.14945e+09	0.956864	ns
compress time/Euro2016 throughput	0.191204	0.182956	1.04508	bytes/ns
parquet_rs-zstd compress time/Euro2016	1.56379e+09	1.59307e+09	0.981617	ns
parquet_rs-zstd compress time/Euro2016 throughput	0.251476	0.246853	1.01873	bytes/ns
decompress time/Euro2016	2.76666e+08	2.87648e+08	0.961822	ns
decompress time/Euro2016 throughput	1.4214	1.36714	1.03969	bytes/ns
parquet_rs-zstd decompress time/Euro2016	4.81894e+08	5.06167e+08	0.952045	ns
parquet_rs-zstd decompress time/Euro2016 throughput	0.81606	0.776926	1.05037	bytes/ns
compress time/Food	1.03987e+09	1.07218e+09	0.969866	ns
compress time/Food throughput	0.319963	0.310321	1.03107	bytes/ns
parquet_rs-zstd compress time/Food	1.05489e+09	1.10911e+09	0.951115	ns
parquet_rs-zstd compress time/Food throughput	0.315407	0.299989	1.0514	bytes/ns
decompress time/Food	1.81973e+08	1.92055e+08	0.947502	ns
decompress time/Food throughput	1.82841	1.73242	1.05541	bytes/ns
parquet_rs-zstd decompress time/Food	2.18871e+08	2.265e+08	0.966316	ns
parquet_rs-zstd decompress time/Food throughput	1.52016	1.46896	1.03486	bytes/ns
compress time/HashTags	2.48933e+09	2.57127e+09	0.968133	ns
compress time/HashTags throughput	0.323178	0.31288	1.03292	bytes/ns
parquet_rs-zstd compress time/HashTags	2.45418e+09	2.51587e+09	0.97548	ns
parquet_rs-zstd compress time/HashTags throughput	0.327807	0.319769	1.02514	bytes/ns
decompress time/HashTags	4.42347e+08	4.60031e+08	0.96156	ns
decompress time/HashTags throughput	1.81871	1.74879	1.03998	bytes/ns
parquet_rs-zstd decompress time/HashTags	7.64234e+08	8.39928e+08	0.90988	ns
parquet_rs-zstd decompress time/HashTags throughput	1.05269	0.957819	1.09905	bytes/ns
compress time/TPC-H l_comment chunked without fsst	2.97876e+09	3.43041e+09	0.868338	ns
compress time/TPC-H l_comment chunked without fsst throughput	0.0836586	0.0726439	1.15163	bytes/ns
parquet_rs-zstd compress time/TPC-H l_comment chunked without fsst	9.08086e+08	9.37191e+08	0.968944	ns
parquet_rs-zstd compress time/TPC-H l_comment chunked without fsst throughput	0.274422	0.265899	1.03205	bytes/ns
decompress time/TPC-H l_comment chunked without fsst	5.43717e+07	5.70223e+07	0.953516	ns
decompress time/TPC-H l_comment chunked without fsst throughput	4.58324	4.37019	1.04875	bytes/ns
parquet_rs-zstd decompress time/TPC-H l_comment chunked without fsst	2.48868e+08	2.52631e+08	0.985106	ns
parquet_rs-zstd decompress time/TPC-H l_comment chunked without fsst throughput	1.00133	0.986414	1.01512	bytes/ns
compress time/TPC-H l_comment chunked	9.98087e+08	1.04242e+09	0.957468	ns
compress time/TPC-H l_comment chunked throughput	0.249676	0.239057	1.04442	bytes/ns
parquet_rs-zstd compress time/TPC-H l_comment chunked	9.0321e+08	9.36567e+08	0.964384	ns
parquet_rs-zstd compress time/TPC-H l_comment chunked throughput	0.275903	0.266077	1.03693	bytes/ns
decompress time/TPC-H l_comment chunked	1.02961e+08	1.0375e+08	0.992402	ns
decompress time/TPC-H l_comment chunked throughput	2.42031	2.40192	1.00766	bytes/ns
parquet_rs-zstd decompress time/TPC-H l_comment chunked	2.48578e+08	2.52852e+08	0.983098	ns
parquet_rs-zstd decompress time/TPC-H l_comment chunked throughput	1.0025	0.985553	1.01719	bytes/ns
compress time/TPC-H l_comment canonical	9.9922e+08	1.03267e+09	0.967608	ns
compress time/TPC-H l_comment canonical throughput	0.249392	0.241314	1.03348	bytes/ns
parquet_rs-zstd compress time/TPC-H l_comment canonical	9.11616e+08	9.22616e+08	0.988078	ns
parquet_rs-zstd compress time/TPC-H l_comment canonical throughput	0.273358	0.270099	1.01207	bytes/ns
decompress time/TPC-H l_comment canonical	1.02431e+08	1.02982e+08	0.994644	ns
decompress time/TPC-H l_comment canonical throughput	2.43284	2.41981	1.00539	bytes/ns
parquet_rs-zstd decompress time/TPC-H l_comment canonical	2.51196e+08	2.54699e+08	0.986246	ns
parquet_rs-zstd decompress time/TPC-H l_comment canonical throughput	0.992046	0.978402	1.01395	bytes/ns

The patches are now always non-nullable. This required PrimitiveArray::patch to gracefully handle non-nullable patches when the array is nullable. I modified the benchmarks to include patch manipulation time, but notice that the test data has no patches. The benchmarks measure the overhead of `is_valid`. If we had test data where the invalid positions contained exceptional values, I would expect a modest improvement in both decompression and compression time.

This reverts commit f26139f.

a10y · 2025-01-31T00:30:53Z

encodings/alp/src/alp/mod.rs

+            if let Some(fill_value) =
+                Self::first_non_patched_encoded_value(&encoded, &patch_indices)


just use unwrap_or_default() to fallback to zero

the original code also does this: https://github.com/cwida/ALP/blob/main/include/alp/encoder.hpp#L379-L385

a10y · 2025-01-31T00:33:05Z

encodings/alp/src/alp/mod.rs

-    encoded_output.extend(chunk.iter().map(|v| {
-        let encoded = unsafe { T::encode_single_unchecked(*v, exp) };
-        let decoded = T::decode_single(encoded, exp);
-        let neq = (decoded != *v) as usize;
-        chunk_patch_count += neq;
-        encoded
-    }));


This loop structure is actually pretty important for performance.

You can refer to the original code where this is actually separated out into two loops:

Loop 1: calculate and materialize encoded + decoded vectors

Loop 2: find the exceptions by looping through the zipped vectors to find mismatches

https://github.com/cwida/ALP/blob/main/include/alp/encoder.hpp#L338-L346

https://github.com/cwida/ALP/blob/main/include/alp/encoder.hpp#L370-L376

I'm not sure i personally find the new version on the right more readable, but I want to make sure we keep a similar structure that the branch predictor likes

danking added the benchmark Run benchmarks on this branch label Jan 22, 2025

github-actions bot removed the benchmark Run benchmarks on this branch label Jan 22, 2025

a10y reviewed Jan 22, 2025

View reviewed changes

danking force-pushed the dk/alp-validity-in-encoded-only branch from b5b44e0 to ca17b75 Compare January 30, 2025 19:41

danking added the benchmark Run benchmarks on this branch label Jan 30, 2025

github-actions bot removed the benchmark Run benchmarks on this branch label Jan 30, 2025

This comment was marked as outdated.

Sign in to view

danking added the benchmark Run benchmarks on this branch label Jan 30, 2025

github-actions bot removed the benchmark Run benchmarks on this branch label Jan 30, 2025

This comment was marked as outdated.

Sign in to view

danking force-pushed the dk/alp-validity-in-encoded-only branch from 0351f19 to e5709ef Compare January 30, 2025 21:16

danking changed the title ~~Dk/alp validity in encoded only~~ feat: teach ALPArray to store validity only in the encoded array Jan 30, 2025

danking added 7 commits January 30, 2025 16:18

irrelevant comment

2e51664

unnecessary condition

f26139f

use values_slice instead of calling as_slice again

47fdf00

fix

fd12e57

Revert "unnecessary condition"

d6fc5aa

This reverts commit f26139f.

restore fill values

d8819da

fix tests

4837ce9

danking changed the title ~~feat: teach ALPArray to store validity only in the encoded array~~ feat: teach ALPArray to store validity only in the encoded array & other minor changes Jan 30, 2025

danking added 2 commits January 30, 2025 17:38

remove fill null zero

c2f4820

clippy

3a31385

danking added 3 commits January 30, 2025 17:40

restore test for all null

1d2550c

fix the null round trip test

eeea44e

sort fraction_valid

64ed4d4

danking requested review from a10y and lwwmanning January 30, 2025 23:08

final fixes

f4ef3cc

danking added the benchmark Run benchmarks on this branch label Jan 30, 2025

github-actions bot removed the benchmark Run benchmarks on this branch label Jan 30, 2025

a10y reviewed Jan 31, 2025

View reviewed changes

use zero as fill value if the entire chunk is patches

c2580f0

danking force-pushed the dk/alp-validity-in-encoded-only branch from e0e2528 to c2580f0 Compare February 3, 2025 16:38

danking closed this Feb 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: teach ALPArray to store validity only in the encoded array & other minor changes #2053

feat: teach ALPArray to store validity only in the encoded array & other minor changes #2053

danking commented Jan 22, 2025 •

edited

Loading

danking commented Jan 22, 2025

github-actions bot commented Jan 22, 2025 •

edited

Loading

github-actions bot commented Jan 22, 2025

github-actions bot commented Jan 22, 2025 •

edited

Loading

a10y Jan 22, 2025

danking Jan 30, 2025

This comment was marked as outdated.

github-actions bot commented Jan 30, 2025 •

edited

Loading

robert3005 commented Jan 30, 2025

This comment was marked as outdated.

github-actions bot commented Jan 30, 2025 •

edited

Loading

a10y Jan 31, 2025

a10y Jan 31, 2025

a10y Jan 31, 2025

		if let Some(fill_value) =
		Self::first_non_patched_encoded_value(&encoded, &patch_indices)

feat: teach ALPArray to store validity only in the encoded array & other minor changes #2053

feat: teach ALPArray to store validity only in the encoded array & other minor changes #2053

Conversation

danking commented Jan 22, 2025 • edited Loading

Benchmarks on latest commit:

Benchmarks before reverting to develop's chunking code

danking commented Jan 22, 2025

github-actions bot commented Jan 22, 2025 • edited Loading

Benchmarks: random_access

github-actions bot commented Jan 22, 2025

Benchmarks: datafusion

github-actions bot commented Jan 22, 2025 • edited Loading

Benchmarks: Clickbench

a10y Jan 22, 2025

Choose a reason for hiding this comment

danking Jan 30, 2025

Choose a reason for hiding this comment

This comment was marked as outdated.

github-actions bot commented Jan 30, 2025 • edited Loading

Benchmarks: TPC-H

robert3005 commented Jan 30, 2025

This comment was marked as outdated.

github-actions bot commented Jan 30, 2025 • edited Loading

Benchmarks: compress

a10y Jan 31, 2025

Choose a reason for hiding this comment

a10y Jan 31, 2025

Choose a reason for hiding this comment

a10y Jan 31, 2025

Choose a reason for hiding this comment

danking commented Jan 22, 2025 •

edited

Loading

github-actions bot commented Jan 22, 2025 •

edited

Loading

github-actions bot commented Jan 22, 2025 •

edited

Loading

github-actions bot commented Jan 30, 2025 •

edited

Loading

github-actions bot commented Jan 30, 2025 •

edited

Loading