CategoricalMatrix A.Tb reproducibility. #348

adityagoel4512 · 2024-02-23T11:00:25Z

The bit reproducibility of CategoricalMatrix's transpose_matvec can't be guaranteed since different threads can increment same entry in the output vector - since fp addition is not associative, the output will not necessarily be the same every time. We can get this level of reproducibility if each slice has one thread associated to it which is then aggregated deterministically (parallelising over the number of categories when aggregating into the output).

This is not entirely for free (result below on my Macbook with spawns 12 threads), so I'll leave this to regular users to determine if this makes more sense behind a flag or if it is an acceptable trade off:

python src/tabmat/benchmark/main.py --one_cat 1000000 1000000 --n_iterations 50
After:
1  transpose-matvec  tabmat  0.004588  dense_smallcat
Before (main):
1  transpose-matvec  tabmat  0.003998  dense_smallcat

Checklist

Added a CHANGELOG.rst entry

MarcAntoineSchmidtQC · 2024-02-26T12:31:50Z

src/tabmat/ext/cat_split_helpers-tmpl.cpp

@@ -10,24 +10,29 @@ void _transpose_matvec_${dropfirst}(
    F* res,
    Int res_size
 ) {
-    #pragma omp parallel
+    int num_threads = omp_get_max_threads();
+    std::vector<F> all_res(num_threads * res_size, 0.0);


Am I understanding correctly that we are using ~N times the memory compared to before (N being the number of threads)?
For transpose matvec this can be a problem because the result is as large as the input. Therefore, if the user has a very large categorical matrix as the main part of X, it will require N times more memory than X.

Was a new vector restemp not allocated per thread before (see https://github.com/Quantco/tabmat/blob/main/src/tabmat/ext/cat_split_helpers-tmpl.cpp#L15)?

You are right. Then let's merge this. Thanks for the contribution!

MarcAntoineSchmidtQC

Looks good.

adityagoel4512 added 2 commits February 23, 2024 01:14

transpose matmul categorical bit reproducibility

8c2de99

Remove reproducibility test

d4d6a17

adityagoel4512 requested review from MarcAntoineSchmidtQC, xhochy, jtilly and lbittarello as code owners February 23, 2024 11:00

Drop redundant atomic

e3df4cb

MarcAntoineSchmidtQC reviewed Feb 26, 2024

View reviewed changes

MarcAntoineSchmidtQC approved these changes Feb 28, 2024

View reviewed changes

changelog

4a0c4ad

MarcAntoineSchmidtQC merged commit 4ac3dc3 into main Feb 28, 2024
20 of 21 checks passed

MarcAntoineSchmidtQC deleted the transpose-matmul-reproducibility branch February 28, 2024 22:09

MatthiasSchmidtblaicherQC mentioned this pull request Mar 25, 2024

Tabmat v4 alpha #286

Merged

1 task

jtilly mentioned this pull request Apr 18, 2024

Inconsistent coefficient values Quantco/glum#785

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CategoricalMatrix A.Tb reproducibility. #348

CategoricalMatrix A.Tb reproducibility. #348

adityagoel4512 commented Feb 23, 2024 •

edited

Loading

MarcAntoineSchmidtQC Feb 26, 2024

adityagoel4512 Feb 26, 2024 •

edited

Loading

MarcAntoineSchmidtQC Feb 28, 2024

MarcAntoineSchmidtQC left a comment

CategoricalMatrix A.Tb reproducibility. #348

CategoricalMatrix A.Tb reproducibility. #348

Conversation

adityagoel4512 commented Feb 23, 2024 • edited Loading

MarcAntoineSchmidtQC Feb 26, 2024

Choose a reason for hiding this comment

adityagoel4512 Feb 26, 2024 • edited Loading

Choose a reason for hiding this comment

MarcAntoineSchmidtQC Feb 28, 2024

Choose a reason for hiding this comment

MarcAntoineSchmidtQC left a comment

Choose a reason for hiding this comment

adityagoel4512 commented Feb 23, 2024 •

edited

Loading

adityagoel4512 Feb 26, 2024 •

edited

Loading