Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CategoricalMatrix A.Tb reproducibility. #348

Merged
merged 4 commits into from
Feb 28, 2024

Conversation

adityagoel4512
Copy link
Member

@adityagoel4512 adityagoel4512 commented Feb 23, 2024

The bit reproducibility of CategoricalMatrix's transpose_matvec can't be guaranteed since different threads can increment same entry in the output vector - since fp addition is not associative, the output will not necessarily be the same every time. We can get this level of reproducibility if each slice has one thread associated to it which is then aggregated deterministically (parallelising over the number of categories when aggregating into the output).

This is not entirely for free (result below on my Macbook with spawns 12 threads), so I'll leave this to regular users to determine if this makes more sense behind a flag or if it is an acceptable trade off:

python src/tabmat/benchmark/main.py --one_cat 1000000 1000000 --n_iterations 50
After:
1  transpose-matvec  tabmat  0.004588  dense_smallcat
Before (main):
1  transpose-matvec  tabmat  0.003998  dense_smallcat

Checklist

  • Added a CHANGELOG.rst entry

@@ -10,24 +10,29 @@ void _transpose_matvec_${dropfirst}(
F* res,
Int res_size
) {
#pragma omp parallel
int num_threads = omp_get_max_threads();
std::vector<F> all_res(num_threads * res_size, 0.0);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I understanding correctly that we are using ~N times the memory compared to before (N being the number of threads)?
For transpose matvec this can be a problem because the result is as large as the input. Therefore, if the user has a very large categorical matrix as the main part of X, it will require N times more memory than X.

Copy link
Member Author

@adityagoel4512 adityagoel4512 Feb 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. Then let's merge this. Thanks for the contribution!

Copy link
Member

@MarcAntoineSchmidtQC MarcAntoineSchmidtQC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@MarcAntoineSchmidtQC MarcAntoineSchmidtQC merged commit 4ac3dc3 into main Feb 28, 2024
20 of 21 checks passed
@MarcAntoineSchmidtQC MarcAntoineSchmidtQC deleted the transpose-matmul-reproducibility branch February 28, 2024 22:09
@MatthiasSchmidtblaicherQC MatthiasSchmidtblaicherQC mentioned this pull request Mar 25, 2024
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants