-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Use separate (larger) dataset for gram (and mean) matrices (#344)
## Description Allow a separate dataset to be used for the gram matrix computation than for the RIB basis computation. I also allow using a tokenized dataset rather than untokenized dataset to skip the (kinda slow) tokenization. Also added an option to store the computed gram matrix to a file! ## Motivation and Context We noticed that the gram (PCA) dataset size is a lot more sensitive to amount of samples, and also a lot cheaper. ## How Has This Been Tested? Did runs, and scaling plots. Added a test making sure this config option runs. ## Does this PR introduce a breaking change? No. Not giving a gram_dataset defaults to using the same dataset as for the Cs.
- Loading branch information
1 parent
65dc3d2
commit 4613bad
Showing
21 changed files
with
2,723 additions
and
133 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.