We are deprecating this repository. Please refer to this repository: https://github.com/mederrata/bayesianquilts/tree/main/bayesianquilts/models/spmf
Implemented using Tensorflow-probability.
This method differs from conventional hierarchical Poisson Matrix factorization methods primarily by sparsifying the encoding transformation rather than the decoding transformation. The encoding transformation is what computes a representation conditional on data. The decoding transformation takes the representation and produces predictive probability densities. By sparsifying the encoding, we make each representation coordinate a linear combination of a subset of original data features. Hence, inequalities placed on the representation transform directly and transparently into inequalities over the original features.
Using pip:
pip install git+https://github.com/mederrata/spmf.git
You'll find these examples under notebooks/ - for your convenience, here are links to open them in Google Colab. Note that you will need to install the package within Colab using
!pip install git+https://github.com/mederrata/spmf.git
-
Factorization of synthetic data with underlying linear structure
-
Factorization of synthetic data with underlying nonlinear structure
We have included a script that you might find to be useful. It is installed into your PATH
when using pip.
Usage:
usage: factorize_csv.py [-h] [-f [CSV_FILE]] [-e [EPOCH]] [-d [DIMENSION]]
[-b [BATCH_SIZE]] [-lr [LEARNING_RATE]]
[-c [CLIP_VALUE]] [-lt] [-rn]
Train PMF on CSV-formatted count matrix
optional arguments:
-h, --help show this help message and exit
-f [CSV_FILE], --csv-file [CSV_FILE]
Enter the CSV file
-e [EPOCH], --epoch [EPOCH]
Enter Epoch value: Default: 300
-d [DIMENSION], --dimension [DIMENSION]
Enter embedding dimension. Default: 2
-b [BATCH_SIZE], --batch-size [BATCH_SIZE]
Enter batch size. Default: 5000
-lr [LEARNING_RATE], --learning-rate [LEARNING_RATE]
Enter float. Default: 0.01
-c [CLIP_VALUE], --clip-value [CLIP_VALUE]
Gradient clip value. Default: 3.0
-lt, --log-transform Log-transform?
-rn, --row-normalize Row normalize based on counts?