starreco stands for State-of-The-Art Review Recommendation System.
starreco is a Pytorch lightning implementation for a series of SOTA deep learning rating-based recommendation systems. This repository also serves as a part of the author's master thesis work's literature review.
- Up to 20+ recommendation models across 20 publications.
- Built on top of Pytorch lightning.
- GPU acceleration execution.
- Reducing memory usage for large sparse matrices.
- Simple and understandable code.
- Easy extension and code reusability.
Research model | Description | Reference |
MF | Matrix Factorization | [1] |
GMF | Generalized Matrix Factorization | [2] |
MLP | Multilayer Perceptrons | [2] |
NeuMF | Neural Matrix Factorization | [2] |
FM | Factorization Machine | [3] |
NeuFM | Neural Factorization Machine | [4] |
WDL | Wide & Deep Learning | [5] |
DeepFM | Deep Factorization Machine | [6] |
xDeepFM | Extreme Deep Factorization Machine | [7] |
FGCNN | Feature Generation by using Convolutional Neural Network | [8] |
ONCF | Outer-based Product Neural Collaborative Filtering | [9] |
CNNDCF | Convolutional Neural Network based Deep Colloborative Filtering | [10] |
ConvMF | Convolutional Matrix Factorization | [11] |
AutoRec | AutoRec | [12] |
DeepRec | DeepRec | [13] |
CFN | Collaborative Filtering Network | [14] |
CDAE | Collaborative Denoising AutoEncoder | [15] |
CCAE | Collaborative Convolutional AutoEncoder | [16] |
SDAECF | Stacked Denoising AutoEncoder for Collaborative Filtering | [17] |
mDACF | marginalized Denoising AutoEncoder Collaborative Filtering | [18] |
GMF++ | Generalized Matrix Factorization ++ | [19] |
MLP++ | Multilayer Perceptrons ++ | [19] |
NeuMF++ | Neural Matrix Factorization ++ | [20] |
Movielen Dataset: A movie rating dataset collected from the Movielens websites by the GroupLensResearch Project at University of Minnesota. The datasets were collected over various time periods, depending on the sizes given. Movielen 1M Dataset** has been chosen. It contains 1,000,209 anonymous ratings of approximately 3,900 movies made by 6,040 MovieLens users who joined MovieLens in 2000.
Bookcrossing Dataset: The BookCrossing (BX) dataset was collected by Cai-Nicolas Ziegler in a 4-week crawl (August / September 2004) from the Book-Crossing community with kind permission from Ron Hornbaker, CTO of Humankind Systems. It contains 278,858 users (anonymized but with demographic information) providing 1,149,780 ratings (explicit / implicit) about 271,379 books.
Create virtual environment
python3 -m virtualenv env # Python 3.6 and above
Activate virtual environment
source env/bin/activate # Linux
./env/Scripts/activate # Windows
Clone and install necessary python packages
git clone
pip install -r requirements.txt
import os
import torch
from pytorch_lightning import Trainer
from pytorch_lightning.loggers import TensorBoardLogger
from pytorch_lightning.callbacks import ModelCheckpoint
from starreco.modules import *
from import *
# data module
data_module = StarDataModule("ml-1m")
# module
module = MF([data_module.dataset.rating.num_users, data_module.dataset.rating.num_items],
"lr" = 0.007629571188584098,
"weight_decay" = 1.0643056040513936e-05)
# setup
# checkpoint callback
current_version = max(0, len(list(os.walk("checkpoints/mf")))-1)
checkpoint_callback = ModelCheckpoint(dirpath = f"checkpoints/mf/version_{current_version}",
monitor = "val_loss",
filename = "mf-{epoch:02d}-{train_loss:.4f}-{val_loss:.4f}")
# logger
logger = TensorBoardLogger("training_logs", name = "mf")
# trainer
trainer = Trainer(logger = logger,
gpus = -1 if torch.cuda.is_available() else None,
max_epochs = 100,
progress_bar_refresh_rate = 2,
callbacks=[checkpoint_callback]), data_module)
# evaluate
module_test = MF.load_from_checkpoint(checkpoint_callback.best_model_path)
trainer.test(module_test, datamodule = data_module)
