Skip to content

Latest commit

 

History

History
53 lines (40 loc) · 1.99 KB

README.md

File metadata and controls

53 lines (40 loc) · 1.99 KB

MF_PyTorch

Matrix Factorization with PyTorch.

Environment

  • Python: 3.6
  • PyTorch: 1.5.1
  • CUDA: 10.1
  • Ubuntu: 18.04

Dataset

The Movielens 1M Dataset is used. The rating data is included in data/ml-1m.

Run the Codes

$ python MF_PyTorch/main.py

Details

For each user, the latest and the second latest rating are used as test and validation, respectively. The remaining ratings are used as training. The hyperparameters (batch_size, lr, latent_dim, l2_reg) are tuned by using the valudation data in terms of nDCG. See config.ini about the range of each hyperparameter.

Although the original ratings range 1 to 5, all of them are converted to 1. That is, we use the binalized data where movies rated by users have score 1 while those not rated by users have score 0.

By running the code, hyperparameters are automatically tuned. After the training process, the best hyperparameters and HR/nDCG computed by using the test data are displayed.

Given a specific combination of hyperparameters, the corresponding training results are saved in data/train_result/<hyperparameter combination> (e.g., data/train_result/batch_size_512-lr_0.005-latent_dim_8-l2_reg_1e-07-epoch_3-n_negative_4-top_k_10). In the directory, a model file (model.pth) and a json file (epoch_data.json) that describes information for each epoch are generated. The json file can be described as follows (epoch=3).

[
    {
        "epoch": 0,
        "loss": 3275.6108826696873,
        "HR": 0.4460264900662252,
        "NDCG": 0.2433340828186714
    },
    {
        "epoch": 1,
        "loss": 1510.2559289187193,
        "HR": 0.6197019867549669,
        "NDCG": 0.3502363951558794
    },
    {
        "epoch": 2,
        "loss": 1320.9795952737331,
        "HR": 0.6700331125827814,
        "NDCG": 0.3889819661175262
    }
]