v-jinyi/Change Dataset used in newsrec to MIND #1153

yjw1029 · 2020-07-20T13:33:38Z

Description

Change Dataset used in newsrec algorithms to MIND

Related Issues

#1152 (comment)

Checklist:

I have followed the contribution guidelines and code style for this project.
I have added tests covering my contributions.
I have updated the documentation accordingly.
This PR is being made to staging and not master.

…nto v-jinyi/add-news-reco-methods

review-notebook-app · 2020-07-20T13:33:44Z

Check out this pull request on

Review Jupyter notebook visual diffs & provide feedback on notebooks.

Powered by ReviewNB

yjw1029 · 2020-07-20T14:02:17Z

Using MINDsmall dataset for training takes about 20minites per epoch, which is a little longer for quick start. A sample MINDdemo dataset is used in Juputer notebook. The file format is totally the same as those in MINDsmall and MIND large.

miguelgfierro · 2020-07-21T12:13:38Z

tests/smoke/test_notebooks_gpu.py

@@ -162,8 +162,10 @@ def test_naml_smoke(notebooks):
    )
    results = pm.read_notebook(OUTPUT_NOTEBOOK).dataframe.set_index("name")["value"]

-    assert results["res_syn"]["group_auc"] == pytest.approx(0.5565, rel=TOL, abs=ABS_TOL)
-    assert results["res_syn"]["mean_mrr"] == pytest.approx(0.1811, rel=TOL, abs=ABS_TOL)
+    assert results["res_syn"]["group_auc"] == pytest.approx(


@yjw1029 there is something weird happening. I was running these tests and it is taking too long, a couple of hours in Prometheus (still hasn't finished). Then I went and created a NC24 machine and run naml_MIND with 1 epoch. The training phase is taking long as well. I don't see any GPU consumption.

Could you please check?

I try the naml_MIND notebook again on my gcr server. The run_eval part has 27% gpu utilization, and the training part has 93% gpu ultilization.

Do other notebooks have the same issue?

I run the notebook recommenders/examples/00_quick_start/naml_MIND with the default parameters and 1 epoch, the training took Wall time: 12min 13s. The metrics were:
{'group_auc': 0.5825, 'mean_mrr': 0.2553, 'ndcg@5': 0.2802, 'ndcg@10': 0.3448}
then I changed the batch size to 512, hparams.batch_size = 512 the train took longer: Wall time: 18min 50s and the metrics were:
{'group_auc': 0.5663, 'mean_mrr': 0.2403, 'ndcg@5': 0.2594, 'ndcg@10': 0.3283}

With bs=512, I expect the metrics to be lower, however the training time should be faster. Still I can't see memory consumtion with nvidia-smi.

oh I think I found the issue, epoch vs epochs :-)

With gpu the time for 1 epoch should be about 3min per epoch. I already fix it. Sorry for my mistake.
I will check the time of 512 batch size.

running pytest tests/smoke/test_notebooks_gpu.py::test_naml_smoke, will report the results

The NAML model takes news abstract, news title, news vert and news subvert, which is much bigger than other newsrec model. It will cause OOM error when setting the batch_size as 512. I guess the longer training time may caused by shortage of gpu memory.

running pytest tests/smoke/test_notebooks_gpu.py::test_naml_smoke, will report the results

on a K80 it takes 3090s (around 50min), this is too much for smoke tests. I haven't tried to the integration tests yet, but if NAML test has 8 epochs, it would easily take more than 6 hours.

Could you please erase all the smoke tests for the news algos and reduce the integration tests to 1 epoch, so we get an affordable time?

Also, if it is possible to increase the batch size so NAML takes around 15-20min in the integration test, it would be perfect.

For you to get an estimate of what we had before, the integration test of all the GPU algos was a little bit more of 2h, we should target 3h max I would say in total

I checked test_naml_gpu time in smoke test again. It only takes 6 minutes.
And all algos in interaction test only take 50 minutes.

However, the smoke test does take a lot of time, which may not be caused by newsrec algos (we are now use a sample demo dataset of MIND and only run 1 epoch). Could you please check the reason of long smoke test again? @miguel-ferreira

…w1029/recommenders into v-jinyi/add-news-reco-methods

yjw1029 added 5 commits July 20, 2020 03:39

change news dataset to MIND

6640b83

add doc string

bed9df9

add test

d9ac6a1

fix typo

9065cdc

Merge branch 'staging' of https://github.com/microsoft/recommenders i…

84af16c

…nto v-jinyi/add-news-reco-methods

yjw1029 requested review from anargyri, gramhagen, loomlike, miguelgfierro and yueguoguo as code owners July 20, 2020 13:33

yjw1029 changed the title ~~V jinyi/add news reco methods~~ v-jinyi/Change Dataset used in newsrec to MIND Jul 20, 2020

miguelgfierro reviewed Jul 21, 2020

View reviewed changes

miguelgfierro and others added 8 commits July 21, 2020 14:01

Merge branch 'staging' into v-jinyi/add-news-reco-methods

6522d74

change epoch to epochs in test_notebook_gpu

20773ae

Merge branch 'v-jinyi/add-news-reco-methods' of https://github.com/yj…

154ef7f

…w1029/recommenders into v-jinyi/add-news-reco-methods

change naml epoch to 6

f68bb64

fix MINDAllIterator

f8d9fe5

rm dkn resource

a17fdde

add detail infomation of mind_demo in newsrec

201cd2d

Merge branch 'staging' into v-jinyi/add-news-reco-methods

194e148

miguelgfierro approved these changes Jul 23, 2020

View reviewed changes

miguelgfierro merged commit f1891d3 into recommenders-team:staging Jul 23, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v-jinyi/Change Dataset used in newsrec to MIND #1153

v-jinyi/Change Dataset used in newsrec to MIND #1153

yjw1029 commented Jul 20, 2020

review-notebook-app bot commented Jul 20, 2020

yjw1029 commented Jul 20, 2020

miguelgfierro Jul 21, 2020 •

edited

Loading

yjw1029 Jul 21, 2020

miguelgfierro Jul 21, 2020 •

edited

Loading

miguelgfierro Jul 21, 2020 •

edited

Loading

yjw1029 Jul 21, 2020

miguelgfierro Jul 21, 2020

yjw1029 Jul 21, 2020

miguelgfierro Jul 21, 2020 •

edited

Loading

yjw1029 Jul 23, 2020

v-jinyi/Change Dataset used in newsrec to MIND #1153

v-jinyi/Change Dataset used in newsrec to MIND #1153

Conversation

yjw1029 commented Jul 20, 2020

Description

Related Issues

Checklist:

review-notebook-app bot commented Jul 20, 2020

yjw1029 commented Jul 20, 2020

miguelgfierro Jul 21, 2020 • edited Loading

Choose a reason for hiding this comment

yjw1029 Jul 21, 2020

Choose a reason for hiding this comment

miguelgfierro Jul 21, 2020 • edited Loading

Choose a reason for hiding this comment

miguelgfierro Jul 21, 2020 • edited Loading

Choose a reason for hiding this comment

yjw1029 Jul 21, 2020

Choose a reason for hiding this comment

miguelgfierro Jul 21, 2020

Choose a reason for hiding this comment

yjw1029 Jul 21, 2020

Choose a reason for hiding this comment

miguelgfierro Jul 21, 2020 • edited Loading

Choose a reason for hiding this comment

yjw1029 Jul 23, 2020

Choose a reason for hiding this comment

miguelgfierro Jul 21, 2020 •

edited

Loading

miguelgfierro Jul 21, 2020 •

edited

Loading

miguelgfierro Jul 21, 2020 •

edited

Loading

miguelgfierro Jul 21, 2020 •

edited

Loading