diff --git a/README.md b/README.md
index bce19d0c04..f706b20dc2 100644
--- a/README.md
+++ b/README.md
@@ -56,7 +56,11 @@ The table below lists the recommender algorithms currently available in the repo
 | Factorization Machine (FM) / Field-Aware FM (FFM) | [Python CPU](notebooks/02_model/fm_deep_dive.ipynb) | Content-Based Filtering | Algorithm that predict labels with user/item features |
 | FastAI Embedding Dot Bias (FAST) | [Python CPU / Python GPU](notebooks/00_quick_start/fastai_movielens.ipynb) | Collaborative Filtering | General purpose algorithm with embeddings and biases for users and items |
 | LightGBM/Gradient Boosting Tree<sup>*</sup> | [Python CPU](notebooks/00_quick_start/lightgbm_tinycriteo.ipynb) / [PySpark](notebooks/02_model/mmlspark_lightgbm_criteo.ipynb) | Content-Based Filtering | Gradient Boosting Tree algorithm for fast training and low memory usage in content-based problems |
+| Neural Recommendation with Long- and Short-term User Representations (LSTUR)<sup>*</sup> | [Python CPU / Python GPU](notebooks/00_quick_start/lstur_synthetic.ipynb) | Content-Based Filtering | Neural recommendation algorithm with long- and short-term user interest modeling |
+| Neural Recommendation with Attentive Multi-View Learning (NAML)<sup>*</sup> | [Python CPU / Python GPU](notebooks/00_quick_start/naml_synthetic.ipynb) | Content-Based Filtering | Neural recommendation algorithm with attentive multi-view learning |
 | Neural Collaborative Filtering (NCF) | [Python CPU / Python GPU](notebooks/00_quick_start/ncf_movielens.ipynb) | Collaborative Filtering | Deep learning algorithm with enhanced performance for implicit feedback |
+| Neural Recommendation with Personalized Attention (NPA)<sup>*</sup> | [Python CPU / Python GPU](notebooks/00_quick_start/npa_synthetic.ipynb) | Content-Based Filtering | Neural recommendation algorithm with personalized attention network |
+| Neural Recommendation with Multi-Head Self-Attention (NRMS)<sup>*</sup> | [Python CPU / Python GPU](notebooks/00_quick_start/nrms_synthetic.ipynbb) | Content-Based Filtering | Neural recommendation algorithm with multi-head self-attention |
 | Restricted Boltzmann Machines (RBM) | [Python CPU / Python GPU](notebooks/00_quick_start/rbm_movielens.ipynb) | Collaborative Filtering | Neural network based algorithm for learning the underlying probability distribution for explicit or implicit feedback |
 | Riemannian Low-rank Matrix Completion (RLRMC)<sup>*</sup> | [Python CPU](notebooks/00_quick_start/rlrmc_movielens.ipynb) | Collaborative Filtering | Matrix factorization algorithm using Riemannian conjugate gradients optimization with small memory consumption. |
 | Simple Algorithm for Recommendation (SAR)<sup>*</sup> | [Python CPU](notebooks/00_quick_start/sar_movielens.ipynb) | Collaborative Filtering | Similarity-based algorithm for implicit feedback dataset |
diff --git a/notebooks/00_quick_start/README.md b/notebooks/00_quick_start/README.md
index 685252abdd..b78bfafdbd 100644
--- a/notebooks/00_quick_start/README.md
+++ b/notebooks/00_quick_start/README.md
@@ -11,7 +11,11 @@ data preparation, model building, and model evaluation by using the utility func
 | [dkn](dkn_synthetic.ipynb) | Synthetic Data | Python CPU, GPU | Utilizing the Deep Knowledge-Aware Network (DKN) [2] algorithm for news recommendations using information from a knowledge graph, in a Python+GPU (TensorFlow) environment.
 | [fastai](fastai_movielens.ipynb) | MovieLens | Python CPU, GPU | Utilizing FastAI recommender to predict movie ratings in a Python+GPU (PyTorch) environment.
 | [lightgbm](lightgbm_tinycriteo.ipynb) | Criteo | Python CPU | Utilizing LightGBM Boosting Tree to predict whether or not a user has clicked on an e-commerce ad |
+| [lstur](lstur_synthetic.ipynb) | Synthetic Data | Python CPU, GPU | Utilizing the Neural News Recommendation with Long- and Short-term User Representations (LSTUR) [9] for news recommendation, in a Python+GPU (Tensorflow) enviroment.
+| [naml](naml_synthetic.ipynb) | Synthetic Data | Python CPU, GPU | Utilizing the Neural News Recommendation with Attentive Multi-View Learning (NAML) [7] to algorithm for news recommendation using news verticle, subverticle, title and body information, in a Python+GPU (Tensorflow) environment.  
 | [ncf](ncf_movielens.ipynb) | MovieLens | Python CPU, GPU |  Utilizing Neural Collaborative Filtering (NCF) [1] to predict movie ratings in a Python+GPU (TensorFlow) environment.
+| [npa](npa_synthetic.ipynb) | Synthetic Data | Python CPU, GPU | Utilizing the Neural News Recommendation with Personalized Attention (NPA) [10] for news recommendation, in a Python+GPU (Tensorflow) environment.  
+| [nrms](nrms_synthetic.ipynb) | Synthetic Data | Python CPU, GPU | Utilizing the Neural News Recommendation with Multi-Head Self-Attention (NRMS) [8] for news recommendation, in a Python+GPU (Tensorflow) environment.  
 | [rbm](rbm_movielens.ipynb)| MovieLens | Python CPU, GPU | Utilizing the Restricted Boltzmann Machine (rbm) [4] to predict movie ratings in a Python+GPU (TensorFlow) environment.<br>
 | [rlrmc](rlrmc_movielens.ipynb) | Movielens | Python CPU | Utilizing the Riemannian Low-rank Matrix Completion (RLRMC) [6] to predict movie rating in a Python+CPU environment
 | [sar](sar_movielens.ipynb) | MovieLens | Python CPU | Utilizing Simple Algorithm for Recommendation (SAR) algorithm to predict movie ratings in a Python+CPU environment.
@@ -23,6 +27,10 @@ data preparation, model building, and model evaluation by using the utility func
 [2] _DKN: Deep Knowledge-Aware Network for News Recommendation_, Hongwei Wang, Fuzheng Zhang, Xing Xie and Minyi Guo. WWW 2018.<br>
 [3] _xDeepFM: Combining Explicit and Implicit Feature Interactions for Recommender Systems_, Jianxun Lian, Xiaohuan Zhou, Fuzheng Zhang, Zhongxia Chen, Xing Xie and Guangzhong Sun. KDD 2018.<br>
 [4] _Restricted Boltzmann Machines for Collaborative Filtering_, Ruslan Salakhutdinov, Andriy Mnih and Geoffrey Hinton. ICML 2007.<br>
-[5] _Wide & Deep Learning for Recommender Systems_, Heng-Tze Cheng et al., arXiv:1606.07792 2016.
-[6] _A unified framework for structured low-rank matrix learning_, Pratik Jawanpuria and Bamdev Mishra, In International Conference on Machine Learning, 2018.
+[5] _Wide & Deep Learning for Recommender Systems_, Heng-Tze Cheng et al., arXiv:1606.07792 2016. <br>
+[6] _A unified framework for structured low-rank matrix learning_, Pratik Jawanpuria and Bamdev Mishra, In International Conference on Machine Learning, 2018. <br>
+[7] _NAML: Neural News Recommendation with Attentive Multi-View Learning_, Chuhan Wu, Fangzhao Wu, Mingxiao An, Jianqiang Huang, Yongfeng Huang and Xing Xie. IJCAI 2019.<br>
+[8] _NRMS: Neural News Recommendation with Multi-Head Self-Attention_, Chuhan Wu, Fangzhao Wu, Suyu Ge, Tao Qi, Yongfeng Huang, Xing Xie. in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP).<br>
+[9] _LSTUR: Neural News Recommendation with Long- and Short-term User Representations_, Mingxiao An, Fangzhao Wu, Chuhan Wu, Kun Zhang, Zheng Liu and Xing Xie. ACL 2019.<br>
+[10] _NPA: Neural News Recommendation with Personalized Attention_, Chuhan Wu, Fangzhao Wu, Mingxiao An, Jianqiang Huang, Yongfeng Huang and Xing Xie. KDD 2019, ADS track.<br>
 
diff --git a/notebooks/00_quick_start/lstur_synthetic.ipynb b/notebooks/00_quick_start/lstur_synthetic.ipynb
new file mode 100644
index 0000000000..f496a06648
--- /dev/null
+++ b/notebooks/00_quick_start/lstur_synthetic.ipynb
@@ -0,0 +1,350 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<i>Copyright (c) Microsoft Corporation. All rights reserved.</i>\n",
+    "\n",
+    "<i>Licensed under the MIT License.</i>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# LSTUR: Neural News Recommendation with Long- and Short-term User Representations\n",
+    "LSTUR \\[1\\] is a news recommendation approach capturing users' both long-term preferences and short-term interests. The core of LSTUR is a news encoder and a user encoder.  In the news encoder, we learn representations of news from their titles. In user encoder, we propose to learn long-term\n",
+    "user representations from the embeddings of their IDs. In addition, we propose to learn short-term user representations from their recently browsed news via GRU network. Besides, we propose two methods to combine\n",
+    "long-term and short-term user representations. The first one is using the long-term user representation to initialize the hidden state of the GRU network in short-term user representation. The second one is concatenating both\n",
+    "long- and short-term user representations as a unified user vector.\n",
+    "\n",
+    "## Properties of LSTUR:\n",
+    "- LSTUR captures users' both long-term and short term preference.\n",
+    "- It uses embeddings of users' IDs to learn long-term user representations.\n",
+    "- It uses users' recently browsed news via GRU network to learn short-term user representations.\n",
+    "\n",
+    "## Data format:\n",
+    "\n",
+    "### train data\n",
+    "One simple example: <br>\n",
+    "\n",
+    "`1 0 0 0 0 Impression:0 User:2903 CandidateNews0:27006,11901,21668,9856,16156,21390,1741,2003,16983,8164 CandidateNews1:8377,10423,9960,5485,20494,7553,1251,17232,4745,9178 CandidateNews2:1607,26414,25830,16156,15337,16461,4004,6230,17841,10704 CandidateNews3:17323,20324,27855,16156,2934,14673,551,0,0,0 CandidateNews4:7172,3596,25442,21596,26195,4745,17988,16461,1741,76 ClickedNews0:11362,8205,22501,9349,12911,20324,1238,11362,26422,19185 ...`\n",
+    "<br>\n",
+    "\n",
+    "In general, each line in data file represents one positive instance and n negative instances in a same impression. The format is like: <br>\n",
+    "\n",
+    "`[label0] ... [labeln] [Impression:i] [User:u] [CandidateNews0:w1,w2,w3,...] ... [CandidateNewsn:w1,w2,w3,...] [ClickedNews0:w1,w2,w3,...] ...`\n",
+    "\n",
+    "<br>\n",
+    "\n",
+    "It contains several parts seperated by space, i.e. label part, Impression part `<impresison id>`, User part `<user id>`, CandidateNews part, ClickedHistory part. CandidateNews part describes the target news article we are going to score in this instance, it is represented by (aligned) title words. To take a quick example, a news title may be : `Trump to deliver State of the Union address next week` , then the title words value may be `CandidateNewsi:34,45,334,23,12,987,3456,111,456,432`. <br>\n",
+    "ClickedNewsk describe the k-th news article the user ever clicked and the format is the same as candidate news. Words are aligned in news title. We use a fixed length to describe an article, if the title is less than the fixed length, just pad it with zeros.\n",
+    "\n",
+    "### test data\n",
+    "One simple example: <br>\n",
+    "`1 Impression:0 User:6446 CandidateNews0:18707,23848,13490,10948,21385,11606,1251,16591,827,28081 ClickedNews0:27838,7376,16567,28518,119,21248,7598,9349,20324,9349 ClickedNews1:7969,9783,1741,2549,27104,14669,14777,21343,7667,20324 ...`\n",
+    "<br>\n",
+    "\n",
+    "In general, each line in data file represents one instance. The format is like: <br>\n",
+    "\n",
+    "`[label] [Impression:i] [User:u] [CandidateNews0:w1,w2,w3,...] [ClickedNews0:w1,w2,w3,...] ...`\n",
+    "<br>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Global settings and imports"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "/data/anaconda/envs/reco_gpu/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:523: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+      "  _np_qint8 = np.dtype([(\"qint8\", np.int8, 1)])\n",
+      "/data/anaconda/envs/reco_gpu/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:524: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+      "  _np_quint8 = np.dtype([(\"quint8\", np.uint8, 1)])\n",
+      "/data/anaconda/envs/reco_gpu/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+      "  _np_qint16 = np.dtype([(\"qint16\", np.int16, 1)])\n",
+      "/data/anaconda/envs/reco_gpu/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:526: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+      "  _np_quint16 = np.dtype([(\"quint16\", np.uint16, 1)])\n",
+      "/data/anaconda/envs/reco_gpu/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:527: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+      "  _np_qint32 = np.dtype([(\"qint32\", np.int32, 1)])\n",
+      "/data/anaconda/envs/reco_gpu/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:532: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+      "  np_resource = np.dtype([(\"resource\", np.ubyte, 1)])\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "System version: 3.6.10 |Anaconda, Inc.| (default, Mar 25 2020, 23:51:54) \n",
+      "[GCC 7.3.0]\n",
+      "Tensorflow version: 1.12.0\n"
+     ]
+    }
+   ],
+   "source": [
+    "import sys\n",
+    "sys.path.append(\"../../\")\n",
+    "import os\n",
+    "from reco_utils.recommender.deeprec.deeprec_utils import download_deeprec_resources \n",
+    "from reco_utils.recommender.newsrec.newsrec_utils import prepare_hparams\n",
+    "from reco_utils.recommender.newsrec.models.lstur import LSTURModel\n",
+    "from reco_utils.recommender.newsrec.IO.news_iterator import NewsIterator\n",
+    "import papermill as pm\n",
+    "from tempfile import TemporaryDirectory\n",
+    "import tensorflow as tf\n",
+    "\n",
+    "print(\"System version: {}\".format(sys.version))\n",
+    "print(\"Tensorflow version: {}\".format(tf.__version__))\n",
+    "\n",
+    "tmpdir = TemporaryDirectory()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Download and load data"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "100%|██████████| 21.2k/21.2k [00:01<00:00, 12.4kKB/s]\n"
+     ]
+    }
+   ],
+   "source": [
+    "data_path = tmpdir.name\n",
+    "yaml_file = os.path.join(data_path, r'lstur.yaml')\n",
+    "train_file = os.path.join(data_path, r'train.txt')\n",
+    "valid_file = os.path.join(data_path, r'test.txt')\n",
+    "wordEmb_file = os.path.join(data_path, r'embedding.npy')\n",
+    "if not os.path.exists(yaml_file):\n",
+    "    download_deeprec_resources(r'https://recodatasets.blob.core.windows.net/newsrec/', data_path, 'lstur.zip')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Create hyper-parameters"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {
+    "tags": [
+     "parameters"
+    ]
+   },
+   "outputs": [],
+   "source": [
+    "epochs=5\n",
+    "seed=42"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[('attention_hidden_dim', 200), ('batch_size', 64), ('body_size', None), ('cnn_activation', 'relu'), ('data_format', 'news'), ('dense_activation', None), ('doc_size', 10), ('dropout', 0.2), ('epochs', 5), ('filter_num', 400), ('gru_unit', 400), ('head_dim', 100), ('head_num', 4), ('his_size', 50), ('iterator_type', None), ('learning_rate', 0.0001), ('loss', 'cross_entropy_loss'), ('metrics', ['group_auc', 'mean_mrr', 'ndcg@5;10']), ('npratio', 4), ('optimizer', 'adam'), ('show_step', 100000), ('subvert_emb_dim', 100), ('subvert_num', None), ('title_size', None), ('type', 'ini'), ('user_emb_dim', 50), ('user_num', 10338), ('vert_emb_dim', 100), ('vert_num', None), ('window_size', 3), ('wordEmb_file', '/tmp/tmp2e1es2so/embedding.npy'), ('word_emb_dim', 100), ('word_size', 28929)]\n"
+     ]
+    }
+   ],
+   "source": [
+    "hparams = prepare_hparams(yaml_file, wordEmb_file=wordEmb_file, epochs=epochs)\n",
+    "print(hparams)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "iterator = NewsIterator"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Train the LSTUR model"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "model = LSTURModel(hparams, iterator, seed=seed)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{'group_auc': 0.5094, 'mean_mrr': 0.1628, 'ndcg@5': 0.1497, 'ndcg@10': 0.2118}\n"
+     ]
+    }
+   ],
+   "source": [
+    "print(model.run_eval(valid_file))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "/data/anaconda/envs/reco_gpu/lib/python3.6/site-packages/tensorflow/python/ops/gradients_impl.py:112: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.\n",
+      "  \"Converting sparse IndexedSlices to a dense Tensor of unknown shape. \"\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "at epoch 1\n",
+      "train info: logloss loss:1.6093932487526719\n",
+      "eval info: group_auc:0.5476, mean_mrr:0.1704, ndcg@10:0.236, ndcg@5:0.1733\n",
+      "at epoch 1 , train time: 12.5 eval time: 8.1\n",
+      "at epoch 2\n",
+      "train info: logloss loss:1.5449624518958889\n",
+      "eval info: group_auc:0.5537, mean_mrr:0.1816, ndcg@10:0.2515, ndcg@5:0.1843\n",
+      "at epoch 2 , train time: 10.2 eval time: 8.1\n",
+      "at epoch 3\n",
+      "train info: logloss loss:1.5040684374011293\n",
+      "eval info: group_auc:0.561, mean_mrr:0.1816, ndcg@10:0.2515, ndcg@5:0.1821\n",
+      "at epoch 3 , train time: 10.3 eval time: 8.0\n",
+      "at epoch 4\n",
+      "train info: logloss loss:1.4703012485893405\n",
+      "eval info: group_auc:0.5649, mean_mrr:0.1823, ndcg@10:0.2511, ndcg@5:0.1866\n",
+      "at epoch 4 , train time: 10.3 eval time: 7.9\n",
+      "at epoch 5\n",
+      "train info: logloss loss:1.4386503496948553\n",
+      "eval info: group_auc:0.5667, mean_mrr:0.1827, ndcg@10:0.2465, ndcg@5:0.1898\n",
+      "at epoch 5 , train time: 10.3 eval time: 8.1\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "<reco_utils.recommender.newsrec.models.lstur.LSTURModel at 0x7f5478722b38>"
+      ]
+     },
+     "execution_count": 8,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "model.fit(train_file, valid_file)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{'group_auc': 0.5667, 'mean_mrr': 0.1827, 'ndcg@5': 0.1898, 'ndcg@10': 0.2465}\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "/data/anaconda/envs/reco_gpu/lib/python3.6/site-packages/ipykernel_launcher.py:3: DeprecationWarning: Function record is deprecated and will be removed in verison 1.0.0 (current version 0.19.1). Please see `scrapbook.glue` (nteract-scrapbook) as a replacement for this functionality.\n",
+      "  This is separate from the ipykernel package so we can avoid doing imports until\n"
+     ]
+    },
+    {
+     "data": {
+      "application/papermill.record+json": {
+       "res_syn": {
+        "group_auc": 0.5667,
+        "mean_mrr": 0.1827,
+        "ndcg@10": 0.2465,
+        "ndcg@5": 0.1898
+       }
+      }
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "res_syn = model.run_eval(valid_file)\n",
+    "print(res_syn)\n",
+    "pm.record(\"res_syn\", res_syn)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Reference\n",
+    "\\[1\\] Mingxiao An, Fangzhao Wu, Chuhan Wu, Kun Zhang, Zheng Liu and Xing Xie: Neural News Recommendation with Long- and Short-term User Representations, ACL 2019<br>"
+   ]
+  }
+ ],
+ "metadata": {
+  "celltoolbar": "Tags",
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.10"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/notebooks/00_quick_start/naml_synthetic.ipynb b/notebooks/00_quick_start/naml_synthetic.ipynb
new file mode 100644
index 0000000000..ca0b9e9b00
--- /dev/null
+++ b/notebooks/00_quick_start/naml_synthetic.ipynb
@@ -0,0 +1,357 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<i>Copyright (c) Microsoft Corporation. All rights reserved.</i>\n",
+    "\n",
+    "<i>Licensed under the MIT License.</i>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# NAML: Neural News Recommendation with Attentive Multi-View Learning\n",
+    "NAML \\[1\\] is a multi-view news recommendation approach. The core of NAML is a news encoder and a user encoder. The newsencoder is composed of a title encoder, a body encoder, a vert encoder and a subvert encoder. The CNN-based title encoder and body encoder learn title and body representations by capturing words semantic information. After getting news title, body, vert and subvert representations, an attention network is used to aggregate those vectors. In the user encoder, we learn representations of users from their browsed news. Besides, we apply additive attention to learn more informative news and user representations by selecting important words and news.\n",
+    "\n",
+    "## Properties of NAML:\n",
+    "- NAML is a multi-view neural news recommendation approach.\n",
+    "- It uses news title, news body, news vert and news subvert to get news repersentations. And it uses user historical behaviors to learn user representations.\n",
+    "- NAML uses additive attention to learn informative news and user representations by selecting important words and news.\n",
+    "\n",
+    "## Data format:\n",
+    "\n",
+    "### train data\n",
+    "One simple example: <br>\n",
+    "\n",
+    "`1 0 0 0 0 Impression:0 User:502 CandidateTitle0:17917,36557,47926,32224,24113,48923,19086,5636,3703,0... CandidateBody0:17024,53305,8832,29800,9787,4068,48731,48923,19086,38699,5766,22487,38336,29800,8548,39128,33457,7789,\n",
+    "30543,7482,8548,49004,53305,22999,32747,21103,11799,5766,4868,17115,7482,15118,48731,2025,7789,23336,7789,48731,19086,\n",
+    "10630,11128,36557,3703,47354,611,7789,19086,5636,51521,30706... CandidateVert0:14... CandidateSubvert0:219... ClickedTitle0:48,33405,35198,5969,5636,35845,850,48731,46799,24113... ClickedBody0:36557,67,34519,24113,8548,48,33405,35198,5969,14340,7053,850,8823,9498,46799,24113,12506,32747,31130,\n",
+    "3074,48731,20869,14264,38289,37310,7789,36557,34967,48731,36916,23321,3595,48731,47354,4868,15719,7482,12771,50693,\n",
+    "47354,17523,48,20918,17900,35198,48731,20869,1220,14264,7789... ClickedVert0:14... ClickedSubvert0:99... `\n",
+    "<br>\n",
+    "\n",
+    "In general, each line in data file represents one positive instance and n negative instances in a same impression. The format is like: <br>\n",
+    "\n",
+    "`[label0] ... [labeln] [Impression:i] [User:u] [CandidateTitle0:w1,w2,w3,...] ... [CandidateBody0:w1,w2 ..] ... [CandidateVert0:v] ... [CandidateSubvert0:s] ... [ClickedTitle0:w1,w2,w3,...] ... [ClickedBody0:w1,w2,w3,...] ... [ClickedVert0:v] ... [ClickedSubvert0:s] ...`\n",
+    "\n",
+    "<br>\n",
+    "\n",
+    "It contains several parts seperated by space, i.e. label part, Impression part `<impresison id>`, User part `<user id>`, CandidateNews part, ClickedHistory part. CandidateNews part describes the target news article we are going to score in this instance. It is represented by (aligned) title words, body words, news vertical index and subvertical index. To take a quick example, a news title may be : `Trump to deliver State of the Union address next week` , then the title words value may be `CandidateTitlei:34,45,334,23,12,987,3456,111,456,432`. <br>\n",
+    "ClickedHistory describe the k-th news article the user ever clicked and the format is the same as candidate news. Every clicked news has title words, body words, vertical and subvertical. We use a fixed length to describe an article, if the title or body is less than the fixed length, just pad it with zeros.\n",
+    "\n",
+    "### test data\n",
+    "One simple example: <br>\n",
+    "`1 Impression:0 User:1529 CandidateTitle0:5327,18658,13846,6439,611,50611,0,0,0,0 CandidateBody0:13846,3197,27902,225,5327,45008,29145,7789,509,7395,11502,36557,13846,23680,26492,38072,20507,5636,\n",
+    "4247,32747,50132,7482,41049,32747,43022,50611,35979,7789,1191,36557,52870,21622,48148,42737,48731,36557,13846,23680,\n",
+    "13173,7482,13848,38072,20507,7789,41675,36875,51461,12348,21045,42160 CandidateVert0:14 CandidateSubvert0:19 ClickedTitle0:9079,3703,32747,8546,19377,50184,32747,24026,40010,49754 ... ClickedBody0:26061,48731,8576,7789,8683,9079,5636,45084,46805,3703,509,43036,11502,28883,9498,18450,32747,8546,33405,\n",
+    "35647,50184,7482,41143,8220,43618,38072,35198,43390,28057,32552,45245,10764,16247,4221,41038,36557,43683,46805,7789,\n",
+    "29727,2179,51003,34797,897,21045,12974,23382,46287,48731,15206 ... ClickedVert0:14 ... ClickedSubvert0:219 ...`\n",
+    "<br>\n",
+    "\n",
+    "In general, each line in data file represents one instance. The format is like: <br>\n",
+    "\n",
+    "`[label] [Impression:i] [User:u] [CandidateTitle0:w1,w2,w3,...] [CandidateBody0:w1,w2,w3,...] [CandidateVert0:v] [CandidateSubvert0:s] [ClickedTitle0:w1,w2,w3,...] ... [ClickedBody0:w1,w2,w3,...] ... [ClickedVert0:v] ... [ClickedSubvert0:s] ...`\n",
+    "<br>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Global settings and imports"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "/data/anaconda/envs/reco_gpu/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:523: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+      "  _np_qint8 = np.dtype([(\"qint8\", np.int8, 1)])\n",
+      "/data/anaconda/envs/reco_gpu/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:524: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+      "  _np_quint8 = np.dtype([(\"quint8\", np.uint8, 1)])\n",
+      "/data/anaconda/envs/reco_gpu/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+      "  _np_qint16 = np.dtype([(\"qint16\", np.int16, 1)])\n",
+      "/data/anaconda/envs/reco_gpu/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:526: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+      "  _np_quint16 = np.dtype([(\"quint16\", np.uint16, 1)])\n",
+      "/data/anaconda/envs/reco_gpu/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:527: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+      "  _np_qint32 = np.dtype([(\"qint32\", np.int32, 1)])\n",
+      "/data/anaconda/envs/reco_gpu/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:532: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+      "  np_resource = np.dtype([(\"resource\", np.ubyte, 1)])\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "System version: 3.6.10 |Anaconda, Inc.| (default, Mar 25 2020, 23:51:54) \n",
+      "[GCC 7.3.0]\n",
+      "Tensorflow version: 1.12.0\n"
+     ]
+    }
+   ],
+   "source": [
+    "import sys\n",
+    "sys.path.append(\"../../\")\n",
+    "from reco_utils.recommender.deeprec.deeprec_utils import download_deeprec_resources \n",
+    "from reco_utils.recommender.newsrec.newsrec_utils import prepare_hparams\n",
+    "from reco_utils.recommender.newsrec.models.naml import NAMLModel\n",
+    "from reco_utils.recommender.newsrec.IO.naml_iterator import NAMLIterator\n",
+    "import papermill as pm\n",
+    "from tempfile import TemporaryDirectory\n",
+    "import tensorflow as tf\n",
+    "import os\n",
+    "\n",
+    "print(\"System version: {}\".format(sys.version))\n",
+    "print(\"Tensorflow version: {}\".format(tf.__version__))\n",
+    "\n",
+    "tmpdir = TemporaryDirectory()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Download and load data"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "100%|██████████| 72.6k/72.6k [00:04<00:00, 18.0kKB/s]\n"
+     ]
+    }
+   ],
+   "source": [
+    "data_path = tmpdir.name\n",
+    "yaml_file = os.path.join(data_path, r'naml.yaml')\n",
+    "train_file = os.path.join(data_path, r'train.txt')\n",
+    "valid_file = os.path.join(data_path, r'test.txt')\n",
+    "wordEmb_file = os.path.join(data_path, r'embedding.npy')\n",
+    "if not os.path.exists(yaml_file):\n",
+    "    download_deeprec_resources(r'https://recodatasets.blob.core.windows.net/newsrec/', data_path, 'naml.zip')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Create hyper-parameters"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {
+    "tags": [
+     "parameters"
+    ]
+   },
+   "outputs": [],
+   "source": [
+    "epochs=5\n",
+    "seed=42"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[('attention_hidden_dim', 100), ('batch_size', 64), ('body_size', 50), ('cnn_activation', 'relu'), ('data_format', 'naml'), ('dense_activation', 'relu'), ('doc_size', None), ('dropout', 0.2), ('epochs', 5), ('filter_num', 200), ('gru_unit', 400), ('head_dim', 100), ('head_num', 4), ('his_size', 50), ('iterator_type', None), ('learning_rate', 0.0001), ('loss', 'cross_entropy_loss'), ('metrics', ['group_auc', 'mean_mrr', 'ndcg@5;10']), ('npratio', 4), ('optimizer', 'adam'), ('show_step', 100000), ('subvert_emb_dim', 100), ('subvert_num', 249), ('title_size', 10), ('type', 'ini'), ('user_emb_dim', 50), ('user_num', 10329), ('vert_emb_dim', 100), ('vert_num', 17), ('window_size', 3), ('wordEmb_file', '/tmp/tmps2ts1yn8/embedding.npy'), ('word_emb_dim', 100), ('word_size', 54071)]\n"
+     ]
+    }
+   ],
+   "source": [
+    "hparams = prepare_hparams(yaml_file, wordEmb_file=wordEmb_file, epochs=epochs)\n",
+    "print(hparams)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "iterator = NAMLIterator"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Train the NAML model"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "model = NAMLModel(hparams, iterator, seed=seed)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "metadata": {
+    "scrolled": true
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{'group_auc': 0.5298, 'mean_mrr': 0.1727, 'ndcg@5': 0.1698, 'ndcg@10': 0.2353}\n"
+     ]
+    }
+   ],
+   "source": [
+    "print(model.run_eval(valid_file))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "/data/anaconda/envs/reco_gpu/lib/python3.6/site-packages/tensorflow/python/ops/gradients_impl.py:112: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.\n",
+      "  \"Converting sparse IndexedSlices to a dense Tensor of unknown shape. \"\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "at epoch 1\n",
+      "train info: logloss loss:1.584658507424958\n",
+      "eval info: group_auc:0.5576, mean_mrr:0.1886, ndcg@10:0.2516, ndcg@5:0.1954\n",
+      "at epoch 1 , train time: 45.3 eval time: 36.2\n",
+      "at epoch 2\n",
+      "train info: logloss loss:1.5453801481091247\n",
+      "eval info: group_auc:0.5612, mean_mrr:0.1909, ndcg@10:0.2565, ndcg@5:0.1997\n",
+      "at epoch 2 , train time: 39.2 eval time: 36.3\n",
+      "at epoch 3\n",
+      "train info: logloss loss:1.5066348367807816\n",
+      "eval info: group_auc:0.5598, mean_mrr:0.1979, ndcg@10:0.2628, ndcg@5:0.2056\n",
+      "at epoch 3 , train time: 39.5 eval time: 36.2\n",
+      "at epoch 4\n",
+      "train info: logloss loss:1.4670472329976607\n",
+      "eval info: group_auc:0.5619, mean_mrr:0.2004, ndcg@10:0.2642, ndcg@5:0.2071\n",
+      "at epoch 4 , train time: 39.6 eval time: 36.3\n",
+      "at epoch 5\n",
+      "train info: logloss loss:1.4319027910427171\n",
+      "eval info: group_auc:0.5599, mean_mrr:0.2027, ndcg@10:0.268, ndcg@5:0.2065\n",
+      "at epoch 5 , train time: 39.2 eval time: 36.3\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "<reco_utils.recommender.newsrec.models.naml.NAMLModel at 0x7f9819dd5cc0>"
+      ]
+     },
+     "execution_count": 10,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "model.fit(train_file, valid_file)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{'group_auc': 0.5599, 'mean_mrr': 0.2027, 'ndcg@5': 0.2065, 'ndcg@10': 0.268}\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "/data/anaconda/envs/reco_gpu/lib/python3.6/site-packages/ipykernel_launcher.py:3: DeprecationWarning: Function record is deprecated and will be removed in verison 1.0.0 (current version 0.19.1). Please see `scrapbook.glue` (nteract-scrapbook) as a replacement for this functionality.\n",
+      "  This is separate from the ipykernel package so we can avoid doing imports until\n"
+     ]
+    },
+    {
+     "data": {
+      "application/papermill.record+json": {
+       "res_syn": {
+        "group_auc": 0.5599,
+        "mean_mrr": 0.2027,
+        "ndcg@10": 0.268,
+        "ndcg@5": 0.2065
+       }
+      }
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "res_syn = model.run_eval(valid_file)\n",
+    "print(res_syn)\n",
+    "pm.record(\"res_syn\", res_syn)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Reference\n",
+    "\\[1\\] Chuhan Wu, Fangzhao Wu, Mingxiao An, Jianqiang Huang, Yongfeng Huang and Xing Xie: Neural News Recommendation with Attentive Multi-View Learning, IJCAI 2019<br>"
+   ]
+  }
+ ],
+ "metadata": {
+  "celltoolbar": "Tags",
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.10"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/notebooks/00_quick_start/npa_synthetic.ipynb b/notebooks/00_quick_start/npa_synthetic.ipynb
new file mode 100644
index 0000000000..528e8776f5
--- /dev/null
+++ b/notebooks/00_quick_start/npa_synthetic.ipynb
@@ -0,0 +1,348 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<i>Copyright (c) Microsoft Corporation. All rights reserved.</i>\n",
+    "\n",
+    "<i>Licensed under the MIT License.</i>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# NPA: Neural News Recommendation with Personalized Attention\n",
+    "NPA \\[1\\] is a news recommendation model with personalized attention. The core of NPA is a news representation model and a user representation model. In the news representation model we use a CNN network to learn hidden representations of news articles based on their titles. In the user representation model we learn the representations of users based on the representations of their clicked news articles. In addition, a word-level and a news-level personalized attention are used to capture different informativeness for different users.\n",
+    "\n",
+    "## Properties of NPA:\n",
+    "- NPA is a content-based news recommendation method.\n",
+    "- It uses a CNN network to learn news representation. And it learns user representations from their clicked news articles.\n",
+    "- A word-level personalized attention is used to help NPA attend to important words for different users.\n",
+    "- A news-level personalized attention is used to help NPA attend to important historical clicked news for different users.\n",
+    "\n",
+    "## Data format:\n",
+    "\n",
+    "### train data\n",
+    "One simple example: <br>\n",
+    "\n",
+    "`1 0 0 0 0 Impression:0 User:2903 CandidateNews0:27006,11901,21668,9856,16156,21390,1741,2003,16983,8164 CandidateNews1:8377,10423,9960,5485,20494,7553,1251,17232,4745,9178 CandidateNews2:1607,26414,25830,16156,15337,16461,4004,6230,17841,10704 CandidateNews3:17323,20324,27855,16156,2934,14673,551,0,0,0 CandidateNews4:7172,3596,25442,21596,26195,4745,17988,16461,1741,76 ClickedNews0:11362,8205,22501,9349,12911,20324,1238,11362,26422,19185 ...`\n",
+    "<br>\n",
+    "\n",
+    "In general, each line in data file represents one positive instance and n negative instances in a same impression. The format is like: <br>\n",
+    "\n",
+    "`[label0] ... [labeln] [Impression:i] [User:u] [CandidateNews0:w1,w2,w3,...] ... [CandidateNewsn:w1,w2,w3,...] [ClickedNews0:w1,w2,w3,...] ...`\n",
+    "\n",
+    "<br>\n",
+    "\n",
+    "It contains several parts seperated by space, i.e. label part, Impression part `<impresison id>`, User part `<user id>`, CandidateNews part, ClickedHistory part. CandidateNews part describes the target news article we are going to score in this instance, it is represented by (aligned) title words. To take a quick example, a news title may be : `Trump to deliver State of the Union address next week` , then the title words value may be `CandidateNewsi:34,45,334,23,12,987,3456,111,456,432`. <br>\n",
+    "ClickedNewsk describe the k-th news article the user ever clicked and the format is the same as candidate news. Words are aligned in news title. We use a fixed length to describe an article, if the title is less than the fixed length, just pad it with zeros.\n",
+    "\n",
+    "### test data\n",
+    "One simple example: <br>\n",
+    "`1 Impression:0 User:6446 CandidateNews0:18707,23848,13490,10948,21385,11606,1251,16591,827,28081 ClickedNews0:27838,7376,16567,28518,119,21248,7598,9349,20324,9349 ClickedNews1:7969,9783,1741,2549,27104,14669,14777,21343,7667,20324 ...`\n",
+    "<br>\n",
+    "\n",
+    "In general, each line in data file represents one instance. The format is like: <br>\n",
+    "\n",
+    "`[label] [Impression:i] [User:u] [CandidateNews0:w1,w2,w3,...] [ClickedNews0:w1,w2,w3,...] ...`\n",
+    "<br>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Global settings and imports"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "/data/anaconda/envs/reco_gpu/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:523: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+      "  _np_qint8 = np.dtype([(\"qint8\", np.int8, 1)])\n",
+      "/data/anaconda/envs/reco_gpu/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:524: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+      "  _np_quint8 = np.dtype([(\"quint8\", np.uint8, 1)])\n",
+      "/data/anaconda/envs/reco_gpu/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+      "  _np_qint16 = np.dtype([(\"qint16\", np.int16, 1)])\n",
+      "/data/anaconda/envs/reco_gpu/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:526: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+      "  _np_quint16 = np.dtype([(\"quint16\", np.uint16, 1)])\n",
+      "/data/anaconda/envs/reco_gpu/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:527: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+      "  _np_qint32 = np.dtype([(\"qint32\", np.int32, 1)])\n",
+      "/data/anaconda/envs/reco_gpu/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:532: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+      "  np_resource = np.dtype([(\"resource\", np.ubyte, 1)])\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "System version: 3.6.10 |Anaconda, Inc.| (default, Mar 25 2020, 23:51:54) \n",
+      "[GCC 7.3.0]\n",
+      "Tensorflow version: 1.12.0\n"
+     ]
+    }
+   ],
+   "source": [
+    "import sys\n",
+    "sys.path.append(\"../../\")\n",
+    "from reco_utils.recommender.deeprec.deeprec_utils import download_deeprec_resources \n",
+    "from reco_utils.recommender.newsrec.newsrec_utils import prepare_hparams\n",
+    "from reco_utils.recommender.newsrec.models.npa import NPAModel\n",
+    "from reco_utils.recommender.newsrec.IO.news_iterator import NewsIterator\n",
+    "import papermill as pm\n",
+    "from tempfile import TemporaryDirectory\n",
+    "import tensorflow as tf\n",
+    "import os\n",
+    "\n",
+    "print(\"System version: {}\".format(sys.version))\n",
+    "print(\"Tensorflow version: {}\".format(tf.__version__))\n",
+    "\n",
+    "tmpdir = TemporaryDirectory()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Download and load data"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "100%|██████████| 21.2k/21.2k [00:01<00:00, 12.1kKB/s]\n"
+     ]
+    }
+   ],
+   "source": [
+    "data_path = tmpdir.name\n",
+    "yaml_file = os.path.join(data_path, r'npa.yaml')\n",
+    "train_file = os.path.join(data_path, r'train.txt')\n",
+    "valid_file = os.path.join(data_path, r'test.txt')\n",
+    "wordEmb_file = os.path.join(data_path, r'embedding.npy')\n",
+    "if not os.path.exists(yaml_file):\n",
+    "    download_deeprec_resources(r'https://recodatasets.blob.core.windows.net/newsrec/', data_path, 'npa.zip')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Create hyper-parameters"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {
+    "tags": [
+     "parameters"
+    ]
+   },
+   "outputs": [],
+   "source": [
+    "epochs=5\n",
+    "seed=42"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[('attention_hidden_dim', 200), ('batch_size', 64), ('body_size', None), ('cnn_activation', 'relu'), ('data_format', 'news'), ('dense_activation', None), ('doc_size', 10), ('dropout', 0.2), ('epochs', 5), ('filter_num', 400), ('gru_unit', 400), ('head_dim', 100), ('head_num', 4), ('his_size', 50), ('iterator_type', None), ('learning_rate', 0.0001), ('loss', 'cross_entropy_loss'), ('metrics', ['group_auc', 'mean_mrr', 'ndcg@5;10']), ('npratio', 4), ('optimizer', 'adam'), ('show_step', 100000), ('subvert_emb_dim', 100), ('subvert_num', None), ('title_size', None), ('type', 'ini'), ('user_emb_dim', 50), ('user_num', 10338), ('vert_emb_dim', 100), ('vert_num', None), ('window_size', 3), ('wordEmb_file', '/tmp/tmpciqa91mp/embedding.npy'), ('word_emb_dim', 100), ('word_size', 28929)]\n"
+     ]
+    }
+   ],
+   "source": [
+    "hparams = prepare_hparams(yaml_file, wordEmb_file=wordEmb_file, epochs=epochs)\n",
+    "print(hparams)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "iterator = NewsIterator"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Train the NPA model"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "model = NPAModel(hparams, iterator, seed=seed)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{'group_auc': 0.5101, 'mean_mrr': 0.163, 'ndcg@5': 0.1523, 'ndcg@10': 0.2126}\n"
+     ]
+    }
+   ],
+   "source": [
+    "print(model.run_eval(valid_file))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "/data/anaconda/envs/reco_gpu/lib/python3.6/site-packages/tensorflow/python/ops/gradients_impl.py:112: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.\n",
+      "  \"Converting sparse IndexedSlices to a dense Tensor of unknown shape. \"\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "at epoch 1\n",
+      "train info: logloss loss:1.6403130025279766\n",
+      "eval info: group_auc:0.5368, mean_mrr:0.1625, ndcg@10:0.2249, ndcg@5:0.1567\n",
+      "at epoch 1 , train time: 15.8 eval time: 9.0\n",
+      "at epoch 2\n",
+      "train info: logloss loss:1.5913376968734119\n",
+      "eval info: group_auc:0.5413, mean_mrr:0.1714, ndcg@10:0.232, ndcg@5:0.1681\n",
+      "at epoch 2 , train time: 13.2 eval time: 9.2\n",
+      "at epoch 3\n",
+      "train info: logloss loss:1.548547458162113\n",
+      "eval info: group_auc:0.5522, mean_mrr:0.1722, ndcg@10:0.2406, ndcg@5:0.1668\n",
+      "at epoch 3 , train time: 13.3 eval time: 9.2\n",
+      "at epoch 4\n",
+      "train info: logloss loss:1.4391646930149624\n",
+      "eval info: group_auc:0.5581, mean_mrr:0.1776, ndcg@10:0.2441, ndcg@5:0.1683\n",
+      "at epoch 4 , train time: 13.2 eval time: 8.9\n",
+      "at epoch 5\n",
+      "train info: logloss loss:1.3466039570010437\n",
+      "eval info: group_auc:0.5583, mean_mrr:0.1741, ndcg@10:0.2462, ndcg@5:0.1676\n",
+      "at epoch 5 , train time: 13.2 eval time: 9.0\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "<reco_utils.recommender.newsrec.models.npa.NPAModel at 0x7fbe6a26d278>"
+      ]
+     },
+     "execution_count": 8,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "model.fit(train_file, valid_file)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{'group_auc': 0.5583, 'mean_mrr': 0.1741, 'ndcg@5': 0.1676, 'ndcg@10': 0.2462}\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "/data/anaconda/envs/reco_gpu/lib/python3.6/site-packages/ipykernel_launcher.py:3: DeprecationWarning: Function record is deprecated and will be removed in verison 1.0.0 (current version 0.19.1). Please see `scrapbook.glue` (nteract-scrapbook) as a replacement for this functionality.\n",
+      "  This is separate from the ipykernel package so we can avoid doing imports until\n"
+     ]
+    },
+    {
+     "data": {
+      "application/papermill.record+json": {
+       "res_syn": {
+        "group_auc": 0.5583,
+        "mean_mrr": 0.1741,
+        "ndcg@10": 0.2462,
+        "ndcg@5": 0.1676
+       }
+      }
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "res_syn = model.run_eval(valid_file)\n",
+    "print(res_syn)\n",
+    "pm.record(\"res_syn\", res_syn)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Reference\n",
+    "\\[1\\] Chuhan Wu, Fangzhao Wu, Mingxiao An, Jianqiang Huang, Yongfeng Huang and Xing Xie: NPA: Neural News Recommendation with Personalized Attention, KDD 2019, ADS track.<br>"
+   ]
+  }
+ ],
+ "metadata": {
+  "celltoolbar": "Tags",
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.10"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/notebooks/00_quick_start/nrms_synthetic.ipynb b/notebooks/00_quick_start/nrms_synthetic.ipynb
new file mode 100644
index 0000000000..2508841df1
--- /dev/null
+++ b/notebooks/00_quick_start/nrms_synthetic.ipynb
@@ -0,0 +1,370 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<i>Copyright (c) Microsoft Corporation. All rights reserved.</i>\n",
+    "\n",
+    "<i>Licensed under the MIT License.</i>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# NRMS: Neural News Recommendation with Multi-Head Self-Attention\n",
+    "NRMS \\[1\\] is a neural news recommendation approach with multi-head selfattention. The core of NRMS is a news encoder and a user encoder. In the newsencoder, a multi-head self-attentions is used to learn news representations from news titles by modeling the interactions between words. In the user encoder, we learn representations of users from their browsed news and use multihead self-attention to capture the relatedness between the news. Besides, we apply additive\n",
+    "attention to learn more informative news and user representations by selecting important words and news.\n",
+    "\n",
+    "## Properties of NRMS:\n",
+    "- NRMS is a content-based neural news recommendation approach.\n",
+    "- It uses multi-self attention to learn news representations by modeling the iteractions between words and learn user representations by capturing the relationship between user browsed news.\n",
+    "- NRMS uses additive attentions to learn informative news and user representations by selecting important words and news.\n",
+    "\n",
+    "## Data format:\n",
+    "\n",
+    "### train data\n",
+    "One simple example: <br>\n",
+    "\n",
+    "`1 0 0 0 0 Impression:0 User:2903 CandidateNews0:27006,11901,21668,9856,16156,21390,1741,2003,16983,8164 CandidateNews1:8377,10423,9960,5485,20494,7553,1251,17232,4745,9178 CandidateNews2:1607,26414,25830,16156,15337,16461,4004,6230,17841,10704 CandidateNews3:17323,20324,27855,16156,2934,14673,551,0,0,0 CandidateNews4:7172,3596,25442,21596,26195,4745,17988,16461,1741,76 ClickedNews0:11362,8205,22501,9349,12911,20324,1238,11362,26422,19185 ...`\n",
+    "<br>\n",
+    "\n",
+    "In general, each line in data file represents one positive instance and n negative instances in a same impression. The format is like: <br>\n",
+    "\n",
+    "`[label0] ... [labeln] [Impression:i] [User:u] [CandidateNews0:w1,w2,w3,...] ... [CandidateNewsn:w1,w2,w3,...] [ClickedNews0:w1,w2,w3,...] ...`\n",
+    "\n",
+    "<br>\n",
+    "\n",
+    "It contains several parts seperated by space, i.e. label part, Impression part `<impresison id>`, User part `<user id>`, CandidateNews part, ClickedHistory part. CandidateNews part describes the target news article we are going to score in this instance, it is represented by (aligned) title words. To take a quick example, a news title may be : `Trump to deliver State of the Union address next week` , then the title words value may be `CandidateNewsi:34,45,334,23,12,987,3456,111,456,432`. <br>\n",
+    "ClickedNewsk describe the k-th news article the user ever clicked and the format is the same as candidate news. Words are aligned in news title. We use a fixed length to describe an article, if the title is less than the fixed length, just pad it with zeros.\n",
+    "\n",
+    "### test data\n",
+    "One simple example: <br>\n",
+    "`1 Impression:0 User:6446 CandidateNews0:18707,23848,13490,10948,21385,11606,1251,16591,827,28081 ClickedNews0:27838,7376,16567,28518,119,21248,7598,9349,20324,9349 ClickedNews1:7969,9783,1741,2549,27104,14669,14777,21343,7667,20324 ...`\n",
+    "<br>\n",
+    "\n",
+    "In general, each line in data file represents one instance. The format is like: <br>\n",
+    "\n",
+    "`[label] [Impression:i] [User:u] [CandidateNews0:w1,w2,w3,...] [ClickedNews0:w1,w2,w3,...] ...`\n",
+    "<br>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Global settings and imports"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {
+    "scrolled": false
+   },
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "/data/anaconda/envs/reco_gpu/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:523: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+      "  _np_qint8 = np.dtype([(\"qint8\", np.int8, 1)])\n",
+      "/data/anaconda/envs/reco_gpu/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:524: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+      "  _np_quint8 = np.dtype([(\"quint8\", np.uint8, 1)])\n",
+      "/data/anaconda/envs/reco_gpu/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+      "  _np_qint16 = np.dtype([(\"qint16\", np.int16, 1)])\n",
+      "/data/anaconda/envs/reco_gpu/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:526: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+      "  _np_quint16 = np.dtype([(\"quint16\", np.uint16, 1)])\n",
+      "/data/anaconda/envs/reco_gpu/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:527: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+      "  _np_qint32 = np.dtype([(\"qint32\", np.int32, 1)])\n",
+      "/data/anaconda/envs/reco_gpu/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:532: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+      "  np_resource = np.dtype([(\"resource\", np.ubyte, 1)])\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "System version: 3.6.10 |Anaconda, Inc.| (default, Mar 25 2020, 23:51:54) \n",
+      "[GCC 7.3.0]\n",
+      "Tensorflow version: 1.12.0\n"
+     ]
+    }
+   ],
+   "source": [
+    "import sys\n",
+    "sys.path.append(\"../../\")\n",
+    "from reco_utils.recommender.deeprec.deeprec_utils import download_deeprec_resources \n",
+    "from reco_utils.recommender.newsrec.newsrec_utils import prepare_hparams\n",
+    "from reco_utils.recommender.newsrec.models.nrms import NRMSModel\n",
+    "from reco_utils.recommender.newsrec.IO.news_iterator import NewsIterator\n",
+    "import papermill as pm\n",
+    "from tempfile import TemporaryDirectory\n",
+    "import tensorflow as tf\n",
+    "import os\n",
+    "\n",
+    "print(\"System version: {}\".format(sys.version))\n",
+    "print(\"Tensorflow version: {}\".format(tf.__version__))\n",
+    "\n",
+    "tmpdir = TemporaryDirectory()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Download and load data"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "100%|██████████| 21.2k/21.2k [00:01<00:00, 12.5kKB/s]\n"
+     ]
+    }
+   ],
+   "source": [
+    "data_path = tmpdir.name\n",
+    "yaml_file = os.path.join(data_path, r'nrms.yaml')\n",
+    "train_file = os.path.join(data_path, r'train.txt')\n",
+    "valid_file = os.path.join(data_path, r'test.txt')\n",
+    "wordEmb_file = os.path.join(data_path, r'embedding.npy')\n",
+    "if not os.path.exists(yaml_file):\n",
+    "    download_deeprec_resources(r'https://recodatasets.blob.core.windows.net/newsrec/', data_path, 'nrms.zip')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Create hyper-parameters"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {
+    "tags": [
+     "parameters"
+    ]
+   },
+   "outputs": [],
+   "source": [
+    "epochs=10\n",
+    "seed=42"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[('attention_hidden_dim', 200), ('batch_size', 64), ('body_size', None), ('cnn_activation', None), ('data_format', 'news'), ('dense_activation', None), ('doc_size', 10), ('dropout', 0.2), ('epochs', 10), ('filter_num', 200), ('gru_unit', 400), ('head_dim', 50), ('head_num', 3), ('his_size', 50), ('iterator_type', None), ('learning_rate', 0.0001), ('loss', 'cross_entropy_loss'), ('metrics', ['group_auc', 'mean_mrr', 'ndcg@5;10']), ('npratio', 4), ('optimizer', 'adam'), ('show_step', 100000), ('subvert_emb_dim', 100), ('subvert_num', None), ('title_size', None), ('type', 'ini'), ('user_emb_dim', 50), ('user_num', 10338), ('vert_emb_dim', 100), ('vert_num', None), ('window_size', 3), ('wordEmb_file', '/tmp/tmp0zkt9uou/embedding.npy'), ('word_emb_dim', 100), ('word_size', 28929)]\n"
+     ]
+    }
+   ],
+   "source": [
+    "hparams = prepare_hparams(yaml_file, wordEmb_file=wordEmb_file, epochs=epochs)\n",
+    "print(hparams)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "iterator = NewsIterator"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Train the NRMS model"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "model = NRMSModel(hparams, iterator, seed=seed)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{'group_auc': 0.4641, 'mean_mrr': 0.1571, 'ndcg@5': 0.154, 'ndcg@10': 0.201}\n"
+     ]
+    }
+   ],
+   "source": [
+    "print(model.run_eval(valid_file))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "/data/anaconda/envs/reco_gpu/lib/python3.6/site-packages/tensorflow/python/ops/gradients_impl.py:112: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.\n",
+      "  \"Converting sparse IndexedSlices to a dense Tensor of unknown shape. \"\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "at epoch 1\n",
+      "train info: logloss loss:1.617180992632496\n",
+      "eval info: group_auc:0.5564, mean_mrr:0.178, ndcg@10:0.2377, ndcg@5:0.1697\n",
+      "at epoch 1 , train time: 12.2 eval time: 8.5\n",
+      "at epoch 2\n",
+      "train info: logloss loss:1.5781418518144257\n",
+      "eval info: group_auc:0.5673, mean_mrr:0.1783, ndcg@10:0.2406, ndcg@5:0.1747\n",
+      "at epoch 2 , train time: 9.5 eval time: 8.5\n",
+      "at epoch 3\n",
+      "train info: logloss loss:1.561437344064518\n",
+      "eval info: group_auc:0.5739, mean_mrr:0.1806, ndcg@10:0.2433, ndcg@5:0.1781\n",
+      "at epoch 3 , train time: 9.5 eval time: 8.2\n",
+      "at epoch 4\n",
+      "train info: logloss loss:1.5440349160408486\n",
+      "eval info: group_auc:0.577, mean_mrr:0.1813, ndcg@10:0.2447, ndcg@5:0.182\n",
+      "at epoch 4 , train time: 9.7 eval time: 8.3\n",
+      "at epoch 5\n",
+      "train info: logloss loss:1.5250257662364415\n",
+      "eval info: group_auc:0.5764, mean_mrr:0.1834, ndcg@10:0.2502, ndcg@5:0.181\n",
+      "at epoch 5 , train time: 9.5 eval time: 8.4\n",
+      "at epoch 6\n",
+      "train info: logloss loss:1.5041815392825069\n",
+      "eval info: group_auc:0.5777, mean_mrr:0.1874, ndcg@10:0.2537, ndcg@5:0.1896\n",
+      "at epoch 6 , train time: 9.7 eval time: 8.2\n",
+      "at epoch 7\n",
+      "train info: logloss loss:1.4824992778349897\n",
+      "eval info: group_auc:0.5817, mean_mrr:0.1921, ndcg@10:0.2589, ndcg@5:0.1916\n",
+      "at epoch 7 , train time: 9.7 eval time: 8.2\n",
+      "at epoch 8\n",
+      "train info: logloss loss:1.4605322935143294\n",
+      "eval info: group_auc:0.5815, mean_mrr:0.1957, ndcg@10:0.2608, ndcg@5:0.1984\n",
+      "at epoch 8 , train time: 9.6 eval time: 8.4\n",
+      "at epoch 9\n",
+      "train info: logloss loss:1.4382321717787763\n",
+      "eval info: group_auc:0.5831, mean_mrr:0.2005, ndcg@10:0.2667, ndcg@5:0.1996\n",
+      "at epoch 9 , train time: 9.6 eval time: 8.4\n",
+      "at epoch 10\n",
+      "train info: logloss loss:1.4156336020450202\n",
+      "eval info: group_auc:0.5845, mean_mrr:0.202, ndcg@10:0.2655, ndcg@5:0.1977\n",
+      "at epoch 10 , train time: 9.5 eval time: 8.4\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "<reco_utils.recommender.newsrec.models.nrms.NRMSModel at 0x7ffa500bc240>"
+      ]
+     },
+     "execution_count": 8,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "model.fit(train_file, valid_file)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{'group_auc': 0.5845, 'mean_mrr': 0.202, 'ndcg@5': 0.1977, 'ndcg@10': 0.2655}\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "/data/anaconda/envs/reco_gpu/lib/python3.6/site-packages/ipykernel_launcher.py:3: DeprecationWarning: Function record is deprecated and will be removed in verison 1.0.0 (current version 0.19.1). Please see `scrapbook.glue` (nteract-scrapbook) as a replacement for this functionality.\n",
+      "  This is separate from the ipykernel package so we can avoid doing imports until\n"
+     ]
+    },
+    {
+     "data": {
+      "application/papermill.record+json": {
+       "res_syn": {
+        "group_auc": 0.5845,
+        "mean_mrr": 0.202,
+        "ndcg@10": 0.2655,
+        "ndcg@5": 0.1977
+       }
+      }
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "res_syn = model.run_eval(valid_file)\n",
+    "print(res_syn)\n",
+    "pm.record(\"res_syn\", res_syn)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Reference\n",
+    "\\[1\\] Wu et al. \"Neural News Recommendation with Multi-Head Self-Attention.\" in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)<br>"
+   ]
+  }
+ ],
+ "metadata": {
+  "celltoolbar": "Tags",
+  "kernelspec": {
+   "display_name": "Python (reco_gpu)",
+   "language": "python",
+   "name": "reco_gpu"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.10"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/reco_utils/README.md b/reco_utils/README.md
index a874ed8197..b7ac3e6077 100644
--- a/reco_utils/README.md
+++ b/reco_utils/README.md
@@ -58,6 +58,7 @@ The recommender submodule contains implementations of various algorithms that ca
 * FastAI
 * LightGBM
 * NCF
+* NewsRec (includes LSTUR, NAML NPA and NRMS)
 * RBM
 * RLRMC
 * SAR
diff --git a/reco_utils/recommender/deeprec/deeprec_utils.py b/reco_utils/recommender/deeprec/deeprec_utils.py
index e53f68b476..c7b3928803 100644
--- a/reco_utils/recommender/deeprec/deeprec_utils.py
+++ b/reco_utils/recommender/deeprec/deeprec_utils.py
@@ -466,10 +466,10 @@ def mrr_score(y_true, y_score):
     Returns:
         numpy.ndarray: mrr scores.
     """
-    order = np.argsort(y_score, axis=1)[:, ::-1]
+    order = np.argsort(y_score)[::-1]
     y_true = np.take(y_true, order)
-    rr_score = y_true / (np.arange(np.shape(y_true)[1]) + 1)
-    return np.sum(rr_score, axis=1) / np.sum(y_true, axis=1)
+    rr_score = y_true / (np.arange(len(y_true)) + 1)
+    return np.sum(rr_score) / np.sum(y_true)
 
 
 def ndcg_score(y_true, y_score, k=10):
@@ -513,12 +513,13 @@ def dcg_score(y_true, y_score, k=10):
     Returns:
         numpy.ndarray: dcg scores.
     """
-    k = min(np.shape(y_true)[1], k)
-    order = np.argsort(y_score, axis=1)[:, ::-1]
-    y_true = np.take(y_true, order[:, :k])
+    k = min(np.shape(y_true)[-1], k)
+    order = np.argsort(y_score)[::-1]
+    y_true = np.take(y_true, order[:k])
     gains = 2 ** y_true - 1
-    discounts = np.log2(np.arange(np.shape(y_true)[1]) + 2)
-    return np.sum(gains / discounts, axis=1)
+    discounts = np.log2(np.arange(len(y_true)) + 2)
+    return np.sum(gains / discounts)
+
 
 
 def cal_metric(labels, preds, metrics):
@@ -553,7 +554,12 @@ def cal_metric(labels, preds, metrics):
             f1 = f1_score(np.asarray(labels), pred)
             res["f1"] = round(f1, 4)
         elif metric == "mean_mrr":
-            mean_mrr = np.mean(mrr_score(labels, preds))
+            mean_mrr = np.mean(
+                [   
+                    mrr_score(each_labels, each_preds)
+                    for each_labels, each_preds in zip(labels, preds)
+                ]
+            )
             res["mean_mrr"] = round(mean_mrr, 4)
         elif metric.startswith("ndcg"):  # format like:  ndcg@2;4;6;8
             ndcg_list = [1, 2]
@@ -561,7 +567,12 @@ def cal_metric(labels, preds, metrics):
             if len(ks) > 1:
                 ndcg_list = [int(token) for token in ks[1].split(';')]
             for k in ndcg_list:
-                ndcg_temp = np.mean(ndcg_score(labels, preds, k))
+                ndcg_temp = np.mean(
+                    [   
+                        ndcg_score(each_labels, each_preds, k)
+                        for each_labels, each_preds in zip(labels, preds)
+                    ]
+                )
                 res["ndcg@{0}".format(k)] = round(ndcg_temp, 4)
         elif metric.startswith("hit"):  # format like:  hit@2;4;6;8
             hit_list = [1, 2]
diff --git a/reco_utils/recommender/newsrec/IO/__init__.py b/reco_utils/recommender/newsrec/IO/__init__.py
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/reco_utils/recommender/newsrec/IO/naml_iterator.py b/reco_utils/recommender/newsrec/IO/naml_iterator.py
new file mode 100644
index 0000000000..770fcebbee
--- /dev/null
+++ b/reco_utils/recommender/newsrec/IO/naml_iterator.py
@@ -0,0 +1,255 @@
+# Copyright (c) Microsoft Corporation. All rights reserved.
+# Licensed under the MIT License.
+
+import tensorflow as tf
+import numpy as np
+
+from reco_utils.recommender.deeprec.IO.iterator import BaseIterator
+
+__all__ = ["NAMLIterator"]
+
+
+class NAMLIterator(BaseIterator):
+    """Train data loader for NAML model.
+    The model require a special type of data format, where each instance contains a label, impresion id, user id,
+    the candidate news articles and user's clicked news article. Articles are represented by title words,
+    body words, verts and subverts. 
+
+    Iterator will not load the whole data into memory. Instead, it loads data into memory
+    per mini-batch, so that large files can be used as input data.
+
+    Attributes:
+        col_spliter (str): column spliter in one line.
+        ID_spliter (str): ID spliter in one line.
+        batch_size (int): the samples num in one batch.
+        doc_size (int): max word num in news title.
+        his_size (int): max clicked news num in user click history.
+    """
+
+    def __init__(self, hparams, npratio=0, col_spliter=" ", ID_spliter="%"):
+        """Initialize an iterator. Create necessary placeholders for the model.
+        
+        Args:
+            hparams (obj): Global hyper-parameters. Some key setttings such as head_num and head_dim are there.
+            graph (obj): the running graph. All created placeholder will be added to this graph.
+            col_spliter (str): column spliter in one line.
+            ID_spliter (str): ID spliter in one line.
+        """
+        self.col_spliter = col_spliter
+        self.ID_spliter = ID_spliter
+        self.batch_size = hparams.batch_size
+        self.doc_size = hparams.doc_size
+        self.his_size = hparams.his_size
+        self.npratio = npratio
+
+    def parser_one_line(self, line):
+        """Parse one string line into feature values.
+        
+        Args:
+            line (str): a string indicating one instance.
+
+        Returns:
+            list: Parsed results including label, impression id , user id, 
+            candidate_title_index, clicked_title_index, candidate_body_index,
+            clicked_body_index, candidate_vert_index, clicked_vert_index,
+            candidate_subvert_index, clicked_suvert_index.
+        """
+        words = line.strip().split(self.ID_spliter)
+
+        cols = words[0].strip().split(self.col_spliter)
+        label = [float(i) for i in cols[: self.npratio + 1]]
+        candidate_title_index = []
+        click_title_index = []
+        candidate_body_index = []
+        click_body_index = []
+        candidate_vert_index = []
+        click_vert_index = []
+        candidate_subvert_index = []
+        click_subvert_index = []
+        imp_index = []
+        user_index = []
+
+        for news in cols[self.npratio + 1 :]:
+            tokens = news.split(":")
+            if "Impression" in tokens[0]:
+                imp_index.append(int(tokens[1]))
+            elif "User" in tokens[0]:
+                user_index.append(int(tokens[1]))
+            elif "CandidateTitle" in tokens[0]:
+                # word index start by 0
+                candidate_title_index.append([int(i) for i in tokens[1].split(",")])
+            elif "ClickedTitle" in tokens[0]:
+                click_title_index.append([int(i) for i in tokens[1].split(",")])
+            elif "CandidateBody" in tokens[0]:
+                candidate_body_index.append([int(i) for i in tokens[1].split(",")])
+            elif "ClickedBody" in tokens[0]:
+                click_body_index.append([int(i) for i in tokens[1].split(",")])
+            elif "CandidateVert" in tokens[0]:
+                candidate_vert_index.append([int(tokens[1])])
+            elif "ClickedVert" in tokens[0]:
+                click_vert_index.append([int(tokens[1])])
+            elif "CandidateSubvert" in tokens[0]:
+                candidate_subvert_index.append([int(tokens[1])])
+            elif "ClickedSubvert" in tokens[0]:
+                click_subvert_index.append([int(tokens[1])])
+            else:
+                print(tokens[0])
+                raise ValueError("data format is wrong")
+
+        return (
+            label,
+            imp_index,
+            user_index,
+            candidate_title_index,
+            click_title_index,
+            candidate_body_index,
+            click_body_index,
+            candidate_vert_index,
+            click_vert_index,
+            candidate_subvert_index,
+            click_subvert_index,
+        )
+
+    def load_data_from_file(self, infile):
+        """Read and parse data from a file.
+        
+        Args:
+            infile (str): text input file. Each line in this file is an instance.
+
+        Returns:
+            obj: An iterator that will yields parsed results, in the format of graph feed_dict.
+        """
+        label_list = []
+        imp_indexes = []
+        user_indexes = []
+        candidate_title_indexes = []
+        click_title_indexes = []
+        candidate_body_indexes = []
+        click_body_indexes = []
+        candidate_vert_indexes = []
+        click_vert_indexes = []
+        candidate_subvert_indexes = []
+        click_subvert_indexes = []
+        cnt = 0
+
+        with tf.gfile.GFile(infile, "r") as rd:
+            while True:
+                line = rd.readline()
+                if not line:
+                    break
+
+                (
+                    label,
+                    imp_index,
+                    user_index,
+                    candidate_title_index,
+                    click_title_index,
+                    candidate_body_index,
+                    click_body_index,
+                    candidate_vert_index,
+                    click_vert_index,
+                    candidate_subvert_index,
+                    click_subvert_index,
+                ) = self.parser_one_line(line)
+
+                candidate_title_indexes.append(candidate_title_index)
+                click_title_indexes.append(click_title_index)
+                candidate_body_indexes.append(candidate_body_index)
+                click_body_indexes.append(click_body_index)
+                candidate_vert_indexes.append(candidate_vert_index)
+                click_vert_indexes.append(click_vert_index)
+                candidate_subvert_indexes.append(candidate_subvert_index)
+                click_subvert_indexes.append(click_subvert_index)
+                imp_indexes.append(imp_index)
+                user_indexes.append(user_index)
+                label_list.append(label)
+
+                cnt += 1
+                if cnt >= self.batch_size:
+                    yield self._convert_data(
+                        label_list,
+                        imp_indexes,
+                        user_indexes,
+                        candidate_title_indexes,
+                        click_title_indexes,
+                        candidate_body_indexes,
+                        click_body_indexes,
+                        candidate_vert_indexes,
+                        click_vert_indexes,
+                        candidate_subvert_indexes,
+                        click_subvert_indexes,
+                    )
+                    label_list = []
+                    imp_indexes = []
+                    user_indexes = []
+                    candidate_title_indexes = []
+                    click_title_indexes = []
+                    candidate_body_indexes = []
+                    click_body_indexes = []
+                    candidate_vert_indexes = []
+                    click_vert_indexes = []
+                    candidate_subvert_indexes = []
+                    click_subvert_indexes = []
+                    cnt = 0
+
+    def _convert_data(
+        self,
+        label_list,
+        imp_indexes,
+        user_indexes,
+        candidate_title_indexes,
+        click_title_indexes,
+        candidate_body_indexes,
+        click_body_indexes,
+        candidate_vert_indexes,
+        click_vert_indexes,
+        candidate_subvert_indexes,
+        click_subvert_indexes,
+    ):
+        """Convert data into numpy arrays that are good for further model operation.
+        
+        Args:
+            label_list (list): a list of ground-truth labels.
+            imp_indexes (list): a list of impression indexes.
+            user_indexes (list): a list of user indexes.
+            candidate_title_indexes (list): the candidate news titles' words indices.
+            click_title_indexes (list): words indices for user's clicked news titles.
+            candidate_body_indexes (list): the candidate news bodies' words indices.
+            click_body_indexes (list): words indices for user's clicked news bodies.
+            candidate_vert_indexes (list): the candidate news vert indexes.
+            click_vert_indexes (list): vert indexes for user's clicked news.
+            candidate_subvert_indexes (list): the candidate news subvert indexes.
+            click_subvert_indexes (list): subvert indexes for user's clicked news.
+            
+        Returns:
+            dict: A dictionary, contains multiple numpy arrays that are convenient for further operation.
+        """
+
+        labels = np.asarray(label_list, dtype=np.float32)
+        imp_indexes = np.asarray(imp_indexes, dtype=np.int32)
+        user_indexes = np.asarray(user_indexes, dtype=np.int32)
+        candidate_title_index_batch = np.asarray(
+            candidate_title_indexes, dtype=np.int64
+        )
+        click_title_index_batch = np.asarray(click_title_indexes, dtype=np.int64)
+        candidate_body_index_batch = np.asarray(candidate_body_indexes, dtype=np.int64)
+        click_body_index_batch = np.asarray(click_body_indexes, dtype=np.int64)
+        candidate_vert_index_batch = np.asarray(candidate_vert_indexes, dtype=np.int64)
+        click_vert_index_batch = np.asarray(click_vert_indexes, dtype=np.int64)
+        candidate_subvert_index_batch = np.asarray(
+            candidate_subvert_indexes, dtype=np.int64
+        )
+        click_subvert_index_batch = np.asarray(click_subvert_indexes, dtype=np.int64)
+        return {
+            "impression_index_batch": imp_indexes,
+            "user_index_batch": user_indexes,
+            "clicked_title_batch": click_title_index_batch,
+            "clicked_body_batch": click_body_index_batch,
+            "clicked_vert_batch": click_vert_index_batch,
+            "clicked_subvert_batch": click_subvert_index_batch,
+            "candidate_title_batch": candidate_title_index_batch,
+            "candidate_body_batch": candidate_body_index_batch,
+            "candidate_vert_batch": candidate_vert_index_batch,
+            "candidate_subvert_batch": candidate_subvert_index_batch,
+            "labels": labels,
+        }
diff --git a/reco_utils/recommender/newsrec/IO/news_iterator.py b/reco_utils/recommender/newsrec/IO/news_iterator.py
new file mode 100644
index 0000000000..82f76bc98a
--- /dev/null
+++ b/reco_utils/recommender/newsrec/IO/news_iterator.py
@@ -0,0 +1,160 @@
+# Copyright (c) Microsoft Corporation. All rights reserved.
+# Licensed under the MIT License.
+
+import tensorflow as tf
+import numpy as np
+
+from reco_utils.recommender.deeprec.IO.iterator import BaseIterator
+
+__all__ = ["NewsIterator"]
+
+
+class NewsIterator(BaseIterator):
+    """Train data loader for the NRMS NPA LSTUR model.
+    Those model require a special type of data format, where each instance contains a label, impresion id, user id,
+    the candidate news articlesand user's clicked news article. Articles are represented by title words. 
+
+    Iterator will not load the whole data into memory. Instead, it loads data into memory
+    per mini-batch, so that large files can be used as input data.
+
+    Attributes:
+        col_spliter (str): column spliter in one line.
+        ID_spliter (str): ID spliter in one line.
+        batch_size (int): the samples num in one batch.
+        doc_size (int): max word num in news title.
+        his_size (int): max clicked news num in user click history.
+    """
+
+    def __init__(self, hparams, npratio=0, col_spliter=" ", ID_spliter="%"):
+        """Initialize an iterator. Create necessary placeholders for the model.
+        
+        Args:
+            hparams (obj): Global hyper-parameters. Some key setttings such as head_num and head_dim are there.
+            graph (obj): the running graph. All created placeholder will be added to this graph.
+            col_spliter (str): column spliter in one line.
+            ID_spliter (str): ID spliter in one line.
+        """
+        self.col_spliter = col_spliter
+        self.ID_spliter = ID_spliter
+        self.batch_size = hparams.batch_size
+        self.doc_size = hparams.doc_size
+        self.his_size = hparams.his_size
+        self.npratio = npratio
+
+    def parser_one_line(self, line):
+        """Parse one string line into feature values.
+        
+        Args:
+            line (str): a string indicating one instance
+
+        Returns:
+            list: Parsed results including label, impression id , user id, 
+            candidate_news_index, clicked_news_index.
+        """
+        words = line.strip().split(self.ID_spliter)
+
+        cols = words[0].strip().split(self.col_spliter)
+        label = [float(i) for i in cols[: self.npratio + 1]]
+        candidate_news_index = []
+        click_news_index = []
+        imp_index = []
+        user_index = []
+
+        for news in cols[self.npratio + 1 :]:
+            tokens = news.split(":")
+            if "Impression" in tokens[0]:
+                imp_index.append(int(tokens[1]))
+            elif "User" in tokens[0]:
+                user_index.append(int(tokens[1]))
+            elif "CandidateNews" in tokens[0]:
+                # word index start by 0
+                candidate_news_index.append([int(i) for i in tokens[1].split(",")])
+            elif "ClickedNews" in tokens[0]:
+                click_news_index.append([int(i) for i in tokens[1].split(",")])
+            else:
+                raise ValueError("data format is wrong")
+
+        return (label, imp_index, user_index, candidate_news_index, click_news_index)
+
+    def load_data_from_file(self, infile):
+        """Read and parse data from a file.
+        
+        Args:
+            infile (str): text input file. Each line in this file is an instance.
+
+        Returns:
+            obj: An iterator that will yields parsed results, in the format of graph feed_dict.
+        """
+        label_list = []
+        imp_indexes = []
+        user_indexes = []
+        candidate_news_indexes = []
+        click_news_indexes = []
+        cnt = 0
+
+        with tf.gfile.GFile(infile, "r") as rd:
+            for line in rd:
+
+                (
+                    label,
+                    imp_index,
+                    user_index,
+                    candidate_news_index,
+                    click_news_index,
+                ) = self.parser_one_line(line)
+
+                candidate_news_indexes.append(candidate_news_index)
+                click_news_indexes.append(click_news_index)
+                imp_indexes.append(imp_index)
+                user_indexes.append(user_index)
+                label_list.append(label)
+
+                cnt += 1
+                if cnt >= self.batch_size:
+                    yield self._convert_data(
+                        label_list,
+                        imp_indexes,
+                        user_indexes,
+                        candidate_news_indexes,
+                        click_news_indexes,
+                    )
+                    candidate_news_indexes = []
+                    click_news_indexes = []
+                    label_list = []
+                    imp_indexes = []
+                    user_indexes = []
+                    cnt = 0
+
+    def _convert_data(
+        self,
+        label_list,
+        imp_indexes,
+        user_indexes,
+        candidate_news_indexes,
+        click_news_indexes,
+    ):
+        """Convert data into numpy arrays that are good for further model operation.
+        
+        Args:
+            label_list (list): a list of ground-truth labels.
+            imp_indexes (list): a list of impression indexes.
+            user_indexes (list): a list of user indexes.
+            candidate_news_indexes (list): the candidate news article's words indices
+            click_news_indexes (list): words indices for user's clicked news articles
+            
+        Returns:
+            dict: A dictionary, contains multiple numpy arrays that are convenient for further operation.
+        """
+
+        labels = np.asarray(label_list, dtype=np.float32)
+        imp_indexes = np.asarray(imp_indexes, dtype=np.int32)
+        user_indexes = np.asarray(user_indexes, dtype=np.int32)
+        candidate_news_index_batch = np.asarray(candidate_news_indexes, dtype=np.int64)
+        click_news_index_batch = np.asarray(click_news_indexes, dtype=np.int64)
+        return {
+            "impression_index_batch": imp_indexes,
+            "user_index_batch": user_indexes,
+            "clicked_news_batch": click_news_index_batch,
+            "candidate_news_batch": candidate_news_index_batch,
+            "labels": labels,
+        }
diff --git a/reco_utils/recommender/newsrec/__init__.py b/reco_utils/recommender/newsrec/__init__.py
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/reco_utils/recommender/newsrec/models/__init__.py b/reco_utils/recommender/newsrec/models/__init__.py
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/reco_utils/recommender/newsrec/models/base_model.py b/reco_utils/recommender/newsrec/models/base_model.py
new file mode 100644
index 0000000000..07f20abbc0
--- /dev/null
+++ b/reco_utils/recommender/newsrec/models/base_model.py
@@ -0,0 +1,285 @@
+# Copyright (c) Microsoft Corporation. All rights reserved.
+# Licensed under the MIT License.
+
+from os.path import join
+import abc
+import time
+import numpy as np
+import tensorflow as tf
+from tensorflow import keras
+from reco_utils.recommender.deeprec.deeprec_utils import cal_metric
+
+
+__all__ = ["BaseModel"]
+
+
+class BaseModel:
+    """Basic class of models
+
+    Attributes:
+        hparams (obj): A tf.contrib.training.HParams object, hold the entire set of hyperparameters.
+        iterator_creator_train (obj): An iterator to load the data in trainning steps.
+        iterator_creator_train (obj): An iterator to load the data in testing steps.
+        graph (obj): An optional graph.
+        seed (int): Random seed.
+    """
+
+    def __init__(self, hparams, iterator_creator, seed=None):
+        """Initializing the model. Create common logics which are needed by all deeprec models, such as loss function, 
+        parameter set.
+
+        Args:
+            hparams (obj): A tf.contrib.training.HParams object, hold the entire set of hyperparameters.
+            iterator_creator_train (obj): An iterator to load the data in trainning steps.
+            iterator_creator_train (obj): An iterator to load the data in testing steps.
+            graph (obj): An optional graph.
+            seed (int): Random seed.
+        """
+        self.seed = seed
+        tf.set_random_seed(seed)
+        np.random.seed(seed)
+
+        self.train_iterator = iterator_creator(hparams, hparams.npratio)
+        self.test_iterator = iterator_creator(hparams, 0)
+
+        self.hparams = hparams
+        self.model, self.scorer = self._build_graph()
+
+        self.loss = self._get_loss()
+        self.train_optimizer = self._get_opt()
+
+        self.model.compile(loss=self.loss, optimizer=self.train_optimizer)
+
+        # set GPU use with demand growth
+        gpu_options = tf.GPUOptions(allow_growth=True)
+
+    @abc.abstractmethod
+    def _build_graph(self):
+        """Subclass will implement this."""
+        pass
+
+    @abc.abstractmethod
+    def _get_input_label_from_iter(self, batch_data):
+        """Subclass will implement this"""
+        pass
+
+    def _get_loss(self):
+        """Make loss function, consists of data loss and regularization loss
+        
+        Returns:
+            obj: Loss function or loss function name
+        """
+        if self.hparams.loss == "cross_entropy_loss":
+            data_loss = "categorical_crossentropy"
+        elif self.hparams.loss == "log_loss":
+            data_loss = "binary_crossentropy"
+        else:
+            raise ValueError("this loss not defined {0}".format(self.hparams.loss))
+        return data_loss
+
+    def _get_opt(self):
+        """Get the optimizer according to configuration. Usually we will use Adam.
+        Returns:
+            obj: An optimizer.
+        """
+        lr = self.hparams.learning_rate
+        optimizer = self.hparams.optimizer
+
+        if optimizer == "adam":
+            train_opt = keras.optimizers.Adam(lr=lr)
+
+        return train_opt
+
+    def _get_pred(self, logit, task):
+        """Make final output as prediction score, according to different tasks.
+        
+        Args:
+            logit (obj): Base prediction value.
+            task (str): A task (values: regression/classification)
+        
+        Returns:
+            obj: Transformed score
+        """
+        if task == "regression":
+            pred = tf.identity(logit)
+        elif task == "classification":
+            pred = tf.sigmoid(logit)
+        else:
+            raise ValueError(
+                "method must be regression or classification, but now is {0}".format(
+                    task
+                )
+            )
+        return pred
+
+    def train(self, train_batch_data):
+        """Go through the optimization step once with training data in feed_dict.
+
+        Args:
+            sess (obj): The model session object.
+            feed_dict (dict): Feed values to train the model. This is a dictionary that maps graph elements to values.
+
+        Returns:
+            list: A list of values, including update operation, total loss, data loss, and merged summary.
+        """
+        train_input, train_label = self._get_input_label_from_iter(train_batch_data)
+        rslt = self.model.train_on_batch(train_input, train_label)
+        return rslt
+
+    def eval(self, eval_batch_data):
+        """Evaluate the data in feed_dict with current model.
+
+        Args:
+            sess (obj): The model session object.
+            feed_dict (dict): Feed values for evaluation. This is a dictionary that maps graph elements to values.
+
+        Returns:
+            list: A list of evaluated results, including total loss value, data loss value,
+                predicted scores, and ground-truth labels.
+        """
+        eval_input, eval_label = self._get_input_label_from_iter(eval_batch_data)
+        imp_index = eval_input[0]
+        pred_rslt = self.scorer.predict_on_batch(eval_input)
+
+        return pred_rslt, eval_label, imp_index
+
+    def fit(self, train_file, valid_file, test_file=None):
+        """Fit the model with train_file. Evaluate the model on valid_file per epoch to observe the training status.
+        If test_file is not None, evaluate it too.
+        
+        Args:
+            train_file (str): training data set.
+            valid_file (str): validation set.
+            test_file (str): test set.
+
+        Returns:
+            obj: An instance of self.
+        """
+
+        for epoch in range(1, self.hparams.epochs + 1):
+            step = 0
+            self.hparams.current_epoch = epoch
+            epoch_loss = 0
+            train_start = time.time()
+
+            for batch_data_input in self.train_iterator.load_data_from_file(train_file):
+                step_result = self.train(batch_data_input)
+                step_data_loss = step_result
+
+                epoch_loss += step_data_loss
+                step += 1
+                if step % self.hparams.show_step == 0:
+                    print(
+                        "step {0:d} , total_loss: {1:.4f}, data_loss: {2:.4f}".format(
+                            step, epoch_loss, step_data_loss
+                        )
+                    )
+
+            train_end = time.time()
+            train_time = train_end - train_start
+
+            eval_start = time.time()
+
+            train_info = ",".join(
+                [
+                    str(item[0]) + ":" + str(item[1])
+                    for item in [("logloss loss", epoch_loss / step)]
+                ]
+            )
+
+            eval_res = self.run_eval(valid_file)
+            eval_info = ", ".join(
+                [
+                    str(item[0]) + ":" + str(item[1])
+                    for item in sorted(eval_res.items(), key=lambda x: x[0])
+                ]
+            )
+            if test_file is not None:
+                test_res = self.run_eval(test_file)
+                test_info = ", ".join(
+                    [
+                        str(item[0]) + ":" + str(item[1])
+                        for item in sorted(test_res.items(), key=lambda x: x[0])
+                    ]
+                )
+            eval_end = time.time()
+            eval_time = eval_end - eval_start
+
+            if test_file is not None:
+                print(
+                    "at epoch {0:d}".format(epoch)
+                    + "\ntrain info: "
+                    + train_info
+                    + "\neval info: "
+                    + eval_info
+                    + "\ntest info: "
+                    + test_info
+                )
+            else:
+                print(
+                    "at epoch {0:d}".format(epoch)
+                    + "\ntrain info: "
+                    + train_info
+                    + "\neval info: "
+                    + eval_info
+                )
+            print(
+                "at epoch {0:d} , train time: {1:.1f} eval time: {2:.1f}".format(
+                    epoch, train_time, eval_time
+                )
+            )
+
+        return self
+
+    def group_labels(self, labels, preds, group_keys):
+        """Devide labels and preds into several group according to values in group keys.
+
+        Args:
+            labels (list): ground truth label list.
+            preds (list): prediction score list.
+            group_keys (list): group key list.
+
+        Returns:
+            all_labels: labels after group.
+            all_preds: preds after group.
+
+        """
+
+        all_keys = list(set(group_keys))
+        group_labels = {k: [] for k in all_keys}
+        group_preds = {k: [] for k in all_keys}
+
+        for l, p, k in zip(labels, preds, group_keys):
+            group_labels[k].append(l)
+            group_preds[k].append(p)
+
+        all_labels = []
+        all_preds = []
+        for k in all_keys:
+            all_labels.append(group_labels[k])
+            all_preds.append(group_preds[k])
+
+        return all_labels, all_preds
+
+    def run_eval(self, filename):
+        """Evaluate the given file and returns some evaluation metrics.
+        
+        Args:
+            filename (str): A file name that will be evaluated.
+
+        Returns:
+            dict: A dictionary contains evaluation metrics.
+        """
+        preds = []
+        labels = []
+        imp_indexes = []
+
+        for batch_data_input in self.test_iterator.load_data_from_file(filename):
+            step_pred, step_labels, step_imp_index = self.eval(batch_data_input)
+            preds.extend(np.reshape(step_pred, -1))
+            labels.extend(np.reshape(step_labels, -1))
+            imp_indexes.extend(np.reshape(step_imp_index, -1))
+
+        group_labels, group_preds = self.group_labels(labels, preds, imp_indexes)
+        res = cal_metric(group_labels, group_preds, self.hparams.metrics)
+        return res
diff --git a/reco_utils/recommender/newsrec/models/layers.py b/reco_utils/recommender/newsrec/models/layers.py
new file mode 100644
index 0000000000..73084288f1
--- /dev/null
+++ b/reco_utils/recommender/newsrec/models/layers.py
@@ -0,0 +1,338 @@
+# Copyright (c) Microsoft Corporation. All rights reserved.
+# Licensed under the MIT License.
+
+import tensorflow as tf
+import tensorflow.keras as keras
+from tensorflow.keras import layers
+from tensorflow.keras import backend as K
+
+
+class AttLayer2(layers.Layer):
+    """Soft alignment attention implement.
+
+    Attributes:
+        dim (int): attention hidden dim
+    """
+
+    def __init__(self, dim=200, seed=0, **kwargs):
+        """Initialization steps for AttLayer2.
+        
+        Args:
+            dim (int): attention hidden dim
+        """
+
+        self.dim = dim
+        self.seed = seed
+        super(AttLayer2, self).__init__(**kwargs)
+
+    def build(self, input_shape):
+        """Initialization for variables in AttLayer2
+        There are there variables in AttLayer2, i.e. W, b and q.
+
+        Args:
+            input_shape (obj): shape of input tensor.
+        """
+
+        assert len(input_shape) == 3
+        dim = self.dim
+        self.W = self.add_weight(
+            name="W",
+            shape=(int(input_shape[-1]), dim),
+            initializer=keras.initializers.glorot_uniform(seed=self.seed),
+            trainable=True,
+        )
+        self.b = self.add_weight(
+            name="b",
+            shape=(dim,),
+            initializer=keras.initializers.Zeros(),
+            trainable=True,
+        )
+        self.q = self.add_weight(
+            name="q",
+            shape=(dim, 1),
+            initializer=keras.initializers.glorot_uniform(seed=self.seed),
+            trainable=True,
+        )
+        super(AttLayer2, self).build(input_shape)  # be sure you call this somewhere!
+
+    def call(self, inputs, mask=None, **kwargs):
+        """Core implemention of soft attention
+
+        Args:
+            inputs (obj): input tensor.
+
+        Returns:
+            obj: weighted sum of input tensors.
+        """
+
+        attention = K.tanh(K.dot(inputs, self.W) + self.b)
+        attention = K.dot(attention, self.q)
+
+        attention = K.squeeze(attention, axis=2)
+        attention = K.exp(attention)
+        attention_weight = attention / (
+            K.sum(attention, axis=-1, keepdims=True) + K.epsilon()
+        )
+
+        attention_weight = K.expand_dims(attention_weight)
+        weighted_input = inputs * attention_weight
+        return K.sum(weighted_input, axis=1)
+
+    def compute_mask(self, input, input_mask=None):
+        """Compte output mask value
+
+        Args: 
+            input (obj): input tensor.
+            input_mask: input mask
+        
+        Returns:
+            obj: output mask.
+        """
+        return None
+
+    def compute_output_shape(self, input_shape):
+        """Compute shape of output tensor
+
+        Args:
+            input_shape (tuple): shape of input tensor.
+        
+        Returns:
+            tuple: shape of output tensor.
+        """
+        return input_shape[0], input_shape[-1]
+
+
+class SelfAttention(layers.Layer):
+    """Multi-head self attention implement.
+
+    Args:
+        multiheads (int): The number of heads.
+        head_dim(obj): Dimention of each head.
+        mask_right(boolean): whether to mask right words.
+
+    Returns:
+        obj: Weighted sum after attention.
+    """
+
+    def __init__(self, multiheads, head_dim, seed=0, mask_right=False, **kwargs):
+        """Initialization steps for AttLayer2.
+        
+        Args:
+            multiheads (int): The number of heads.
+            head_dim(obj): Dimention of each head.
+            mask_right(boolean): whether to mask right words.
+        """
+
+        self.multiheads = multiheads
+        self.head_dim = head_dim
+        self.output_dim = multiheads * head_dim
+        self.mask_right = mask_right
+        self.seed = seed
+        super(SelfAttention, self).__init__(**kwargs)
+
+    def compute_output_shape(self, input_shape):
+        """Compute shape of output tensor.
+
+        Returns:
+            tuple: output shape tuple.
+        """
+
+        return (input_shape[0][0], input_shape[0][1], self.output_dim)
+
+    def build(self, input_shape):
+        """Initialization for variables in SelfAttention.
+        There are three variables in SelfAttention, i.e. WQ, WK ans WV.
+        WQ is used for linear transformation of query.
+        WK is used for linear transformation of key.
+        WV is used for linear transformation of value.
+
+        Args:
+            input_shape (obj): shape of input tensor.
+        """
+
+        self.WQ = self.add_weight(
+            name="WQ",
+            shape=(int(input_shape[0][-1]), self.output_dim),
+            initializer=keras.initializers.glorot_uniform(seed=self.seed),
+            trainable=True,
+        )
+        self.WK = self.add_weight(
+            name="WK",
+            shape=(int(input_shape[1][-1]), self.output_dim),
+            initializer=keras.initializers.glorot_uniform(seed=self.seed),
+            trainable=True,
+        )
+        self.WV = self.add_weight(
+            name="WV",
+            shape=(int(input_shape[2][-1]), self.output_dim),
+            initializer=keras.initializers.glorot_uniform(seed=self.seed),
+            trainable=True,
+        )
+        super(SelfAttention, self).build(input_shape)
+
+    def Mask(self, inputs, seq_len, mode="add"):
+        """Mask operation used in multi-head self attention
+
+        Args:
+            seq_len (obj): sequence length of inputs.
+            mode (str): mode of mask.
+        
+        Returns:
+            obj: tensors after masking.
+        """
+
+        if seq_len == None:
+            return inputs
+        else:
+            mask = K.one_hot(indices=seq_len[:, 0], num_classes=K.shape(inputs)[1])
+            mask = 1 - K.cumsum(mask, axis=1)
+
+            for _ in range(len(inputs.shape) - 2):
+                mask = K.expand_dims(mask, 2)
+
+            if mode == "mul":
+                return inputs * mask
+            elif mode == "add":
+                return inputs - (1 - mask) * 1e12
+
+    def call(self, QKVs):
+        """Core logic of multi-head self attention.
+
+        Args:
+            QKVs (list): inputs of multi-head self attention i.e. qeury, key and value.
+
+        Returns:
+            obj: ouput tensors.
+        """
+        if len(QKVs) == 3:
+            Q_seq, K_seq, V_seq = QKVs
+            Q_len, V_len = None, None
+        elif len(QKVs) == 5:
+            Q_seq, K_seq, V_seq, Q_len, V_len = QKVs
+        Q_seq = K.dot(Q_seq, self.WQ)
+        Q_seq = K.reshape(
+            Q_seq, shape=(-1, K.shape(Q_seq)[1], self.multiheads, self.head_dim)
+        )
+        Q_seq = K.permute_dimensions(Q_seq, pattern=(0, 2, 1, 3))
+
+        K_seq = K.dot(K_seq, self.WK)
+        K_seq = K.reshape(
+            K_seq, shape=(-1, K.shape(K_seq)[1], self.multiheads, self.head_dim)
+        )
+        K_seq = K.permute_dimensions(K_seq, pattern=(0, 2, 1, 3))
+
+        V_seq = K.dot(V_seq, self.WV)
+        V_seq = K.reshape(
+            V_seq, shape=(-1, K.shape(V_seq)[1], self.multiheads, self.head_dim)
+        )
+        V_seq = K.permute_dimensions(V_seq, pattern=(0, 2, 1, 3))
+
+        A = K.batch_dot(Q_seq, K_seq, axes=[3, 3]) / K.sqrt(
+            K.cast(self.head_dim, dtype="float32")
+        )
+        A = K.permute_dimensions(
+            A, pattern=(0, 3, 2, 1)
+        )  # A.shape=[batch_size,K_sequence_length,Q_sequence_length,self.multiheads]
+
+        A = self.Mask(A, V_len, "add")
+        A = K.permute_dimensions(A, pattern=(0, 3, 2, 1))
+
+        if self.mask_right:
+            ones = K.ones_like(A[:1, :1])
+            lower_triangular = K.tf.matrix_band_part(ones, num_lower=-1, num_upper=0)
+            mask = (ones - lower_triangular) * 1e12
+            A = A - mask
+        A = K.softmax(A)
+
+        O_seq = K.batch_dot(A, V_seq, axes=[3, 2])
+        O_seq = K.permute_dimensions(O_seq, pattern=(0, 2, 1, 3))
+
+        O_seq = K.reshape(O_seq, shape=(-1, K.shape(O_seq)[1], self.output_dim))
+        O_seq = self.Mask(O_seq, Q_len, "mul")
+        return O_seq
+
+    def get_config(self):
+        """ add multiheads, multiheads and mask_right into layer config.
+
+        Returns:
+            dict: config of SelfAttention layer.  
+        """
+        config = super(SelfAttention, self).get_config()
+        config.update(
+            {
+                "multiheads": self.multiheads,
+                "head_dim": self.multiheads,
+                "mask_right": self.mask_right,
+            }
+        )
+        return config
+
+
+def PersonalizedAttentivePooling(dim1, dim2, dim3, seed=0):
+    """Soft alignment attention implement.
+
+    Attributes:
+        dim1 (int): first dimention of value shape.
+        dim2 (int): second dimention of value shape.
+        dim3 (int): shape of query
+    
+    Returns:
+        weighted summary of inputs value.
+    """
+    vecs_input = keras.Input(shape=(dim1, dim2), dtype="float32")
+    query_input = keras.Input(shape=(dim3,), dtype="float32")
+
+    user_vecs = layers.Dropout(0.2)(vecs_input)
+    user_att = layers.Dense(
+        dim3,
+        activation="tanh",
+        kernel_initializer=keras.initializers.glorot_uniform(seed=seed),
+        bias_initializer=keras.initializers.Zeros(),
+    )(user_vecs)
+    user_att2 = layers.Dot(axes=-1)([query_input, user_att])
+    user_att2 = layers.Activation("softmax")(user_att2)
+    user_vec = layers.Dot((1, 1))([user_vecs, user_att2])
+
+    model = keras.Model([vecs_input, query_input], user_vec)
+    return model
+
+
+class ComputeMasking(layers.Layer):
+    """Compute if inputs contains zero value.
+
+    Returns:
+        bool tensor: True for values not equal to zero.
+    """
+
+    def __init__(self, **kwargs):
+        super(ComputeMasking, self).__init__(**kwargs)
+
+    def call(self, inputs, **kwargs):
+        mask = K.not_equal(inputs, 0)
+        return K.cast(mask, K.floatx())
+
+    def compute_output_shape(self, input_shape):
+        return input_shape
+
+
+class OverwriteMasking(layers.Layer):
+    """Set values at spasific positions to zero.
+
+    Args:
+        inputs (list): value tensor and mask tensor.
+    
+    Returns:
+        obj: tensor after setting values to zero.
+    """
+
+    def __init__(self, **kwargs):
+        super(OverwriteMasking, self).__init__(**kwargs)
+
+    def build(self, input_shape):
+        super(OverwriteMasking, self).build(input_shape)
+
+    def call(self, inputs, **kwargs):
+        return inputs[0] * K.expand_dims(inputs[1])
+
+    def compute_output_shape(self, input_shape):
+        return input_shape[0]
diff --git a/reco_utils/recommender/newsrec/models/lstur.py b/reco_utils/recommender/newsrec/models/lstur.py
new file mode 100644
index 0000000000..7f6f045860
--- /dev/null
+++ b/reco_utils/recommender/newsrec/models/lstur.py
@@ -0,0 +1,221 @@
+# Copyright (c) Microsoft Corporation. All rights reserved.
+# Licensed under the MIT License.
+
+import numpy as np
+import tensorflow as tf
+import tensorflow.keras as keras
+from tensorflow.keras import layers
+
+
+from reco_utils.recommender.newsrec.models.base_model import BaseModel
+from reco_utils.recommender.newsrec.models.layers import (
+    AttLayer2,
+    ComputeMasking,
+    OverwriteMasking,
+)
+
+__all__ = ["LSTURModel"]
+
+
+class LSTURModel(BaseModel):
+    """LSTUR model(Neural News Recommendation with Multi-Head Self-Attention)
+
+    Mingxiao An, Fangzhao Wu, Chuhan Wu, Kun Zhang, Zheng Liu and Xing Xie: 
+    Neural News Recommendation with Long- and Short-term User Representations, ACL 2019
+
+    Attributes:
+        word2vec_embedding (numpy.array): Pretrained word embedding matrix.
+        hparam (obj): Global hyper-parameters.
+    """
+
+    def __init__(self, hparams, iterator_creator, seed=None):
+        """Initialization steps for LSTUR.
+        Compared with the BaseModel, LSTUR need word embedding.
+        After creating word embedding matrix, BaseModel's __init__ method will be called.
+        
+        Args:
+            hparams (obj): Global hyper-parameters. Some key setttings such as type and gru_unit are there.
+            iterator_creator_train(obj): LSTUR data loader class for train data.
+            iterator_creator_test(obj): LSTUR data loader class for test and validation data
+        """
+
+        self.word2vec_embedding = self._init_embedding(hparams.wordEmb_file)
+        self.hparam = hparams
+
+        super().__init__(hparams, iterator_creator, seed=seed)
+
+    def _init_embedding(self, file_path):
+        """Load pre-trained embeddings as a constant tensor.
+        
+        Args:
+            file_path (str): the pre-trained embeddings filename.
+
+        Returns:
+            np.array: A constant numpy array.
+        """
+        return np.load(file_path).astype(np.float32)
+
+    def _get_input_label_from_iter(self, batch_data):
+        input_feat = [
+            batch_data["impression_index_batch"],
+            batch_data["user_index_batch"],
+            batch_data["clicked_news_batch"],
+            batch_data["candidate_news_batch"],
+        ]
+        input_label = batch_data["labels"]
+        return input_feat, input_label
+
+    def _build_graph(self):
+        """Build LSTUR model and scorer.
+
+        Returns:
+            obj: a model used to train.
+            obj: a model used to evaluate and inference.
+        """
+
+        model, scorer = self._build_lstur()
+        return model, scorer
+
+    def _build_userencoder(self, titleencoder, type="ini"):
+        """The main function to create user encoder of LSTUR.
+
+        Args:
+            titleencoder(obj): the news encoder of LSTUR. 
+
+        Return:
+            obj: the user encoder of LSTUR.
+        """
+        hparams = self.hparams
+        his_input_title = keras.Input(
+            shape=(hparams.his_size, hparams.doc_size), dtype="int32"
+        )
+        user_indexes = keras.Input(shape=(1,), dtype="int32")
+
+        user_embedding_layer = layers.Embedding(
+            hparams.user_num,
+            hparams.gru_unit,
+            trainable=True,
+            embeddings_initializer="zeros",
+        )
+
+        long_u_emb = layers.Reshape((hparams.gru_unit,))(
+            user_embedding_layer(user_indexes)
+        )
+        click_title_presents = layers.TimeDistributed(titleencoder)(his_input_title)
+
+        if type == "ini":
+            user_present = layers.GRU(
+                hparams.gru_unit,
+                kernel_initializer=keras.initializers.glorot_uniform(seed=self.seed),
+                recurrent_initializer=keras.initializers.glorot_uniform(seed=self.seed),
+                bias_initializer=keras.initializers.Zeros(),
+            )(
+                layers.Masking(mask_value=0.0)(click_title_presents),
+                initial_state=[long_u_emb],
+            )
+        elif type == "con":
+            short_uemb = layers.GRU(
+                hparams.gru_unit,
+                kernel_initializer=keras.initializers.glorot_uniform(seed=self.seed),
+                recurrent_initializer=keras.initializers.glorot_uniform(seed=self.seed),
+                bias_initializer=keras.initializers.Zeros(),
+            )(layers.Masking(mask_value=0.0)(click_title_presents))
+            
+            user_present = layers.Concatenate()([short_uemb, long_u_emb])
+            user_present = layers.Dense(
+                hparams.gru_unit,
+                bias_initializer=keras.initializers.Zeros(),
+                kernel_initializer=keras.initializers.glorot_uniform(seed=self.seed),
+            )(user_present)
+
+        click_title_presents = layers.TimeDistributed(titleencoder)(his_input_title)
+        user_present = AttLayer2(hparams.attention_hidden_dim, seed=self.seed)(
+            click_title_presents
+        )
+
+        model = keras.Model(
+            [his_input_title, user_indexes], user_present, name="user_encoder"
+        )
+        return model
+
+    def _build_newsencoder(self, embedding_layer):
+        """The main function to create news encoder of LSTUR.
+
+        Args:
+            embedding_layer(obj): a word embedding layer.
+        
+        Return:
+            obj: the news encoder of LSTUR.
+        """
+        hparams = self.hparams
+        sequences_input_title = keras.Input(shape=(hparams.doc_size,), dtype="int32")
+        embedded_sequences_title = embedding_layer(sequences_input_title)
+
+        y = layers.Dropout(hparams.dropout)(embedded_sequences_title)
+        y = layers.Conv1D(
+            hparams.filter_num,
+            hparams.window_size,
+            activation=hparams.cnn_activation,
+            padding="same",
+            bias_initializer=keras.initializers.Zeros(),
+            kernel_initializer=keras.initializers.glorot_uniform(seed=self.seed),
+        )(y)
+        y = layers.Dropout(hparams.dropout)(y)
+        y = layers.Masking()(
+            OverwriteMasking()([y, ComputeMasking()(sequences_input_title)])
+        )
+        pred_title = AttLayer2(hparams.attention_hidden_dim, seed=self.seed)(y)
+
+        model = keras.Model(sequences_input_title, pred_title, name="news_encoder")
+        return model
+
+    def _build_lstur(self):
+        """The main function to create LSTUR's logic. The core of LSTUR
+        is a user encoder and a news encoder.
+        
+        Returns:
+            obj: a model used to train.
+            obj: a model used to evaluate and inference.
+        """
+        hparams = self.hparams
+
+        his_input_title = keras.Input(
+            shape=(hparams.his_size, hparams.doc_size), dtype="int32"
+        )
+        pred_input_title = keras.Input(
+            shape=(hparams.npratio + 1, hparams.doc_size), dtype="int32"
+        )
+        pred_input_title_one = keras.Input(shape=(1, hparams.doc_size,), dtype="int32")
+        pred_title_reshape = layers.Reshape((hparams.doc_size,))(pred_input_title_one)
+        imp_indexes = keras.Input(shape=(1,), dtype="int32")
+        user_indexes = keras.Input(shape=(1,), dtype="int32")
+
+        embedding_layer = layers.Embedding(
+            hparams.word_size,
+            hparams.word_emb_dim,
+            weights=[self.word2vec_embedding],
+            trainable=True,
+        )
+
+        titleencoder = self._build_newsencoder(embedding_layer)
+        userencoder = self._build_userencoder(titleencoder, type=hparams.type)
+        newsencoder = titleencoder
+
+        user_present = userencoder([his_input_title, user_indexes])
+        news_present = layers.TimeDistributed(newsencoder)(pred_input_title)
+        news_present_one = newsencoder(pred_title_reshape)
+
+        preds = layers.Dot(axes=-1)([news_present, user_present])
+        preds = layers.Activation(activation="softmax")(preds)
+
+        pred_one = layers.Dot(axes=-1)([news_present_one, user_present])
+        pred_one = layers.Activation(activation="sigmoid")(pred_one)
+
+        model = keras.Model(
+            [imp_indexes, user_indexes, his_input_title, pred_input_title], preds
+        )
+        scorer = keras.Model(
+            [imp_indexes, user_indexes, his_input_title, pred_input_title_one], pred_one
+        )
+
+        return model, scorer
diff --git a/reco_utils/recommender/newsrec/models/naml.py b/reco_utils/recommender/newsrec/models/naml.py
new file mode 100644
index 0000000000..b40bc24a94
--- /dev/null
+++ b/reco_utils/recommender/newsrec/models/naml.py
@@ -0,0 +1,369 @@
+# Copyright (c) Microsoft Corporation. All rights reserved.
+# Licensed under the MIT License.
+
+import numpy as np
+import tensorflow as tf
+import tensorflow.keras as keras
+from tensorflow.keras import layers
+
+
+from reco_utils.recommender.newsrec.models.base_model import BaseModel
+from reco_utils.recommender.newsrec.models.layers import AttLayer2
+
+__all__ = ["NAMLModel"]
+
+
+class NAMLModel(BaseModel):
+    """NAML model(Neural News Recommendation with Attentive Multi-View Learning)
+
+    Chuhan Wu, Fangzhao Wu, Mingxiao An, Jianqiang Huang, Yongfeng Huang and Xing Xie,
+    Neural News Recommendation with Attentive Multi-View Learning, IJCAI 2019
+
+    Attributes:
+        word2vec_embedding (numpy.array): Pretrained word embedding matrix.
+        hparam (obj): Global hyper-parameters.
+    """
+
+    def __init__(self, hparams, iterator_creator, seed=None):
+        """Initialization steps for NAML.
+        Compared with the BaseModel, NAML need word embedding.
+        After creating word embedding matrix, BaseModel's __init__ method will be called.
+        
+        Args:
+            hparams (obj): Global hyper-parameters. Some key setttings such as filter_num are there.
+            iterator_creator_train(obj): NAML data loader class for train data.
+            iterator_creator_test(obj): NAML data loader class for test and validation data
+        """
+
+        self.word2vec_embedding = self._init_embedding(hparams.wordEmb_file)
+        self.hparam = hparams
+
+        super().__init__(hparams, iterator_creator, seed=seed)
+
+    def _get_input_label_from_iter(self, batch_data):
+        input_feat = [
+            batch_data["impression_index_batch"],
+            batch_data["user_index_batch"],
+            batch_data["clicked_title_batch"],
+            batch_data["clicked_body_batch"],
+            batch_data["clicked_vert_batch"],
+            batch_data["clicked_subvert_batch"],
+            batch_data["candidate_title_batch"],
+            batch_data["candidate_body_batch"],
+            batch_data["candidate_vert_batch"],
+            batch_data["candidate_subvert_batch"]
+        ]
+        input_label = batch_data["labels"]
+        return input_feat, input_label
+
+    def _init_embedding(self, file_path):
+        """Load pre-trained embeddings as a constant tensor.
+        
+        Args:
+            file_path (str): the pre-trained embeddings filename.
+
+        Returns:
+            np.array: A constant numpy array.
+        """
+        return np.load(file_path).astype(np.float32)
+
+    def _build_graph(self):
+        """Build NAML model and scorer.
+
+        Returns:
+            obj: a model used to train.
+            obj: a model used to evaluate and inference.
+        """
+
+        model, scorer = self._build_naml()
+        return model, scorer
+
+    def _build_userencoder(self, newsencoder):
+        """The main function to create user encoder of NAML.
+
+        Args:
+            newsencoder(obj): the news encoder of NAML. 
+
+        Return:
+            obj: the user encoder of NAML.
+        """
+        hparams = self.hparams
+        his_input_title_body_verts = keras.Input(
+            shape=(hparams.his_size, hparams.title_size + hparams.body_size + 2),
+            dtype="int32",
+        )
+
+        click_news_presents = layers.TimeDistributed(newsencoder)(
+            his_input_title_body_verts
+        )
+        user_present = AttLayer2(hparams.attention_hidden_dim, seed=self.seed)(click_news_presents)
+
+        model = keras.Model(
+            his_input_title_body_verts, user_present, name="user_encoder"
+        )
+        return model
+
+    def _build_newsencoder(self, embedding_layer):
+        """The main function to create news encoder of NAML.
+        news encoder in composed of title encoder, body encoder, vert encoder and subvert encoder
+
+        Args:
+            embedding_layer(obj): a word embedding layer.
+        
+        Return:
+            obj: the news encoder of NAML.
+        """
+        hparams = self.hparams
+        input_title_body_verts = keras.Input(
+            shape=(hparams.title_size + hparams.body_size + 2,), dtype="int32"
+        )
+
+        sequences_input_title = layers.Lambda(lambda x: x[:, : hparams.title_size])(
+            input_title_body_verts
+        )
+        sequences_input_body = layers.Lambda(
+            lambda x: x[:, hparams.title_size : hparams.title_size + hparams.body_size]
+        )(input_title_body_verts)
+        input_vert = layers.Lambda(
+            lambda x: x[
+                :,
+                hparams.title_size
+                + hparams.body_size : hparams.title_size
+                + hparams.body_size
+                + 1,
+            ]
+        )(input_title_body_verts)
+        input_subvert = layers.Lambda(
+            lambda x: x[:, hparams.title_size + hparams.body_size + 1 :]
+        )(input_title_body_verts)
+
+        title_repr = self._build_titleencoder(embedding_layer)(sequences_input_title)
+        body_repr = self._build_bodyencoder(embedding_layer)(sequences_input_body)
+        vert_repr = self._build_vertencoder()(input_vert)
+        subvert_repr = self._build_subvertencoder()(input_subvert)
+
+        concate_repr = layers.Concatenate(axis=-2)(
+            [title_repr, body_repr, vert_repr, subvert_repr]
+        )
+        news_repr = AttLayer2(hparams.attention_hidden_dim, seed=self.seed)(concate_repr)
+
+        model = keras.Model(input_title_body_verts, news_repr, name="news_encoder")
+        return model
+
+    def _build_titleencoder(self, embedding_layer):
+        """build title encoder of NAML news encoder.
+
+        Args:
+            embedding_layer(obj): a word embedding layer.
+        
+        Return:
+            obj: the title encoder of NAML.
+        """
+        hparams = self.hparams
+        sequences_input_title = keras.Input(shape=(hparams.title_size,), dtype="int32")
+        embedded_sequences_title = embedding_layer(sequences_input_title)
+
+        y = layers.Dropout(hparams.dropout)(embedded_sequences_title)
+        y = layers.Conv1D(
+            hparams.filter_num,
+            hparams.window_size,
+            activation=hparams.cnn_activation,
+            padding="same",
+            bias_initializer=keras.initializers.Zeros(),
+            kernel_initializer=keras.initializers.glorot_uniform(seed=self.seed)
+        )(y)
+        y = layers.Dropout(hparams.dropout)(y)
+        pred_title = AttLayer2(hparams.attention_hidden_dim, seed=self.seed)(y)
+        pred_title = layers.Reshape((1, hparams.filter_num))(pred_title)
+
+        model = keras.Model(sequences_input_title, pred_title, name="title_encoder")
+        return model
+
+    def _build_bodyencoder(self, embedding_layer):
+        """build body encoder of NAML news encoder.
+
+        Args:
+            embedding_layer(obj): a word embedding layer.
+        
+        Return:
+            obj: the body encoder of NAML.
+        """
+        hparams = self.hparams
+        sequences_input_body = keras.Input(shape=(hparams.body_size,), dtype="int32")
+        embedded_sequences_body = embedding_layer(sequences_input_body)
+
+        y = layers.Dropout(hparams.dropout)(embedded_sequences_body)
+        y = layers.Conv1D(
+            hparams.filter_num,
+            hparams.window_size,
+            activation=hparams.cnn_activation,
+            padding="same",
+            bias_initializer=keras.initializers.Zeros(),
+            kernel_initializer=keras.initializers.glorot_uniform(seed=self.seed)
+        )(y)
+        y = layers.Dropout(hparams.dropout)(y)
+        pred_body = AttLayer2(hparams.attention_hidden_dim, seed=self.seed)(y)
+        pred_body = layers.Reshape((1, hparams.filter_num))(pred_body)
+
+        model = keras.Model(sequences_input_body, pred_body, name="body_encoder")
+        return model
+
+    def _build_vertencoder(self):
+        """build vert encoder of NAML news encoder.
+
+        Return:
+            obj: the vert encoder of NAML.
+        """
+        hparams = self.hparams
+        input_vert = keras.Input(shape=(1,), dtype="int32")
+
+        vert_embedding = layers.Embedding(
+            hparams.vert_num, hparams.vert_emb_dim, trainable=True
+        )
+
+        vert_emb = vert_embedding(input_vert)
+        pred_vert = layers.Dense(
+            hparams.filter_num, 
+            activation=hparams.dense_activation,
+            bias_initializer=keras.initializers.Zeros(),
+            kernel_initializer=keras.initializers.glorot_uniform(seed=self.seed)
+        )(vert_emb)
+        pred_vert = layers.Reshape((1, hparams.filter_num))(pred_vert)
+
+        model = keras.Model(input_vert, pred_vert, name="vert_encoder")
+        return model
+
+    def _build_subvertencoder(self):
+        """build subvert encoder of NAML news encoder.
+
+        Return:
+            obj: the subvert encoder of NAML.
+        """
+        hparams = self.hparams
+        input_subvert = keras.Input(shape=(1,), dtype="int32")
+
+        subvert_embedding = layers.Embedding(
+            hparams.subvert_num, hparams.subvert_emb_dim, trainable=True
+        )
+
+        subvert_emb = subvert_embedding(input_subvert)
+        pred_subvert = layers.Dense(
+            hparams.filter_num, 
+            activation=hparams.dense_activation,
+            bias_initializer=keras.initializers.Zeros(),
+            kernel_initializer=keras.initializers.glorot_uniform(seed=self.seed)
+        )(subvert_emb)
+        pred_subvert = layers.Reshape((1, hparams.filter_num))(pred_subvert)
+
+        model = keras.Model(input_subvert, pred_subvert, name="subvert_encoder")
+        return model
+
+    def _build_naml(self):
+        """The main function to create NAML's logic. The core of NAML
+        is a user encoder and a news encoder.
+        
+        Returns:
+            obj: a model used to train.
+            obj: a model used to evaluate and predict.
+        """
+        hparams = self.hparams
+
+        his_input_title = keras.Input(
+            shape=(hparams.his_size, hparams.title_size), dtype="int32"
+        )
+        his_input_body = keras.Input(
+            shape=(hparams.his_size, hparams.body_size), dtype="int32"
+        )
+        his_input_vert = keras.Input(shape=(hparams.his_size, 1), dtype="int32")
+        his_input_subvert = keras.Input(shape=(hparams.his_size, 1), dtype="int32")
+
+        pred_input_title = keras.Input(
+            shape=(hparams.npratio + 1, hparams.title_size), dtype="int32"
+        )
+        pred_input_body = keras.Input(
+            shape=(hparams.npratio + 1, hparams.body_size), dtype="int32"
+        )
+        pred_input_vert = keras.Input(shape=(hparams.npratio + 1, 1), dtype="int32")
+        pred_input_subvert = keras.Input(shape=(hparams.npratio + 1, 1), dtype="int32")
+
+        pred_input_title_one = keras.Input(
+            shape=(1, hparams.title_size,), dtype="int32"
+        )
+        pred_input_body_one = keras.Input(shape=(1, hparams.body_size,), dtype="int32")
+        pred_input_vert_one = keras.Input(shape=(1, 1), dtype="int32")
+        pred_input_subvert_one = keras.Input(shape=(1, 1), dtype="int32")
+
+        his_title_body_verts = layers.Concatenate(axis=-1)(
+            [his_input_title, his_input_body, his_input_vert, his_input_subvert]
+        )
+
+        pred_title_body_verts = layers.Concatenate(axis=-1)(
+            [pred_input_title, pred_input_body, pred_input_vert, pred_input_subvert]
+        )
+
+        pred_title_body_verts_one = layers.Concatenate(axis=-1)(
+            [
+                pred_input_title_one,
+                pred_input_body_one,
+                pred_input_vert_one,
+                pred_input_subvert_one,
+            ]
+        )
+        pred_title_body_verts_one = layers.Reshape((-1,))(pred_title_body_verts_one)
+
+        imp_indexes = keras.Input(shape=(1,), dtype="int32")
+        user_indexes = keras.Input(shape=(1,), dtype="int32")
+
+        embedding_layer = layers.Embedding(
+            hparams.word_size,
+            hparams.word_emb_dim,
+            weights=[self.word2vec_embedding],
+            trainable=True,
+        )
+
+        newsencoder = self._build_newsencoder(embedding_layer)
+        userencoder = self._build_userencoder(newsencoder)
+
+        user_present = userencoder(his_title_body_verts)
+        news_present = layers.TimeDistributed(newsencoder)(pred_title_body_verts)
+        news_present_one = newsencoder(pred_title_body_verts_one)
+
+        preds = layers.Dot(axes=-1)([news_present, user_present])
+        preds = layers.Activation(activation="softmax")(preds)
+
+        pred_one = layers.Dot(axes=-1)([news_present_one, user_present])
+        pred_one = layers.Activation(activation="sigmoid")(pred_one)
+
+        model = keras.Model(
+            [
+                imp_indexes,
+                user_indexes,
+                his_input_title,
+                his_input_body,
+                his_input_vert,
+                his_input_subvert,
+                pred_input_title,
+                pred_input_body,
+                pred_input_vert,
+                pred_input_subvert,
+            ],
+            preds,
+        )
+
+        scorer = keras.Model(
+            [
+                imp_indexes,
+                user_indexes,
+                his_input_title,
+                his_input_body,
+                his_input_vert,
+                his_input_subvert,
+                pred_input_title_one,
+                pred_input_body_one,
+                pred_input_vert_one,
+                pred_input_subvert_one,
+            ],
+            pred_one,
+        )
+
+        return model, scorer
diff --git a/reco_utils/recommender/newsrec/models/npa.py b/reco_utils/recommender/newsrec/models/npa.py
new file mode 100644
index 0000000000..6f43c043de
--- /dev/null
+++ b/reco_utils/recommender/newsrec/models/npa.py
@@ -0,0 +1,230 @@
+# Copyright (c) Microsoft Corporation. All rights reserved.
+# Licensed under the MIT License.
+
+import numpy as np
+import tensorflow as tf
+import tensorflow.keras as keras
+from tensorflow.keras import layers
+
+
+from reco_utils.recommender.newsrec.models.base_model import BaseModel
+from reco_utils.recommender.newsrec.models.layers import PersonalizedAttentivePooling
+
+__all__ = ["NPAModel"]
+
+
+class NPAModel(BaseModel):
+    """NPA model(Neural News Recommendation with Attentive Multi-View Learning)
+
+    Chuhan Wu, Fangzhao Wu, Mingxiao An, Jianqiang Huang, Yongfeng Huang and Xing Xie:
+    NPA: Neural News Recommendation with Personalized Attention, KDD 2019, ADS track.
+
+    Attributes:
+        word2vec_embedding (numpy.array): Pretrained word embedding matrix.
+        hparam (obj): Global hyper-parameters.
+    """
+
+    def __init__(self, hparams, iterator_creator, seed=None):
+        """Initialization steps for MANL.
+        Compared with the BaseModel, NPA need word embedding.
+        After creating word embedding matrix, BaseModel's __init__ method will be called.
+        
+        Args:
+            hparams (obj): Global hyper-parameters. Some key setttings such as filter_num are there.
+            iterator_creator_train(obj): NPA data loader class for train data.
+            iterator_creator_test(obj): NPA data loader class for test and validation data
+        """
+
+        self.word2vec_embedding = self._init_embedding(hparams.wordEmb_file)
+        self.hparam = hparams
+
+        super().__init__(hparams, iterator_creator, seed=seed)
+
+    def _init_embedding(self, file_path):
+        """Load pre-trained embeddings as a constant tensor.
+        
+        Args:
+            file_path (str): the pre-trained embeddings filename.
+
+        Returns:
+            np.array: A constant numpy array.
+        """
+        return np.load(file_path).astype(np.float32)
+
+    def _get_input_label_from_iter(self, batch_data):
+        input_feat = [
+            batch_data["impression_index_batch"],
+            batch_data["user_index_batch"],
+            batch_data["clicked_news_batch"],
+            batch_data["candidate_news_batch"],
+        ]
+        input_label = batch_data["labels"]
+        return input_feat, input_label
+
+    def _build_graph(self):
+        """Build NPA model and scorer.
+
+        Returns:
+            obj: a model used to train.
+            obj: a model used to evaluate and inference.
+        """
+
+        model, scorer = self._build_npa()
+        return model, scorer
+
+    def _build_userencoder(self, titleencoder, user_embedding_layer):
+        """The main function to create user encoder of NPA.
+
+        Args:
+            titleencoder(obj): the news encoder of NPA. 
+
+        Return:
+            obj: the user encoder of NPA.
+        """
+        hparams = self.hparams
+
+        his_input_title = keras.Input(
+            shape=(hparams.his_size, hparams.doc_size), dtype="int32"
+        )
+        user_indexes = keras.Input(shape=(1,), dtype="int32")
+
+        nuser_id = layers.Reshape((1, 1))(user_indexes)
+        repeat_uids = layers.Concatenate(axis=-2)([nuser_id] * hparams.his_size)
+        his_title_uid = layers.Concatenate(axis=-1)([his_input_title, repeat_uids])
+
+        click_title_presents = layers.TimeDistributed(titleencoder)(his_title_uid)
+
+        u_emb = layers.Reshape((hparams.user_emb_dim,))(
+            user_embedding_layer(user_indexes)
+        )
+        user_present = PersonalizedAttentivePooling(
+            hparams.his_size,
+            hparams.filter_num,
+            hparams.attention_hidden_dim,
+            seed=self.seed,
+        )([click_title_presents, layers.Dense(hparams.attention_hidden_dim)(u_emb)])
+
+        model = keras.Model(
+            [his_input_title, user_indexes], user_present, name="user_encoder"
+        )
+        return model
+
+    def _build_newsencoder(self, embedding_layer, user_embedding_layer):
+        """The main function to create news encoder of NPA.
+
+        Args:
+            embedding_layer(obj): a word embedding layer.
+        
+        Return:
+            obj: the news encoder of NPA.
+        """
+        hparams = self.hparams
+        sequence_title_uindex = keras.Input(
+            shape=(hparams.doc_size + 1,), dtype="int32"
+        )
+
+        sequences_input_title = layers.Lambda(lambda x: x[:, : hparams.doc_size])(
+            sequence_title_uindex
+        )
+        user_index = layers.Lambda(lambda x: x[:, hparams.doc_size :])(
+            sequence_title_uindex
+        )
+
+        u_emb = layers.Reshape((hparams.user_emb_dim,))(
+            user_embedding_layer(user_index)
+        )
+        embedded_sequences_title = embedding_layer(sequences_input_title)
+
+        y = layers.Dropout(hparams.dropout)(embedded_sequences_title)
+        y = layers.Conv1D(
+            hparams.filter_num,
+            hparams.window_size,
+            activation=hparams.cnn_activation,
+            padding="same",
+            bias_initializer=keras.initializers.Zeros(),
+            kernel_initializer=keras.initializers.glorot_uniform(seed=self.seed),
+        )(y)
+        y = layers.Dropout(hparams.dropout)(y)
+
+        pred_title = PersonalizedAttentivePooling(
+            hparams.doc_size,
+            hparams.filter_num,
+            hparams.attention_hidden_dim,
+            seed=self.seed,
+        )([y, layers.Dense(hparams.attention_hidden_dim)(u_emb)])
+
+        # pred_title = Reshape((1, feature_size))(pred_title)
+        model = keras.Model(sequence_title_uindex, pred_title, name="news_encoder")
+        return model
+
+    def _build_npa(self):
+        """The main function to create NPA's logic. The core of NPA
+        is a user encoder and a news encoder.
+        
+        Returns:
+            obj: a model used to train.
+            obj: a model used to evaluate and predict.
+        """
+        hparams = self.hparams
+
+        his_input_title = keras.Input(
+            shape=(hparams.his_size, hparams.doc_size), dtype="int32"
+        )
+        pred_input_title = keras.Input(
+            shape=(hparams.npratio + 1, hparams.doc_size), dtype="int32"
+        )
+        pred_input_title_one = keras.Input(shape=(1, hparams.doc_size,), dtype="int32")
+        pred_title_one_reshape = layers.Reshape((hparams.doc_size,))(
+            pred_input_title_one
+        )
+        imp_indexes = keras.Input(shape=(1,), dtype="int32")
+        user_indexes = keras.Input(shape=(1,), dtype="int32")
+
+        nuser_index = layers.Reshape((1, 1))(user_indexes)
+        repeat_uindex = layers.Concatenate(axis=-2)(
+            [nuser_index] * (hparams.npratio + 1)
+        )
+        pred_title_uindex = layers.Concatenate(axis=-1)(
+            [pred_input_title, repeat_uindex]
+        )
+        pred_title_uindex_one = layers.Concatenate()(
+            [pred_title_one_reshape, user_indexes]
+        )
+
+        embedding_layer = layers.Embedding(
+            hparams.word_size,
+            hparams.word_emb_dim,
+            weights=[self.word2vec_embedding],
+            trainable=True,
+        )
+
+        user_embedding_layer = layers.Embedding(
+            hparams.user_num,
+            hparams.user_emb_dim,
+            trainable=True,
+            embeddings_initializer="zeros",
+        )
+
+        titleencoder = self._build_newsencoder(embedding_layer, user_embedding_layer)
+        userencoder = self._build_userencoder(titleencoder, user_embedding_layer)
+        newsencoder = titleencoder
+
+        user_present = userencoder([his_input_title, user_indexes])
+
+        news_present = layers.TimeDistributed(newsencoder)(pred_title_uindex)
+        news_present_one = newsencoder(pred_title_uindex_one)
+
+        preds = layers.Dot(axes=-1)([news_present, user_present])
+        preds = layers.Activation(activation="softmax")(preds)
+
+        pred_one = layers.Dot(axes=-1)([news_present_one, user_present])
+        pred_one = layers.Activation(activation="sigmoid")(pred_one)
+
+        model = keras.Model(
+            [imp_indexes, user_indexes, his_input_title, pred_input_title], preds
+        )
+        scorer = keras.Model(
+            [imp_indexes, user_indexes, his_input_title, pred_input_title_one], pred_one
+        )
+
+        return model, scorer
diff --git a/reco_utils/recommender/newsrec/models/nrms.py b/reco_utils/recommender/newsrec/models/nrms.py
new file mode 100644
index 0000000000..b74629df04
--- /dev/null
+++ b/reco_utils/recommender/newsrec/models/nrms.py
@@ -0,0 +1,173 @@
+# Copyright (c) Microsoft Corporation. All rights reserved.
+# Licensed under the MIT License.
+
+import numpy as np
+import tensorflow as tf
+import tensorflow.keras as keras
+from tensorflow.keras import layers
+
+
+from reco_utils.recommender.newsrec.models.base_model import BaseModel
+from reco_utils.recommender.newsrec.models.layers import AttLayer2, SelfAttention
+
+__all__ = ["NRMSModel"]
+
+
+class NRMSModel(BaseModel):
+    """NRMS model(Neural News Recommendation with Multi-Head Self-Attention)
+
+    Chuhan Wu, Fangzhao Wu, Suyu Ge, Tao Qi, Yongfeng Huang,and Xing Xie, "Neural News
+    Recommendation with Multi-Head Self-Attention" in Proceedings of the 2019 Conference 
+    on Empirical Methods in Natural Language Processing and the 9th International Joint Conference 
+    on Natural Language Processing (EMNLP-IJCNLP)
+
+    Attributes:
+        word2vec_embedding (numpy.array): Pretrained word embedding matrix.
+        hparam (obj): Global hyper-parameters.
+    """
+
+    def __init__(self, hparams, iterator_creator, seed=None):
+        """Initialization steps for NRMS.
+        Compared with the BaseModel, NRMS need word embedding.
+        After creating word embedding matrix, BaseModel's __init__ method will be called.
+        
+        Args:
+            hparams (obj): Global hyper-parameters. Some key setttings such as head_num and head_dim are there.
+            iterator_creator_train(obj): NRMS data loader class for train data.
+            iterator_creator_test(obj): NRMS data loader class for test and validation data
+        """
+
+        self.word2vec_embedding = self._init_embedding(hparams.wordEmb_file)
+        self.hparam = hparams
+
+        super().__init__(hparams, iterator_creator, seed=seed)
+
+    def _init_embedding(self, file_path):
+        """Load pre-trained embeddings as a constant tensor.
+        
+        Args:
+            file_path (str): the pre-trained embeddings filename.
+
+        Returns:
+            np.array: A constant numpy array.
+        """
+        return np.load(file_path).astype(np.float32)
+
+    def _get_input_label_from_iter(self, batch_data):
+        input_feat = [
+            batch_data["impression_index_batch"],
+            batch_data["user_index_batch"],
+            batch_data["clicked_news_batch"],
+            batch_data["candidate_news_batch"],
+        ]
+        input_label = batch_data["labels"]
+        return input_feat, input_label
+
+    def _build_graph(self):
+        """Build NRMS model and scorer.
+
+        Returns:
+            obj: a model used to train.
+            obj: a model used to evaluate and inference.
+        """
+        hparams = self.hparams
+        model, scorer = self._build_nrms()
+        return model, scorer
+
+    def _build_userencoder(self, titleencoder):
+        """The main function to create user encoder of NRMS.
+
+        Args:
+            titleencoder(obj): the news encoder of NRMS. 
+
+        Return:
+            obj: the user encoder of NRMS.
+        """
+        hparams = self.hparams
+        his_input_title = keras.Input(
+            shape=(hparams.his_size, hparams.doc_size), dtype="int32"
+        )
+
+        click_title_presents = layers.TimeDistributed(titleencoder)(his_input_title)
+        y = SelfAttention(hparams.head_num, hparams.head_dim, seed=self.seed)(
+            [click_title_presents] * 3
+        )
+        user_present = AttLayer2(hparams.attention_hidden_dim, seed=self.seed)(y)
+
+        model = keras.Model(his_input_title, user_present, name="user_encoder")
+        return model
+
+    def _build_newsencoder(self, embedding_layer):
+        """The main function to create news encoder of NRMS.
+
+        Args:
+            embedding_layer(obj): a word embedding layer.
+        
+        Return:
+            obj: the news encoder of NRMS.
+        """
+        hparams = self.hparams
+        sequences_input_title = keras.Input(shape=(hparams.doc_size,), dtype="int32")
+        embedded_sequences_title = embedding_layer(sequences_input_title)
+
+        y = layers.Dropout(hparams.dropout)(embedded_sequences_title)
+        y = SelfAttention(hparams.head_num, hparams.head_dim, seed=self.seed)([y, y, y])
+        y = layers.Dropout(hparams.dropout)(y)
+        pred_title = AttLayer2(hparams.attention_hidden_dim, seed=self.seed)(y)
+
+        model = keras.Model(sequences_input_title, pred_title, name="news_encoder")
+        return model
+
+    def _build_nrms(self):
+        """The main function to create NRMS's logic. The core of NRMS
+        is a user encoder and a news encoder.
+        
+        Returns:
+            obj: a model used to train.
+            obj: a model used to evaluate and inference.
+        """
+        hparams = self.hparams
+
+        his_input_title = keras.Input(
+            shape=(hparams.his_size, hparams.doc_size), dtype="int32"
+        )
+        pred_input_title = keras.Input(
+            shape=(hparams.npratio + 1, hparams.doc_size), dtype="int32"
+        )
+        pred_input_title_one = keras.Input(shape=(1, hparams.doc_size,), dtype="int32")
+        pred_title_one_reshape = layers.Reshape((hparams.doc_size,))(
+            pred_input_title_one
+        )
+
+        imp_indexes = keras.Input(shape=(1,), dtype="int32")
+        user_indexes = keras.Input(shape=(1,), dtype="int32")
+
+        embedding_layer = layers.Embedding(
+            hparams.word_size,
+            hparams.word_emb_dim,
+            weights=[self.word2vec_embedding],
+            trainable=True,
+        )
+
+        titleencoder = self._build_newsencoder(embedding_layer)
+        userencoder = self._build_userencoder(titleencoder)
+        newsencoder = titleencoder
+
+        user_present = userencoder(his_input_title)
+        news_present = layers.TimeDistributed(newsencoder)(pred_input_title)
+        news_present_one = newsencoder(pred_title_one_reshape)
+
+        preds = layers.Dot(axes=-1)([news_present, user_present])
+        preds = layers.Activation(activation="softmax")(preds)
+
+        pred_one = layers.Dot(axes=-1)([news_present_one, user_present])
+        pred_one = layers.Activation(activation="sigmoid")(pred_one)
+
+        model = keras.Model(
+            [imp_indexes, user_indexes, his_input_title, pred_input_title], preds
+        )
+        scorer = keras.Model(
+            [imp_indexes, user_indexes, his_input_title, pred_input_title_one], pred_one
+        )
+
+        return model, scorer
diff --git a/reco_utils/recommender/newsrec/newsrec_utils.py b/reco_utils/recommender/newsrec/newsrec_utils.py
new file mode 100644
index 0000000000..e6b2f48d52
--- /dev/null
+++ b/reco_utils/recommender/newsrec/newsrec_utils.py
@@ -0,0 +1,287 @@
+# Copyright (c) Microsoft Corporation. All rights reserved.
+# Licensed under the MIT License.
+
+
+import tensorflow as tf
+import six
+import os
+from sklearn.metrics import (
+    roc_auc_score,
+    log_loss,
+    mean_squared_error,
+    accuracy_score,
+    f1_score,
+)
+import numpy as np
+import yaml
+import zipfile
+from reco_utils.dataset.download_utils import maybe_download
+from reco_utils.recommender.deeprec.deeprec_utils import (
+    flat_config,
+    load_yaml,
+    load_dict,
+)
+import json
+import pickle as pkl
+
+
+def check_type(config):
+    """Check that the config parameters are the correct type
+    
+    Args:
+        config (dict): Configuration dictionary.
+
+    Raises:
+        TypeError: If the parameters are not the correct type.
+    """
+
+    int_parameters = [
+        "word_size",
+        "his_size",
+        "doc_size",
+        "title_size",
+        "body_size",
+        "vert_num",
+        "subvert_num",
+        "npratio",
+        "word_emb_dim",
+        "attention_hidden_dim",
+        "epochs",
+        "batch_size",
+        "show_step",
+        "save_epoch",
+        "head_num",
+        "head_dim",
+        "user_num",
+        "filter_num",
+        "window_size",
+        "gru_unit",
+        "user_emb_dim",
+        "vert_emb_dim",
+        "subvert_emb_dim",
+    ]
+    for param in int_parameters:
+        if param in config and not isinstance(config[param], int):
+            raise TypeError("Parameters {0} must be int".format(param))
+
+    float_parameters = ["learning_rate", "dropout"]
+    for param in float_parameters:
+        if param in config and not isinstance(config[param], float):
+            raise TypeError("Parameters {0} must be float".format(param))
+
+    str_parameters = [
+        "wordEmb_file",
+        "method",
+        "loss",
+        "optimizer",
+        "cnn_activation",
+        "dense_activation" "type",
+    ]
+    for param in str_parameters:
+        if param in config and not isinstance(config[param], str):
+            raise TypeError("Parameters {0} must be str".format(param))
+
+    list_parameters = ["layer_sizes", "activation"]
+    for param in list_parameters:
+        if param in config and not isinstance(config[param], list):
+            raise TypeError("Parameters {0} must be list".format(param))
+
+
+def check_nn_config(f_config):
+    """Check neural networks configuration.
+    
+    Args:
+        f_config (dict): Neural network configuration.
+    
+    Raises:
+        ValueError: If the parameters are not correct.
+    """
+
+    if f_config["model_type"] in ["nrms", "NRMS"]:
+        required_parameters = [
+            "doc_size",
+            "his_size",
+            "user_num",
+            "wordEmb_file",
+            "word_size",
+            "npratio",
+            "data_format",
+            "word_emb_dim",
+            # nrms
+            "head_num",
+            "head_dim",
+            # attention
+            "attention_hidden_dim",
+            "loss",
+            "data_format",
+            "dropout",
+        ]
+
+    elif f_config["model_type"] in ["naml", "NAML"]:
+        required_parameters = [
+            "title_size",
+            "body_size",
+            "his_size",
+            "user_num",
+            "vert_num",
+            "subvert_num",
+            "wordEmb_file",
+            "word_size",
+            "npratio",
+            "data_format",
+            "word_emb_dim",
+            "vert_emb_dim",
+            "subvert_emb_dim",
+            # naml
+            "filter_num",
+            "cnn_activation",
+            "window_size",
+            "dense_activation",
+            # attention
+            "attention_hidden_dim",
+            "loss",
+            "data_format",
+            "dropout",
+        ]
+    elif f_config["model_type"] in ["lstur", "LSTUR"]:
+        required_parameters = [
+            "doc_size",
+            "his_size",
+            "user_num",
+            "wordEmb_file",
+            "word_size",
+            "npratio",
+            "data_format",
+            "word_emb_dim",
+            # lstur
+            "gru_unit",
+            "type",
+            "filter_num",
+            "cnn_activation",
+            "window_size",
+            # attention
+            "attention_hidden_dim",
+            "loss",
+            "data_format",
+            "dropout",
+        ]
+    elif f_config["model_type"] in ["npa", "NPA"]:
+        required_parameters = [
+            "doc_size",
+            "his_size",
+            "user_num",
+            "wordEmb_file",
+            "word_size",
+            "npratio",
+            "data_format",
+            "word_emb_dim",
+            # npa
+            "user_emb_dim",
+            "filter_num",
+            "cnn_activation",
+            "window_size",
+            # attention
+            "attention_hidden_dim",
+            "loss",
+            "data_format",
+            "dropout",
+        ]
+    else:
+        required_parameters = []
+
+    # check required parameters
+    for param in required_parameters:
+        if param not in f_config:
+            raise ValueError("Parameters {0} must be set".format(param))
+
+    if f_config["model_type"] in ["nrms", "NRMS", "lstur", "LSTUR"]:
+        if f_config["data_format"] != "news":
+            raise ValueError(
+                "For nrms and naml model, data format must be 'news', but your set is {0}".format(
+                    f_config["data_format"]
+                )
+            )
+    elif f_config["model_type"] in ["naml", "NAML"]:
+        if f_config["data_format"] != "naml":
+            raise ValueError(
+                "For nrms and naml model, data format must be 'naml', but your set is {0}".format(
+                    f_config["data_format"]
+                )
+            )
+
+    check_type(f_config)
+
+
+def create_hparams(flags):
+    """Create the model hyperparameters.
+
+    Args:
+        flags (dict): Dictionary with the model requirements.
+
+    Returns:
+        obj: Hyperparameter object in TF (tf.contrib.training.HParams).
+    """
+    return tf.contrib.training.HParams(
+        # data
+        data_format=flags.get("data_format", None),
+        iterator_type=flags.get("iterator_type", None),
+        # models
+        wordEmb_file=flags.get("wordEmb_file", None),
+        doc_size=flags.get("doc_size", None),
+        title_size=flags.get("title_size", None),
+        body_size=flags.get("body_size", None),
+        word_emb_dim=flags.get("word_emb_dim", None),
+        word_size=flags.get("word_size", None),
+        user_num=flags.get("user_num", None),
+        vert_num=flags.get("vert_num", None),
+        subvert_num=flags.get("subvert_num", None),
+        his_size=flags.get("his_size", None),
+        npratio=flags.get("npratio"),
+        dropout=flags.get("dropout", 0.0),
+        attention_hidden_dim=flags.get("attention_hidden_dim", 200),
+        # nrms
+        head_num=flags.get("head_num", 4),
+        head_dim=flags.get("head_dim", 100),
+        # naml
+        cnn_activation=flags.get("cnn_activation", None),
+        dense_activation=flags.get("dense_activation", None),
+        filter_num=flags.get("filter_num", 200),
+        window_size=flags.get("window_size", 3),
+        vert_emb_dim=flags.get("vert_emb_dim", 100),
+        subvert_emb_dim=flags.get("subvert_emb_dim", 100),
+        # lstur
+        gru_unit=flags.get("gru_unit", 400),
+        type=flags.get("type", "ini"),
+        # npa
+        user_emb_dim=flags.get("user_emb_dim", 50),
+        # train
+        learning_rate=flags.get("learning_rate", 0.001),
+        loss=flags.get("loss", None),
+        optimizer=flags.get("optimizer", "adam"),
+        epochs=flags.get("epochs", 10),
+        batch_size=flags.get("batch_size", 1),
+        # show info
+        show_step=flags.get("show_step", 1),
+        metrics=flags.get("metrics", None),
+    )
+
+
+def prepare_hparams(yaml_file=None, **kwargs):
+    """Prepare the model hyperparameters and check that all have the correct value.
+
+    Args:
+        yaml_file (str): YAML file as configuration.
+
+    Returns:
+        obj: Hyperparameter object in TF (tf.contrib.training.HParams).
+    """
+    if yaml_file is not None:
+        config = load_yaml(yaml_file)
+        config = flat_config(config)
+    else:
+        config = {}
+
+    config.update(kwargs)
+
+    check_nn_config(config)
+    return create_hparams(config)
diff --git a/tests/conftest.py b/tests/conftest.py
index 98b27f9bd7..2424b78260 100644
--- a/tests/conftest.py
+++ b/tests/conftest.py
@@ -210,6 +210,18 @@ def notebooks():
         "slirec_quickstart": os.path.join(
             folder_notebooks, "00_quick_start", "sequential_recsys_amazondataset.ipynb"
         ),
+        "nrms_quickstart": os.path.join(
+            folder_notebooks, "00_quick_start", "nrms_synthetic.ipynb"
+        ),
+        "naml_quickstart": os.path.join(
+            folder_notebooks, "00_quick_start", "naml_synthetic.ipynb"
+        ),
+        "lstur_quickstart": os.path.join(
+            folder_notebooks, "00_quick_start", "lstur_synthetic.ipynb"
+        ),
+        "npa_quickstart": os.path.join(
+            folder_notebooks, "00_quick_start", "npa_synthetic.ipynb"
+        ),
         "data_split": os.path.join(
             folder_notebooks, "01_prepare_data", "data_split.ipynb"
         ),
diff --git a/tests/integration/test_notebooks_gpu.py b/tests/integration/test_notebooks_gpu.py
index 8b69829a46..787811ce41 100644
--- a/tests/integration/test_notebooks_gpu.py
+++ b/tests/integration/test_notebooks_gpu.py
@@ -43,7 +43,11 @@ def test_ncf_integration(notebooks, size, epochs, expected_values, seed):
         OUTPUT_NOTEBOOK,
         kernel_name=KERNEL_NAME,
         parameters=dict(
-            TOP_K=10, MOVIELENS_DATA_SIZE=size, EPOCHS=epochs, BATCH_SIZE=512, SEED=seed,
+            TOP_K=10,
+            MOVIELENS_DATA_SIZE=size,
+            EPOCHS=epochs,
+            BATCH_SIZE=512,
+            SEED=seed,
         ),
     )
     results = pm.read_notebook(OUTPUT_NOTEBOOK).dataframe.set_index("name")["value"]
@@ -84,7 +88,11 @@ def test_ncf_deep_dive_integration(
         OUTPUT_NOTEBOOK,
         kernel_name=KERNEL_NAME,
         parameters=dict(
-            TOP_K=10, MOVIELENS_DATA_SIZE=size, EPOCHS=epochs, BATCH_SIZE=batch_size, SEED=seed,
+            TOP_K=10,
+            MOVIELENS_DATA_SIZE=size,
+            EPOCHS=epochs,
+            BATCH_SIZE=batch_size,
+            SEED=seed,
         ),
     )
     results = pm.read_notebook(OUTPUT_NOTEBOOK).dataframe.set_index("name")["value"]
@@ -139,20 +147,16 @@ def test_fastai_integration(notebooks, size, epochs, expected_values):
             15,
             10,
             {
-                "res_syn": {
-                    "auc": 0.9716,
-                    "logloss": 0.699,
-                },
-                "res_real": {
-                    "auc": 0.749,
-                    "logloss": 0.4926,
-                },
+                "res_syn": {"auc": 0.9716, "logloss": 0.699,},
+                "res_real": {"auc": 0.749, "logloss": 0.4926,},
             },
             42,
         )
     ],
 )
-def test_xdeepfm_integration(notebooks, syn_epochs, criteo_epochs, expected_values, seed):
+def test_xdeepfm_integration(
+    notebooks, syn_epochs, criteo_epochs, expected_values, seed
+):
     notebook_path = notebooks["xdeepfm_quickstart"]
     pm.execute_notebook(
         notebook_path,
@@ -170,7 +174,9 @@ def test_xdeepfm_integration(notebooks, syn_epochs, criteo_epochs, expected_valu
 
     for key, value in expected_values.items():
         assert results[key]["auc"] == pytest.approx(value["auc"], rel=TOL, abs=ABS_TOL)
-        assert results[key]["logloss"] == pytest.approx(value["logloss"], rel=TOL, abs=ABS_TOL)
+        assert results[key]["logloss"] == pytest.approx(
+            value["logloss"], rel=TOL, abs=ABS_TOL
+        )
 
 
 @pytest.mark.integration
@@ -215,6 +221,7 @@ def test_wide_deep_integration(notebooks, size, steps, expected_values, seed, tm
     for key, value in expected_values.items():
         assert results[key] == pytest.approx(value, rel=TOL, abs=ABS_TOL)
 
+
 @pytest.mark.sequential
 @pytest.mark.integration
 @pytest.mark.gpu
@@ -226,17 +233,14 @@ def test_wide_deep_integration(notebooks, size, steps, expected_values, seed, tm
             os.path.join("tests", "resources", "deeprec", "slirec"),
             10,
             400,
-            {
-                "res_syn": {
-                    "auc": 0.7183,
-                    "logloss": 0.6045,
-                },
-            },
+            {"res_syn": {"auc": 0.7183, "logloss": 0.6045,},},
             2019,
         )
     ],
 )
-def test_slirec_quickstart_integration(notebooks, yaml_file, data_path, epochs, batch_size, expected_values, seed):
+def test_slirec_quickstart_integration(
+    notebooks, yaml_file, data_path, epochs, batch_size, expected_values, seed
+):
     notebook_path = notebooks["slirec_quickstart"]
 
     params = {
@@ -252,4 +256,189 @@ def test_slirec_quickstart_integration(notebooks, yaml_file, data_path, epochs,
     results = pm.read_notebook(OUTPUT_NOTEBOOK).dataframe.set_index("name")["value"]
     for key, value in expected_values.items():
         assert results[key]["auc"] == pytest.approx(value["auc"], rel=TOL, abs=ABS_TOL)
-        assert results[key]["logloss"] == pytest.approx(value["logloss"], rel=TOL, abs=ABS_TOL)
+        assert results[key]["logloss"] == pytest.approx(
+            value["logloss"], rel=TOL, abs=ABS_TOL
+        )
+
+
+@pytest.mark.nrms
+@pytest.mark.integration
+@pytest.mark.gpu
+@pytest.mark.parametrize(
+    "epochs, seed, expected_values",
+    [
+        (
+            10,
+            42,
+            {
+                "res_syn": {
+                    "group_auc": 0.5845,
+                    "mean_mrr": 0.202,
+                    "ndcg@5": 0.1977,
+                    "ndcg@10": 0.2655,
+                },
+            },
+        )
+    ],
+)
+def test_nrms_quickstart_integration(notebooks, epochs, seed, expected_values):
+    notebook_path = notebooks["nrms_quickstart"]
+
+    params = {
+        "epochs": epochs,
+        "seed": seed,
+    }
+    pm.execute_notebook(
+        notebook_path, OUTPUT_NOTEBOOK, kernel_name=KERNEL_NAME, parameters=params
+    )
+    results = pm.read_notebook(OUTPUT_NOTEBOOK).dataframe.set_index("name")["value"]
+    for key, value in expected_values.items():
+        assert results[key]["group_auc"] == pytest.approx(
+            value["group_auc"], rel=TOL, abs=ABS_TOL
+        )
+        assert results[key]["mean_mrr"] == pytest.approx(
+            value["mean_mrr"], rel=TOL, abs=ABS_TOL
+        )
+        assert results[key]["ndcg@5"] == pytest.approx(
+            value["ndcg@5"], rel=TOL, abs=ABS_TOL
+        )
+        assert results[key]["ndcg@10"] == pytest.approx(
+            value["ndcg@10"], rel=TOL, abs=ABS_TOL
+        )
+
+
+@pytest.mark.naml
+@pytest.mark.integration
+@pytest.mark.gpu
+@pytest.mark.parametrize(
+    "epochs, seed, expected_values",
+    [
+        (
+            5,
+            42,
+            {
+                "res_syn": {
+                    "group_auc": 0.5667,
+                    "mean_mrr": 0.1827,
+                    "ndcg@5": 0.1898,
+                    "ndcg@10": 0.2465,
+                },
+            },
+        )
+    ],
+)
+def test_naml_quickstart_integration(notebooks, epochs, seed, expected_values):
+    notebook_path = notebooks["nrms_quickstart"]
+
+    params = {
+        "epochs": epochs,
+        "seed": seed,
+    }
+    pm.execute_notebook(
+        notebook_path, OUTPUT_NOTEBOOK, kernel_name=KERNEL_NAME, parameters=params
+    )
+    results = pm.read_notebook(OUTPUT_NOTEBOOK).dataframe.set_index("name")["value"]
+    for key, value in expected_values.items():
+        assert results[key]["group_auc"] == pytest.approx(
+            value["group_auc"], rel=TOL, abs=ABS_TOL
+        )
+        assert results[key]["mean_mrr"] == pytest.approx(
+            value["mean_mrr"], rel=TOL, abs=ABS_TOL
+        )
+        assert results[key]["ndcg@5"] == pytest.approx(
+            value["ndcg@5"], rel=TOL, abs=ABS_TOL
+        )
+        assert results[key]["ndcg@10"] == pytest.approx(
+            value["ndcg@10"], rel=TOL, abs=ABS_TOL
+        )
+
+
+@pytest.mark.lstur
+@pytest.mark.integration
+@pytest.mark.gpu
+@pytest.mark.parametrize(
+    "epochs, seed, expected_values",
+    [
+        (
+            5,
+            42,
+            {
+                "res_syn": {
+                    "group_auc": 0.5599,
+                    "mean_mrr": 0.2027,
+                    "ndcg@5": 0.2065,
+                    "ndcg@10": 0.268,
+                },
+            },
+        )
+    ],
+)
+def test_lstur_quickstart_integration(notebooks, epochs, seed, expected_values):
+    notebook_path = notebooks["lstur_quickstart"]
+
+    params = {
+        "epochs": epochs,
+        "seed": seed,
+    }
+    pm.execute_notebook(
+        notebook_path, OUTPUT_NOTEBOOK, kernel_name=KERNEL_NAME, parameters=params
+    )
+    results = pm.read_notebook(OUTPUT_NOTEBOOK).dataframe.set_index("name")["value"]
+    for key, value in expected_values.items():
+        assert results[key]["group_auc"] == pytest.approx(
+            value["group_auc"], rel=TOL, abs=ABS_TOL
+        )
+        assert results[key]["mean_mrr"] == pytest.approx(
+            value["mean_mrr"], rel=TOL, abs=ABS_TOL
+        )
+        assert results[key]["ndcg@5"] == pytest.approx(
+            value["ndcg@5"], rel=TOL, abs=ABS_TOL
+        )
+        assert results[key]["ndcg@10"] == pytest.approx(
+            value["ndcg@10"], rel=TOL, abs=ABS_TOL
+        )
+
+@pytest.mark.npa
+@pytest.mark.integration
+@pytest.mark.gpu
+@pytest.mark.parametrize(
+    "epochs, seed, expected_values",
+    [
+        (
+            5,
+            42,
+            {
+                "res_syn": {
+                    "group_auc": 0.5583,
+                    "mean_mrr": 0.1741,
+                    "ndcg@5": 0.1676,
+                    "ndcg@10": 0.2462,
+                },
+            },
+        )
+    ],
+)
+def test_npa_quickstart_integration(notebooks, epochs, seed, expected_values):
+    notebook_path = notebooks["npa_quickstart"]
+
+    params = {
+        "epochs": epochs,
+        "seed": seed,
+    }
+    pm.execute_notebook(
+        notebook_path, OUTPUT_NOTEBOOK, kernel_name=KERNEL_NAME, parameters=params
+    )
+    results = pm.read_notebook(OUTPUT_NOTEBOOK).dataframe.set_index("name")["value"]
+    for key, value in expected_values.items():
+        assert results[key]["group_auc"] == pytest.approx(
+            value["group_auc"], rel=TOL, abs=ABS_TOL
+        )
+        assert results[key]["mean_mrr"] == pytest.approx(
+            value["mean_mrr"], rel=TOL, abs=ABS_TOL
+        )
+        assert results[key]["ndcg@5"] == pytest.approx(
+            value["ndcg@5"], rel=TOL, abs=ABS_TOL
+        )
+        assert results[key]["ndcg@10"] == pytest.approx(
+            value["ndcg@10"], rel=TOL, abs=ABS_TOL
+        )
diff --git a/tests/smoke/test_newsrec_model.py b/tests/smoke/test_newsrec_model.py
new file mode 100644
index 0000000000..31c984327e
--- /dev/null
+++ b/tests/smoke/test_newsrec_model.py
@@ -0,0 +1,113 @@
+# Copyright (c) Microsoft Corporation. All rights reserved.
+# Licensed under the MIT License.
+
+import pytest
+import os
+import papermill as pm
+from reco_utils.recommender.newsrec.newsrec_utils import prepare_hparams
+from reco_utils.recommender.deeprec.deeprec_utils import download_deeprec_resources  
+from reco_utils.recommender.newsrec.models.base_model import BaseModel
+from reco_utils.recommender.newsrec.models.nrms import NRMSModel
+from reco_utils.recommender.newsrec.models.naml import NAMLModel
+from reco_utils.recommender.newsrec.models.lstur import LSTURModel
+from reco_utils.recommender.newsrec.models.npa import NPAModel
+from reco_utils.recommender.newsrec.IO.news_iterator import NewsIterator
+from reco_utils.recommender.newsrec.IO.naml_iterator import NAMLIterator
+
+
+@pytest.mark.smoke
+@pytest.mark.gpu
+def test_model_nrms(tmp):
+    yaml_file = os.path.join(tmp, 'nrms.yaml')
+    train_file = os.path.join(tmp, 'train.txt')
+    valid_file = os.path.join(tmp, 'test.txt')
+    wordEmb_file = os.path.join(tmp, 'embedding.npy')
+
+    if not os.path.exists(yaml_file):
+        download_deeprec_resources(
+            "https://recodatasets.blob.core.windows.net/newsrec/",
+            tmp,
+            "nrms.zip",
+        )
+
+    hparams = prepare_hparams(yaml_file, wordEmb_file=wordEmb_file, epochs=1)
+    assert hparams is not None
+
+    iterator = NewsIterator  
+    model = NRMSModel(hparams, iterator)
+
+    assert model.run_eval(valid_file) is not None
+    assert isinstance(model.fit(train_file, valid_file), BaseModel)
+
+@pytest.mark.smoke
+@pytest.mark.gpu
+def test_model_naml(tmp):
+    yaml_file = os.path.join(tmp, 'naml.yaml')
+    train_file = os.path.join(tmp, 'train.txt')
+    valid_file = os.path.join(tmp, 'test.txt')
+    wordEmb_file = os.path.join(tmp, 'embedding.npy')
+
+    if not os.path.exists(yaml_file):
+        download_deeprec_resources(
+            "https://recodatasets.blob.core.windows.net/newsrec/",
+            tmp,
+            "naml.zip",
+        )
+
+    hparams = prepare_hparams(yaml_file, wordEmb_file=wordEmb_file, epochs=1)
+    assert hparams is not None
+
+    iterator = NAMLIterator
+    model = NAMLModel(hparams, iterator)
+
+    assert model.run_eval(valid_file) is not None
+    assert isinstance(model.fit(train_file, valid_file), BaseModel)
+
+@pytest.mark.smoke
+@pytest.mark.gpu
+def test_model_lstur(tmp):
+    yaml_file = os.path.join(tmp, 'lstur.yaml')
+    train_file = os.path.join(tmp, 'train.txt')
+    valid_file = os.path.join(tmp, 'test.txt')
+    wordEmb_file = os.path.join(tmp, 'embedding.npy')
+
+    if not os.path.exists(yaml_file):
+        download_deeprec_resources(
+            "https://recodatasets.blob.core.windows.net/newsrec/",
+            tmp,
+            "lstur.zip",
+        )
+
+    hparams = prepare_hparams(yaml_file, wordEmb_file=wordEmb_file, epochs=1)
+    assert hparams is not None
+
+    iterator = NewsIterator
+    model = LSTURModel(hparams, iterator)
+
+    assert model.run_eval(valid_file) is not None
+    assert isinstance(model.fit(train_file, valid_file), BaseModel)
+
+
+@pytest.mark.smoke
+@pytest.mark.gpu
+def test_model_npa(tmp):
+    yaml_file = os.path.join(tmp, 'npa.yaml')
+    train_file = os.path.join(tmp, 'train.txt')
+    valid_file = os.path.join(tmp, 'test.txt')
+    wordEmb_file = os.path.join(tmp, 'embedding.npy')
+
+    if not os.path.exists(yaml_file):
+        download_deeprec_resources(
+            "https://recodatasets.blob.core.windows.net/newsrec/",
+            tmp,
+            "npa.zip",
+        )
+
+    hparams = prepare_hparams(yaml_file, wordEmb_file=wordEmb_file, epochs=1)
+    assert hparams is not None
+
+    iterator = NewsIterator
+    model = NPAModel(hparams, iterator)
+
+    assert model.run_eval(valid_file) is not None
+    assert isinstance(model.fit(train_file, valid_file), BaseModel)
\ No newline at end of file
diff --git a/tests/unit/test_newsrec_model.py b/tests/unit/test_newsrec_model.py
new file mode 100644
index 0000000000..5007fc28c4
--- /dev/null
+++ b/tests/unit/test_newsrec_model.py
@@ -0,0 +1,104 @@
+# Copyright (c) Microsoft Corporation. All rights reserved.
+# Licensed under the MIT License.
+
+import pytest
+import os
+from reco_utils.recommender.newsrec.newsrec_utils import prepare_hparams
+from reco_utils.recommender.deeprec.deeprec_utils import download_deeprec_resources  
+
+from reco_utils.recommender.newsrec.models.nrms import NRMSModel
+from reco_utils.recommender.newsrec.models.naml import NAMLModel
+from reco_utils.recommender.newsrec.models.lstur import LSTURModel
+from reco_utils.recommender.newsrec.models.npa import NPAModel
+from reco_utils.recommender.newsrec.IO.news_iterator import NewsIterator
+from reco_utils.recommender.newsrec.IO.naml_iterator import NAMLIterator
+
+@pytest.fixture
+def resource_path():
+    return os.path.dirname(os.path.realpath(__file__))
+
+@pytest.mark.gpu
+def test_nrms_component_definition(tmp):
+    yaml_file = os.path.join(tmp, 'nrms.yaml')
+    wordEmb_file = os.path.join(tmp, 'embedding.npy')
+
+    if not os.path.exists(yaml_file):
+        download_deeprec_resources(
+            "https://recodatasets.blob.core.windows.net/newsrec/",
+            tmp,
+            "nrms.zip",
+        )
+
+    hparams = prepare_hparams(yaml_file, wordEmb_file=wordEmb_file, epochs=1)
+    iterator = NewsIterator  
+    model = NRMSModel(hparams, iterator)
+
+    assert model.model is not None
+    assert model.scorer is not None
+    assert model.loss is not None
+    assert model.train_optimizer is not None
+
+
+@pytest.mark.gpu
+def test_naml_component_definition(tmp):
+    yaml_file = os.path.join(tmp, 'naml.yaml')
+    wordEmb_file = os.path.join(tmp, 'embedding.npy')
+
+    if not os.path.exists(yaml_file):
+        download_deeprec_resources(
+            "https://recodatasets.blob.core.windows.net/newsrec/",
+            tmp,
+            "naml.zip",
+        )
+
+    hparams = prepare_hparams(yaml_file, wordEmb_file=wordEmb_file, epochs=1)
+    iterator = NAMLIterator   
+    model = NAMLModel(hparams, iterator)
+
+    assert model.model is not None
+    assert model.scorer is not None
+    assert model.loss is not None
+    assert model.train_optimizer is not None
+
+
+@pytest.mark.gpu
+def test_npa_component_definition(tmp):
+    yaml_file = os.path.join(tmp, 'npa.yaml')
+    wordEmb_file = os.path.join(tmp, 'embedding.npy')
+
+    if not os.path.exists(yaml_file):
+        download_deeprec_resources(
+            "https://recodatasets.blob.core.windows.net/newsrec/",
+            tmp,
+            "npa.zip",
+        )
+
+    hparams = prepare_hparams(yaml_file, wordEmb_file=wordEmb_file, epochs=1)
+    iterator = NewsIterator
+    model = NPAModel(hparams, iterator)
+
+    assert model.model is not None
+    assert model.scorer is not None
+    assert model.loss is not None
+    assert model.train_optimizer is not None
+
+@pytest.mark.gpu
+def test_lstur_component_definition(tmp):
+    yaml_file = os.path.join(tmp, 'lstur.yaml')
+    wordEmb_file = os.path.join(tmp, 'embedding.npy')
+
+    if not os.path.exists(yaml_file):
+        download_deeprec_resources(
+            "https://recodatasets.blob.core.windows.net/newsrec/",
+            tmp,
+            "lstur.zip",
+        )
+
+    hparams = prepare_hparams(yaml_file, wordEmb_file=wordEmb_file, epochs=1)
+    iterator = NewsIterator  
+    model = LSTURModel(hparams, iterator)
+
+    assert model.model is not None
+    assert model.scorer is not None
+    assert model.loss is not None
+    assert model.train_optimizer is not None
\ No newline at end of file
diff --git a/tests/unit/test_newsrec_utils.py b/tests/unit/test_newsrec_utils.py
new file mode 100644
index 0000000000..34bfe3efae
--- /dev/null
+++ b/tests/unit/test_newsrec_utils.py
@@ -0,0 +1,110 @@
+# Copyright (c) Microsoft Corporation. All rights reserved.
+# Licensed under the MIT License.
+
+import pytest
+import os
+import tensorflow as tf
+from reco_utils.recommender.newsrec.newsrec_utils import prepare_hparams, load_yaml
+from reco_utils.recommender.deeprec.deeprec_utils import download_deeprec_resources  
+
+from reco_utils.recommender.newsrec.IO.news_iterator import NewsIterator
+from reco_utils.recommender.newsrec.IO.naml_iterator import NAMLIterator
+
+@pytest.fixture
+def resource_path():
+    return os.path.dirname(os.path.realpath(__file__))
+
+
+@pytest.mark.parametrize(
+    "must_exist_attributes", ["word_size", "data_format", "word_emb_dim"]
+)
+@pytest.mark.gpu
+def test_prepare_hparams(must_exist_attributes, tmp):
+    yaml_file = os.path.join(tmp, 'nrms.yaml')
+    train_file = os.path.join(tmp, 'train.txt')
+    valid_file = os.path.join(tmp, 'test.txt')
+    wordEmb_file = os.path.join(tmp, 'embedding.npy')
+
+    if not os.path.exists(yaml_file):
+        download_deeprec_resources(
+            "https://recodatasets.blob.core.windows.net/newsrec/",
+            tmp,
+            "nrms.zip",
+        )
+
+    hparams = prepare_hparams(yaml_file, wordEmb_file=wordEmb_file, epochs=1)
+    assert hasattr(hparams, must_exist_attributes)
+
+@pytest.mark.gpu
+def test_load_yaml_file(tmp):
+    yaml_file = os.path.join(tmp, 'nrms.yaml')
+
+    if not os.path.exists(yaml_file):
+        download_deeprec_resources(
+            "https://recodatasets.blob.core.windows.net/newsrec/",
+            tmp,
+            "nrms.zip",
+        )
+    config = load_yaml(yaml_file)
+    assert config is not None
+
+@pytest.mark.gpu
+def test_news_iterator(tmp):
+    yaml_file = os.path.join(tmp, 'nrms.yaml')
+    train_file = os.path.join(tmp, 'train.txt')
+    valid_file = os.path.join(tmp, 'test.txt')
+    wordEmb_file = os.path.join(tmp, 'embedding.npy')
+
+    if not os.path.exists(yaml_file):
+        download_deeprec_resources(
+            "https://recodatasets.blob.core.windows.net/newsrec/",
+            tmp,
+            "nrms.zip",
+        )
+    
+    hparams = prepare_hparams(yaml_file, wordEmb_file=wordEmb_file, epochs=1, batch_size=512)
+    train_iterator = NewsIterator(hparams, hparams.npratio)
+    test_iterator = NewsIterator(hparams, 0)
+
+    assert train_iterator is not None
+    for res in train_iterator.load_data_from_file(train_file):
+        assert isinstance(res, dict)
+        assert len(res) == 5
+        break
+    
+    assert test_iterator is not None
+    for res in test_iterator.load_data_from_file(valid_file):
+        assert isinstance(res, dict)
+        assert len(res) == 5
+        break
+
+
+@pytest.mark.gpu
+def test_naml_iterator(tmp):
+    yaml_file = os.path.join(tmp, 'naml.yaml')
+    train_file = os.path.join(tmp, 'train.txt')
+    valid_file = os.path.join(tmp, 'test.txt')
+    wordEmb_file = os.path.join(tmp, 'embedding.npy')
+
+    if not os.path.exists(yaml_file):
+        download_deeprec_resources(
+            "https://recodatasets.blob.core.windows.net/newsrec/",
+            tmp,
+            "naml.zip",
+        )
+    
+    hparams = prepare_hparams(yaml_file, wordEmb_file=wordEmb_file, epochs=1, batch_size=1024)
+    train_iterator = NAMLIterator(hparams, hparams.npratio)
+    test_iterator = NAMLIterator(hparams, 0)
+
+    assert train_iterator is not None
+    for res in train_iterator.load_data_from_file(train_file):
+        assert isinstance(res, dict)
+        assert len(res) == 11
+        break
+        
+    assert test_iterator is not None
+    for res in test_iterator.load_data_from_file(valid_file):
+        assert isinstance(res, dict)
+        assert len(res) == 11
+        break