Skip to content

Commit

Permalink
Merge pull request #1415 from microsoft/zhangya
Browse files Browse the repository at this point in the history
Zhangya
  • Loading branch information
miguelgfierro authored Jun 15, 2021
2 parents a566e9f + e5cc9aa commit 85d696c
Show file tree
Hide file tree
Showing 78 changed files with 1,875 additions and 1,434 deletions.
2 changes: 1 addition & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ Here are the basic steps to get started with your first contribution. Please rea
5. Install development requirements. `pip install -r dev-requirements.txt`
6. Create a test that replicates the issue.
7. Make code changes.
8. Ensure unit tests pass and code style / formatting is consistent (see [wiki](https://github.com/Microsoft/Recommenders/wiki/Coding-Guidelines#python-and-docstrings-style) for more details).
8. Ensure that unit tests pass and code style / formatting is consistent (see the [coding guidelines](https://github.com/Microsoft/Recommenders/wiki/Coding-Guidelines#python-and-docstrings-style) for more details). In particular, make sure that there is a docstring for every function and class you add and that it conforms to the [Google style](http://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html).
9. Create a pull request against **staging** branch.

Once the features included in a [milestone](https://github.com/microsoft/recommenders/milestones) are completed, we will merge staging into main. See the wiki for more detail about our [merge strategy](https://github.com/microsoft/recommenders/wiki/Strategy-to-merge-the-code-to-main-branch).
Expand Down
11 changes: 11 additions & 0 deletions docs/.readthedocs.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
version: 2

# Build from the docs/ directory with Sphinx
sphinx:
configuration: docs/source/conf.py

# Explicitly set the version of Python and its requirements
python:
version: 3.7
install:
- requirements: docs/requirements.txt
3 changes: 3 additions & 0 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,9 @@

To setup the documentation, first you need to install the dependencies of the full environment. For it please follow the [SETUP.md](../SETUP.md). Then type:

conda create -n reco_full python=3.6 cudatoolkit=10.0 cudnn>=7.6
conda activate reco_full
pip install .[all]
pip install sphinx_rtd_theme


Expand All @@ -11,3 +13,4 @@ To build the documentation as HTML:
cd docs
make html

To contribute to this repository, please follow our [coding guidelines](https://github.com/Microsoft/Recommenders/wiki/Coding-Guidelines). See also the [reStructuredText documentation](https://www.sphinx-doc.org/en/master/usage/restructuredtext/index.html) for the syntax of docstrings.
34 changes: 34 additions & 0 deletions docs/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
numpy>=1.14
pandas>1.0.3,<2
scipy>=1.0.0,<2
tqdm>=4.31.1,<5
matplotlib>=2.2.2,<4
scikit-learn>=0.22.1,<1
numba>=0.38.1,<1
lightfm>=1.15,<2
lightgbm>=2.2.1,<3
memory_profiler>=0.54.0,<1
nltk>=3.4,<4
pydocumentdb>=2.3.3<3
pymanopt>=0.2.5,<1
seaborn>=0.8.1,<1
transformers>=2.5.0,<5
bottleneck>=1.2.1,<2
category_encoders>=1.3.0,<2
jinja2>=2,<3
pyyaml>=5.4.1,<6
requests>=2.0.0,<3
cornac>=1.1.2,<2
scikit-surprise>=0.19.1,<=1.1.1
retrying>=1.3.3
azure.mgmt.cosmosdb>=0.8.0,<1
hyperopt>=0.1.2,<1
ipykernel>=4.6.1,<5
jupyter>=1,<2
locust>=1,<2
papermill>=2.1.2,<3
scrapbook>=0.5.0,<1.0.0
nvidia-ml-py3>=7.352.0
tensorflow-gpu>=1.15.0,<2
torch==1.2.0
fastai>=1.0.46,<2
19 changes: 0 additions & 19 deletions docs/source/azureml.rst

This file was deleted.

7 changes: 7 additions & 0 deletions docs/source/common.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,13 @@ GPU utilities
:members:


Kubernetes utilities
===============================

.. automodule:: reco_utils.common.k8s_utils
:members:


Notebook utilities
===============================

Expand Down
113 changes: 102 additions & 11 deletions docs/source/dataset.rst
Original file line number Diff line number Diff line change
@@ -1,41 +1,132 @@
.. _dataset:

Dataset module
**************************
##############

Recommendation datasets and related utilities

Recommendation datasets
===============================
***********************

.. automodule:: reco_utils.dataset.movielens
Amazon Reviews
==============

`Amazon Reviews dataset <https://snap.stanford.edu/data/web-Amazon.html>`_ consists of reviews from Amazon.
The data span a period of 18 years, including ~35 million reviews up to March 2013. Reviews include product and user
information, ratings, and a plaintext review.

:Citation:

J. McAuley and J. Leskovec, "Hidden factors and hidden topics: understanding rating dimensions with review text",
RecSys, 2013.

.. automodule:: reco_utils.dataset.amazon_reviews
:members:

CORD-19
=======

`COVID-19 Open Research Dataset (CORD-19) <https://azure.microsoft.com/en-us/services/open-datasets/catalog/covid-19-open-research/>`_ is a full-text
and metadata dataset of COVID-19 and coronavirus-related scholarly articles optimized
for machine readability and made available for use by the global research community.

In response to the COVID-19 pandemic, the Allen Institute for AI has partnered with leading research groups
to prepare and distribute the COVID-19 Open Research Dataset (CORD-19), a free resource of
over 47,000 scholarly articles, including over 36,000 with full text, about COVID-19 and the
coronavirus family of viruses for use by the global research community.

This dataset is intended to mobilize researchers to apply recent advances in natural language processing
to generate new insights in support of the fight against this infectious disease.

:Citation:

Wang, L.L., Lo, K., Chandrasekhar, Y., Reas, R., Yang, J., Eide, D.,
Funk, K., Kinney, R., Liu, Z., Merrill, W. and Mooney, P. "Cord-19: The COVID-19 Open Research Dataset.", 2020.


.. automodule:: reco_utils.dataset.covid_utils
:members:

Criteo
======

`Criteo dataset <https://www.kaggle.com/c/criteo-display-ad-challenge/overview>`_, released by Criteo Labs, is an online advertising dataset that contains feature values and click feedback
for millions of display Ads. Every Ad has has 40 attributes, the first attribute is the label where a value 1 represents
that the Ad has been clicked on and a 0 represents it wasn't clicked on. The rest consist of 13 integer columns and
26 categorical columns.

.. automodule:: reco_utils.dataset.criteo
:members:

MIND
====

`MIcrosoft News Dataset (MIND) <https://msnews.github.io/>`_, is a large-scale dataset for news recommendation research. It was collected from
anonymized behavior logs of Microsoft News website.

MIND contains about 160k English news articles and more than 15 million impression logs generated by 1 million users.
Every news article contains rich textual content including title, abstract, body, category and entities.
Each impression log contains the click events, non-clicked events and historical news click behaviors of this user before
this impression. To protect user privacy, each user was de-linked from the production system when securely hashed into an anonymized ID.

:Citation:

Fangzhao Wu, Ying Qiao, Jiun-Hung Chen, Chuhan Wu, Tao Qi, Jianxun Lian, Danyang Liu, Xing Xie, Jianfeng Gao, Winnie Wu
and Ming Zhou, "MIND: A Large-scale Dataset for News Recommendation", ACL, 2020.



.. automodule:: reco_utils.dataset.mind
:members:

MovieLens
=========

The `MovieLens datasets <https://grouplens.org/datasets/movielens/>`_, first released in 1998,
describe people's expressed preferences
for movies. These preferences take the form of `<user, item, rating, timestamp>` tuples,
each the result of a person expressing a preference (a 0-5 star rating) for a movie
at a particular time.

It comes with several sizes:

* MovieLens 100k: 100,000 ratings from 1000 users on 1700 movies.
* MovieLens 1M: 1 million ratings from 6000 users on 4000 movies.
* MovieLens 10M: 10 million ratings from 72000 users on 10000 movies.
* MovieLens 20M: 20 million ratings from 138000 users on 27000 movies

:Citation:

F. M. Harper and J. A. Konstan. "The MovieLens Datasets: History and Context".
ACM Transactions on Interactive Intelligent Systems (TiiS) 5, 4, Article 19,
DOI=http://dx.doi.org/10.1145/2827872, 2015.

.. automodule:: reco_utils.dataset.movielens
:members:

Download utilities
===============================
******************

.. automodule:: reco_utils.dataset.download_utils
:members:


Cosmos CLI
===============================
Cosmos CLI utilities
*********************

.. automodule:: reco_utils.dataset.cosmos_cli
:members:


Pandas dataframe utils
===============================
Pandas dataframe utilities
***************************

.. automodule:: reco_utils.dataset.pandas_df_utils
:members:


Splitter utilities
===============================
******************

.. automodule:: reco_utils.dataset.python_splitters
:members:
Expand All @@ -48,14 +139,14 @@ Splitter utilities


Sparse utilities
===============================
****************

.. automodule:: reco_utils.dataset.sparse
:members:


Knowledge graph utilities
===============================
*************************

.. automodule:: reco_utils.dataset.wikidata
:members:
1 change: 0 additions & 1 deletion docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,6 @@ evaluating recommender systems.
:maxdepth: 1
:caption: Contents:

AzureML <azureml>
Common <common>
Dataset <dataset>
Evaluation <evaluation>
Expand Down
Loading

0 comments on commit 85d696c

Please sign in to comment.