Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zhangya #1415

Merged
merged 71 commits into from
Jun 15, 2021
Merged

Zhangya #1415

Show file tree
Hide file tree
Changes from 59 commits
Commits
Show all changes
71 commits
Select commit Hold shift + click to select a range
c694ae6
update doc source
YanZhangADS May 26, 2021
8a16f00
fix format
YanZhangADS May 27, 2021
25fee43
fix format
YanZhangADS May 27, 2021
49788d7
remove log file and fix type
YanZhangADS May 27, 2021
2ab6c88
Merge branch 'pipeline_release' into zhangya
YanZhangADS May 27, 2021
0740f71
remove log file
YanZhangADS May 27, 2021
87b6e61
update rst file
YanZhangADS May 27, 2021
ef43004
modify instructions
YanZhangADS Jun 1, 2021
47d3c98
update
YanZhangADS Jun 2, 2021
61d2be5
Miguel/review docs (#1425)
miguelgfierro Jun 3, 2021
501c76a
requirment files
YanZhangADS Jun 3, 2021
bb69060
add gpu packages
YanZhangADS Jun 3, 2021
3fd4b4c
add sublevel
YanZhangADS Jun 3, 2021
7a3cc44
fix
YanZhangADS Jun 3, 2021
a55fa78
Merge branch 'pipeline_release' into zhangya
YanZhangADS Jun 3, 2021
94d23fc
Andreas/docs (#1426)
anargyri Jun 3, 2021
33cb2d8
Merge branch 'zhangya' of github.com:microsoft/recommenders into zhangya
YanZhangADS Jun 3, 2021
628e88f
update algos
YanZhangADS Jun 3, 2021
c76d679
Fix DeepRec titles; add levels under DeepRec docs (#1428)
anargyri Jun 4, 2021
a98e8cf
Merge branch 'pipeline_release' into zhangya
YanZhangADS Jun 4, 2021
9c7b4e8
improve requirement.txt
YanZhangADS Jun 4, 2021
5653e0e
mock imports
YanZhangADS Jun 4, 2021
0062924
fix
YanZhangADS Jun 4, 2021
e261a90
fix
YanZhangADS Jun 4, 2021
588e056
remove mock
YanZhangADS Jun 5, 2021
6ad6d9f
add package
YanZhangADS Jun 5, 2021
639387c
fix imports
YanZhangADS Jun 7, 2021
110e62a
Miguel/review doc2 (#1430)
miguelgfierro Jun 7, 2021
ed0b726
fix
YanZhangADS Jun 8, 2021
b165432
fix
YanZhangADS Jun 8, 2021
6b8bd1c
fix
YanZhangADS Jun 8, 2021
83033d9
fix
YanZhangADS Jun 8, 2021
4fa688a
fix
YanZhangADS Jun 8, 2021
956da84
fix
YanZhangADS Jun 8, 2021
e44212f
fix
YanZhangADS Jun 8, 2021
9614f58
fix
YanZhangADS Jun 8, 2021
3262bc0
fix
YanZhangADS Jun 8, 2021
162ce73
fix
YanZhangADS Jun 8, 2021
516fd53
fix
YanZhangADS Jun 8, 2021
f57cee9
fix
YanZhangADS Jun 8, 2021
bca42bb
fix
YanZhangADS Jun 8, 2021
e96e347
fix
YanZhangADS Jun 8, 2021
07c9efd
fix
YanZhangADS Jun 8, 2021
de28173
fix
YanZhangADS Jun 8, 2021
a3a7072
fix
YanZhangADS Jun 8, 2021
11e7bef
Update README.md
anargyri Jun 9, 2021
61ee0f6
Update CONTRIBUTING.md
anargyri Jun 9, 2021
8ad3cdb
Update CONTRIBUTING.md
anargyri Jun 9, 2021
0417347
merge
YanZhangADS Jun 10, 2021
7ddb282
Miguel/review doc3 (#1436)
miguelgfierro Jun 10, 2021
b6f4f15
fix docstring
YanZhangADS Jun 10, 2021
726d183
fix
YanZhangADS Jun 10, 2021
70aa90f
fix docstring
YanZhangADS Jun 10, 2021
10a894c
Fix return types
anargyri Jun 10, 2021
f9d69df
Merge branch 'pipeline_release' into zhangya
YanZhangADS Jun 10, 2021
032ff56
fix naming
YanZhangADS Jun 14, 2021
d2cf13c
fix format
YanZhangADS Jun 14, 2021
16d9b37
fix format
YanZhangADS Jun 14, 2021
75f94eb
Miguel/review doc4 (#1440)
miguelgfierro Jun 14, 2021
d6da292
change numpy.array to numpy.ndarray
YanZhangADS Jun 14, 2021
ab07117
change from numpy.array to numpy.ndarray
YanZhangADS Jun 14, 2021
bb833ba
fix
YanZhangADS Jun 14, 2021
85e949a
More nitpicking
anargyri Jun 14, 2021
8db2532
Merge branch 'zhangya' of github.com:microsoft/recommenders into zhangya
anargyri Jun 14, 2021
665bbeb
fix
YanZhangADS Jun 14, 2021
8a58b95
Merge branch 'zhangya' of github.com:microsoft/recommenders into zhangya
YanZhangADS Jun 14, 2021
2960f05
fix
YanZhangADS Jun 14, 2021
be7c65e
fix
YanZhangADS Jun 14, 2021
f47b016
fix
YanZhangADS Jun 14, 2021
ea97b50
change return type from obj to list
YanZhangADS Jun 14, 2021
e5cc9aa
change obj to object
YanZhangADS Jun 14, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ Here are the basic steps to get started with your first contribution. Please rea
5. Install development requirements. `pip install -r dev-requirements.txt`
6. Create a test that replicates the issue.
7. Make code changes.
8. Ensure unit tests pass and code style / formatting is consistent (see [wiki](https://github.com/Microsoft/Recommenders/wiki/Coding-Guidelines#python-and-docstrings-style) for more details).
8. Ensure that unit tests pass and code style / formatting is consistent (see the [coding guidelines](https://github.com/Microsoft/Recommenders/wiki/Coding-Guidelines#python-and-docstrings-style) for more details). In particular, make sure that there is a docstring for every function and class you add and that it conforms to the [Google style](http://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html).
9. Create a pull request against **staging** branch.

Once the features included in a [milestone](https://github.com/microsoft/recommenders/milestones) are completed, we will merge staging into main. See the wiki for more detail about our [merge strategy](https://github.com/microsoft/recommenders/wiki/Strategy-to-merge-the-code-to-main-branch).
Expand Down
11 changes: 11 additions & 0 deletions docs/.readthedocs.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
version: 2

# Build from the docs/ directory with Sphinx
sphinx:
configuration: docs/source/conf.py

# Explicitly set the version of Python and its requirements
python:
version: 3.7
install:
- requirements: docs/requirements.txt
3 changes: 3 additions & 0 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,9 @@

To setup the documentation, first you need to install the dependencies of the full environment. For it please follow the [SETUP.md](../SETUP.md). Then type:

conda create -n reco_full python=3.6 cudatoolkit=10.0 cudnn>=7.6
conda activate reco_full
pip install .[all]
pip install sphinx_rtd_theme


Expand All @@ -11,3 +13,4 @@ To build the documentation as HTML:
cd docs
make html

To contribute to this repository, please follow our [coding guidelines](https://github.com/Microsoft/Recommenders/wiki/Coding-Guidelines). See also the [reStructuredText documentation](https://www.sphinx-doc.org/en/master/usage/restructuredtext/index.html) for the syntax of docstrings.
34 changes: 34 additions & 0 deletions docs/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
numpy>=1.14
pandas>1.0.3,<2
scipy>=1.0.0,<2
tqdm>=4.31.1,<5
matplotlib>=2.2.2,<4
scikit-learn>=0.22.1,<1
numba>=0.38.1,<1
lightfm>=1.15,<2
lightgbm>=2.2.1,<3
memory_profiler>=0.54.0,<1
nltk>=3.4,<4
pydocumentdb>=2.3.3<3
pymanopt>=0.2.5,<1
seaborn>=0.8.1,<1
transformers>=2.5.0,<5
bottleneck>=1.2.1,<2
category_encoders>=1.3.0,<2
jinja2>=2,<3
pyyaml>=5.4.1,<6
requests>=2.0.0,<3
cornac>=1.1.2,<2
scikit-surprise>=0.19.1,<=1.1.1
retrying>=1.3.3
azure.mgmt.cosmosdb>=0.8.0,<1
hyperopt>=0.1.2,<1
ipykernel>=4.6.1,<5
jupyter>=1,<2
locust>=1,<2
papermill>=2.1.2,<3
scrapbook>=0.5.0,<1.0.0
nvidia-ml-py3>=7.352.0
tensorflow-gpu>=1.15.0,<2
torch==1.2.0
fastai>=1.0.46,<2
19 changes: 0 additions & 19 deletions docs/source/azureml.rst

This file was deleted.

7 changes: 7 additions & 0 deletions docs/source/common.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,13 @@ GPU utilities
:members:


Kubernetes utilities
===============================

.. automodule:: reco_utils.common.k8s_utils
:members:


Notebook utilities
===============================

Expand Down
113 changes: 102 additions & 11 deletions docs/source/dataset.rst
Original file line number Diff line number Diff line change
@@ -1,41 +1,132 @@
.. _dataset:

Dataset module
**************************
##############

Recommendation datasets and related utilities

Recommendation datasets
===============================
***********************

.. automodule:: reco_utils.dataset.movielens
Amazon Reviews
==============

`Amazon Reviews dataset <https://snap.stanford.edu/data/web-Amazon.html>`_ consists of reviews from Amazon.
The data span a period of 18 years, including ~35 million reviews up to March 2013. Reviews include product and user
information, ratings, and a plaintext review.

:Citation:

J. McAuley and J. Leskovec, "Hidden factors and hidden topics: understanding rating dimensions with review text",
RecSys, 2013.

.. automodule:: reco_utils.dataset.amazon_reviews
:members:

CORD-19
=======

`COVID-19 Open Research Dataset (CORD-19) <https://azure.microsoft.com/en-us/services/open-datasets/catalog/covid-19-open-research/>`_ is a full-text
and metadata dataset of COVID-19 and coronavirus-related scholarly articles optimized
for machine readability and made available for use by the global research community.

In response to the COVID-19 pandemic, the Allen Institute for AI has partnered with leading research groups
to prepare and distribute the COVID-19 Open Research Dataset (CORD-19), a free resource of
over 47,000 scholarly articles, including over 36,000 with full text, about COVID-19 and the
coronavirus family of viruses for use by the global research community.

This dataset is intended to mobilize researchers to apply recent advances in natural language processing
to generate new insights in support of the fight against this infectious disease.

:Citation:

Wang, L.L., Lo, K., Chandrasekhar, Y., Reas, R., Yang, J., Eide, D.,
Funk, K., Kinney, R., Liu, Z., Merrill, W. and Mooney, P. "Cord-19: The COVID-19 Open Research Dataset.", 2020.


.. automodule:: reco_utils.dataset.covid_utils
:members:

Criteo
======

`Criteo dataset <https://www.kaggle.com/c/criteo-display-ad-challenge/overview>`_, released by Criteo Labs, is an online advertising dataset that contains feature values and click feedback
for millions of display Ads. Every Ad has has 40 attributes, the first attribute is the label where a value 1 represents
that the Ad has been clicked on and a 0 represents it wasn't clicked on. The rest consist of 13 integer columns and
26 categorical columns.

.. automodule:: reco_utils.dataset.criteo
:members:

MIND
====

`MIcrosoft News Dataset (MIND) <https://msnews.github.io/>`_, is a large-scale dataset for news recommendation research. It was collected from
anonymized behavior logs of Microsoft News website.

MIND contains about 160k English news articles and more than 15 million impression logs generated by 1 million users.
Every news article contains rich textual content including title, abstract, body, category and entities.
Each impression log contains the click events, non-clicked events and historical news click behaviors of this user before
this impression. To protect user privacy, each user was de-linked from the production system when securely hashed into an anonymized ID.

:Citation:

Fangzhao Wu, Ying Qiao, Jiun-Hung Chen, Chuhan Wu, Tao Qi, Jianxun Lian, Danyang Liu, Xing Xie, Jianfeng Gao, Winnie Wu
and Ming Zhou, "MIND: A Large-scale Dataset for News Recommendation", ACL, 2020.



.. automodule:: reco_utils.dataset.mind
:members:

MovieLens
=========

The `MovieLens datasets <https://grouplens.org/datasets/movielens/>`_, first released in 1998,
describe people's expressed preferences
for movies. These preferences take the form of `<user, item, rating, timestamp>` tuples,
each the result of a person expressing a preference (a 0-5 star rating) for a movie
at a particular time.

It comes with several sizes:

* MovieLens 100k: 100,000 ratings from 1000 users on 1700 movies.
* MovieLens 1M: 1 million ratings from 6000 users on 4000 movies.
* MovieLens 10M: 10 million ratings from 72000 users on 10000 movies.
* MovieLens 20M: 20 million ratings from 138000 users on 27000 movies

:Citation:

F. M. Harper and J. A. Konstan. "The MovieLens Datasets: History and Context".
ACM Transactions on Interactive Intelligent Systems (TiiS) 5, 4, Article 19,
DOI=http://dx.doi.org/10.1145/2827872, 2015.

.. automodule:: reco_utils.dataset.movielens
:members:

Download utilities
===============================
******************

.. automodule:: reco_utils.dataset.download_utils
:members:


Cosmos CLI
===============================
Cosmos CLI utilities
*********************

.. automodule:: reco_utils.dataset.cosmos_cli
:members:


Pandas dataframe utils
===============================
Pandas dataframe utilities
***************************

.. automodule:: reco_utils.dataset.pandas_df_utils
:members:


Splitter utilities
===============================
******************

.. automodule:: reco_utils.dataset.python_splitters
:members:
Expand All @@ -48,14 +139,14 @@ Splitter utilities


Sparse utilities
===============================
****************

.. automodule:: reco_utils.dataset.sparse
:members:


Knowledge graph utilities
===============================
*************************

.. automodule:: reco_utils.dataset.wikidata
:members:
1 change: 0 additions & 1 deletion docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,6 @@ evaluating recommender systems.
:maxdepth: 1
:caption: Contents:

AzureML <azureml>
Common <common>
Dataset <dataset>
Evaluation <evaluation>
Expand Down
Loading