Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add model M-RNN, and update the docs #127

Merged
merged 12 commits into from
May 21, 2023
41 changes: 22 additions & 19 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -146,25 +146,26 @@ mae = cal_mae(imputation, X_intact, indicating_mask) # calculate mean absolute
## ❖ Available Algorithms
PyPOTS supports imputation, classification, clustering, and forecasting tasks on multivariate time series with missing values. The currently available algorithms of four tasks are cataloged in the following table with four partitions. The paper references are all listed at the bottom of this readme file. Please refer to them if you want more details.

| ***`Imputation`*** | 🚥 | 🚥 | 🚥 |
|:----------------------:|:------------:|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:--------:|
| **Type** | **Abbr.** | **Full name of the algorithm/model/paper** | **Year** |
| Neural Net | SAITS | Self-Attention-based Imputation for Time Series [^1] | 2023 |
| Neural Net | Transformer | Attention is All you Need [^2];<br>Self-Attention-based Imputation for Time Series [^1];<br><sub>Note: proposed in [^2], and re-implemented as an imputation model in [^1].</sub> | 2017 |
| Neural Net | BRITS | Bidirectional Recurrent Imputation for Time Series [^3] | 2018 |
| Naive | LOCF | Last Observation Carried Forward | - |
| ***`Classification`*** | 🚥 | 🚥 | 🚥 |
| **Type** | **Abbr.** | **Full name of the algorithm/model/paper** | **Year** |
| Neural Net | BRITS | Bidirectional Recurrent Imputation for Time Series [^3] | 2018 |
| Neural Net | GRU-D | Recurrent Neural Networks for Multivariate Time Series with Missing Values [^4] | 2018 |
| Neural Net | Raindrop | Graph-Guided Network for Irregularly Sampled Multivariate Time Series [^5] | 2022 |
| ***`Clustering`*** | 🚥 | 🚥 | 🚥 |
| **Type** | **Abbr.** | **Full name of the algorithm/model/paper** | **Year** |
| Neural Net | CRLI | Clustering Representation Learning on Incomplete time-series data [^6] | 2021 |
| Neural Net | VaDER | Variational Deep Embedding with Recurrence [^7] | 2019 |
| ***`Forecasting`*** | 🚥 | 🚥 | 🚥 |
| **Type** | **Abbr.** | **Full name of the algorithm/model/paper** | **Year** |
| Probabilistic | BTTF | Bayesian Temporal Tensor Factorization [^8] | 2021 |
| ***`Imputation`*** | 🚥 | 🚥 | 🚥 |
|:----------------------:|:-----------:|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:--------:|
| **Type** | **Abbr.** | **Full name of the algorithm/model/paper** | **Year** |
| Neural Net | SAITS | Self-Attention-based Imputation for Time Series [^1] | 2023 |
| Neural Net | Transformer | Attention is All you Need [^2];<br>Self-Attention-based Imputation for Time Series [^1];<br><sub>Note: proposed in [^2], and re-implemented as an imputation model in [^1].</sub> | 2017 |
| Neural Net | BRITS | Bidirectional Recurrent Imputation for Time Series [^3] | 2018 |
| Neural Net | M-RNN | Multi-directional Recurrent Neural Network [^9] | 2019 |
| Naive | LOCF | Last Observation Carried Forward | - |
| ***`Classification`*** | 🚥 | 🚥 | 🚥 |
| **Type** | **Abbr.** | **Full name of the algorithm/model/paper** | **Year** |
| Neural Net | BRITS | Bidirectional Recurrent Imputation for Time Series [^3] | 2018 |
| Neural Net | GRU-D | Recurrent Neural Networks for Multivariate Time Series with Missing Values [^4] | 2018 |
| Neural Net | Raindrop | Graph-Guided Network for Irregularly Sampled Multivariate Time Series [^5] | 2022 |
| ***`Clustering`*** | 🚥 | 🚥 | 🚥 |
| **Type** | **Abbr.** | **Full name of the algorithm/model/paper** | **Year** |
| Neural Net | CRLI | Clustering Representation Learning on Incomplete time-series data [^6] | 2021 |
| Neural Net | VaDER | Variational Deep Embedding with Recurrence [^7] | 2019 |
| ***`Forecasting`*** | 🚥 | 🚥 | 🚥 |
| **Type** | **Abbr.** | **Full name of the algorithm/model/paper** | **Year** |
| Probabilistic | BTTF | Bayesian Temporal Tensor Factorization [^8] | 2021 |


## ❖ Citing PyPOTS
Expand Down Expand Up @@ -254,6 +255,8 @@ Thank you all for your attention! 😃
[^6]: Ma, Q., Chen, C., Li, S., & Cottrell, G. W. (2021). [Learning Representations for Incomplete Time Series Clustering](https://ojs.aaai.org/index.php/AAAI/article/view/17070). *AAAI 2021*.
[^7]: Jong, J.D., Emon, M.A., Wu, P., Karki, R., Sood, M., Godard, P., Ahmad, A., Vrooman, H.A., Hofmann-Apitius, M., & Fröhlich, H. (2019). [Deep learning for clustering of multivariate clinical patient trajectories with missing values](https://academic.oup.com/gigascience/article/8/11/giz134/5626377). *GigaScience*.
[^8]: Chen, X., & Sun, L. (2021). [Bayesian Temporal Factorization for Multidimensional Time Series Prediction](https://arxiv.org/abs/1910.06366). *IEEE transactions on pattern analysis and machine intelligence*.
[^9]: Yoon, J., Zame, W. R., & van der Schaar, M. (2019). [Estimating Missing Data in Temporal Data Streams Using Multi-Directional Recurrent Neural Networks](https://ieeexplore.ieee.org/document/8485748). *IEEE Transactions on Biomedical Engineering*.


<details>
<summary>🏠 Visits</summary>
Expand Down
2 changes: 2 additions & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -128,6 +128,7 @@ Task Type Algorithm
Imputation Neural Network SAITS (Self-Attention-based Imputation for Time Series) 2022 :cite:`du2023SAITS`
Imputation Neural Network Transformer 2017 :cite:`vaswani2017Transformer`, :cite:`du2023SAITS`
Imputation, Classification Neural Network BRITS (Bidirectional Recurrent Imputation for Time Series) 2018 :cite:`cao2018BRITS`
Imputation Neural Network M-RNN (Multi-directional Recurrent Neural Network) 2019 :cite:`yoon2019MRNN`
Imputation Naive LOCF (Last Observation Carried Forward) / /
Classification Neural Network GRU-D 2018 :cite:`che2018GRUD`
Classification Neural Network Raindrop 2022 :cite:`zhang2022Raindrop`
Expand All @@ -136,6 +137,7 @@ Clustering Neural Network VaDER (Variational Deep Embeddin
Forecasting Probabilistic BTTF (Bayesian Temporal Tensor Factorization) 2021 :cite:`chen2021BTMF`
============================== ================ ========================================================================= ====== =========

[^9]: Yoon, J., Zame, W. R., & van der Schaar, M. (2019). [Estimating Missing Data in Temporal Data Streams Using Multi-Directional Recurrent Neural Networks](https://ieeexplore.ieee.org/document/8485748). *IEEE Transactions on Biomedical Engineering*.

❖ Citing PyPOTS
^^^^^^^^^^^^^^^^
Expand Down
9 changes: 9 additions & 0 deletions docs/pypots.imputation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,15 @@ pypots.imputation.brits module
:show-inheritance:
:inherited-members:

pypots.imputation.mrnn module
------------------------------

.. automodule:: pypots.imputation.mrnn
:members:
:undoc-members:
:show-inheritance:
:inherited-members:

pypots.imputation.locf module
-----------------------------

Expand Down
12 changes: 12 additions & 0 deletions docs/references.bib
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,18 @@ @article{cao2018BRITS
keywords = {Computer Science - Machine Learning,Statistics - Machine Learning}
}

@ARTICLE{yoon2019MRNN,
author={Yoon, Jinsung and Zame, William R. and van der Schaar, Mihaela},
journal={IEEE Transactions on Biomedical Engineering},
title={Estimating Missing Data in Temporal Data Streams Using Multi-Directional Recurrent Neural Networks},
year={2019},
volume={66},
number={5},
pages={1477-1490},
doi={10.1109/TBME.2018.2874712}
}


@article{che2018GRUD,
title = {Recurrent {{Neural Networks}} for {{Multivariate Time Series}} with {{Missing Values}}},
author = {Che, Zhengping and Purushotham, Sanjay and Cho, Kyunghyun and Sontag, David and Liu, Yan},
Expand Down
2 changes: 1 addition & 1 deletion pypots/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
#
# Dev branch marker is: 'X.Y.dev' or 'X.Y.devN' where N is an integer.
# 'X.Y.dev0' is the canonical version of 'X.Y.dev'
__version__ = "0.1.0"
__version__ = "0.1.1"


__all__ = [
Expand Down
6 changes: 4 additions & 2 deletions pypots/imputation/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,10 +9,12 @@
from .locf import LOCF
from .saits import SAITS
from .transformer import Transformer
from .mrnn import MRNN

__all__ = [
"BRITS",
"Transformer",
"SAITS",
"Transformer",
"BRITS",
"MRNN",
"LOCF",
]
13 changes: 13 additions & 0 deletions pypots/imputation/mrnn/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
"""

"""

# Created by Wenjie Du <[email protected]>
# License: GLP-v3

from .model import MRNN


__all__ = [
"MRNN",
]
46 changes: 46 additions & 0 deletions pypots/imputation/mrnn/data.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
"""
Dataset class for model MRNN.
"""

# Created by Wenjie Du <[email protected]>
# License: GLP-v3

from typing import Union

from ..brits.data import DatasetForBRITS


class DatasetForMRNN(DatasetForBRITS):
"""Dataset class for BRITS.

Parameters
----------
data : dict or str,
The dataset for model input, should be a dictionary including keys as 'X' and 'y',
or a path string locating a data file.
If it is a dict, X should be array-like of shape [n_samples, sequence length (time steps), n_features],
which is time-series data for input, can contain missing values, and y should be array-like of shape
[n_samples], which is classification labels of X.
If it is a path string, the path should point to a data file, e.g. a h5 file, which contains
key-value pairs like a dict, and it has to include keys as 'X' and 'y'.

return_labels : bool, default = True,
Whether to return labels in function __getitem__() if they exist in the given data. If `True`, for example,
during training of classification models, the Dataset class will return labels in __getitem__() for model input.
Otherwise, labels won't be included in the data returned by __getitem__(). This parameter exists because we
need the defined Dataset class for all training/validating/testing stages. For those big datasets stored in h5
files, they already have both X and y saved. But we don't read labels from the file for validating and testing
with function _fetch_data_from_file(), which works for all three stages. Therefore, we need this parameter for
distinction.

file_type : str, default = "h5py"
The type of the given file if train_set and val_set are path strings.
"""

def __init__(
self,
data: Union[dict, str],
return_labels: bool = True,
file_type: str = "h5py",
):
super().__init__(data, return_labels, file_type)
Loading