Skip to content

Tools for time-series exploratory data analysis, cleaning and preparing for machine learning applications + rapid-prototyping time-series classification models and applications.

License

Notifications You must be signed in to change notification settings

LaverdeS/Multivariate-Time-Series-Classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Multivariate-Time-Series-Classification

The purpose of this repo is to provide some tools for time-series Exploratory Data Analysis (EDA) and data preparation pipelines for machine learning applications and research with eye-tracking data: gaze and pupil dilation in. The initial processing and transformation blocks enhance the researcher for rapid-prototyping data applications and first-hand data cleaning, visualization and chained transformations.

The tool-box is organized by modules found on the python folder. The tools are part of one of the following families:

  • 🌠 Preprocessing tools: including data loader, DataFrame constructures, transformation functions to format, standarize, and normalize the data.
  • 🎨 Visualizing tools: plotting methods that assist in EDA of time series data and reporting.
  • πŸͺ“ Purging tools: methods use to clean data points from time-series features and to detect + visualize + remove outliers from the data by statistical methods such as Median Absolute Deviation (MAD) and Interquartile Range (IQR).
  • πŸ‘οΈ Classyfing tools: BinaryTimeSeriesClassifier class with methods to train univariate or multivariate time-series classification with eye-tracking data (pupil dilations, gaze path, ...). Training strategies/models: RandomForestClassifier (using Tabularization and/or Feature Extractors) and ROCKET-CNN. Tabularization and Feature Extractor transformations can be applied as preprocessing steps for ROCKET-CNN and other time-series classification models from sktime.

β˜• Getting Started

  • Create a conda environment to work in and activate it, e.g. conda environment named 'ts-tools':

    conda create ts-tools python=3.7
    conda activate ts-tools

  • Install the requirements using the requirements.txt and python package manage pip:

    pip install -r requirements.txt

πŸ‘ Quick Tour

You can run this Jupyter notebook locally or this Google Colab notebook to quickly check some methods that served for one of the use-cases: Keeping an eye on Tinder

The following code summarizes the how to use chained methods from the tool-box for doing a clean EDA and data preparation in 6 steps. This input for this pipeline are the .json files and the output is a .csv containing ml-ready data. This is the equivalent of a data pipeline and some methods for visualizing time-series data:

from python.preprocessing import json_data_to_dataframe, add_relative_to_baseline, 
from python.preprocessing import normalize_lengths, normalize_float_resolution_ts, standarize
from python.visualizing import plot_collection
from python.purging import remove_outliers_mad_single_feature

# Load and purge the data from blinking values
df = json_data_to_dataframe(path='sample-data/tinder')
df = detect_and_remove_blinking_from(df, ['pupil_dilation', 'baseline'])

# Visualize the data
pupil_collection = [(rating, series_i) for rating, series_i in zip(df.rating, df.pupil_dilation)]
HTML(plot_collection(4, pupil_collection).to_html())

# Add calculated fields and normalize lengths
df = add_relative_to_baseline('pupil_dilation', df)
df['relative_pupil_dilation'] = normalize_lengths(df.relative_pupil_dilation.tolist())

# Remove extreme outliers using MAD, normalize float resolution and standarize time-series
df = remove_outliers_mad_single_feature(df, column='relative_pupil_dilation')
df = normalize_float_resolution_ts(df, columns=['pupil_dilation', 'relative_pupil_dilation', 'baseline'])
df.relative_pupil_dilation = df.relative_pupil_dilation.apply(standarize)

# Visualize the transformed data
relative_pupil_collection = [(rating, series_i) for rating, series_i in zip(df.rating, df.relative_pupil_dilation)]
HTML(plot_collection(4, relative_pupil_collection).to_html())

# Save to disk
df.to_csv("ml-ready-data.csv")

🧰 Toolbox

The following developer tools list of 17 methods are part of the modules inside python directory:

🌠 Preprocessing

  • json_data_to_dataframe
  • min_listoflists_length
  • max_listoflists_length
  • standarize
  • normalize_lengths
  • normalize_float_resolution_ts
  • normalize_float_resolution
  • add_relative_to_baseline

🎨 Visualizing

  • plot_collection
  • plot_outliers_in

πŸͺ“ Purging

  • detect_and_remove_blinking_from
  • mad_method
  • remove_outliers_mad_single_feature
  • iqr_method
  • iqr_analysis
  • count_outliers
  • purge_iter_iqr_method

πŸ‘οΈ Classyfing (BinaryTimeSeriesClassifier)

  • _prepare_binary_labels
  • _load_data_from_path
  • build_training_data
  • save_training_data
  • train [tabularization, rocket, feature-extractor]

πŸ’Ό Use Cases

The following examples are using the tools provided by this repository and can be foundational for similar kind of work.

  • Keeping an Eye on Tinder: Towards Automated Detection of Partner Selection via Pupillary Data from Eye-tracker and Smartphone Cameras
  • Eye-D: Identifying Users by their Gaze and Pupil Diameter Data while Drawing Patterns

πŸ“ License

The GNU General Public License: Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. Note that free here refers to freedom but not free of price. Doing this repository took several hours. This time and effort is with the spirit of providing the research community with beneficial tools for their eye-tracking projects. Everyone is welcome to contribute. If you find this repository useful and want to suppot the author, you can Buy Me a Coffe!