The purpose of this repo is to provide some tools for time-series Exploratory Data Analysis (EDA) and data preparation pipelines for machine learning applications and research with eye-tracking data: gaze and pupil dilation in. The initial processing and transformation blocks enhance the researcher for rapid-prototyping data applications and first-hand data cleaning, visualization and chained transformations.
The tool-box is organized by modules found on the python folder. The tools are part of one of the following families:
- π Preprocessing tools: including data loader, DataFrame constructures, transformation functions to format, standarize, and normalize the data.
- π¨ Visualizing tools: plotting methods that assist in EDA of time series data and reporting.
- πͺ Purging tools: methods use to clean data points from time-series features and to detect + visualize + remove outliers from the data by statistical methods such as Median Absolute Deviation (MAD) and Interquartile Range (IQR).
- ποΈ Classyfing tools: BinaryTimeSeriesClassifier class with methods to train univariate or multivariate time-series classification with eye-tracking data (pupil dilations, gaze path, ...). Training strategies/models: RandomForestClassifier (using Tabularization and/or Feature Extractors) and ROCKET-CNN. Tabularization and Feature Extractor transformations can be applied as preprocessing steps for ROCKET-CNN and other time-series classification models from sktime.
-
Create a conda environment to work in and activate it, e.g. conda environment named 'ts-tools':
conda create ts-tools python=3.7
conda activate ts-tools
-
Install the requirements using the
requirements.txt
and python package manage pip:pip install -r requirements.txt
You can run this Jupyter notebook locally or this Google Colab notebook to quickly check some methods that served for one of the use-cases: Keeping an eye on Tinder
The following code summarizes the how to use chained methods from the tool-box for doing a clean EDA and data preparation in 6 steps. This input for this pipeline are the .json
files and the output is a .csv
containing ml-ready data. This is the equivalent of a data pipeline and some methods for visualizing time-series data:
from python.preprocessing import json_data_to_dataframe, add_relative_to_baseline,
from python.preprocessing import normalize_lengths, normalize_float_resolution_ts, standarize
from python.visualizing import plot_collection
from python.purging import remove_outliers_mad_single_feature
# Load and purge the data from blinking values
df = json_data_to_dataframe(path='sample-data/tinder')
df = detect_and_remove_blinking_from(df, ['pupil_dilation', 'baseline'])
# Visualize the data
pupil_collection = [(rating, series_i) for rating, series_i in zip(df.rating, df.pupil_dilation)]
HTML(plot_collection(4, pupil_collection).to_html())
# Add calculated fields and normalize lengths
df = add_relative_to_baseline('pupil_dilation', df)
df['relative_pupil_dilation'] = normalize_lengths(df.relative_pupil_dilation.tolist())
# Remove extreme outliers using MAD, normalize float resolution and standarize time-series
df = remove_outliers_mad_single_feature(df, column='relative_pupil_dilation')
df = normalize_float_resolution_ts(df, columns=['pupil_dilation', 'relative_pupil_dilation', 'baseline'])
df.relative_pupil_dilation = df.relative_pupil_dilation.apply(standarize)
# Visualize the transformed data
relative_pupil_collection = [(rating, series_i) for rating, series_i in zip(df.rating, df.relative_pupil_dilation)]
HTML(plot_collection(4, relative_pupil_collection).to_html())
# Save to disk
df.to_csv("ml-ready-data.csv")
The following developer tools list of 17 methods are part of the modules inside python directory:
json_data_to_dataframe
min_listoflists_length
max_listoflists_length
standarize
normalize_lengths
normalize_float_resolution_ts
normalize_float_resolution
add_relative_to_baseline
plot_collection
plot_outliers_in
detect_and_remove_blinking_from
mad_method
remove_outliers_mad_single_feature
iqr_method
iqr_analysis
count_outliers
purge_iter_iqr_method
_prepare_binary_labels
_load_data_from_path
build_training_data
save_training_data
train
[tabularization
,rocket
,feature-extractor
]
The following examples are using the tools provided by this repository and can be foundational for similar kind of work.
- Keeping an Eye on Tinder: Towards Automated Detection of Partner Selection via Pupillary Data from Eye-tracker and Smartphone Cameras
- Eye-D: Identifying Users by their Gaze and Pupil Diameter Data while Drawing Patterns
The GNU General Public License: Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. Note that free here refers to freedom but not free of price. Doing this repository took several hours. This time and effort is with the spirit of providing the research community with beneficial tools for their eye-tracking projects. Everyone is welcome to contribute. If you find this repository useful and want to suppot the author, you can Buy Me a Coffe!