Author: Yap Jheng Khin
The notebook is available in https://www.kaggle.com/polarbearyap/speeddating-part-ii
Note that this is the continuation from Part 1, which was done in here. I have also discover many mistakes from part I, and part II will serve as an improvement or postmortem.
List of mistakes that I have made in part I are:
- Preprocess on whole dataset, which cause train-test contamination.
- Perform cross validation instead of nested cross validation.
My learning expection in Part II are:
- Discover various ways to detect correlated features.
- Perform feature selection to reduce model complexity.
- Apply nested cross validation on areas like hyperarameter tuning.
- Discover XAI techniques that can be used in explaining black box models.
-
This data was gathered from participants in experimental speed dating events from 2002-2004.
-
During the events, the attendees would have a four-minute "first date" with every other participant of the opposite sex. At the end of their four minutes, participants were asked if they would like to see their date again. They were also asked to rate their date on six attributes: Attractiveness, Sincerity, Intelligence, Fun, Ambition, and Shared Interests.
-
The dataset also includes questionnaire data gathered from participants at different points in the process. These fields include: demographics, dating habits, self-perception across key attributes, beliefs on what others find valuable in a mate, and lifestyle information.
There are totally 56 preprocessed features which have undergone the data preprocessing in the dataset such as 'd_ funny' which show the particular attributes in discrete form. The dataset also 'has_ null' which represents whether the particular sample consisting null values. Several features with 'expected_' means the expectations of the users towards partners.
Features' Type | Example |
---|---|
age-related features | age, age_o, d_age |
unknown feature | wave |
field | field_sociology, field_money |
interest-related features | shopping, music |
partner-related features | intelligence_partner, funny_partner |
race-related features | race, importance_same_race |
features about partner's preference | pref_o_intelligence, pref_o_ambitious |
features about partner's rating on self | intelligence_o, funny_o |
features about self's preference | ambition_important, funny_important |
features about self's rating on herself/himself | funny, intelligence |
-
Published by: Joaquin Vanschoren @ 2016 on https://www.openml.org/d/40536
-
Available at:
-
This dataset is also available at kaggle
- Raymond Fisman; Sheena S. Iyengar; Emir Kamenica; Itamar Simonson. Gender Differences in Mate Selection: Evidence From a Speed Dating Experiment. The Quarterly Journal of Economics, Volume 121, Issue 2, 1 May 2006, Pages 673–697, https://doi.org/10.1162/qjec.2006.121.2.673