Skip to content

polarBearYap/speeddating_AI2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 

Repository files navigation

speeddating_AI2

General

Author: Yap Jheng Khin

The notebook is available in https://www.kaggle.com/polarbearyap/speeddating-part-ii

Note that this is the continuation from Part 1, which was done in here. I have also discover many mistakes from part I, and part II will serve as an improvement or postmortem.

List of mistakes that I have made in part I are:

  • Preprocess on whole dataset, which cause train-test contamination.
  • Perform cross validation instead of nested cross validation.

My learning expection in Part II are:

  • Discover various ways to detect correlated features.
  • Perform feature selection to reduce model complexity.
  • Apply nested cross validation on areas like hyperarameter tuning.
  • Discover XAI techniques that can be used in explaining black box models.

Metadata

  • This data was gathered from participants in experimental speed dating events from 2002-2004.

  • During the events, the attendees would have a four-minute "first date" with every other participant of the opposite sex. At the end of their four minutes, participants were asked if they would like to see their date again. They were also asked to rate their date on six attributes: Attractiveness, Sincerity, Intelligence, Fun, Ambition, and Shared Interests.

  • The dataset also includes questionnaire data gathered from participants at different points in the process. These fields include: demographics, dating habits, self-perception across key attributes, beliefs on what others find valuable in a mate, and lifestyle information.

Attribute Information

There are totally 56 preprocessed features which have undergone the data preprocessing in the dataset such as 'd_ funny' which show the particular attributes in discrete form. The dataset also 'has_ null' which represents whether the particular sample consisting null values. Several features with 'expected_' means the expectations of the users towards partners.

Features' Type Example
age-related features age, age_o, d_age
unknown feature wave
field field_sociology, field_money
interest-related features shopping, music
partner-related features intelligence_partner, funny_partner
race-related features race, importance_same_race
features about partner's preference pref_o_intelligence, pref_o_ambitious
features about partner's rating on self intelligence_o, funny_o
features about self's preference ambition_important, funny_important
features about self's rating on herself/himself funny, intelligence

Source of the Dataset

Relevant Paper

  • Raymond Fisman; Sheena S. Iyengar; Emir Kamenica; Itamar Simonson. Gender Differences in Mate Selection: Evidence From a Speed Dating Experiment. The Quarterly Journal of Economics, Volume 121, Issue 2, 1 May 2006, Pages 673–697, https://doi.org/10.1162/qjec.2006.121.2.673

About

SpeedDating Machine Learning Project - Part II

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published