Skip to content
Hai To edited this page Dec 14, 2018 · 2 revisions

Welcome

In this repo we will explore the Yelp data-set from Kaggle a little bit.

Data Set

  • excerpt of Yelp's businesses, reviews and user data
  • for 11 metropolitan areas across 4 countries
  • 5,200,000 user reviews on 174,000 businesses

Potential Analysis

Market / Businesses

  1. Can we identify "trending" areas / neighbourhoods?
    • hip areas like Neukoeln, Kreuzberg etc.
  2. Are the clusters of areas homogeneous in e.g. price level / categories / styles?
    • think of posh (Friedrichstrasse) vs. trashy (Warschauerstr.)
  3. Would a retailer's store fit more into certain areas?
    • given a list of retailer venues, can we predict checkins (or sales even?)
  4. Are there businesses that synergize?

Users

  1. Do user affect / influence their friends (to test / visit venues)?
    • can we identify inluencers
  2. Can we classify users into archetypes?
    • e.g. foodies, hipsters, fashionistas, parents, yuppies, etc.
  3. Can we recommend businesses / venues to users?
    • predict user rating for biz
  4. Can we predict whether / which businesses a user will visit next?
    • base on behaviour, preference, friends etc.
Clone this wiki locally