The official repository for Team Lil Data, CMPT 732 final project at Simon Fraser University.
To visit the site please visit https://www.devxia.com/Lil-Data
2015 sample data
: sample Twitch stream dataset, shared by from Mr. Cong Zhang ([email protected])analysis
: Python code where we do analysis on 2015, 2018 Twitch, GiantBomb.com dataset, mostly usingApache Spark
most_popular_category.py
: Evaluate the popularity of a category using four different perspectivesBest_time_of_stream.py
: Evaluate the best time in a day for streaming
data collecting
: Crawler code that collect data from Twitch official api every 30 minutes, and fetch all games information fromGiantBomb.com
crawl_twitch.py
: Made use of twitch api to access and download live streaming data from twitch database. The work was running with multi-threads every half an hour, from 11/13/2018 to 11/26/2018. About 50 gigabytes of data were collected at the endfetch_games.py
: Used api provided by Giantbomb.com to collect game data from its databaseget_giantbomb_genre.py
: With guid of game, which was collected from Giantbomb api, call another api from Giantbomb to get detailed game information, including genres
ETL
: Extract, transform, load code written inPython
, usingApache Spark
twitch_raw_data_clean.py
: grab useful features forstream
objects andchannel
objects from dirty json filestwitch_dataframe_ETL.py
: reconstruct the dataframe using customized schemagiantbomb_game_info_ETL.py
: grab useful features forgame
objects from dirty json filesjoin_with_giantbomb.py
: joinstream
withgame
, creating a new table including both stream and game informationread_guid.py
: get gamegenres
from dirty json files
frontend
: web frontend written inJavaScript
, mostly inReact.js
docs
: production build of react web frontend
Yarn is our package manager of choice for the frontend project, not to be confused with Apache Hadoop YARN since this is also a "Big Data" project.
To setup environment for the frontend app (/frontend), do yarn install
or just yarn
.
To start, do yarn start
To perform static type checking, do yarn flow
Under 'results' folder
-
Top 20 most popular categories by evaluating their: https://github.com/harrisonxia/Lil-Data/blob/master/Analysis/popular_categories_time_frames.py
- Number of reviews on GiantBomb (which are all 0, though)
- Number of live streams that were broadcasting a game in this category
- Number of viewers that are watching a game in this category
- Number of followers that are following a channel that broadcasts this category of game
-
Best time frame to broadcast stream by evaluating: https://github.com/harrisonxia/Lil-Data/blob/master/Analysis/Best_time_of_stream.py
- Number of streams:
- Number of streams in each time frame in each day
- Sum of number of streams in each time frame throughout entire data collecting period
- A trend of change of number of streams for each time frame throughout all days
- Number of veiwers:
- Number of viewers in each time frame in each day
- Sum of number of viewers in each time frame throughout entire data collecting period
- A trend of change of number of viewers for each time frame throughout all days
- Number of streams:
-
Viewers distribution among days of week: https://github.com/harrisonxia/Lil-Data/blob/master/Analysis/views_in_dow.py