Skip to content

This repository is about pulling data from YouTube using the developer API

Notifications You must be signed in to change notification settings

NiiJoshua/YouTube-API

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 

Repository files navigation

YouTube Data extraction :Project Overview

This repository is about extracting data from YouTube using the developer API. The goal is to analyze the #endsars# trend that rocked the entire world.

Task

Script that extracts YouTube data to analyze the #endsars# trend. The script answers the following:

  • Filter out channels and playlists.
  • Get only videos published this year.
  • Include videos that are between 4 to 20 mins long.
  • Generic such that the search query can be changed.
  • Output should have filename : current_timestamp_youtube_data

The following video should contain the following attributes:

  • the time video was published
  • the video id
  • the title of the video
  • description
  • the URL of the video thumbnail
  • number of views
  • number of likes
  • number of dislikes
  • number of comments
  • A the column that builds the video URL using the video id

Code and resources used

Python version : 3.6 Packages : json, pandas OS :macOS Catalina Web Framework: virtual environment, requirements.txt Articles that help complete the script

Activities done:

  1. Create Credentials

Idealy the first step was to create a credential and generate authorization. What's needed is a google (gmail) account. Use this link to create the account. Getting authorization from google is instant not too stressful.

  1. Set-up your environment.

The kind of environment to use is key. There're a number of environments data engineers can deploy for this exercise. E.g, google colab, jupyter notebook, etc. I used jupyter notebook because I am conversant with it and also becuase all I do is saved on my local computer. For google colab, you'll need internet to create your workspace and to access your files. In situations where you have unstable internet, it delays execution of your project.

  1. Install the necessary packages

By defualt, jyputer notebook doesn't come with pre-installed packages for interaction with youtube. I had to install the google apiclient ($ pip install --upgrade google-api-python-client), and the authentication client ($ pip install google-auth). Before installing these libraries, be sure to check your python version to aid installation of the appropriate client version and authentication version.

  1. define your own module

The next step I took was to hide my credentials by writing a script in my local container. I then imported the module in my script and called the credentials. First, I created a python script called dir having init.py and a second script in the same folder as the dir which I named key.py having credentials = {"DEVELOPER_KEY": "xxxxxxxx"} as the only script.

  1. Define a function for the script

I proceeded to define a function to complete the task.

  1. Download and save in .py

Working in jupyter notebook stores files in .ipynb but to save as a script, i.e. .py, I downloaded the file as Python(.py)

  1. Create requirements.txt

I used Vs Code to access my .py file to create a virtual environment and the requirements.txt. If you're new to using virtual environment, this material could be of help if you want to use Vs code.

About

This repository is about pulling data from YouTube using the developer API

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages