YouTube Data extraction :Project Overview

This repository is about extracting data from YouTube using the developer API. The goal is to analyze the #endsars# trend that rocked the entire world.

Task

Script that extracts YouTube data to analyze the #endsars# trend. The script answers the following:

Filter out channels and playlists.
Get only videos published this year.
Include videos that are between 4 to 20 mins long.
Generic such that the search query can be changed.
Output should have filename : current_timestamp_youtube_data

The following video should contain the following attributes:

the time video was published
the video id
the title of the video
description
the URL of the video thumbnail
number of views
number of likes
number of dislikes
number of comments
A the column that builds the video URL using the video id

Code and resources used

Python version : 3.6 Packages : json, pandas OS :macOS Catalina Web Framework: virtual environment, requirements.txt Articles that help complete the script

Activities done:

Create Credentials

Idealy the first step was to create a credential and generate authorization. What's needed is a google (gmail) account. Use this link to create the account. Getting authorization from google is instant not too stressful.

Set-up your environment.

The kind of environment to use is key. There're a number of environments data engineers can deploy for this exercise. E.g, google colab, jupyter notebook, etc. I used jupyter notebook because I am conversant with it and also becuase all I do is saved on my local computer. For google colab, you'll need internet to create your workspace and to access your files. In situations where you have unstable internet, it delays execution of your project.

Install the necessary packages

By defualt, jyputer notebook doesn't come with pre-installed packages for interaction with youtube. I had to install the google apiclient ($ pip install --upgrade google-api-python-client), and the authentication client ($ pip install google-auth). Before installing these libraries, be sure to check your python version to aid installation of the appropriate client version and authentication version.

define your own module

The next step I took was to hide my credentials by writing a script in my local container. I then imported the module in my script and called the credentials. First, I created a python script called dir having init.py and a second script in the same folder as the dir which I named key.py having credentials = {"DEVELOPER_KEY": "xxxxxxxx"} as the only script.

Define a function for the script

I proceeded to define a function to complete the task.

Download and save in .py

Working in jupyter notebook stores files in .ipynb but to save as a script, i.e. .py, I downloaded the file as Python(.py)

Create requirements.txt

I used Vs Code to access my .py file to create a virtual environment and the requirements.txt. If you're new to using virtual environment, this material could be of help if you want to use Vs code.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
README.md		README.md
YouTube.py		YouTube.py
requirement.txt		requirement.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

YouTube Data extraction :Project Overview

Task

Code and resources used

Activities done:

Create Credentials

Set-up your environment.

Install the necessary packages

define your own module

Define a function for the script

Download and save in .py

Create requirements.txt

About

Releases

Packages

Languages

NiiJoshua/YouTube-API

Folders and files

Latest commit

History

Repository files navigation

YouTube Data extraction :Project Overview

Task

Code and resources used

Activities done:

Create Credentials

Set-up your environment.

Install the necessary packages

define your own module

Define a function for the script

Download and save in .py

Create requirements.txt

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages