Skip to content

Dicksonchin93/toxic_comment_classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Toxic Comment Classification

This is my codes for the toxic comment classification competition hosted in Kaggle. Fully modified to another level from the base code here

To download datasets please run get_data.sh

The Task

The dataset comprises of comments from Wikipedia’s talk page edits. It is a large number of Wikipedia comments which have been labeled by human raters for toxic behavior. The types of toxicity are:

  • toxic
  • severe_toxic
  • obscene
  • threat
  • insult
  • identity_hate

The Approach

Creating an ensemble model which predicts a probability of each type of toxicity for each comment.Full explaination of my approach is documented here

Install Pre-requisites

run install.sh and then run pip install -r requirements.txt

Tips

  • Make sure embeddings original preprocessing is used to ensure highest percentage of embeddings can be imported

About

Predict Toxic Comments in the wild

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages