Skip to content

abduhsalam/Hate-Speech-Detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 

Repository files navigation

The Dataset for Hate Speech Detection in Indonesian

(Dataset untuk Deteksi Ujaran Kebencian dalam Bahasa Indonesia)

Dataset
The dataset is a two columns data of: label - tweet, consist of 713 tweets in Indonesian. The label is Non_HS or HS. Non_HS for "non-hate-speech" tweet and HS for "hate-speech" tweet.

  • Number of Non_HS tweets: 453
  • Number of HS tweets: 260 Since this dataset is unbalanced, you might have to do over-sampling/down-sampling in order to create a balanced dataset.
    The dataset may be used freely, but if you want to publish paper/publication using the dataset, please cite this publication:

Preproceesing

  • Case Folding (Lowercase, Remove Number, remove punctuation, whitespaces)
  • Tokenization
  • Stopword Removal
  • Stemming

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published