Skip to content

Latest commit

 

History

History
17 lines (8 loc) · 1.03 KB

README.md

File metadata and controls

17 lines (8 loc) · 1.03 KB

Kaggle-Movie-Review

Sentiment Analysis on movie review data set using NLTK, Sci-Kit learner and some of the Weka classifiers

Goal- To predict the sentiments of reviews using basic classification algorithms and compare the results by varying different parameters.

Dataset-The data was taken from the original Pang and Lee movie review corpus based on reviews from the Rotten Tomatoes web site and later also used in a Kaggle competition.train.tsv contains the phrases and their associated sentiment labels. test.tsv contains just phrases

Features sets Used-Unigram feature(Bag of words), Bigram, Negation, POS(Parts of Speech) and also features based on sentiment lexicons such as LIWC,opinion lexicon and subjectivity(SL) lexicon

NLTK based Classifiers algorithms-Naive Bayes, Generalized Iterative Scaling , Improved Iterative Scaling algorithms

SciKit Learner CLassifiers- Random Forest,MultinomialNB, BernoulliNB, Logistic Regressions, SGDClassifer, SVC, Linear SVC, NuSVC, Decision Tree Classifier

Weka Classifiers-Naive Bayes, Random Forest