Sentiment Analysis on movie review data set using NLTK, Sci-Kit learner and some of the Weka classifiers
Goal- To predict the sentiments of reviews using basic classification algorithms and compare the results by varying different parameters.
Dataset-The data was taken from the original Pang and Lee movie review corpus based on reviews from the Rotten Tomatoes web site and later also used in a Kaggle competition.train.tsv contains the phrases and their associated sentiment labels. test.tsv contains just phrases
Features sets Used-Unigram feature(Bag of words), Bigram, Negation, POS(Parts of Speech) and also features based on sentiment lexicons such as LIWC,opinion lexicon and subjectivity(SL) lexicon
NLTK based Classifiers algorithms-Naive Bayes, Generalized Iterative Scaling , Improved Iterative Scaling algorithms
SciKit Learner CLassifiers- Random Forest,MultinomialNB, BernoulliNB, Logistic Regressions, SGDClassifer, SVC, Linear SVC, NuSVC, Decision Tree Classifier
Weka Classifiers-Naive Bayes, Random Forest