CSYE7200_FinalProject_Team2_Spring2017

This is the CSYE7200 FinalProject for Team2 Spring2017

Team member:

Yuan Ying - [email protected]

Mushtaq Rizvi - [email protected]

Wei Huang - [email protected]

Jinjin Zhang - [email protected]

Sentiment Analysis on Tweets

Planning presentation

Final presentation

Abstract

The goal is to process real Twitter datasets to extract meaningful analysis by performing Sentiment Analysis. In this project, we utilize information available through the Twitter API to gather information about the tweets and their users. Sentiment analysis is used to see if a text is neutral, positive or negative. Since Twitter restricts each tweet to be less than 140 characters, users’ comments tend to be straightforward. In addition, because of its huge influences, many people have started to include a hashtag in their tweets to attract social attention. Therefore, Twitter has become a great platform to examine people’s feedback. Thus twitter is full of sentiments. This project is using Twitter Search and Streaming API through Spark Streaming to retreive the tweets, Stanford NLP library to detect the sentiments and Apache Zeppelin to visualize the results.

Methodology

1. Tweets acquired by Search API are in JSON format with a maximum limit of 100 per request. Built a JSON parser to correctly parse the and filter those attributes which are not required.
2. More filtering by specifying the language and Geolocation of tweets.
3. Special characters are removed to increase the accuracy of the sentiment scores.
4. Using Stanford NLP to calculate the sentiment score which tells whether the particular tweet is positive or negative.
5. Using Spark Streaming to receive the stream of tweets and perform some of the analysis like popular hashtags and location based sentiment scores.
6. Last but not the least, Apache Zeppelin is used for the visualization to display the bar plots showing the popular hashtags and their respective counts. Also, the leaflet geo map which shows the sentiment score of a particular location using latitudes and longitudes.

Inputs and Outputs

If the user inputs a city or company name, the system calculates the sentiment score for the city weather and company stock in last 7 days.
If the user inputs a keyword, the system generates top 10 popular hashtags in a 5 minute window. It also shows the count of tweets and sentiment score for each hashtag and there is a bar chart in Apache Zeppelin where results can be visualized.

The arguments that can be passed while running the jar are:-
1. hashtags => It will run popularHashTags without keyword
2. hashtags New York => It will run popularHashTags with keyword "New York"
3. map => It will run popularLocations without keyword
4. map New York => It will run popularLocations with keyword "New York"
5. weather => It will run 10 city weather comparison
6. weather New York => It will return sentiment score for New York weather
7. stock => It will run 10 company stock comparison
8. stock Bank of America => It will return sentiment score for Bank of America stock

If you are using SBT, you can run as:
1. sbt "run hashtags"
2. sbt "run hashtags New York"
3. etc...

Continuous integration

This project is using CircleCI as the continuous integration tool.

Current Status:

Name		Name	Last commit message	Last commit date
Latest commit History 96 Commits
.circleci		.circleci
presentation		presentation
project		project
src		src
testdata		testdata
zeppelin_noteBooks		zeppelin_noteBooks
README.md		README.md
build.sbt		build.sbt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CSYE7200_FinalProject_Team2_Spring2017

Sentiment Analysis on Tweets

Abstract

Methodology

Inputs and Outputs

Continuous integration

About

Releases

Packages

Contributors 4

Languages

yingy4/CSYE7200_FinalProject_Team2_Spring2017

Folders and files

Latest commit

History

Repository files navigation

CSYE7200_FinalProject_Team2_Spring2017

Sentiment Analysis on Tweets

Abstract

Methodology

Inputs and Outputs

Continuous integration

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages