Centiment is a service that performs sentiment analysis of tweets using Google's Natural Language APIs. It was designed with the goal of searching for cryptocurrency tweets, but can be used to analyze and aggregate sentiments for any search terms.
- It will search Twitter for tweets matching the configured search terms, and store the aggregate "sentiment" (negative, neutral or positive) and magnitude each time it runs a search.
- Search terms can be easily added without writing code via
cmd/centimentd/search.toml
- The aggregate results are made available via a REST API.
The goal is to see whether written sentiment about cryptocurrencies has correlation with prices - e.g. does a negative sentiment predict or otherwise reinforce a drop in price?
Centiment relies on Google's Natural Language APIs and Firestore, but otherwise can run anywhere provided it can reach these services.
At a minimum, you'll need to:
- Install the Google Cloud SDK & create a new project with billing enabled.
- Create a new Firestore instance & enable the Natural Language API via the Google Cloud API Dashboard.
- Create a new Twitter application & retrieve your API credentials.
- Install the Firebase SDK via
npm install -g firebase-tools
You can run Centiment locally with a properly configured Go toolchain and Service Account credentials saved locally.
# Fetch Centiment & its dependencies
go get github.com/elithrar/centiment/...
# Initialize the Firebase SDK & create the required indexes
centiment/ $ firebase login
centiment/ $ firebase deploy --only firestore:indexes
# Set the required configuration as env. variables, or pass via flags (see: `centiment --help`)
export TWITTER_CONSUMER_KEY="key"; \
export TWITTER_CONSUMER_SECRET="secret"; \
export TWITTER_ACCESS_TOKEN="at"; \
export TWITTER_ACCESS_KEY="ak"; \
export CENTIMENT_PROJECT_ID="your-gcp-project-id"; \
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/creds.json";
# Run centimentd (the server) in the foreground, provided its on your PATH:
$ centimentd
App Engine Flexible makes running Centiment fairly easy: no need to set up or secure an environment.
git clone
orgo get
this repository:git clone https://github.com/elithrar/centiment.git
- Copy
app.example.yaml
toapp.yaml
and add your Twitter API keys underenv_variables
- important: don't check these credentials into your source-code! The.gitignore
file included in the repo should help to prevent that.
The service can then be deployed via:
centiment $ cd cmd/centimentd
cmd/centimentd $ gcloud app deploy
Some notes on running this yourself:
- The default
app.example.yaml
included alongside is designed to use the minimum set of resources on App Engine Flex. Centiment is extremely efficient (it's written in Go) and runs quickly on a single CPU core + 600MB RAM. At the time of writing (Jan 2018), running a 1CPU / 1GB RAM / 10GB disk App Engine Flex instance for a month is ~USD$44/month. - Cloud Function pricing is fairly cheap for our use-case: if you're running a search every 10 minutes, that's 6 times an hour * 730 hours per month = 4380 invocations per search term per month. That falls into the free tier of Cloud Functions pricing.
- The Natural Language API is where the majority of the costs will lie if you choose to run Centiment more aggressively (more tweets, more often). Searching for up to 50 tweets (per search term) every 10 minutes is 219,000 Sentiment Analysis records per month, and results in a total of USD$219 per search term per month (as of Jan 2018), excluding the small free tier (first 5k)
Note: Make sure to do the math before tweaking the
CENTIMENT_RUN_INTERVAL
orCENTIMENT_MAX_TWEETS
environmental variables, or adding additional search terms tocmd/centimentd/search.toml
.
In order to make analysis easier, you can import data directly into BigQuery after each run via a Cloud Function that is triggered from every database write.
You'll need to:
- Create a BigQuery dataset called "Centiment" and a table called "sentiments". You can opt to use different names, but you will need to make sure to use
config:set
within the Firebase SDK so that our function works.
# Create an empty table with our schema using the bq CLI tool (installed with the gcloud SDK)
centiment/ $ bq mk --schema bigquery.schema.json -t centiment.sentiments
- Install the Firebase SDK so that we can deploy the Cloud Function with the Firestore trigger.
centiment $ cd _functions
# Log into your Google Cloud Platform account
_functions $ firebase login
# Set the dataset and table names
_functions $ firebase functions:config:set centiment.dataset="Centiment" centiment.table="sentiments"
# Deploy this secific function.
_functions $ firebase deploy --only functions:sentimentsToBQ
# Done!
TODO(matt): Create a Dockerfile
- for this FROM alpine:latest
If you're running Centiment elsewhere, you'll need to provide the application with credentials to reach Firestore and the Natural Language APIs by setting the GOOGLE_APPLICATION_CREDENTIALS
environmental variable to the location of your credentials file.
Further, the Store
interface allows you to provide alternate backend datastores (e.g. PostgreSQL), if you want to run Centiment on alternative infrastructure.
Centiment exposes its analysis as JSON via a REST API. Requests are not authenticated by default.
# Get the latest sentiments for the named currency ("bitcoin", in this case)
GET /sentiments/bitcoin
[
{
"id": "lwnXwJmNbxRoE0mzXff0",
"topic": "bitcoin",
"slug": "bitcoin",
"query": "bitcoin OR BTC OR #bitcoin OR #BTC -filter:retweets",
"count": 154,
"score": 0.11818181921715863,
"stdDev": 0.3425117817511681,
"variance": 0.11731432063835981,
"fetchedAt": "2018-02-12T05:24:15.44671Z"
}
]
PRs are welcome, but any non-trivial changes should be raised as an issue first to discuss the design and avoid having your hard work rejected!
Suggestions for contributors:
- Additional sentiment analysis adapters (e.g. Azure Cognitive Services, IBM Watson)
- Alternative backend datastores
BSD licensed. See the LICENSE file for details.