GitHub - koradir/cil2017-tweets_kogoluki

Extract the twitter-datasets.zip

To build a co-occurence matrix, run the following commands. Note that the cooc.py script takes a few minutes to run, and displays the number of tweets processed.

build_vocab.sh cut_vocab.sh python3 pickle_vocab.py python3 cooc.py

Then to calculate the word vectors: python3 glove.py

And finally: python3 tweet_svm.py

If you change glove.py, make sure you go into tweet_svm.py and change

clf = TweetClassifier(embeddingsX='embeddingsX_K200_step0.001_epochs10.npy')

s.t. it loads the correct embeddings.

Name		Name	Last commit message	Last commit date
Latest commit History 69 Commits
report		report
.gitignore		.gitignore
CNN_BaselineClassifier.py		CNN_BaselineClassifier.py
CNN_Classifier.py		CNN_Classifier.py
Classifier.py		Classifier.py
MatrixPlotter.py		MatrixPlotter.py
NLTK_Classifier.py		NLTK_Classifier.py
PARAMS.py		PARAMS.py
README.md		README.md
SpatialPyramidPooling.py		SpatialPyramidPooling.py
TweetClassifier.py		TweetClassifier.py
TweetRepresenter.py		TweetRepresenter.py
WordCountClassifier.py		WordCountClassifier.py
build_vocab.sh		build_vocab.sh
cooc.py		cooc.py
create_submission.py		create_submission.py
cut_vocab.sh		cut_vocab.sh
glove.py		glove.py
kmeans.py		kmeans.py
pickle_vocab.py		pickle_vocab.py
plot_embeddings.py		plot_embeddings.py
predict.py		predict.py
statusbar.py		statusbar.py
tweet_cluster.py		tweet_cluster.py
tweet_cnn.py		tweet_cnn.py
tweet_kmean.py		tweet_kmean.py
tweet_svm.py		tweet_svm.py
twitter-datasets.zip		twitter-datasets.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

koradir/cil2017-tweets_kogoluki

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages