TwitterAnalytics

export HADOOP_HOME=~/Software/hadoop

Analise de tweets salvos em uma base do MongoDB utilizando o Hadoop

#Comando para gerar SparseVector: ./mahout seq2sparse -i ~/dados/tweeter-analytics/sequence-files-mongodb/ -o ~/dados/tweeter-analytics/sparse-vector -wt tfidf -chunk 64 --minSupport 2 --minDF 1 --maxDFPercent 99

#Gerar cluster com k-means ./mahout kmeans --input ~/dados/tweeter-analytics/sparse-vector/tfidf-vectors --output ~/dados/tweeter-analytics/kmeans-cluster -k 5 --maxIter 20 --distanceMeasure org.apache.mahout.common.distance.CosineDistanceMeasure --clustering --method mapreduce --clusters ~/dados/tweeter-analytics/kmeans-cluster/clusters

#Gerar o dump do cluster ./mahout clusterdump -dt sequencefile -d ~/dados/tweeter-analytics/sparse-vector/dictionary.file-0 -i ~/dados/tweeter-analytics/kmeans-cluster/clusters-2-final -o ~/dados/tweeter-analytics/dump/clusters.txt -b 10 -n 100

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.settings		.settings
bin		bin
lib		lib
src/main/java/jdamasceno		src/main/java/jdamasceno
stemmer		stemmer
.classpath		.classpath
.gitignore		.gitignore
.project		.project
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TwitterAnalytics

About

Releases

Packages

Contributors 2

Languages

jdamasceno/TwitterAnalytics

Folders and files

Latest commit

History

Repository files navigation

TwitterAnalytics

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages