Twitter user/Directed graph analysis using pageRank algorithm based on Hadoop

Using pageRank algorithm to analyze twitter user, implementation based on hadoop streaming
The whole workflow is very similar to http://salsahpc.indiana.edu/csci-b649-spring-2014/projects/project3.html
The original dataset is at http://socialcomputing.asu.edu/datasets/Twitter

Raw data format

1 2
1 3
2 3
2 4
3 4
3 5

Run job1.sh
Run gen_iter_value.py
Run job2.sh
Run combine_then_update_iter_val.py. If output value(sum of value difference) is greater than the threshold, goes to step 3. If the output value is smaller than threshold, goes to step 5.
Run sort.py

[4, 0.19241600000000003]
[3, 0.18625066666666665]
[5, 0.16974933333333334]
[2, 0.163584]
[0, 0.14400000000000002]
[1, 0.14400000000000002]

User/Node 4 ranks 1, with the value of 0.192
User/Node 3 ranks 2, with the value of 0.186...

Add a shell script to run everything, I know I'm a slacker so I'll leave it for now ╮(╯_╰)╭

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md
combine_then_update_iter_val.py		combine_then_update_iter_val.py
gen_iter_val.py		gen_iter_val.py
job1.sh		job1.sh
job1reducer.py		job1reducer.py
job2.sh		job2.sh
job2reducer.py		job2reducer.py
sort.py		sort.py
test.txt		test.txt
total_user.py		total_user.py
total_zero_degree.py		total_zero_degree.py