This project aims at palying around with Spark Streaming. It reads from Twitter Firehose via Spark Streaming and store the data into a Kafka topic.
Lots of inspiration has been taken from the following projects:
- for collecting tweets and Utils class
- to write the DStream to Kafka
- for human description of the tweet collection
Build the project
mvn clean install
Upload the produced jar in the DAPLAB infrastructure
scp target/twitter-to-kafka-1.0.0-SNAPSHOT-jar-with-dependencies.jar
Get the Twitter API credentials
Launch the job
spark-submit --master yarn --num-executors 4 \
--class ch.daplab.spark.streaming.TwitterToKafka \
twitter-to-kafka-1.0.0-SNAPSHOT-jar-with-dependencies.jar \
public.tweets daplab-rt-11.fri.lan:6667 10 \