Skip to content
This repository has been archived by the owner on Jul 18, 2021. It is now read-only.
/ twitter-to-kafka Public archive

Twitter ingestion via Spark Streaming, pushing tweets into Kafka

Notifications You must be signed in to change notification settings

daplab/twitter-to-kafka

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Twitter to Kafka

This project aims at palying around with Spark Streaming. It reads from Twitter Firehose via Spark Streaming and store the data into a Kafka topic.

Lots of inspiration has been taken from the following projects:

Run it

Build the project

mvn clean install

Upload the produced jar in the DAPLAB infrastructure

scp target/twitter-to-kafka-1.0.0-SNAPSHOT-jar-with-dependencies.jar pubgw1.daplab.ch:

Get the Twitter API credentials

Launch the job

spark-submit --master yarn --num-executors 4 \
  --class ch.daplab.spark.streaming.TwitterToKafka \
  twitter-to-kafka-1.0.0-SNAPSHOT-jar-with-dependencies.jar \
  public.tweets daplab-rt-11.fri.lan:6667 10 \
  --consumerKey ${YOUR_TWITTER_CONSUMER_KEY} \
  --consumerSecret ${YOUR_TWITTER_CONSUMER_SECRET} \
  --accessToken ${YOUR_TWITTER_ACCESS_TOKEN}  \
  --accessTokenSecret ${YOUR_TWITTER_ACCESS_SECRET}

About

Twitter ingestion via Spark Streaming, pushing tweets into Kafka

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages