NewsScraper Dashboard

Required Libraries

The necessary libraries can be installed with the following command

pip3 install scrapy spacy geopy schedule
python3 -m spacy download en_core_web_sm

Usage

A selenium chromedriver is required to run the spiders. The driver must be stored as newsscraper/newsscraper/chromedriver.exe

In order to run the spiders continuously, the command python3 run_continuous.py can be called. This will start the spiders at 10:00 AM and 10:00 PM every day and let them run for 90 minutes.

Spiders

As of now, these are the implemented spiders, each for scraping the respective site.

Spiders:

ABCSpider-> ABC Go News

AfroSpider -> afro.com

BaltimoreFishbowlSpider -> baltimorefishbowl.com

BaltimoreJewishTimesSpider -> www.jewishtimes.com

FoxBaltimoreSpider -> www.foxbaltimore.com

CNNSpider -> cnn.com

NBCPGSpider -> www.nbcwashington.com/news/local/prince-georges-county/

NJSpider -> nj.com

NPRSpider -> npr.org

PGPDSpider -> pgpolice.blogspot.com

WBALTVSpider -> www.wbaltv.com

WJLASpider -> wjla.com

WJZSpider -> baltimore.cbslocal.com

WKYTSpider -> www.wkyt.com

WMARSpider -> www.wmar2news.com/news/

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
newsscraper		newsscraper
scraped_ref		scraped_ref
.gitignore		.gitignore
README.md		README.md
items.json		items.json
run_continuous.py		run_continuous.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NewsScraper Dashboard

Required Libraries

Usage

Spiders

About

Releases

Packages

Contributors 2

Languages

MarioJayakumar/NewsScraper

Folders and files

Latest commit

History

Repository files navigation

NewsScraper Dashboard

Required Libraries

Usage

Spiders

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages