Recommend using virtualenv and virtualenvwrapper to manage your environment
- mkvirtualenv HackerNewsInstapaper
- pip install -r pip-requirements.txt
- Edit and rename scrapyhacker/instapaper.ini.sample to scrapyhacker/instapaper.ini
- Edit the helper scripts to add your environment in
scrapyhacker/cron_hackernews_scrape.sh
andscrapyhacker/cron_hackernews_instapaper.sh
Use scrapy to crawl Hacker News (http://news.ycombinator.com/best) and store
into articles.db, then run submitinstapaper.py
:
scrapy crawl hackernews
python submitinstapaper.py
Two example scripts are provided, but needs editing depending on your virtualenv location:
scrapyhacker/cron_hackernews_scrape.sh
: Sets the environment and runs the crawlerscrapyhacker/cron_hackernews_instapaper.sh
: Submits the unsent articles in articles.db to Instapaper
Example lines to add to your crontab:
0 3 * * * cron_hackernews_scrape.sh > /dev/null 2>&1
15 3 * * 1 cron_hackernews_instapaper.sh