Skip to content

Latest commit



293 lines (224 loc) · 11.3 KB

File metadata and controls

293 lines (224 loc) · 11.3 KB

Deployment Instructions

Check the following sections for deployment instructions for Scrapinghub and Scrapydweb.

Scrapinghub Deployment

Create an free account and create a project: Screen Shot 2019-08-19 at 11 27 48 AM

We will use the shub command line to deploy. You can find your API key and deploy number once in your project Code & Deploys page: Screen Shot 2019-08-19 at 11 33 05 AM

Go back to the root of Scrapy-tutorial (the root of the Scrapy project) and use the following command to deploy your project to Scrapyinghub.

(venv) dami:scrapy-tutorial harrywang$ shub login
Enter your API key from
API key: xxxxx
Validating API key...
API key is OK, you are logged in now.
(venv) dami:scrapy-tutorial harrywang$ shub deploy 404937
Messagepack is not available, please ensure that msgpack-python library is properly installed.
Saving project 404937 as default target. You can deploy to it via 'shub deploy' from now on
Saved to /Users/harrywang/xxx/scrapy-tutorial/scrapinghub.yml.
Packing version b6ac860-master
Created at /Users/harrywang/xxx/scrapy-tutorial
Deploying to Scrapy Cloud project "404937"
{"status": "ok", "project": 4xxx, "version": "b6ac860-master", "spiders": 3}
Run your spiders at:

Scrapinghub configuration file is created scrapinghub.yml and you need to edit it to specify:

  • scrapy 1.7 running Python 3
  • requirements files for other packages
project: 404937

    default: scrapy:1.7-py3

  file: requirements.txt

run $ shub deploy to deploy again.

We have three spiders in the project:

  • is the main spider
  • is the version 1 of the spider that writes to files, etc.
  • is the spider to get author page from the official tutorial

You can see your current deployment on Screen Shot 2019-08-19 at 11 44 31 AM

Then, you can run your spider:

Screen Shot 2019-08-19 at 12 47 48 PM

Screen Shot 2019-08-19 at 12 48 51 PM

Once the job is complete, you can check the results and download the items: Screen Shot 2019-08-19 at 1 57 49 PM

Screen Shot 2019-08-19 at 1 58 22 PM

You can schedule periodic jobs if you upgrade your free plan.

Scrapydweb Deployment

I found this repo and follow to setup the server.

We need a custom deployment because our scrapy project has specific package requirements, e.g., SQLAlchemy, MySQL, etc. if no special package is needed, you can follow the easy setup below.

Custom Setup

Setup repo and Heroku account

fork a copy of to your account, e.g.,

create a free account at and install Heroku CLI: brew tap heroku/brew && brew install heroku

clone the repo:

git clone
cd scrapyd-cluster-on-heroku/

login to Heroku

scrapyd-cluster-on-heroku harrywang$ heroku login
heroku: Press any key to open up the browser to login or q to exit:
Opening browser to
Logging in... done
Logged in as [email protected]

Set up Scrapyd server/app

In this step, you should update the runtime.txt to specify the Python version and requirements.txt to include all packages your spider needs.

After changes, runtime.txt is:


requirements.txt is:



Setup the repo and commit the changes we just made:

cd scrapyd
git init
git status
git add .
git commit -a -m "first commit"
git status

Deploy Scrapyd app

heroku apps:create scrapy-server1
heroku git:remote -a scrapy-server1
git remote -v
git push heroku master
heroku logs --tail
# Press ctrl+c to stop logs outputting
# Visit

Add environment variables


# python -c "import tzlocal; print(tzlocal.get_localzone())"
heroku config:set TZ=US/Eastern
# heroku config:get TZ

Redis (optional - not in this tutorial) Redis account (optional, see in the

heroku config:set REDIS_HOST=your-redis-host
heroku config:set REDIS_PORT=your-redis-port
heroku config:set REDIS_PASSWORD=your-redis-password

Repeat this step if multiple scrapyd server is needed.

Setup ScrapydWeb server/app

go to scrapydweb subfolder and update runtime.txt, requirements.txt, and if needed.

Let's enable authentication, edit the following section of

# The default is False, set it to True to enable basic auth for the web UI.
if os.environ.get('ENABLE_AUTH', 'False') == 'True':
    ENABLE_AUTH = True
# In order to enable basic auth, both USERNAME and PASSWORD should be non-empty strings.
USERNAME = 'admin'
PASSWORD = 'scrapydweb'
USERNAME = os.environ.get('USERNAME', 'admin')
PASSWORD = os.environ.get('PASSWORD', 'scrapydweb')

Otherwise, proceed as follows:

cd ..
cd scrapydweb
git init
git status
git add .
git commit -a -m "first commit"
git status

Deploy ScrapydWeb app

heroku apps:create scrapyd-web
heroku git:remote -a scrapyd-web
git remote -v
git push heroku master

Add environment variables


heroku config:set TZ=US/Eastern

Scrapyd servers - you have to use the scrapyd server address you just setup above (see in the scrapydweb directory)

heroku config:set
# heroku config:set
# heroku config:set
# heroku config:set

Deploy the scrapy project

We need to package the project and upload to the server.

First, install scrapyd-client using pip install git+ (note: pip does not work as of writing this document see:

change the deploy setting in scrapy.cfg:

url =
username = admin
password = scrapydweb
project = scrapy-tutorial

Then, use scrapyd-deploy to package and deploy to scrapyd server:

(venv) dami:scrapy-tutorial harrywang$ scrapyd-deploy
/Users/harrywang/sandbox/scrapy-tutorial/venv/lib/python3.6/site-packages/scrapyd_client/ ScrapyDeprecationWarning: Module `scrapy.utils.http` is deprecated, Please import from `w3lib.http` instead.
  from scrapy.utils.http import basic_auth_header
Packing version 1566253506
Deploying to project "scrapy-tutorial" in
Server response (200):
{"node_name": "9177f699-b645-4656-82d1-beef2898fdc1", "status": "ok", "project": "scrapy-tutorial", "version": "1566253506", "spiders": 3}

go to, you should see your project deployed: Screen Shot 2019-08-19 at 6 27 32 PM

go to the following page to run the spider:

Screen Shot 2019-08-19 at 8 56 23 PM

Once the spider finishes, you can check the items in Files menu.

You can specify Timer Tasks. The following shows a task that runs every 10 minutes. This part is based on APScheduler, see document to figure out how to set the values (this could be confusing.) Screen Shot 2019-08-19 at 10 28 04 PM

Easy Setup

Use the following settings (No redis setting) and the app is at Screen Shot 2019-08-19 at 5 19 26 PM

Use the following settings (No redis setting) and the app is at Screen Shot 2019-08-19 at 5 31 15 PM

Screen Shot 2019-08-19 at 5 37 25 PM

We need to package the project and upload to the server.

First, install scrapyd-client using pip install git+ (note: pip does not work as of writing this document see:

change the deploy setting in scrapy.cfg:

url =
username = admin
password = scrapydweb
project = scrapy-tutorial

Then, use scrapyd-deploy to package and deploy to scrapyd server:

(venv) dami:scrapy-tutorial harrywang$ scrapyd-deploy
/Users/harrywang/sandbox/scrapy-tutorial/venv/lib/python3.6/site-packages/scrapyd_client/ ScrapyDeprecationWarning: Module `scrapy.utils.http` is deprecated, Please import from `w3lib.http` instead.
  from scrapy.utils.http import basic_auth_header
Packing version 1566253506
Deploying to project "scrapy-tutorial" in
Server response (200):
{"node_name": "9177f699-b645-4656-82d1-beef2898fdc1", "status": "ok", "project": "scrapy-tutorial", "version": "1566253506", "spiders": 3}

go to, you should see your project deployed: Screen Shot 2019-08-19 at 6 27 32 PM