Skip to content

Latest commit

 

History

History
221 lines (155 loc) · 7.04 KB

deployment.md

File metadata and controls

221 lines (155 loc) · 7.04 KB

Deployment

There are a number of tools and libraries included in this repository. One is a web server wm-diffing-server, which can be deployed to a remote machine to be used in conjuction with the other Web Monitoring projects.

Set up your machine (first time only)

Set up Python environment

First, SSH into your server, update the package manager, and install nginx, git, build-essential, and libxml2-dev. On Debian or Ubuntu Linux, you should run:

$ sudo apt-get update
$ sudo apt-get install git nginx build-essential libxml2-dev

Next, install conda, which we’ll use to manage Python versions and environments. You can install either Anaconda (the full-featured version with extra packages and tools) or Miniconda (the minimal, light-weight version). Minconda is recommended to keep the server as simple as possible.

# Find the URL of the installer you want to use.
# For Anaconda, see: https://www.continuum.io/downloads for download URLs
# For Miniconda, see: https://conda.io/miniconda.html for download URLs
$ curl <conda_url> > conda_installer.sh
# e.g: curl https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh > conda_installer.sh

Then run the installer. (These parameters get us a system-wide install so the web server can use it.)

$ sudo bash conda_installer.sh -b -p /opt/conda

Finally, ensure that you can access conda by creating a new conda group and adding yourself to it.

# Create a group for conda and add yourself to it
$ sudo groupadd conda
$ sudo usermod -a -G conda <your_username>

# Ensure users in that group have access to conda
$ sudo chgrp -R conda /opt/conda
$ sudo chmod 770 -R /opt/conda

# Add setup script to your shell
$ echo -e '\nsource /opt/conda/etc/profile.d/conda.sh' >> ~/.bashrc

You’ll need to log out and log back in to update your user groups and to load Conda’s tools in your shell. Once you’ve logged back in, check to make sure Conda is working by printing its version:

$ conda --version

Install the application

Next, we need to actually install web-monitoring-processing in the /var/www/web-monitoring-processing directory:

$ cd /var/www

# Make the directory first to ensure it does not get installed for the root user
$ sudo mkdir web-monitoring-processing
$ sudo chown <your_user> web-monitoring-processing

# Clone the git repo for the project
$ git clone https://github.com/edgi-govdata-archiving/web-monitoring-processing.git

Run the actual installer:

$ cd web-monitoring-processing

# Create a new conda environment.
$ conda create -n web-monitoring-processing
$ conda activate web-monitoring-processing

# Install packages
$ while read requirement; do conda install --yes ${requirement/\ [~=]/=}; done < requirements.txt
$ pip install -r requirements.txt
$ python setup.py install

Now, test that your installation actually works by running the diffing server on port 8000:

$ conda activate web-monitoring-processing
$ wm-diffing-server --port 8000

Open a web browser and try browsing to: http://[IP address for your server]:8000/html_text_diff

You should get a 500 error because you didn't provide the right arguments :P

Press ctrl+c to stop the server.

Set up Supervisor

Next, we’ll set up Supervisor, a tool that will automatically start several copies of the diffing server and restart them if they crash. First, install it and get it running:

$ sudo apt-get install supervisor
$ sudo service supervisor start

After that, create a configuration file for our server:

$ sudo vim /etc/supervisor/conf.d/wm-diffing-server.conf

The content of this file should look like:

; We run four server instances; one per processor core.
; If you're looking to minimize cpu load, run fewer processes.
; BTW, Tornado processes are single threaded.
; To take advantage of multiple cores, you'll need multiple processes.

[program:wm-diffing-server]
numprocs=4
process_name=%(program_name)s-80%(process_num)02d
command=/opt/conda/envs/web-monitoring-processing/bin/wm-diffing-server --port 80%(process_num)02d
stderr_logfile = /var/log/supervisor/tornado-stderr.log
stdout_logfile = /var/log/supervisor/tornado-stdout.log
stopasgroup=true

You can add or remove as many copies of the program as you like, but note that each should be on a separate port.

Then, reload Supervisor’s configuration:

$ sudo supervisorctl reread
$ sudo supervisorctl update

You can check that your servers are now running with:

$ sudo supervisorctl status
> wm-diffing-server-8000           RUNNING   pid 29929, uptime 0:33:05
> wm-diffing-server-8001           RUNNING   pid 29930, uptime 0:33:05
> wm-diffing-server-8002           RUNNING   pid 29931, uptime 0:33:05
> wm-diffing-server-8003           RUNNING   pid 29932, uptime 0:33:05

Try pointing your web browser to the server again without manually running the server this time.

Use Nginx to route HTTP connections to the diffing server instances

Finally, we’ll use Nginx to proxy HTTP connections through to the diffing servers. This way, Nginx can act as a load balancer across the four services. It can also handle things like SSL, static file serving and the like if we add them in the future.

Create a new web site configuration:

$ sudo vim /etc/nginx/sites-available/web-monitoring-processing

The content of this file should look like:

upstream differs {
    server 127.0.0.1:8000;
    server 127.0.0.1:8001;
    server 127.0.0.1:8002;
    server 127.0.0.1:8003;
}

server {
    listen 80;

    # Allow file uploads
    client_max_body_size 50M;

    # ...other standard URLs here, like robots.txt or static files; not necessary now...

    # Send all paths to the diffing server
    location / {
        proxy_pass_header Server;
        proxy_set_header Host $http_host;
        proxy_redirect off;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Scheme $scheme;
        proxy_pass http://differs;
    }
}

Then disable the existing default site and enable the one you just created.

$ sudo rm /etc/nginx/sites-enabled/default
$ sudo ln -s /etc/nginx/sites-available/web-monitoring-processing /etc/nginx/sites-enabled/web-monitoring-processing
$ sudo systemctl restart nginx

This time, you should be able to browse directly to your server’s IP without using a special port and get the same response as before:

http://[IP address for your server]/html_text_diff

Now you’ve got a working deployment!

Deploy new releases

When new versions of web-monitoring-processing are ready to deploy, use git to checkout the correct code, install it, and restart your servers:

$ cd /var/www/web-monitoring-processing
$ git pull
$ conda activate web-monitoring-processing
$ python setup.py install
$ sudo supervisorctl restart wm-diffing-server-8000
$ sudo supervisorctl restart wm-diffing-server-8001
$ sudo supervisorctl restart wm-diffing-server-8002
$ sudo supervisorctl restart wm-diffing-server-8003