- Create a VPS on horizon.wikimedia.org
- 24 core, 122GB RAM machine
- Debian 11
webservice
security group
- Configure a Cinder volume on horizon.wikimedia.org
- 5000 GB
- attach to the VPS
- Add a Web Proxy on Horizon (
wikiwho.wmcloud.org
or something else from thesettings_wmcloud.py
hosts entry)- Also do the same for
wikiwho-flower.wmcloud.org
- Also do the same for
- SSH into the VPS
- Prepare the Cinder volume.
sudo wmcs-prepare-cinder-volume
- mount it to
/pickles
sudo mkdir -p /pickles/{en,eu,es,de,tr}
sudo chown -R wikiwho /pickles
- Clone the repo and make
wikiwho
own it:git clone https://github.com/wikimedia/wikiwho_api.git /home/wikiwho/wikiwho_api
chown -R wikiwho:wikiwho /home/wikiwho/wikiwho_api
- Run the customization script:
sudo sh wikimedia_cloud_customization_script.sh
- Add the password for the Flower UI:
sudo htpasswd -c /etc/apache2/.htpasswd wikiwho
- Create the wikiwho postgres user and database:
sudo su postgres
psql
create user wikiwho with password 'wikiwho';
create database wikiwho;
grant all privileges on database wikiwho to wikiwho;
exit;
exit
- Become the wikiwho user:
sudo su wikiwho
cd /home/wikiwho/wikiwho_api
- Set up a virtualenv:
python3 -m venv env
. env/bin/activate
- Install the Python dependencies:
pip install -r requirements.txt -r requirements_local.txt -r requirements_test.txt
- Create
wikiwho_api/settings.py
(in the wikiwho_api subdirectory, not the top git directory), with an import fromsettings_wmcloud
plus SECRET_KEY, WP_CONSUMER_TOKEN, WP_CONSUMER_SECRET, WP_ACCESS_TOKEN, WP_ACCESS_SECRET, and DATABASES.- Generate a secret key:
python manage.py generate_secret_key
- Generate a secret key:
python manage.py migrate
python manage.py collectstatic --noinput -c
- As a user with sudo, start the Gunicorn webserver:
sudo systemctl enable ww_gunicorn
sudo systemctl start ww_gunicorn
sudo systemctl status ww_gunicorn
to check if it's running- API and homepage should be working now.
- Start Celery:
sudo systemctl enable ww_celery
sudo systemctl start ww_celery
sudo systemctl status ww_celery
to check if it's running
- Import dumps (as user
wikiwho
)sudo su wikiwho
mkdir -p /pickles/{en,eu,es,de,tr}
- Download the latest dumps for each of the languages to import, eg:
cd /pickles/dumps/en
wget -r -np -nd -c -A 7z https://dumps.wikimedia.your.org/enwiki/20211201/
- For each language, generate pickles from the XML dumps, eg:
cd ~/wikiwho_api
. env/bin/activate
nohup python manage.py generate_articles_from_wp_xmls -p '/pickles/dumps/en/' -t 30 -m 24 -lang en -c
- Start Flower and event_stream services
sudo systemctl enable ww_flower.service
sudo systemctl start ww_flower.service
sudo systemctl status ww_flower.service
to check if it's runningsudo systemctl enable ww_events_stream.service
sudo systemctl start ww_events_stream.service
sudo systemctl status ww_events_stream.service
to check if it's runningsudo systemctl start ww_events_stream_deletion.service
sudo systemctl status ww_events_stream_deletion.service
to check if it's running
- Add cronjob to restart services daily (see T344936
for more information).
sudo su root
crontab -e
- Add the entry
0 0 * * * /home/wikiwho/wikiwho_api/cron/restart_services.sh
- Download the dumps into a volume (new languages most likely should go in the new
pickle_storage02
, mounted to/pickles-02
)mkdir /pickles-02/{lang}
mkdir /pickles-02/dumps/{lang}
cd /pickles-02/dumps/{lang}
screen
wget -r -np -nd -c -A 7z https://dumps.wikimedia.org/{lang}wiki/{datestamp}/
- Use the latest complete dump. Newer versions may be available at https://dumps.wikimedia.your.org
- If you get an error or otherwise no files were downloaded, the dump may be incomplete. Try using an older dump.
- The hit Ctrl+A and the
d
key to detach from screen and keep the downloading of the dumps running in the background. - When you thnk it may be finished, verify by reentering the screen session with
screen -r
, then typeexit
if it's finished or use Ctrl+A andd
to detch again.
- Create a pull request to add the new language to the app, except for EventStreams (example PR).
- The migrations can be created with
python manage.py makemigrations rest_framework_tracking api --empty
, and then fill in the code accordingly, using previous migrations as a guide. These migrations may eventually not be necessary, pending the outcome of T335322.
- The migrations can be created with
- Start the import process on the VPS instance:
sudo su wikiwho
cd ~/wikiwho_api
git pull origin main
. env/bin/activate
python manage.py migrate
nohup python manage.py generate_articles_from_wp_xmls -p '/pickles/dumps/{lang}/' -t 30 -m 24 -lang {lang} -c
then Ctrl+Z and then enterbg
to background the process.- After typing
top
, you should see ~24python
processes running. You can monitor progress withls -al /pickles-02/{lang}/ | wc -l
and that number should eventually roughly equal the total number of articles on the wiki. Note this command will run very slow after there are hundreds of thousands or millions of pickle files.
- Once complete, create a PR to add the wiki to EventStreams (example PR).
- Deploy and restart services (using your account and not
wikiwho
):- Pull in latest changes
- Restart the Flower and EventStreams services with
sudo systemctl restart ww_flower.service
andsudo systemctl restart ww_events_stream.service
- Restart Celery with
sudo systemctl restart ww_celery.service
- Update clients accordingly (XTools, Who Wrote That?, Programs & Events Dashboard, etc.)
Some various tips to help troubleshoot issues in production:
- Check https://wikiwho-flower.wmcloud.org to monitor Celery tasks.
- Use
sudo journalctl -u ww_events_stream
to view the logs for theww_events_stream
service, or replace with another service name such asnginx
. - See also the Celery logs at
/var/log/celery/*.log
.