eLife have handed over stewardship of ScienceBeam to The Coko Foundation. You can now find the updated code repository at https://gitlab.coko.foundation/sciencebeam/sciencebeam-airflow and continue the conversation on Coko's Mattermost chat server: https://mattermost.coko.foundation/
For more information on why we're doing this read our latest update on our new technology direction: https://elifesciences.org/inside-elife/daf1b699/elife-latest-announcing-a-new-technology-direction
Airflow pipeline for ScienceBeam related training and evaluation.
is a platform to programmatically author, schedule, and monitor workflows. ... Airflow is not a data streaming solution.
We are using the official Airflow Image.
- Docker and Docker Compose
- Google Gloud SDK for gcloud
gcloud auth application-default login
Airflow, using the official Airflow Image, is mainly configured in the following way:
- Environment variables interpreted by Airflow, e.g.
AIRFLOW__CORE__SQL_ALCHEMY_CONN
- Default configuration by the Airflow project in default_airflow.cfg
(Since we are using Docker Compose, environment variables would be passed in via docker-compose.yml)
The Dockerfile is used to build the image that is getting deployed within the cluster.
The Docker Compose configuration is only used for development purpose (in the future it could in part be used to build the image).
For development, it is making the local gcloud config available to the Airflow container.
Build and start the image.
make start
Airflow Admin will be available on port 8080 and the Celery Flower will be on port 5555.
Build and run tests.
make test
make stop
make clean