-
Notifications
You must be signed in to change notification settings - Fork 187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dockerfile and script for running crawl jobs in Google Cloud Run with captcha support #208
Conversation
Add additional python script for gcloud run
Updated dependencies
93f0907
to
966ce83
Compare
@@ -161,15 +161,35 @@ First build the image inside the project's root directory: | |||
$ docker build -t flathunter . | |||
``` | |||
|
|||
**When running a container using the image, a config file needs to be mounted on the container at ```/config.yaml```.** The example below provides the file ```config.yaml``` off the current working directory: | |||
**When running a container using the image, a config file needs to be mounted on the container at ```/config.yaml``` or configuration has to be supplied using environment variables.** The example below provides the file ```config.yaml``` off the current working directory: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should think about separating the README into separate .md-files and only use the main file as index, because it is so large already
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. That would make sense. If you have a proposal for how you would like to see that split, I would welcome that. At least the basic getting-started instructions should maybe be at the top of the main README, and people who want to know more can dig a little.
This PR adds support for Google-Cloud-Run, and adds a docker image that can be launched to trigger a one-time crawl (which can then be setup to run on a schedule).
I have refactored the config handling so that it's possible to pass the relevant config by environment variables.
This needs a bunch of documentation still - opening the PR for discussion