-
-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Success status code on failure #207
Comments
@rgaudin in v0.8.0 beta, browsertrix-crawler will return 0 unless a fatal error is encountered (something that triggers the Examples of that would include no WARC files being created, WACZ generation failing, being unable to connect to Redis, or giving invalid arguments to the crawler. In any of those cases, the crawler will exit 1. A failed page is not considered a fatal error, but instead logs an error message (now in JSON as of the beta), as long as some data is captured and written to a WARC during the crawl. (Edited for clarity) |
OK, I understand although I'd suggest raising a fatal error in case the In this particular case, 8 entries were added to WARC (resources from the page) before that initial URL crashed so that's why it returned |
@rgaudin that is a good suggestion, thank you. I'll look into how to best implement. |
In a zimit run, we started a crawl of a page
https://journals.openedition.org/bibnum/
which failed after about 15s but the crawler reported a success status code (0
).Crawler mentioned
Page Load Failed: https://journals.openedition.org/bibnum/, Reason: Error: Page crashed!
I am not sure about the exact behavior of the crawler on errors:
Full log follows.
The text was updated successfully, but these errors were encountered: