-
-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expand use of failOnFailedSeed option #360
Comments
Perhaps we could have the following behavior for seeds that aren't parsed successfully:
|
In simpler terms, if |
That logic makes sense to me. :) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I was looking at where
--failOnFailedSeed
was added in #300 and am wondering if it would generally make sense (it does in my mind for my use case) for that to also apply when a seed isn't parsed successfully. Right now if I crawl with a seed list (generated automatically from URLs extracted from a document) containing this particular URL https://doi.org10.1901/jaba.2012.45-85 that is missing a slash after the .org and is presumably read as having a TLD of .1901, the crawl aborts. Perhaps iffailOnFailedSeed
is false, such a problem could be logged lower than fatal, so the crawl doesn't abort?The text was updated successfully, but these errors were encountered: