-
-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Request] Add option for sleep interval between page crawls to avoid captchas/rate limits #131
Comments
yeah, that makes sense and is easy to add. Are you thinking it would sleep after every page, or after every N pages? |
I think that sleeping after every page should be good enough. |
ikreymer
pushed a commit
that referenced
this issue
Mar 22, 2023
) * Add --pageExtraDelay option to add extra delay/wait time after every page (fixes #131) * Store total page time in 'maxPageTime', include pageExtraDelay * Rename timeout->pageLoadTimeout * cleanup: - store seconds for most interval checks, convert to ms only for api calls, remove most sec<->ms conversions - add secondsElapsed() utility function to help checking time elapsed - cleanup comments --------- Co-authored-by: Ilya Kreymer <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hello!
I'm trying to crawl a huge website which starts asking for captchas after crawling a few hundreds of pages in a short amount of time.
Since setting workers=1 is not enough to avoid hitting the captcha "rate limit", I'm here to ask for the addition of an option to specify a custom sleep interval (e.g. 5 seconds) which makes the crawler do nothing for the specified amount of time before crawling the next page.
Youtube-dl has a similar option too, and in my experience it has been useful in other similar circumstances.
Thanks!
The text was updated successfully, but these errors were encountered: