-
-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parameter sizeLimit is not ending the crawl correctly #256
Comments
Can you try the current 0.9.0 beta on main? I think there may have been a bug related to this but should been fixed in the latest main. |
Specifically, it was fixed via #241. We could also do a 0.8.2 release perhaps if this is urgent. |
thank you and no it is not urgent; |
Just pushed a |
It's now working |
Great, thanks for testing! |
Using Browsertrix in the terminal I was testing all Limits to end the crawl, but the sizeLimit was not ending my crawl correctly
Here is the starting command for sizeLimit:
docker run -p 9037:9037 -i -v $PWD/crawls:/crawls webrecorder/browsertrix-crawler crawl --url "http://falter.at" --profile /crawls/profiles/profile_falter.tar.gz --sizeLimit 2000 --text --depth 3 --scopeType domain --screencastPort 9037
I also removed the profile with the command:
docker run -i -v $PWD/crawls:/crawls webrecorder/browsertrix-crawler crawl --url "http://falter.at" --sizeLimit 2000 --text --depth 3 --scopeType domain
it was not changing anything on the sizeLimit quit
I think the error is coming from crawler.js Line 515:
const size = await getDirSize(dir);
I was expecting in the console the log entry "Size threshold reached ..."
but never saw it
See also Log File:
crawl-20230320113927564.log
crawl-20230320122513349.log
The text was updated successfully, but these errors were encountered: