Releases: webrecorder/browsertrix-crawler
Releases · webrecorder/browsertrix-crawler
Browsertix Crawler 0.4.0 Beta 1
Support for screencasting mode for debugging with --screencastPort
options.
Support for YAML-based config of all options, including specifying multiple seeds via --seeds
or seeds
key.
Browsertrix Crawler 0.3.2
Changes for this version:
- Added a
--urlFile option
: Allows users to specify a text file which contains a list of exact URLs to crawl (one URL per line).
Released image published to DockerHub at webrecorder/browsertrix-crawler:0.3.2
Browsertrix Crawler 0.3.1
Features Include:
- Improved shutdown wait: Instead of waiting for 5 secs, wait until all pending requests are written to WARCs (#47, #44)
- Link extraction includes links in all frames (#48, #45)
- Bug fix: Use async APIs for combine WARC to avoid spurious issues with multiple crawls (#49, #50)
- Behaviors Update to Behaviors to 0.2.1, with support for facebook pages (#46)
Released image published to DockerHub at webrecorder/browsertrix-crawler:0.3.1
Browsertrix Crawler 0.3.0
New features include:
--combineWARC
and--rolloverSize
for generating combined single WARC- Support for creating and running crawl with a login profile tarball (see README for more info)
- Support for using Browsertrix Behaviors v0.1.1 for in-page behaviors
- Customizable logging options via
--logging
, including behavior log, behavior debug log, pywb log and crawl stats (default)
Published to DockerHub at webrecorder/browsertrix-crawler:0.3.0