Skip to content

Releases: webrecorder/browsertrix-crawler

Browsertix Crawler 0.4.0 Beta 1

24 Jun 22:23
Compare
Choose a tag to compare
Pre-release

Support for screencasting mode for debugging with --screencastPort options.
Support for YAML-based config of all options, including specifying multiple seeds via --seeds or seeds key.

Browsertrix Crawler 0.3.2

13 May 15:13
63376ab
Compare
Choose a tag to compare

Changes for this version:

  • Added a --urlFile option: Allows users to specify a text file which contains a list of exact URLs to crawl (one URL per line).

Released image published to DockerHub at webrecorder/browsertrix-crawler:0.3.2

Browsertrix Crawler 0.3.1

04 May 20:43
Compare
Choose a tag to compare

Features Include:

  • Improved shutdown wait: Instead of waiting for 5 secs, wait until all pending requests are written to WARCs (#47, #44)
  • Link extraction includes links in all frames (#48, #45)
  • Bug fix: Use async APIs for combine WARC to avoid spurious issues with multiple crawls (#49, #50)
  • Behaviors Update to Behaviors to 0.2.1, with support for facebook pages (#46)

Released image published to DockerHub at webrecorder/browsertrix-crawler:0.3.1

Browsertrix Crawler 0.3.0

14 Apr 22:58
Compare
Choose a tag to compare

New features include:

  • --combineWARC and --rolloverSize for generating combined single WARC
  • Support for creating and running crawl with a login profile tarball (see README for more info)
  • Support for using Browsertrix Behaviors v0.1.1 for in-page behaviors
  • Customizable logging options via --logging, including behavior log, behavior debug log, pywb log and crawl stats (default)

Published to DockerHub at webrecorder/browsertrix-crawler:0.3.0