Skip to content

Browsertrix Crawler 0.10.0 Beta 0

Pre-release
Pre-release
Compare
Choose a tag to compare
@ikreymer ikreymer released this 27 Apr 00:41
· 307 commits to main since this release
d4bc9e8

Breaking Changes

  • Switch back to Puppeteer from Playwright due to memory issues (#298)
  • Internal: redis key {crawl_id}:d now a number of pages done instead of a list of pages done

What's Changed

  • Add option to log errors to redis by @tw4l in #279
  • Store done in redis as integer and only save full json in redis for failed pages by @tw4l in #284
  • worker: lower wait time, in case where no additional pages remain and… by @ikreymer in #289
  • Store archive dir size in Redis by @tw4l in #291
  • origin override: add --originOverride source=dest to allow routing wh… by @ikreymer in #281
  • Quick exit on redis connection error after interrupt by @ikreymer in #292
  • Fixes from 0.9.1 by @ikreymer in #297
  • Switch back to Puppeteer from Playwright by @ikreymer in #301
  • Catch 4xx and 5xx page.goto() responses to mark invalid URLs as failed by @tw4l in #300

Full Changelog: 0.9.0...0.10.0-beta.0