Browsertrix Crawler 0.10.0 Beta 0

Pre-release

Pre-release

ikreymer released this 27 Apr 00:41

· 307 commits to main since this release

d4bc9e8

Breaking Changes

Switch back to Puppeteer from Playwright due to memory issues (#298)
Internal: redis key {crawl_id}:d now a number of pages done instead of a list of pages done

What's Changed

Add option to log errors to redis by @tw4l in #279
Store done in redis as integer and only save full json in redis for failed pages by @tw4l in #284
worker: lower wait time, in case where no additional pages remain and… by @ikreymer in #289
Store archive dir size in Redis by @tw4l in #291
origin override: add --originOverride source=dest to allow routing wh… by @ikreymer in #281
Quick exit on redis connection error after interrupt by @ikreymer in #292
Fixes from 0.9.1 by @ikreymer in #297
Switch back to Puppeteer from Playwright by @ikreymer in #301
Catch 4xx and 5xx page.goto() responses to mark invalid URLs as failed by @tw4l in #300

Full Changelog: 0.9.0...0.10.0-beta.0

Contributors

ikreymer and tw4l

Assets 2