Browsertrix Crawler 0.10.0 Beta 0
Pre-release
Pre-release
Breaking Changes
- Switch back to Puppeteer from Playwright due to memory issues (#298)
- Internal: redis key
{crawl_id}:d
now a number of pages done instead of a list of pages done
What's Changed
- Add option to log errors to redis by @tw4l in #279
- Store done in redis as integer and only save full json in redis for failed pages by @tw4l in #284
- worker: lower wait time, in case where no additional pages remain and… by @ikreymer in #289
- Store archive dir size in Redis by @tw4l in #291
- origin override: add --originOverride source=dest to allow routing wh… by @ikreymer in #281
- Quick exit on redis connection error after interrupt by @ikreymer in #292
- Fixes from 0.9.1 by @ikreymer in #297
- Switch back to Puppeteer from Playwright by @ikreymer in #301
- Catch 4xx and 5xx page.goto() responses to mark invalid URLs as failed by @tw4l in #300
Full Changelog: 0.9.0...0.10.0-beta.0