Releases: webrecorder/browsertrix-crawler
Releases · webrecorder/browsertrix-crawler
Browsertrix Crawler 0.10.0 Beta 4
What's Changed
- Improve thumbnails with sharp by @tw4l in #304
- Chrome 112 + new headless mode + consistent viewport tweaks by @ikreymer in #316
Full Changelog: 0.10.0-beta.3...0.10.0-beta.4
Browsertrix Crawler 0.10.0 Beta 3
What's Changed
- Disable Chrome optimization logic by @malemburg in #312
- stopping: if crawl is marked as stopping, and no warcs found, mark st… by @ikreymer in #314
New Contributors
- @malemburg made their first contribution in #312
Full Changelog: 0.10.0-beta.2...0.10.0-beta.3
Browsertrix Crawler 0.10.0 Beta 2
What's Changed
- Log fatal messages to redis errors by @tw4l in #305
- Consolidate wacz error loglines by @tw4l in #306
- state: adjust redis keys to be more consistent by @ikreymer in #309
- pywb: don't convert bounded range requests to unbounded (pywb 2.7.4 dev)
Full Changelog: 0.10.0-beta.1...0.10.0-beta.2
Browsertrix Crawler 0.10.0 Beta 1
What's Changed
Full Changelog: 0.10.0-beta.0...0.10.0-beta.1
Browsertrix Crawler 0.10.0 Beta 0
Breaking Changes
- Switch back to Puppeteer from Playwright due to memory issues (#298)
- Internal: redis key
{crawl_id}:d
now a number of pages done instead of a list of pages done
What's Changed
- Add option to log errors to redis by @tw4l in #279
- Store done in redis as integer and only save full json in redis for failed pages by @tw4l in #284
- worker: lower wait time, in case where no additional pages remain and… by @ikreymer in #289
- Store archive dir size in Redis by @tw4l in #291
- origin override: add --originOverride source=dest to allow routing wh… by @ikreymer in #281
- Quick exit on redis connection error after interrupt by @ikreymer in #292
- Fixes from 0.9.1 by @ikreymer in #297
- Switch back to Puppeteer from Playwright by @ikreymer in #301
- Catch 4xx and 5xx page.goto() responses to mark invalid URLs as failed by @tw4l in #300
Full Changelog: 0.9.0...0.10.0-beta.0
Browsertrix Crawler 0.9.1
Bug fix release for screenshots and service workers.
What's Changed
- Fix full page screenshot by @tw4l in #296
- Fix Service Workers being blocked in change to Playwright. (Enabled by default, disabled when profiles are used, to match 0.8.x functionality)
Full Changelog: 0.9.0...0.9.1
Browsertrix Cloud 0.9.0
Major Changes
- BREAKING: Switched from Puppeteer to Playwright. Custom drivers would need to be migrated, see: https://playwright.dev/docs/puppeteer or https://github.com/checkly/puppeteer-to-playwright tool
- Removed puppeteer cluster
- Always using Redis-based crawl state
- Use priority based crawl queue, with URLs of lower depth crawled first and extra hops always crawled last.
- Store 'loadState' in each page, indicating level of loading, bail behaviors run if initial load fails
- Improved timeouts for each page (page load time + behavior time + extra delay)
- New options, including: --pageExtraDelay, --diskUtilization, --maxPageLimit, --title, --description, --logLevel, --context
What's Changed
- logging: serialize regex as string to avoid empty '{}' when logging s… by @ikreymer in #235
- Remove puppeteer-cluster by @tw4l in #219
- Fix size check by @ikreymer in #241
- Add timedRun to prevent async operations from hanging by @tw4l in #243
- Add total timeout + limit redis queue retries by @ikreymer in #248
- Minor crawler fixes after puppeteer-cluster removal refactoring by @tw4l in #250
- Dev 0.9.0 Beta 1 Work - Playwright Removal + Worker Refactor + Redis State by @ikreymer in #253
- Logger cleanup by @ikreymer in #254
- Catch loading issues by @ikreymer in #255
- Add option for sleep interval after behaviors run by @tw4l in #257
- worker index: set worker index automatically to work with k8s naming by @ikreymer in #266
- Reset locked pending URLs when crawler restarts. by @ikreymer in #267
- Ensure crawler can't run out of space with --diskUtilization param by @tw4l in #264
- Add options to filter logs by --logLevel and --context by @tw4l in #271
- Update README for 0.9.0 by @tw4l in #272
- blockrules/logger: use global logger var by @ikreymer in #274
- Add --maxPageLimit override by @ikreymer in #275
- Add --title and --description CLI args to write metadata into datapackage.json by @tw4l in #276
- Don't set viewport for full page screenshots by @tw4l in #221
Full Changelog: 0.8.1...0.9.0
Browsertrix Crawler 0.9.0 Beta 2
What's Changed
- worker index: set worker index automatically to work with k8s naming by @ikreymer in #266
- Reset locked pending URLs when crawler restarts. by @ikreymer in #267
- Ensure crawler can't run out of space with --diskUtilization param by @tw4l in #264
- Add options to filter logs by --logLevel and --context by @tw4l in #271
- Update README for 0.9.0 by @tw4l in #272
- blockrules/logger: use global logger var by @ikreymer in #274
- Add --maxPageLimit override by @ikreymer in #275
Full Changelog: 0.9.0-beta.1...0.9.0-beta.2
Browsertix Crawler 0.9.0 Beta 1
Major Changes
- Removed puppeteer cluster
- BREAKING: Switched from Puppeteer to Playwright. Custom drivers would need to be migrated, see: https://playwright.dev/docs/puppeteer
or https://github.com/checkly/puppeteer-to-playwright tool - Always using Redis-based priority queue for crawl state
What's Changed
- logging: serialize regex as string to avoid empty '{}' when logging s… by @ikreymer in #235
- Remove puppeteer-cluster by @tw4l in #219
- Fix size check by @ikreymer in #241
- Add timedRun to prevent async operations from hanging by @tw4l in #243
- Add total timeout + limit redis queue retries by @ikreymer in #248
- Minor crawler fixes after puppeteer-cluster removal refactoring by @tw4l in #250
- Dev 0.9.0 Beta 1 Work - Playwright Removal + Worker Refactor + Redis State by @ikreymer in #253
- Logger cleanup by @ikreymer in #254
- Catch loading issues by @ikreymer in #255
- Add option for sleep interval after behaviors run by @tw4l in #257
Full Changelog: 0.8.1...0.9.0-beta.1
Browsertix Crawler 0.8.1
What's Changed
- Logging and Behavior Tweaks by @ikreymer in #229
- Fix typos by @stavares843 in #232
- Add crawl log to WACZ by @ikreymer in #231
New Contributors
- @stavares843 made their first contribution in #232
Full Changelog: 0.8.0...0.8.1