v0.3.0
๐ New Release: Version 0.3.0
We're excited to announce a significant overhaul of the Fundus crawling core logic in this release! We've transitioned from using asyncio
to a ThreadPool
-based solution, resulting in a more robust and performant system. Now, each publisher operates on its own thread, synchronized seamlessly through a queue.
Breaking changes
To provide a more streamlined experience, we've relocated every crawler-type specific parameter to its respective constructor. As a result, these parameters are no longer accessible through the crawl
method:
delay
->Crawler
start, end
-> CCNewsCrawler
Furthermore, since we removed asyncio
, the crawl_async
method is no longer available.
What's new
- Unbatch Fundus by @MaxDall in #357
- Add
free_access
as attribute toArticle
by @MaxDall in #421 - Add query parameter [Based on #357] by @addie9800 in #403
- Rework
ExtractionFilter
to adept to boolean values by @MaxDall in #423
New publisher
- Add Lithuanian News Source by @addie9800 in #393
- Add US version of business insider by @MaxDall in #356
- Adding a swiss publisher (SRF) by @addie9800 in #410
- Add
Rheinische Post
as publisher by @MaxDall in #416
Updating existing publisher
- This is a renewed PR for BI Germany, that keeps the mostly Test files unmodified by @addie9800 in #402
- Bump
WAZ
to versionV1_1
by @MaxDall in #388 - Update
FAZ
parser by @MaxDall in #419 - bi authentication bug workaround by @addie9800 in #406
Bug fixes
- Fix domains for several publishers by @MaxDall in #398
- Restrict
typing-extensions
version to >= 4.6 by @MaxDall in #405 - Bump
mypy
to version 1.9.0 by @MaxDall in #412 - Fixed a bug in
documentation.yaml
by @MaxDall in #415 - Fix a bug in generate_parser_test_files.py by @MaxDall in #418
- Fix a bug in bf_search regarding boolean values by @MaxDall in #422
QoL
- Adds Pretty Print for PublisherCollection and PublisherSpec by @addie9800 in #399
- Add custom filter for
publisher_coverage
to skip boolean values by @MaxDall in #408 - Documentation Update: Explain Addition of New Countries by @addie9800 in #413
- Attributes Parameter in Test Generation Script by @addie9800 in #411
- Add
body
to unit tests by @MaxDall in #338 - Adds a part about
generate_tables
script to the documentation by @MaxDall in #424
Maintenance
- Update relevant actions to versions utilizing node 20 by @MaxDall in #417
- Disable
strict_query
parsing for URL validation. by @MaxDall in #407
Full Changelog: v0.2.2...v0.3.0