From 05383e604d02c029d42d50d50240cfcad83fa3b0 Mon Sep 17 00:00:00 2001 From: Adrian Breiding Date: Sun, 18 Feb 2024 18:57:01 +0100 Subject: [PATCH 01/12] Add free_access attribute --- docs/attribute_guidelines.md | 8 ++++++++ docs/how_to_add_a_publisher.md | 18 ++++++++++++++++++ src/fundus/parser/base_parser.py | 11 ++++++++++- src/fundus/publishers/de/bild.py | 8 ++++++++ .../publishers/de/braunschweiger_zeitung.py | 4 ---- 5 files changed, 44 insertions(+), 5 deletions(-) diff --git a/docs/attribute_guidelines.md b/docs/attribute_guidelines.md index 01b906cbc..304ecab25 100644 --- a/docs/attribute_guidelines.md +++ b/docs/attribute_guidelines.md @@ -58,4 +58,12 @@ Those attributes will be validated with unit tests when used. List[str] generic_topic_parsing + + free_access + A boolean which is set to be True, if the article is restricted to users with a subscription. This usually indicates + that the article cannot be crawled completely. + This attribute is implemented by default + bool + + diff --git a/docs/how_to_add_a_publisher.md b/docs/how_to_add_a_publisher.md index 0c9cf14cb..91a00b4fd 100644 --- a/docs/how_to_add_a_publisher.md +++ b/docs/how_to_add_a_publisher.md @@ -469,6 +469,24 @@ Instead, we recommend referring to [this](https://devhints.io/xpath) documentati Make sure to examine other parsers and consult the [attribute guidelines](attribute_guidelines.md) for specifics on attribute implementation. We strongly encourage utilizing these utility functions, especially when parsing the `ArticleBody`. +### Checking the free_access attribute + +In case your new publisher does not have a subscription model, you can go ahead and skip this step. If it does, +please verify that there is a tag `isAccessibleForFree` within the `