Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix article serialization for images #703

Merged
merged 3 commits into from
Feb 12, 2025
Merged

Conversation

MaxDall
Copy link
Collaborator

@MaxDall MaxDall commented Feb 11, 2025

There was a bug when serializing the images attribute described in #702. This PR provides a quick fix so that the to_json method also works for images.

@MaxDall MaxDall added the bug fix Fixes a bug or something labeled with bug label Feb 11, 2025
@MaxDall MaxDall requested a review from addie9800 February 11, 2025 18:02
@addie9800
Copy link
Collaborator

I am having some issues with this.

from fundus import PublisherCollection, Crawler,
from tests.utility import get_test_articles

publisher = PublisherCollection.de.Tagesschau
crawler = Crawler(publisher)

if __name__ == "__main__":
    for article in crawler.crawl(max_articles=1, only_complete=False, error_handling='suppress', save_to_file="E:\\Temp\\test.json"):
        print(article)

Running this code gives me an error: TypeError: Attribute 'topics' of type <class 'lxml.etree._ElementUnicodeResult'> is not JSON serializable. This can be easily fixed by modifying your code slightly:

        def serialize(v: Any) -> JSONVal:
            if hasattr(v, "serialize"):
                return v.serialize()  # type: ignore[no-any-return]
            elif isinstance(v, datetime):
                return str(v)
            elif isinstance(v, str):
                return v
            raise TypeError(f"Attribute {attribute!r} of type {type(v)!r} is not JSON serializable")

Yet, this causes this error: line 342, in <lambda> json.dumps(crawled_articles, default=lambda o: o.to_json(), ensure_ascii=False, indent=4) AttributeError: 'datetime.datetime' object has no attribute 'to_json' which causes me to scratch my head. I have played around with it for some time now, but can't lay my finger on the issue. Perhaps you see can see it.

@MaxDall MaxDall merged commit a55bd0e into master Feb 12, 2025
4 checks passed
@MaxDall MaxDall deleted the gh702-fix-serialization branch February 27, 2025 14:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug fix Fixes a bug or something labeled with bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants