You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
when you try to extract JSON-LD data from this page, you'll get: Invalid control character at: line 8 column 353 (char 625)
Maybe need to change JsonLdExtractor._extract_items() in extruct/extruct/jsonld.py as below:
fromjsonimportJSONDecodeErrordef_extract_items(self, node):
script=node.xpath('string()')
try:
data=json.loads(script)
exceptValueError:
# sometimes JSON-decoding errors are due to leading HTML or JavaScript commentstry:
data=json.loads(HTML_OR_JS_COMMENTLINE.sub('', script))
exceptJSONDecodeError:
data=json.loads(script, strict=False)
ifisinstance(data, list):
returndataelifisinstance(data, dict):
return [data]
The text was updated successfully, but these errors were encountered:
Some pages have JSON-LD with control characters.
One example is: https://www.johnlewis.com/sony-xperia-x-smartphone-android-5-4g-lte-sim-free-32gb/p3210080
when you try to extract JSON-LD data from this page, you'll get:
Invalid control character at: line 8 column 353 (char 625)
Maybe need to change
JsonLdExtractor._extract_items()
inextruct/extruct/jsonld.py
as below:The text was updated successfully, but these errors were encountered: