Skip to content

Commit

Permalink
Merge pull request #64 from johnb30/utf
Browse files Browse the repository at this point in the history
Handle UTF errors with invalid bytes.
  • Loading branch information
johnb30 committed Jul 18, 2014
2 parents dc500ca + 1f250c6 commit 272df2e
Showing 1 changed file with 5 additions and 1 deletion.
6 changes: 5 additions & 1 deletion pages_scrape.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,11 @@ def scrape(url, extractor):

page = requests.get(url, headers=headers)
try:
article = extractor.extract(raw_html=page.content)
try:
article = extractor.extract(raw_html=page.content)
except UnicodeDecodeError:
article = extractor.extract(raw_html=page.content.decode('utf-8',
errors='replace'))
text = article.cleaned_text
meta = article.meta_description
return text, meta
Expand Down

0 comments on commit 272df2e

Please sign in to comment.