Skip to content
This repository has been archived by the owner on Sep 10, 2020. It is now read-only.

UnicodeDecodeError: 'charmap' codec can't decode... #29

Open
VanessaVanG opened this issue Jul 30, 2018 · 7 comments
Open

UnicodeDecodeError: 'charmap' codec can't decode... #29

VanessaVanG opened this issue Jul 30, 2018 · 7 comments

Comments

@VanessaVanG
Copy link

Did @PandaWhoCodes pip install git+https://github.com/reach2ashish/geograpy.git plus
nltk.downloader.download('maxent_ne_chunker')
nltk.downloader.download('words')
nltk.downloader.download('treebank')
nltk.downloader.download('maxent_treebank_pos_tagger')
nltk.downloader.download('punkt')
nltk.download('averaged_perceptron_tagger')

and it seemed to be going well until I tried the example
url = 'http://www.bbc.com/news/world-europe-26919928'
places = geograpy.get_place_context(url=url)

I get
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 274: character maps to <undefined>

Python 3.6 Windows
Any thoughts? (or alternatives? I need to pull out city names. I've used GeoText for the country names (not positive it's working right yet) but GeoText's cities doesn't work very well.)

@sergeiGKS
Copy link

Same issue.

@sergeiGKS
Copy link

@VanessaVanG,

in line 25 of places.py file:

instead of
with open(cur_dir+"/data/GeoLite2-City-Locations.csv", "rb") as info:

put this
with open(cur_dir+"/data/GeoLite2-City-Locations.csv", "rt", encoding="utf-8") as info:

@srinisc
Copy link

srinisc commented Nov 14, 2018

Will this issue be fixed in an upcoming release?

@ghost
Copy link

ghost commented Jan 8, 2019

Unfortunately that fix still doesn't work for me @sergeiGKS

@yougha54
Copy link

@VanessaVanG @sergeiGKS
if you delete the 4th line of data/GeoLite2-City-Locations.csv, it should work.

keineahnung2345 added a commit to keineahnung2345/geograpy that referenced this issue Feb 16, 2019
@SamDean332
Copy link

I am still getting this even trying both fixes. I know it is because Excel file contains quite a few odd characters, but the encoding does not seem to work. I can remove Char by char to change the error position, but do not know how to get it all.

Python 3.7 on Windows 10

urls = hits['link'].values for url in urls: place = geograpy.get_place_context(url=url) print(place)

@SamDean332
Copy link

After some investigation, this is a Windows vs Linux error in some cases. Even using the

with open(cur_dir + "/data/GeoLite2-City-Locations.csv", encoding="utf-8") as info:
I could not resolve the error on my Windows computer. However, the exact same code ran fine on a Linux computer I use as well. I looked in the the City-Locations.csv file on Linux, and it appeared LibreOffice automatically encoded and/or resolved all the characters. Where as looking at the same file in Excel, I would still have all the funky characters causing the error. Excel for some reason insists on keeping the odd characters.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants