Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode support #135

Merged
merged 5 commits into from
Jan 22, 2019
Merged

Unicode support #135

merged 5 commits into from
Jan 22, 2019

Conversation

pelson
Copy link
Member

@pelson pelson commented Jan 21, 2019

Follows up from #134 (so that needs merging first) to allow unicode units, such as π m², as is supported by udunits2.

In addition to this, I've extended the coding standard test to handle an encoding preamble.

Closes #133.

@coveralls
Copy link

coveralls commented Jan 21, 2019

Coverage Status

Coverage decreased (-1.4%) to 88.511% when pulling 5df834a on pelson:unicode_support into 18b72bc on SciTools:master.

@pelson
Copy link
Member Author

pelson commented Jan 21, 2019

Interestingly the failing test doesn't fail for me locally on py27... Will push a commit shortly.

@@ -809,6 +809,10 @@ def __init__(self, unit, calendar=None):
else:
unit = str(unit).strip()

# For the sake of python 2, ensure that the string is a unicode.
if not isinstance(unit, six.text_type):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tempting to put a six.PY2 test here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have done that @bjlittle

Also, allow a file encoding in the coding standards test, so that we can have some literal unicode characters for testing with.
…o see that the code can be deleted when the codebase becomes py3 only.
# Not all unicode characters are allowed.
msg = '[UT_UNKNOWN] Failed to parse unit "ø"'
with self.assertRaises(ValueError, msg=msg):
Unit('ø')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice...

@bjlittle bjlittle self-assigned this Jan 22, 2019
@bjlittle
Copy link
Member

@pelson Looks good to me... I'll let travis do it's thing, then merge 👍

@pelson
Copy link
Member Author

pelson commented Jan 22, 2019

@bjlittle - thanks for merging #134. Have rebased (sorry if it has moved under your feet 😉)

@bjlittle
Copy link
Member

@pelson No worries... I was expecting the rebase 😉

@pelson
Copy link
Member Author

pelson commented Jan 22, 2019

OK, so something isn't quite right yet for py2k...

>>> import cf_units
>>> cf_units.Unit("m²")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xb2' in position 7: ordinal not in range(128)
>>> cf_units.Unit(u"m²")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "cf_units/__init__.py", line 810, in __init__
    unit = str(unit).strip()
UnicodeEncodeError: 'ascii' codec can't encode character u'\xb2' in position 1: ordinal not in range(128)

LICENSE_RE_PATTERN = r'(\#\!.*\n)?' + LICENSE_RE_PATTERN
LICENSE_RE = re.compile(LICENSE_RE_PATTERN, re.MULTILINE)
SHEBANG = r'(\#\!.*\n)?'
ENCODING = r'(\# \-\*\- coding\: .* \-\*\-\n)?'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pelson You might want to account for white space after the trailing -*- ...?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meh. Happy if you want me to, but why be generous on this? Do it right, or fix it is my opinion 😉

@bjlittle
Copy link
Member

@pelson I'll let you chase down the py2k unicode barf... ping me when you want me to re-review 😉

@bjlittle
Copy link
Member

Nice, thanks @pelson 👍

@bjlittle bjlittle merged commit 1e9af2a into SciTools:master Jan 22, 2019
pelson added a commit to pelson/cf_units that referenced this pull request Jan 22, 2019
@pelson
Copy link
Member Author

pelson commented Jan 22, 2019

What a right old shambles py2's unicode handling is. I think I've now got this right. From the start, this was a trivial change for py3k - all of the work (and there has now been several hours worth) has been dealing with the py2 fallout.

With the current setup, the following will occur for py2 users:

>>> import cf_units
>>> cf_units.Unit(u'ø')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "cf_units/__init__.py", line 851, in __init__
    self._propogate_error('Failed to parse unit "%s"' % str_unit, e)
  File "cf_units/__init__.py", line 886, in _propogate_error
    raise ValueError('[%s] %s%s' % (ud_err.status_msg(), msg, error_msg))
ValueError: [UT_UNKNOWN] Failed to parse unit "?"

>>> u = cf_units.Unit(u'π')
>>> u
Unit('?')
>>> u.symbol
'3.14159265358979 1'
>>> u.origin
u'\u03c0'

I think I can do better than that though, so there is another PR incoming.

@pelson pelson deleted the unicode_support branch January 22, 2019 15:47
pelson added a commit to pelson/cf_units that referenced this pull request Jan 22, 2019
@pelson
Copy link
Member Author

pelson commented Jan 23, 2019

As promised #137.

bjlittle pushed a commit that referenced this pull request Jan 24, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants