Unicode support #135

pelson · 2019-01-21T16:12:27Z

Follows up from #134 (so that needs merging first) to allow unicode units, such as π m², as is supported by udunits2.

In addition to this, I've extended the coding standard test to handle an encoding preamble.

Closes #133.

coveralls · 2019-01-21T16:18:28Z

Coverage decreased (-1.4%) to 88.511% when pulling 5df834a on pelson:unicode_support into 18b72bc on SciTools:master.

pelson · 2019-01-21T17:03:07Z

Interestingly the failing test doesn't fail for me locally on py27... Will push a commit shortly.

pelson · 2019-01-22T11:14:38Z

cf_units/__init__.py

@@ -809,6 +809,10 @@ def __init__(self, unit, calendar=None):
        else:
            unit = str(unit).strip()

+        # For the sake of python 2, ensure that the string is a unicode.
+        if not isinstance(unit, six.text_type):


Tempting to put a six.PY2 test here.

Have done that @bjlittle

Also, allow a file encoding in the coding standards test, so that we can have some literal unicode characters for testing with.

…o see that the code can be deleted when the codebase becomes py3 only.

bjlittle · 2019-01-22T11:47:00Z

cf_units/tests/test_unit.py

+        # Not all unicode characters are allowed.
+        msg = '[UT_UNKNOWN] Failed to parse unit "ø"'
+        with self.assertRaises(ValueError, msg=msg):
+            Unit('ø')


bjlittle · 2019-01-22T11:48:04Z

@pelson Looks good to me... I'll let travis do it's thing, then merge 👍

pelson · 2019-01-22T11:48:17Z

@bjlittle - thanks for merging #134. Have rebased (sorry if it has moved under your feet 😉)

bjlittle · 2019-01-22T11:50:13Z

@pelson No worries... I was expecting the rebase 😉

pelson · 2019-01-22T11:51:47Z

OK, so something isn't quite right yet for py2k...

>>> import cf_units
>>> cf_units.Unit("m²")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xb2' in position 7: ordinal not in range(128)
>>> cf_units.Unit(u"m²")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "cf_units/__init__.py", line 810, in __init__
    unit = str(unit).strip()
UnicodeEncodeError: 'ascii' codec can't encode character u'\xb2' in position 1: ordinal not in range(128)

bjlittle · 2019-01-22T11:54:08Z

cf_units/tests/test_coding_standards.py

-LICENSE_RE_PATTERN = r'(\#\!.*\n)?' + LICENSE_RE_PATTERN
-LICENSE_RE = re.compile(LICENSE_RE_PATTERN, re.MULTILINE)
+SHEBANG = r'(\#\!.*\n)?'
+ENCODING = r'(\# \-\*\- coding\: .* \-\*\-\n)?'


@pelson You might want to account for white space after the trailing -*- ...?

Meh. Happy if you want me to, but why be generous on this? Do it right, or fix it is my opinion 😉

bjlittle · 2019-01-22T11:56:37Z

@pelson I'll let you chase down the py2k unicode barf... ping me when you want me to re-review 😉

…eturns a non-unicode for __str__ (unless sys.getdefaultencoding says otherwise).

bjlittle · 2019-01-22T14:50:57Z

Nice, thanks @pelson 👍

pelson · 2019-01-22T15:47:05Z

What a right old shambles py2's unicode handling is. I think I've now got this right. From the start, this was a trivial change for py3k - all of the work (and there has now been several hours worth) has been dealing with the py2 fallout.

With the current setup, the following will occur for py2 users:

>>> import cf_units
>>> cf_units.Unit(u'ø')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "cf_units/__init__.py", line 851, in __init__
    self._propogate_error('Failed to parse unit "%s"' % str_unit, e)
  File "cf_units/__init__.py", line 886, in _propogate_error
    raise ValueError('[%s] %s%s' % (ud_err.status_msg(), msg, error_msg))
ValueError: [UT_UNKNOWN] Failed to parse unit "?"

>>> u = cf_units.Unit(u'π')
>>> u
Unit('?')
>>> u.symbol
'3.14159265358979 1'
>>> u.origin
u'\u03c0'

I think I can do better than that though, so there is another PR incoming.

pelson · 2019-01-23T11:26:28Z

As promised #137.

pelson added the Type: Enhancement label Jan 21, 2019

pelson mentioned this pull request Jan 22, 2019

Tidy up the propagation of exceptions from UDUNITS2 #136

Merged

pelson commented Jan 22, 2019

View reviewed changes

pelson added 3 commits January 22, 2019 11:46

Always treat units as unicode. Closes SciTools#133.

9c497ce

Also, allow a file encoding in the coding standards test, so that we can have some literal unicode characters for testing with.

Fix date2num test which is incorrectly using repr when str was intended.

13174bc

Tidy up the Unit constructor for the py2 case, so that it is easier t…

1645170

…o see that the code can be deleted when the codebase becomes py3 only.

bjlittle reviewed Jan 22, 2019

View reviewed changes

pelson force-pushed the unicode_support branch from ba5fbcd to 1645170 Compare January 22, 2019 11:47

bjlittle self-assigned this Jan 22, 2019

bjlittle reviewed Jan 22, 2019

View reviewed changes

pelson added 2 commits January 22, 2019 14:07

Handle unicode object in py2 specially, and always ensure that py2k r…

5df834a

…eturns a non-unicode for __str__ (unless sys.getdefaultencoding says otherwise).

Ensure that the error raised in Unit constructor handles unicode too.

f7c7ef3

bjlittle merged commit 1e9af2a into SciTools:master Jan 22, 2019

pelson added a commit to pelson/cf_units that referenced this pull request Jan 22, 2019

Improvements to the changes in SciTools#135.

c5bd824

pelson deleted the unicode_support branch January 22, 2019 15:47

pelson mentioned this pull request Jan 22, 2019

Improvements to the recent unicode changes #137

Merged

pelson added a commit to pelson/cf_units that referenced this pull request Jan 22, 2019

Improvements to the changes in SciTools#135.

3767bdd

bjlittle pushed a commit that referenced this pull request Jan 24, 2019

Improvements to the changes in #135. (#137)

659e994

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unicode support #135

Unicode support #135

pelson commented Jan 21, 2019

coveralls commented Jan 21, 2019 •

edited

Loading

pelson commented Jan 21, 2019

pelson Jan 22, 2019

pelson Jan 22, 2019

bjlittle Jan 22, 2019

bjlittle commented Jan 22, 2019

pelson commented Jan 22, 2019

bjlittle commented Jan 22, 2019

pelson commented Jan 22, 2019 •

edited

Loading

bjlittle Jan 22, 2019

pelson Jan 22, 2019

bjlittle commented Jan 22, 2019

bjlittle commented Jan 22, 2019

pelson commented Jan 22, 2019

pelson commented Jan 23, 2019

Unicode support #135

Unicode support #135

Conversation

pelson commented Jan 21, 2019

coveralls commented Jan 21, 2019 • edited Loading

pelson commented Jan 21, 2019

pelson Jan 22, 2019

Choose a reason for hiding this comment

pelson Jan 22, 2019

Choose a reason for hiding this comment

bjlittle Jan 22, 2019

Choose a reason for hiding this comment

bjlittle commented Jan 22, 2019

pelson commented Jan 22, 2019

bjlittle commented Jan 22, 2019

pelson commented Jan 22, 2019 • edited Loading

bjlittle Jan 22, 2019

Choose a reason for hiding this comment

pelson Jan 22, 2019

Choose a reason for hiding this comment

bjlittle commented Jan 22, 2019

bjlittle commented Jan 22, 2019

pelson commented Jan 22, 2019

pelson commented Jan 23, 2019

coveralls commented Jan 21, 2019 •

edited

Loading

pelson commented Jan 22, 2019 •

edited

Loading