Skip to content
This repository was archived by the owner on Jan 28, 2019. It is now read-only.

copy lxml-based plistlib module from fonttools/ufoLib2 #160

Merged
merged 16 commits into from
Jul 12, 2018

Conversation

anthrotype
Copy link
Member

@anthrotype anthrotype commented Jul 11, 2018

This PR replaces the ufoLib.plistlib module with the one from https://github.com/fonttools/ufoLib2/blob/master/src/ufoLib2/plistlib.py

The module provides the same interface for both py2 and py3 but this time using the python3's plistlib module API, not the python2's as the old ufoLib.plistlib.

It's also faster because it's written on top of lxml (which is now required by ufoLib), so we don't need to first serialise using the built-in plistlib, then parse as generic XML and re-serialise when we write the GLIF lib element.

For python 2.7, this module has an additional requirement on singledispatch, a backport of
https://docs.python.org/3/library/functools.html#functools.singledispatch
On python 3 this is built into the standard library functools module. This is used to define generic functions whose implementation depends on the type of the first argument (see _make_element and how it's used in plistlib.totree). It does make things a bit faster (about 4% faster to build an element tree from plist data when compared to a more straightforward tree builder with a big if statement --fonttools/ufoLib2@7d5fee5), besides making the code look cleaner and more idiomatic.

The module includes a full test suite borrowed from the cpython's test_plistlib.py, but rewritten to use pytest-style tests (we already use pytest as test runner here).

https://github.com/fonttools/ufoLib2/blob/master/src/ufoLib2/plistlib.py

replaces the old ufoLib.plistlib shim, privides a single interface
similar to python3's stdlib `plistlib` module, but built on top of
lxml.

On python < 3, it requires the singledispatch backport.
This will be added as conditional installation requirement.
and also set maximum version limit to the next major version of each install_requires,
to prevent breakages when major version bumps
this is basically the same cpython plistlib test suite,
rewritten as pytest-style tests
@benkiel
Copy link
Collaborator

benkiel commented Jul 11, 2018

@anthrotype I’m out this week, and think @typesupply is too, if this looks good to you, feel free to merge and cut a 3.0 release as discussed. Thanks for this!

@anthrotype
Copy link
Member Author

enjoy you holidays! 🌞

we right-justifiy binary data text to max 76 chars (min 16) per line
similarly to the way the stdlib plistlib module does (although they
use tabs for indentation, equivalent to 8-spaces).

We write the plist doctype ourselves, otherwise lxml always adds a
newline to it (even wen pretty_print is False).
since 'lib' element is itself contained in 'glyph' root element
@anthrotype
Copy link
Member Author

I think this is ready to be merged, but another pair of eyes won't hurt.

@anthrotype anthrotype requested a review from typemytype July 12, 2018 10:20
@typemytype
Copy link
Collaborator

I would not change the api of ufoLib.plistlib
as this module is also used outside ufoLib fe in defcon or in RF extension like glifViewer

Im missing readPlist, writePlist, readPlistFromString, writePlistToString

@anthrotype
Copy link
Member Author

no prob, I'll make backward compatible functions for those to avoid breaking backward compat.

@anthrotype
Copy link
Member Author

anthrotype commented Jul 12, 2018

but can I throw a deprecation warning and suggest to use load/dump/loads/dumps for future code? That's better than keeping alive the old python2 API forever, I think.

@typemytype
Copy link
Collaborator

deprecated warning are fine, if not it will break lots of scripts outside ufoLib

@anthrotype
Copy link
Member Author

by the way, in the link to that glifViewer extension I don't see any usage of ufoLib.plistlib.readPlist or writePlist, but only of ufoLib.glifLib.readGlyphFromString and writeGlyphToString, which I didn't modify here (they keep working as before).

@anthrotype
Copy link
Member Author

and defcon itself doesn't seem to be importing its readPlist or writePlist from ufoLib.plistlib but it's doing the aliasing itself from the stdlib's plistlib module, e.g.:
https://github.com/typesupply/defcon/search?q=readPlist&unscoped_q=readPlist

@anthrotype
Copy link
Member Author

I just wonder, was the ufoLib.plistlib ever intended to be a public API or just a convenience module meant to be used internally only by ufoLib? In the latter case, dropping readPlist etc. would not be considered strictly as a backward compatibility break.

Having said that, I'm going to add aliases with deprecation warnings anyway

@typemytype
Copy link
Collaborator

by the way, in the link to that glifViewer extension I don't see any usage of ufoLib.plistlib.readPlist

oh true, my mistake... (just scared with those kinds of changes)

in defcon Im finding this: https://github.com/typesupply/defcon/blob/master/Lib/defcon/objects/layerSet.py#L500

I can change that...

@anthrotype
Copy link
Member Author

I can change that...

too late, I've implemeted them :)

@anthrotype
Copy link
Member Author

the old readPlist, writePlist, readPlistFromString and writePlistToString are back, with this commit:
d664f31

Another difference that's worth mentioning here between this new ufoLib.plistlib and the old one (which was simply a shim to the standard library's plistlib module with a python2 façade) is the following:

In the old python2 plistlib from the stdlib, both the unicode and bytes types (which are the same thing as str in python2) were serialised as <string> elements in the generated property list. For serialising binary <data> there there was a plistlib.Data class wrapping the bytes and explicitly mark them as non-textual binary data. Similarly, the readPlist function would return a Data instance when <data> elements are present.

In the python3's plistlib module from the stdlib, the latter Data class was deprecated, and the load and dump functions got a new use_builtin_types=True argument to control this behaviour. When it's True (by default), then binary data is deserialised as bytes object, otherwise as the old plistlib.Data. Note that even in our previous ufoLib.plistlib shim module, we did not set that argument to False for python3, so we were already inconsistent in the treatment of bytes vs Data between python 2 (where Data was default, and python3 where bytes is default).

I chose not implement this in the new ufoLib.plistlib. The behaviour of the new module is the same as the python3's plistlib with use_builtin_data=True. There is no option to make it behave like the python2 plistlib as far as the treatment of Data.

This means that, if you are using python2 and you do not import unicode_literals or you don't use u"" prefix in your string literals and try to dump these to plist, they will be serialised as <data>, not as <string> (because that's what they are, they are really bytes string, encoded with the encoding of the source file, usually "ascii" in py2).

I think there won't be issues at least in ufoLib or defcon becuase we ported these to py3 since long time and we have gotten accustomed to treating textual data as unicode (str in py3) and binary data as bytes, doing the decoding/encoding as needed.

@anthrotype anthrotype merged commit 9735cdc into unified-font-object:master Jul 12, 2018
@anthrotype anthrotype deleted the plistlib branch July 12, 2018 13:25
@miguelsousa
Copy link
Member

miguelsousa commented Jul 13, 2018

@anthrotype I've tried integrating up to 9735cdc into afdko and got two changes:

  1. Space indents in plist files instead of tabs
  2. <data> elements instead of <string>
-		<string>Source Serif Pro</string>
+		<data>
+			U291cmNlIFNlcmlmIFBybw==
+		</data>
-		<string>SourceSerifPro-Light</string>
+		<data>
+			U291cmNlU2VyaWZQcm8tTGlnaHQ=
+		</data>
-		<string>Light</string>
+		<data>
+			TGlnaHQ=
+		</data>

@anthrotype
Copy link
Member Author

Space indents in plist files instead of tabs

that's a feature, right? Finally we get consistent indentation between the GLIF xml files (which have always used 2 spaces) and the plist files (which the standard library's plistlib writer always writes with tabs). With lxml we don't have a choice, so we use 2 spaces everywhere :)

elements instead of

I've explained this in the comment above #160 (comment)

you need to make sure you use unicode (alias of str in python 3) type for textual data and bytes (alias of str in python 2) for binary data, just like if you are using python 3. Import unicode_literals if you can, or use u"" for string literals. Separate text from binary data, do the decoding and encoding at the I/O boundaries (when opening or writing text files), manipulate text as text and raw bytes as raw bytes.

@anthrotype
Copy link
Member Author

if you are concerned about diff in whitespace and similar stuff, I'd recommend you use the ufonormalizer.

@anthrotype
Copy link
Member Author

anyway, thanks Miguel for checking! 👍

@miguelsousa
Copy link
Member

Finally we get consistent indentation between the GLIF xml files [...] and the plist files

Fair enough.

you need to make sure you use unicode (alias of str in python 3) type for textual data and bytes (alias of str in python 2) for binary data, just like if you are using python 3

Thanks for the details. Have to figure out which of our scripts need to be updated.

@miguelsousa
Copy link
Member

@anthrotype do you think you'll release a new version before Monday, when pyup comes knocking on our door again?

@miguelsousa
Copy link
Member

@anthrotype I've tracked down the data vs. string issue to mutatorMath. Is this something you can fix, since you're more familiar with what's amiss? Here's a simple test case:

from mutatorMath.ufo import build
dsPath = 'adobe-type-tools/afdko/Tests/makeinstancesufo_data/input/font.designspace'
build(documentPath=dsPath)

Open the fontinfo.plist file of one of the UFOs that you'll find in the temp_output directory, and you'll see it contains <data> elements. Thanks in advance.

@miguelsousa
Copy link
Member

There's one in metainfo.plist as well,

     <key>creator</key>
-    <string>org.robofab.ufoLib</string>
+    <data>
+    b3JnLnJvYm9mYWIudWZvTGli
+    </data>

@miguelsousa
Copy link
Member

Adding from __future__ import unicode_literals to ufoLib's __init__.py seems to fix the diff above.
Is ufoLib itself lacking a few unicode_literals imports?

@anthrotype
Copy link
Member Author

I added unicode_literals to all modules in the other PR I’m working, it fixed the issue you mentioned
7aeffbb

I’ll take a look at the mutatormath issue tomorrow, and yes I’ll to make a release before then. Have a nice weekend

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants