Skip to content
This repository was archived by the owner on Jan 28, 2019. It is now read-only.

Validation optional #153

Merged
merged 29 commits into from
Jul 3, 2018
Merged

Validation optional #153

merged 29 commits into from
Jul 3, 2018

Conversation

benkiel
Copy link
Collaborator

@benkiel benkiel commented Jul 3, 2018

Add the option to turn on/off validation. Switch to lxml.

benkiel and others added 29 commits June 11, 2018 11:07
Update tests.
Fix errors.
Add documenation.
Make validation optional
Get encoding right for py2 and py3
@benkiel benkiel requested a review from anthrotype July 3, 2018 14:32
@anthrotype
Copy link
Member

can you remind me again why in the end we decided to have lxml as hard dependency as opposed to use the ElementTree API (supported by both lxml and built-in ElementTree library) and try to use lxml when it's available else fall back to the built-in ElementTree library if not?

I have mixed feelings because, on the one had, I'm happy when we add good third-party dependencies (usually better than our artisanal "reinvented" wheel); on the other hand, this shouldn't be done too lightly, especially when this is compiled extension, and one which, unlike the others (compreffor, pyclipper, etc.) we don't control directly.

@anthrotype
Copy link
Member

@benkiel
Copy link
Collaborator Author

benkiel commented Jul 3, 2018

@anthrotype We could do that, not opposed. We wanted to make lxml the thing for speed, but understand wanting a graceful fallback.

@anthrotype
Copy link
Member

but maybe you or @typesupply decided to require lxml because you're depending on some lxml-specific features that are not available with the built-in elementtree library?
we need to test this it not the case

@anthrotype
Copy link
Member

or if it is, then fine, we can add lxml as required dependency.

the problem is, we need to decide now, otherwise everybody who does pip install fontmake or fontParts or anything will get ImportError

@typesupply
Copy link
Contributor

It's been a really long time since I wrote this... but, I didn't think I was relying on any LXML specific features. Falling back to ET when LXML isn't available is fine with me.

@anthrotype
Copy link
Member

we are using pretty_print=True option in etree.to_string function, which only works for lxml, not for elementtree library, whose writer can't automatically indent..
I guess we need to use something like this as fallback:
https://github.com/fonttools/fonttools/blob/8d7774a3e81df1afc6dedb1272e43c7c3aea100e/Lib/fontTools/designspaceLib/__init__.py#L91-L105

@typesupply
Copy link
Contributor

Ah, yeah. I forgot about that. It's so weird that ET doesn't have this.

@anthrotype
Copy link
Member

and also, the encoding="unicode" argument to etree.tostring is a lxml thing.
Actually it's a python3 thing
https://docs.python.org/3/library/xml.etree.elementtree.html#xml.etree.ElementTree.tostring

@anthrotype
Copy link
Member

i'm going to encode as a UTF-8 bytes string and decode it as Unicode string before returning it from writeGlyphToString

@anthrotype
Copy link
Member

two more problems

  1. ElementTree's tostring always seems to sort attributes alphabetically, even if we use OrderedDict; lxml uses the order we give it. (in the diff below, the minus lines are lxml's, plus are ElementTree's)
 - <glyph name="a" format="2">
 + <glyph format="2" name="a">
  1. When it's a simple element (without children), ElementTree always adds an extra space before the ending />, lxml doesn't:
 - <unicode hex="0062"/>
 + <unicode hex="0062" />

@anthrotype
Copy link
Member

yeah, ElementTree's _serialize_xml functoin calls sorted on the attributes items:
https://github.com/python/cpython/blob/831c29721dcb1b768c6315a4b8a4059c4c97ee8b/Lib/xml/etree/ElementTree.py#L928

😢

@benkiel
Copy link
Collaborator Author

benkiel commented Jul 3, 2018

Perhaps that's enough to say "Alright, sorry, lxml is a requirement"?

@anthrotype
Copy link
Member

it was worth a try. I really wouldn't like to implement our own elementtree xml serializer..
Let's require lxml then.

@benkiel
Copy link
Collaborator Author

benkiel commented Jul 3, 2018

@anthrotype Agree. If there is pushback on the lxml requirement, we can revisit. Thanks for all the work, really appreciated!

@anthrotype
Copy link
Member

just pushed v2.2.1.

Oh, when tagging a new release, I like to "edit" the github release notes so that they will be displayed as formatted markdown, instead of as a monospaced code block that is used by default when pushing a git tag. I basically copy and paste the same text of the git tag. Probably there are tools to automate this, but I'm lazy.

@benkiel
Copy link
Collaborator Author

benkiel commented Jul 3, 2018

@anthrotype got it. and, thanks again; know we just did that one quick — it had been sitting for a while with no comments on the other thread so before the code got too stale it was merged. Thanks again for your running down ETree and everything — it is really appreciated!

@chrissimpkins
Copy link
Contributor

Thoughts about a major version increment for these changes? It sounds like the mandatory use of lxml writes may have the potential for large source text diffs c/w the last release for some users (if I understood Cosimo's comment above correctly) and the elimination of default source validation on UFO reads through the main i/o reading class is a significant change in expected behavior. This has the potential to lead to breakage in projects that depend on the previous default behaviors.

@benkiel
Copy link
Collaborator Author

benkiel commented Jul 4, 2018

@chrissimpkins Fair point. There shouldn't (fingers crossed) be too many diff changes, we didn't change the test cases and made lxml conform to the tests, though I suppose there will still be some. Also open to changing read to validate by default if that seems better.

@anthrotype
Copy link
Member

I also think it would be better to bump major version for this. I'll add the plistlib module from ufoLib2 and then go to 3.0, if you guys are also OK with it.

@benkiel
Copy link
Collaborator Author

benkiel commented Jul 4, 2018

@anthrotype Agree. What do you think about the default being to validate on read?

@anthrotype
Copy link
Member

maybe that could be safer, as long as it's easy to opt-out.
@typemytype seems to prefer validate=False here
https://github.com/typesupply/defcon/pull/191/files

@anthrotype
Copy link
Member

in the defcon PR I linked, the defaults are rather the opposite of what @benkiel was suggesting:

https://github.com/typesupply/defcon/pull/191/files#diff-c6a63107dc3c355ab30047131285dc06R452

read is False, write is True.. Not sure what's best.

@typesupply
Copy link
Contributor

My 2¢ is that validation should be on by default in all cases. That preserves backwards compatibility, for what that is worth. I also didn't engineer defcon against malicious input so I prefer that turning validation off be something that one has to knowingly do.

@anthrotype
Copy link
Member

makes sense. Ok, let's flip the default to be validate=True in ufoLib, and probably defcon too.
@benkiel would you work on that? I'll work on integrating the new plist module.

@typemytype
Copy link
Collaborator

I was just following ufoLib in defcon, will change it to read and write to be validated by default in defcon.

@benkiel
Copy link
Collaborator Author

benkiel commented Jul 4, 2018

@anthrotype done in #155. This now means that validation is on by default for both read and write (it was on for write before)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants