allow usage of non-ascii bytestring literals in templates #11

sqlalchemy-bot · 2007-01-19T10:04:23Z

Migrated issue, originally created by Anonymous

The mako template parser has a problem, or a weirdness, depending on your view. Basically it is not possible to compile any template that contains non-ascii characters inside the ${} code. The problem traces back to python's built-in compiler inability to compile out-of-ascii unicode source. To fix it some kind of encoding-juggling inside ast.py (the 'parse' function?) would be needed as well as adding a #-*- prefix to the code being compiled there. Alas, I haven't been able to fix this myself (mysterious body snatcher exceptions pop out) neither have I enough time to work on it but I'm sure you get the idea.

To replicate the problem, just compile "${f('\u0142')}" as a mako template.

I should add that the problem is serious, at least for us and a showstopper for mako adoption in our project.

Attachments: alternate_unicode.patch

sqlalchemy-bot · 2007-01-19T14:29:57Z

Michael Bayer (@zzzeek) wrote:

ive added backslash replacing for non-ascii characters to expressions sent for AST parsing within expressions, python code blocks, and control lines in [changeset:189]. check the unit tests added to that changeset to get the idea. note that using non-ascii characters anywhere in templates requires that the encoding of the template be specified at the top via a "magic encoding comment".

sqlalchemy-bot · 2007-01-19T14:29:57Z

Changes by Michael Bayer (@zzzeek):

changed status to closed

sqlalchemy-bot · 2007-01-22T07:00:36Z

Anonymous wrote:

I'm afraid it's still wrong. Test case:

import mako.template
t = u"#-*- encoding:utf-8\n${f('\u0142')}".encode('utf-8')
te = mako.template.Template(t)
te.render_unicode(f=lambda x:x)

returns u'\u0142', should return u'\u0142' (tested on svn rev 190).

sqlalchemy-bot · 2007-01-22T07:00:36Z

Changes by Anonymous:

changed status to reopened

sqlalchemy-bot · 2007-01-22T12:42:13Z

Michael Bayer (@zzzeek) wrote:

im sorry, i dont understand at this point. test case:

import mako.template


t = u"#-*- encoding:utf-8\n${f('\u0142')}".encode('utf-8')
te = mako.template.Template(t)
print te.code
f = lambda x:x

assert f('\u0142') == te.render_unicode(f=f)
print repr(unicode(f('\u0142')))
print repr(te.render_unicode(f=lambda x:x))`

generated code (if you believe this is incorrect, tell me what it should say - note that all expressions are expected to be str()-able or unicode expressions since they get passed to unicode() unconditionally - use context.write() to bypass this):

from mako import runtime, filters, cache
UNDEFINED = runtime.UNDEFINED
_magic_number = 1
_modified_time = 1169479868.3539629
_template_filename=None
_template_uri='memory:0x63f30'
_template_cache=cache.Cache(__name__, _modified_time)
_exports = []


def render_body(context,**pageargs):
    __locals = dict(pageargs=pageargs)
    f = context.get('f', UNDEFINED)
    # SOURCE LINE 2
    context.write(unicode(f('\u0142')))
    return ''

program output - assertion case passes:

u'\\u0142'
u'\\u0142'

also observe the unit tests added within the changeset, which embed literal multibyte expressions that come out identically to the original.

sqlalchemy-bot · 2007-01-22T14:02:28Z

Michael Bayer (@zzzeek) wrote:

also, try out the attached patch. it breaks all the current unit tests but i think its what you are looking for, it basically passes the string straight through, adds the "coding" comment to the top of the generated file. i would essentially have to throw out the whole way Mako does unicode and rewrite it to go this approach, it seems.

sqlalchemy-bot · 2007-01-22T14:09:34Z

Michael Bayer (@zzzeek) wrote:

OK, it was using cStringIO. this one passes most tests. again, basic idea is just spitting out the genned module in the same encoding as what was given. not sure if its working all the way though. i know what youre looking for, the total "straight through" without using u"" at all. not sure if i can get this working totally.

sqlalchemy-bot · 2007-01-22T14:18:14Z

Michael Bayer (@zzzeek) wrote:

also im being told that Genshi requires non-ascii strings be sent as u'' as well, so im not sure if this issue is limited to Mako.

sqlalchemy-bot · 2007-01-23T07:22:47Z

Anonymous wrote:

I guess I introduced confusion with '\u0142' which should actually be u'\u0142' - a subtle but important difference :)

Now, this assertion should hold, but doesn't:

assert f(u'\u0142') == te.render_unicode(f=f)

where te = Template(u"#-*- encoding:utf-8\n${f('\u0142')}".encode('utf-8'))

I'm currently reviewing your code and the patch attached and looking for a way to implement what I want. Will keep you updated.

sqlalchemy-bot · 2007-05-01T20:34:38Z

Michael Bayer (@zzzeek) wrote:

ultimately, to make everyone no longer notice that you have to say u'foo' and not 'foo', we have to make it so that generated modules are in the same encoding as the source file. a lot of weird problems arise when you do this, including that the AST parsing needs to be passed bytestrings instead of unicode objects, which then breaks other stuff, and so on. i dont think its high priority now since id prefer people to just use unicode objects.

sqlalchemy-bot · 2007-05-01T20:34:38Z

Changes by Michael Bayer (@zzzeek):

removed labels: bug
added labels: easy, feature
changed title from "non-ASCII code problem" to "allow usage of non-ascii bytestring literals in te"

sqlalchemy-bot · 2008-03-07T13:46:50Z

Michael Bayer (@zzzeek) wrote:

someone has posted a working patch for this in #77 so lets move over to there

sqlalchemy-bot · 2008-03-07T13:46:50Z

Changes by Michael Bayer (@zzzeek):

changed status to closed

sqlalchemy-bot · 2008-03-21T20:15:42Z

Michael Bayer (@zzzeek) wrote:

in d5f83e6:

from mako.template import Template

f = lambda x:x
te = Template(u"#-*- encoding:utf-8\n${f('\u0142')}".encode('utf-8'), disable_unicode=True)
assert f(u'\u0142'.encode('utf-8')) == te.render(f=f)

passes.

sqlalchemy-bot closed this as completed Mar 7, 2008

sqlalchemy-bot added compiler low priority feature labels Nov 26, 2018

sqlalchemy-bot mentioned this issue Nov 26, 2018

turn off unicode #77

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

allow usage of non-ascii bytestring literals in templates #11

allow usage of non-ascii bytestring literals in templates #11

sqlalchemy-bot commented Jan 19, 2007

sqlalchemy-bot commented Jan 19, 2007

sqlalchemy-bot commented Jan 19, 2007

sqlalchemy-bot commented Jan 22, 2007

sqlalchemy-bot commented Jan 22, 2007

sqlalchemy-bot commented Jan 22, 2007

sqlalchemy-bot commented Jan 22, 2007

sqlalchemy-bot commented Jan 22, 2007

sqlalchemy-bot commented Jan 22, 2007

sqlalchemy-bot commented Jan 23, 2007

sqlalchemy-bot commented May 1, 2007

sqlalchemy-bot commented May 1, 2007

sqlalchemy-bot commented Mar 7, 2008

sqlalchemy-bot commented Mar 7, 2008

sqlalchemy-bot commented Mar 21, 2008

allow usage of non-ascii bytestring literals in templates #11

allow usage of non-ascii bytestring literals in templates #11

Comments

sqlalchemy-bot commented Jan 19, 2007

sqlalchemy-bot commented Jan 19, 2007

sqlalchemy-bot commented Jan 19, 2007

sqlalchemy-bot commented Jan 22, 2007

sqlalchemy-bot commented Jan 22, 2007

sqlalchemy-bot commented Jan 22, 2007

sqlalchemy-bot commented Jan 22, 2007

sqlalchemy-bot commented Jan 22, 2007

sqlalchemy-bot commented Jan 22, 2007

sqlalchemy-bot commented Jan 23, 2007

sqlalchemy-bot commented May 1, 2007

sqlalchemy-bot commented May 1, 2007

sqlalchemy-bot commented Mar 7, 2008

sqlalchemy-bot commented Mar 7, 2008

sqlalchemy-bot commented Mar 21, 2008