Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tag value (#text) must be a string when tag has additional parameters - unparse #366

Open
Behoston opened this issue Dec 9, 2024 · 1 comment

Comments

@Behoston
Copy link

Behoston commented Dec 9, 2024

Simple case - no params

data = {"a": 1}

works as expected, producing:

<a>1</a>

One param

However if tag a has parameter, its value must be a string, which is unexpected due to previous example.

data = {"a": {"@param": "test", "#text": 1}}

Throws an exception TypeError: decoding to str: need a bytes-like object, int found.

Param is integer, value is string

This works fine as well as simple case when only value of tag is provided:

data = {"a": {"@param": 1, "#text": "test"}}
<a param="1">test</a>
@Behoston Behoston changed the title Tag value (#text) must be a string when tag has additional parameters Tag value (#text) must be a string when tag has additional parameters - unparse Dec 9, 2024
@ajslater
Copy link

This happens because the xml.sax.saxutils.XMLGenerator.content() is not resilient to non string data:
It tries to encode the content like:

            if not isinstance(content, str):
               content = str(content, self._encoding)

And will always throw with ints, floats or decimals with usual encoding like utf-8, latin-1, ascii etc.

What might help is if xmltodict pre-encoded the data beforehand in the _emit() function:
xmltodict.py:486

        if cdata is not None:
            # BEGIN PATCH
            if not isinstance(cdata, str):
                cdata = str(cdata).encode(encoding, 'ignore').decode(encoding)
            # END PATCH
            content_handler.characters(cdata)

And would require the _emit function to accept an encoding parameter, passed in from unparse()

But I haven't thought deeply about this solution and the maintainer of this project would be better suited to understand the hazards of this approach.

In the meanwhile we'll have to preprocess data with numeric #text values.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants
@ajslater @Behoston and others