-
-
Notifications
You must be signed in to change notification settings - Fork 31.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-111089: Add PyUnicode_AsUTF8Unsafe() function #111672
Conversation
Moreover, PyUnicode_AsUTF8AndSize(str, NULL) now raises an exception if the string contains embedded null characters.
@serhiy-storchaka suggested in private that if The change adds |
Apparently, this is a disagreement on the PyUnicode_AsUTF8() change which rejects null characters: #111091 (comment) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is more consistent with PyUnicode_AsWideCharString()
and PyBytes_AsStringAndSize`.
In general LGTM (besides some nitpicks), but I would wait until the ongoing discussion has been finished.
An alternative is to restore the PyUnicode_AsUTF8()
behavior and introduce PyUnicode_AsUTF8Safe()
. Then PyUnicode_AsUTF8()
can be removed from the Limited C API and deprecated as it was initially planned.
@@ -971,6 +971,12 @@ These are the UTF-8 codec APIs: | |||
returned buffer always has an extra null byte appended (not included in | |||
*size*), regardless of whether there are any other null code points. | |||
|
|||
If *size* is NULL and the *unicode* string contains embedded null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The wording differs from the one for PyUnicode_AsWideCharString()
. It would be better to have the same wording for the same behavior, so the user do not need to search non-existing differences.
If *size* is NULL and the *unicode* string contains embedded null | ||
characters, raise an exception. To accept embedded null characters and | ||
truncate on purpose at the first null byte, :c:func:`PyUnicode_AsUTF8Unsafe` | ||
and :c:func:`PyUnicode_AsUTF8AndSize(unicode, &size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a reference to self. Unlikely it will be useful.
Similar to :c:func:`PyUnicode_AsUTF8AndSize(unicode, NULL) | ||
<PyUnicode_AsUTF8AndSize>`, but does not store the size. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PyUnicode_AsUTF8AndSize(unicode, NULL)
does not store size either.
Maybe just say that it is equivalent to PyUnicode_AsUTF8AndSize(unicode, NULL)
? And no more explanations will be needed.
#if !defined(Py_LIMITED_API) || Py_LIMITED_API+0 >= 0x030D0000 | ||
PyAPI_FUNC(const char*) PyUnicode_AsUTF8Unsafe(PyObject *unicode); | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe not add it to the Limited C API? PyUnicode_AsUTF8()
was not the Limited C API before 3.13.
@@ -451,7 +451,13 @@ PyAPI_FUNC(PyObject*) PyUnicode_AsUTF8String( | |||
// This function caches the UTF-8 encoded string in the Unicode object | |||
// and subsequent calls will return the same string. The memory is released | |||
// when the Unicode object is deallocated. | |||
PyAPI_FUNC(const char *) PyUnicode_AsUTF8(PyObject *unicode); | |||
PyAPI_FUNC(const char*) PyUnicode_AsUTF8(PyObject *unicode); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, this function should only be available in the Limited C API 3.13.
I abandon this PR in favor of the opposite approach: add PyUnicode_AsUTF8Safe(), PR #111688. |
Moreover, PyUnicode_AsUTF8AndSize(str, NULL) now raises an exception if the string contains embedded null characters.
📚 Documentation preview 📚: https://cpython-previews--111672.org.readthedocs.build/