Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[format.string.escaped] does not specify boundary conditions for sequences of ill-formed code units #80

Open
tahonermann opened this issue Oct 10, 2022 · 0 comments
Labels
clarification Something isn't clear help wanted Extra attention is needed

Comments

@tahonermann
Copy link
Member

[format.string.escaped]p2.2 states:

For each code unit sequence X in S that either encodes a single character, is a shift sequence, or is a sequence of ill-formed code units, processing is in order as follows:
What constitutes a "sequence of ill-formed code units" is not specified. That is fine for implementation-defined encodings, but a precise definition could be specified for UTF-8, UTF-16, and UTF-32.

Unicode PR-121 provides a definition for "entire ill-formed subsequence" that is a good candidate for how a "sequence of ill-formed code units" might be defined:

In these policy statements, "entire ill-formed subsequence" refers to all code units in the ill-formed subsequence up to but not including the start of the next well-formed code unit sequence.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clarification Something isn't clear help wanted Extra attention is needed
Development

No branches or pull requests

1 participant