[format.string.escaped] does not specify boundary conditions for sequences of ill-formed code units #80

tahonermann · 2022-10-10T22:40:27Z

For each code unit sequence X in S that either encodes a single character, is a shift sequence, or is a sequence of ill-formed code units, processing is in order as follows:
What constitutes a "sequence of ill-formed code units" is not specified. That is fine for implementation-defined encodings, but a precise definition could be specified for UTF-8, UTF-16, and UTF-32.

Unicode PR-121 provides a definition for "entire ill-formed subsequence" that is a good candidate for how a "sequence of ill-formed code units" might be defined:

In these policy statements, "entire ill-formed subsequence" refers to all code units in the ill-formed subsequence up to but not including the start of the next well-formed code unit sequence.

tahonermann added help wanted Extra attention is needed clarification Something isn't clear labels Oct 10, 2022

tahonermann mentioned this issue Oct 10, 2022

[format.string.escaped] Fix invalid example cplusplus/draft#5890

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[format.string.escaped] does not specify boundary conditions for sequences of ill-formed code units #80

[format.string.escaped] does not specify boundary conditions for sequences of ill-formed code units #80

tahonermann commented Oct 10, 2022

[format.string.escaped] does not specify boundary conditions for sequences of ill-formed code units #80

[format.string.escaped] does not specify boundary conditions for sequences of ill-formed code units #80

Comments

tahonermann commented Oct 10, 2022