add RegExp/Unicode topic (fun!) #1084

michaelficarra · 2021-11-29T23:14:28Z

No description provided.

mathiasbynens · 2021-11-29T23:38:22Z

Some background in case it’s helpful:

Approve #2515 via consensus and require all PRs affecting any of these tables to achieve consensus

Allow loose matching of Unicode property values and possibly property names. Remove tables 70/71 and normatively reference Unicode

There doesn’t seem to be any demand for this. Doing this just to remove some spec maintenance churn is not a great motivation IMHO.

Defer spelling of property values to that used in Unicode spec, even though it's explicitly non-canonical. Remove tables 70/71 and normatively reference Unicode

This might be painful to do precisely, since the Unicode spec mixes spelling/casing throughout different documents. What we’re using is generally the first spelling that’s used in the Unicode data files (but there are exceptions such as “Any” which is technically not a “character property”).

Ask Unicode Consortium to provide canonical spellings for property values and possibly property names. Remove tables 70/71 and normatively reference Unicode

FWIW, I inquired about this while proposing \p{…} in ECMAScript: https://corp.unicode.org/pipermail/unicode/2016-May/thread.html#3648 See the “Canonical block names: spaces vs. underscores” thread. (I asked about Blocks specifically but it applies generally.)

Updates to tables 70/71 use spelling from Unicode spec by convention, and do not require consensus

IMHO it’s worth pointing out explicitly in the slides that Option 5 is what we’ve been doing so far: https://github.com/tc39/ecma262/issues?q=label%3Aunicode+is%3Aclosed+Normative and that this agenda item is effectively re-litigating this. (I’m still hoping things don’t change, since any of the other options seem strictly worse to me.)

michaelficarra · 2021-11-29T23:54:29Z

Confirming consensus on option 5 is fine with me, but previous discussion on the topic was not sufficiently clear to the editor group for us to take action.

bakkot · 2021-11-29T23:56:54Z

This might be painful to do precisely, since the Unicode spec mixes spelling/casing throughout different documents.

So, given that, what is the strategy we actually use for picking a spelling right now? Is it just "Mathias looks at the various options and picks a reasonable one, consistent with what we're already doing"? That seems like it's working fine, but certainly I at least was not aware in previous discussions that this is what we were agreeing to.

mathiasbynens · 2021-11-30T00:00:18Z

This might be painful to do precisely, since the Unicode spec mixes spelling/casing throughout different documents.

So, given that, what is the strategy we actually use for picking a spelling right now? Is it just "Mathias looks at the various options and picks a reasonable one, consistent with what we're already doing"?

No, it’s this:

What we’re using is generally the first spelling that’s used in the Unicode data files (but there are exceptions such as “Any” which is technically not a “character property”).

These exceptions don’t apply for this specific case of new values for existing properties, but they would exist for the spec as a whole (which includes things like Any/ASCII/Assigned), which is why I believe becoming less explicit about this by referring to Unicode would make things more confusing / easier to misinterpret.

add RegExp/Unicode topic (fun!)

356994e

michaelficarra requested review from mathiasbynens and bakkot November 29, 2021 23:14

michaelficarra mentioned this pull request Nov 29, 2021

Normative: List new Unicode v14 Script/Script_Extensions values tc39/ecma262#2515

Merged

mathiasbynens approved these changes Nov 29, 2021

View reviewed changes

michaelficarra merged commit 29a9d0e into master Nov 29, 2021

michaelficarra deleted the michaelficarra-patch-1 branch November 29, 2021 23:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add RegExp/Unicode topic (fun!) #1084

add RegExp/Unicode topic (fun!) #1084

michaelficarra commented Nov 29, 2021

mathiasbynens commented Nov 29, 2021 •

edited

Loading

michaelficarra commented Nov 29, 2021

bakkot commented Nov 29, 2021 •

edited

Loading

mathiasbynens commented Nov 30, 2021

add RegExp/Unicode topic (fun!) #1084

add RegExp/Unicode topic (fun!) #1084

Conversation

michaelficarra commented Nov 29, 2021

mathiasbynens commented Nov 29, 2021 • edited Loading

michaelficarra commented Nov 29, 2021

bakkot commented Nov 29, 2021 • edited Loading

mathiasbynens commented Nov 30, 2021

mathiasbynens commented Nov 29, 2021 •

edited

Loading

bakkot commented Nov 29, 2021 •

edited

Loading