Skip to content
This repository has been archived by the owner on Feb 16, 2024. It is now read-only.

Consider removing .. and __ from class-set reservations #60

Closed
gibson042 opened this issue Mar 31, 2022 · 11 comments
Closed

Consider removing .. and __ from class-set reservations #60

gibson042 opened this issue Mar 31, 2022 · 11 comments

Comments

@gibson042
Copy link

It is surprising to see .. and especially __ reserved in tc39/ecma262#2418 . Both . and _ are interpreted literally anywhere in any current regular expression character class, the former because square brackets specifically strip its special meaning and the latter because it is a completely nonspecial character that is valid anywhere inside the name of an identifier (i.e., it's not even treated as punctuation in the context of ECMAScript).

A case could be made for ., although in practice it almost always gets special mention with respect to character classes (e.g., https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions/Character_Classes includes "Inside a character class, the dot loses its special meaning and matches a literal dot.") and [.] is frequently used to match a literal .. But I strongly recommend not introducing such special treatment for _.

@markusicu
Copy link
Collaborator

About the double-punctuation operators: They allow the use of single characters as literals as before, and they visually stand out. (Plus, there is precedent from some other regex engines.) Note that in a character class (=a set) there is no good reason for why someone would want to use any literal character more than once.

More generally, we want to reserve syntax for future extensions, so that hopefully we don't need to add another flag for future regex features.

We could quibble about allowing a literal .. or __ in a /v character class, but I don't know what the use case would be for these as literals, when they are equivalent to single . and _.

We should lean towards reserving more than what seems likely useful now. Later, when a real desire for certain literals emerges, we could un-reserve and re-allow certain literals.

@gibson042
Copy link
Author

gibson042 commented Apr 4, 2022

We could quibble about allowing a literal .. or __ in a /v character class, but I don't know what the use case would be for these as literals, when they are equivalent to single . and _.

That quibbling is an accurate description of this issue. The case for not reserving __ is that _ is a word character throughout ECMAScript source, and should not be subject to special treatment anywhere. The case for not reserving .. is that educational materials almost universally describe . specifically as having no special meaning inside a regular expression character class.

Bottom line: to subject either to special treatment inside a character class, even if that special treatment is limited to requiring escaping when repeated (which probably would never be written by hand, but could conceivably be the product of automated code generation), would be so surprising that it should be explicitly rejected by removing .. and __ from class-set reservations.

@markusicu
Copy link
Collaborator

I am not convinced that allowing literal . and _ means that we shouldn't reserve .. and/or __.

However, I could live with not treating these two as special.

Note that Unicode Character Database files use .. as a notation indicating a code point range (0041..005A). It works quite nicely, visually. Someone might yet come up with something like that for regex syntax.

@mathiasbynens
Copy link
Member

I personally don’t feel strongly about .. and __ in particular. However, in general I still stand behind our original thinking when discussing this, which went something like this:

It’s better to be strict and reserve more things since we can always loosen the restrictions later on. OTOH if we’re more lenient now, we can never go back.

@markusicu
Copy link
Collaborator

Meeting discussion: Richard feels strongest about the _ because it's an identifier character (especially in ECMAScript). The dot does have special meaning, but it would still require updating educational materials. Also concerned about differential and conditional escaping requirements. Markus could be swayed by the identifier argument. Mathias could also be swayed by the educational-materials argument.

@waldemarhorwat What do you think about this issue? Keep reserving double .. and __ as proposed, reserve only .., or reserve neither of these pairs?

@macchiati
Copy link
Collaborator

macchiati commented Apr 28, 2022 via email

@waldemarhorwat
Copy link

I don't see any reason to reserve __. I'm on the fence with .. and am fine with unreserving it (or not). Is there a good use case for .. that's not already covered by -?

@markusicu
Copy link
Collaborator

markusicu commented Apr 29, 2022

I don't see any reason to reserve __. I'm on the fence with .. and am fine with unreserving it (or not). Is there a good use case for .. that's not already covered by -?

We simply reserved every ASCII punctuation/symbol character when doubled.

There does not seem to be any reason to use or support doubled characters as literals, except for backward compatibility, and the new flag gives us the luxury to reserve these now in case someone wants to extend the character class syntax in the future.

As discussed, if people feel strongly, we could un-reserve __ and maybe ... I would prefer to reserve as many of these as people can agree with.

@waldemarhorwat
Copy link

It's traditional within the family of programming languages that includes ECMAScript to treat _ as an identifier character, not as punctuation. That's even done within regular expression literals: property names and group names can contain _. I don't want to reserve __ unless we also reserve other identifier characters such as qq.

As I mentioned, I have no opinion on ...

@mathiasbynens
Copy link
Member

Given the above discussion, how about resolving this as follows:

  • Remove __ from class-set reservations (since it’s an identifier character)
  • Preserve the .. reservation (i.e. no change here)

@mathiasbynens
Copy link
Member

Resolved by merging mathiasbynens/ecma262#12.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants