[SUGGESTION] Disambiguate postfix unary and binary operators #152

msadeqhe · 2022-11-30T12:41:39Z

Currently in Cpp2:

Expression x&&y is equivalent to (x&) & (y), because there is no whitespace before the first symbol &.
Expression x && y is equivalent to (x) && (y), because there is a whitespace before the first symbol &.

Although x&&y and x && y are too much similar in the syntax, but they have completely different results. A typical programmer expects x&&y and x && y to be equivalent to (x) && (y).

Cpp2 can be simpler with less surprising results, less programmer responsibility to care about the syntax, if Cpp2 uses the following rule to disambiguate postfix unary and binary operators:

If there is a combination of operators between two identifiers or literals or parenthesis, Cpp2 should check whether the last symbols are a valid binary operator, in a way that Cpp2 will try to find the biggest possible match for the binary operator. e.g. the binary operator for x&&&y will be logical && operator (because && is a valid binary operator and also it has more symbols than binary & operator).
After Cpp2 has found the binary operator, it will treat the rest of symbols as postfix unary operators.

For example:

Expression x&&y is equivalent to (x) && (y).
Expression x && y is equivalent to (x) && (y).
Expression x & & y is equivalent to (x&) & (y).
Expression x& & y is equivalent to (x&) & (y).

The text was updated successfully, but these errors were encountered:

SebastianTroy · 2022-11-30T12:51:09Z

A better simplification is to ban && in cpp2 and instead require the "and" keyword. Your suggestion is good, but still requires some teaching and can lead to code that requires detailed knowledge to parse (e.g. "x&&&y&" is much harder to parse than "x& and y&") On 30 November 2022 12:41:55 Sadeq ***@***.***> wrote: DISCLAIMERS TO SET EXPECTATIONS: I'm generally against language feature requests/changes unless they can be shown to improve simplicity, safety, or toolability in a quantifiable way. So: * Please limit suggestions to quantifiable improvements to C++ simplicity, safety, or toolability. Quantifiable means that there is some kind of measurable data that helps motivate the change and measure success. Two of the big ones that will get my attention are eliminating vulnerabilities and eliminating guidance, so use those below please. * Please do not suggest syntax changes. I accept there are hundreds of opinions and everyone will prefer something a little different. Syntax isn't the big thing, fixing semantics is -- reducing concept count, increasing toolability, are the big payoff. * Please do not suggest things that amount to personal taste. I accept there are hundreds of personal tastes and everyone will prefer something a little different. For example, established stakes in the ground include that this declaration syntax is going to be left-to-right, and it's going to use : for every declaration and only for declarations. Will your feature suggestion eliminate X% of security vulnerabilities of a given kind in current C++ code? If yes, please be specific about the classes of bugs that would go away, with an example or two (especially a link to a real CVE or two). Will your feature suggestion eliminate X% of current C++ guidance literature?" If yes, please be specific about what things we would no longer need to teach/learn or that would be simplified and how, with an example or two (especially a link to a real "Effective C++" or "C++ Core Guidelines" guideline or two). For ideas, you can refer to my CppCon 2020 talk starting at 10:31<https://youtu.be/6lurOCdaj0Y?t=631> where I summarize a categorized breakdown based on over 600 C++ guidance literature rules I cataloged and analyzed. Describe alternatives you've considered. There's nearly always more than one way to improve something. What other options did you consider? Why is the one you're suggesting better than those? Currently in Cpp2: 1. x&&y is equivalent to (x&) & (y), because there is no whitespace before the first symbol &. 2. but x &&y is equivalent to (x) && (y), because there is a whitespace before the first symbol &. Although x&&y and x &&y are too much similar, but they behave completely different. A typical programmer expects x&&y or x &&y to be equivalent to (x) && (y). Cpp2 can be simpler with less surprising results, if Cpp2 uses the following rule to disambiguate postfix unary and binary operators: 1. If there is a combination of operators between two identifiers or literals or parenthesis, Cpp2 should check whether the last symbols are a valid binary operator, in a way that Cpp2 will try to find the biggest possible match for the binary operator. e.g. the binary operator for x&&&y will be && (because && is a valid binary operator and it has more symbols than & binary operator). 2. After Cpp2 has found the binary operator, it will treat other symbols as postfix unary operators. For example: 1. x&&y is equivalent to (x) && (y). 2. x &&y is equivalent to (x) && (y). 3. x & &y is equivalent to (x&) & (y). 4. x& &y is equivalent to (x&) & (y). — Reply to this email directly, view it on GitHub<#152>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AALUZQNIBV3CGTUFHISLMLDWK5DRBANCNFSM6AAAAAASPSZWBM>. You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

mhermier · 2022-12-01T10:02:30Z

This sounds more like a bug in the lexer, not being eager. But it is true that is makes the syntax harder to parse/read than necessary.

switch-blade-stuff · 2022-12-07T19:59:45Z

A better simplification is to ban && in cpp2 and instead require the "and" keyword. Your suggestion is good, but still requires some teaching and can lead to code that requires detailed knowledge to parse (e.g. "x&&&y&" is much harder to parse than "x& and y&")

I personally am against using keywords such as and, or and not for logic operators. They create discontinuity with the rest of the operators and the rest of the language, unless we also want to use add, sub, etc., they make the code harder to read since symbolic operators stand out easier in a block of text (neg (x add y) vs -(x + y)), and they take longer to type as well (which isn't that big of an issue but still).

Besides, what would you do for binary &, | and ~? something like band, bor and bnot?

The issue of postfix & could be solved with lexer eagerness, as mhermier said, or with operator precedence. IMHO we should not go completely overboard in the simplification direction, operator precedence is not that extremely context-dependent to require a drastic change, and while aiming for a O(1) lexer/parser is good, it isn't exactly something you would want to absolutely restrict yourself with for a complex language like C++.

hsutter · 2022-12-27T00:14:48Z

Thanks, everyone.

Expression x&&y is equivalent to (x&) & (y), because there is no whitespace before the first symbol &.

I can't reproduce this... maybe I changed something since you opened this issue? But I can't think what it would be because that && should always have max-munched to a single LogicalAnd token, and I think that's the right answer and it's what it's doing now.

Maybe you were thinking of *... it is true that I been trying out a way to make simple expressions like a*b legal without whitespace, by treating the last * of a postfix-expression as binary if followed by something that couldn't make sense if it was unary. (And similarly for &, and this should also include ~.) So a**b was (until today's checkin) equivalent to (x*) * (y), which I agree is a bit subtle. I haven't been satisfied with how that experiment has worked out, so today I've switched what I do in that case from "treat that last operator as binary" to "give a nicer diagnostic that the programmer needs to add whitespace before it in this case if they meant binary."

Here's what this means for your examples:

Expression x&&y is still equivalent to (x) && (y).
Expression x && y is equivalent to (x) && (y).
Expression x & & y is still not legal.
Expression x& & y is equivalent to (x&) & (y). (But FWIW I'm inclined to disallow that by default later by a semantic rule for non-grammar reasons, because xoring pointers is a dangerous and rarely useful thing (e.g., Steve Dewhurst's two-way pointers)... if you want to write that you should have to write "opt into this unsafe thing" explicitly in some way.)

I like this because I think it gets rid of the visually ambiguous cases, and the error message is clear.

And for * (and similarly ~):

Expression x**y is no longer legal. Diagnostic is: error: postfix unary * (dereference) cannot be immediately followed by a (, identifier, or literal - add whitespace before * here if you meant binary * (multiplication) [Edit: Fixed error message in 8386329]
Expression x ** y is still not legal.
Expression x * * y is still not legal.
Expression x* * y is equivalent to (x*) * (y).

Re and et al.: I considered this, also for other reasons, but I think the familiarity of && and || is too strong to break without stronger justification. And even if we used and, that still wouldn't be a full solution for this tension since it wouldn't resolve the a**b or a~~b cases.

I'm open to further improving this. There's a tradeoff here that I cover in the Design note, which I've updated with this updated current resolution.

Thanks!

msadeqhe · 2022-12-28T09:17:55Z

Thank you. Yes, I were thinking of *, and mistakenly I thought & has similar behavior too.

Your solution gets rid of visually ambiguous cases very well (now both x**y and x ** y have similar behavior), also it allows to introduce new binary operators (for example ** as exponentiation operator) without breaking source compatibility in the future.

Before this I had been trying out making `a*b` work without whitespace by making `a****b` interpret the final `*` as binary if followed by something that wouldn't make sense if it was unary, but that's a bit subtle. (And similarly for `&` and should also include `~`) See additional discussion in hsutter#152.

Undo commit 30c79a0 Updates comment on partly-related #152 Closes #319 Closes #989

hsutter · 2024-03-21T06:47:07Z

Update: I decided to revert the weird * and & whitespace sensitivity, and x**y, x ** y, x * * y, and x* * y are all legal and equivalent to ((x*) * (y)).

See longer note in #989.

msadeqhe added the suggestion label Nov 30, 2022

hsutter closed this as completed in 30c79a0 Dec 27, 2022

msadeqhe mentioned this issue Apr 4, 2023

[BUG] Arithmetic operation fails #319

Closed

This was referenced Apr 4, 2023

[SUGGESTION] Support (some) C++ alternative tokens #304

Closed

[BUG] Diagnose use of C++1 alternative tokens #328

Closed

hsutter added a commit that referenced this issue Mar 21, 2024

Remove whitespace sensitivity for * and &

5663493

Undo commit 30c79a0 Updates comment on partly-related #152 Closes #319 Closes #989

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SUGGESTION] Disambiguate postfix unary and binary operators #152

[SUGGESTION] Disambiguate postfix unary and binary operators #152

msadeqhe commented Nov 30, 2022 •

edited

Loading

SebastianTroy commented Nov 30, 2022 via email

mhermier commented Dec 1, 2022

switch-blade-stuff commented Dec 7, 2022 •

edited

Loading

hsutter commented Dec 27, 2022 •

edited

Loading

msadeqhe commented Dec 28, 2022 •

edited

Loading

hsutter commented Mar 21, 2024

[SUGGESTION] Disambiguate postfix unary and binary operators #152

[SUGGESTION] Disambiguate postfix unary and binary operators #152

Comments

msadeqhe commented Nov 30, 2022 • edited Loading

SebastianTroy commented Nov 30, 2022 via email

mhermier commented Dec 1, 2022

switch-blade-stuff commented Dec 7, 2022 • edited Loading

hsutter commented Dec 27, 2022 • edited Loading

msadeqhe commented Dec 28, 2022 • edited Loading

hsutter commented Mar 21, 2024

msadeqhe commented Nov 30, 2022 •

edited

Loading

switch-blade-stuff commented Dec 7, 2022 •

edited

Loading

hsutter commented Dec 27, 2022 •

edited

Loading

msadeqhe commented Dec 28, 2022 •

edited

Loading