Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SUGGESTION] Disambiguate postfix unary and binary operators #152

Closed
msadeqhe opened this issue Nov 30, 2022 · 6 comments
Closed

[SUGGESTION] Disambiguate postfix unary and binary operators #152

msadeqhe opened this issue Nov 30, 2022 · 6 comments

Comments

@msadeqhe
Copy link

msadeqhe commented Nov 30, 2022

Currently in Cpp2:

  1. Expression x&&y is equivalent to (x&) & (y), because there is no whitespace before the first symbol &.
  2. Expression x && y is equivalent to (x) && (y), because there is a whitespace before the first symbol &.

Although x&&y and x && y are too much similar in the syntax, but they have completely different results. A typical programmer expects x&&y and x && y to be equivalent to (x) && (y).

Cpp2 can be simpler with less surprising results, less programmer responsibility to care about the syntax, if Cpp2 uses the following rule to disambiguate postfix unary and binary operators:

  1. If there is a combination of operators between two identifiers or literals or parenthesis, Cpp2 should check whether the last symbols are a valid binary operator, in a way that Cpp2 will try to find the biggest possible match for the binary operator. e.g. the binary operator for x&&&y will be logical && operator (because && is a valid binary operator and also it has more symbols than binary & operator).
  2. After Cpp2 has found the binary operator, it will treat the rest of symbols as postfix unary operators.

For example:

  1. Expression x&&y is equivalent to (x) && (y).
  2. Expression x && y is equivalent to (x) && (y).
  3. Expression x & & y is equivalent to (x&) & (y).
  4. Expression x& & y is equivalent to (x&) & (y).
@SebastianTroy
Copy link

SebastianTroy commented Nov 30, 2022 via email

@mhermier
Copy link

mhermier commented Dec 1, 2022

This sounds more like a bug in the lexer, not being eager. But it is true that is makes the syntax harder to parse/read than necessary.

@switch-blade-stuff
Copy link

switch-blade-stuff commented Dec 7, 2022

A better simplification is to ban && in cpp2 and instead require the "and" keyword. Your suggestion is good, but still requires some teaching and can lead to code that requires detailed knowledge to parse (e.g. "x&&&y&" is much harder to parse than "x& and y&")

I personally am against using keywords such as and, or and not for logic operators. They create discontinuity with the rest of the operators and the rest of the language, unless we also want to use add, sub, etc., they make the code harder to read since symbolic operators stand out easier in a block of text (neg (x add y) vs -(x + y)), and they take longer to type as well (which isn't that big of an issue but still).

Besides, what would you do for binary &, | and ~? something like band, bor and bnot?

The issue of postfix & could be solved with lexer eagerness, as mhermier said, or with operator precedence. IMHO we should not go completely overboard in the simplification direction, operator precedence is not that extremely context-dependent to require a drastic change, and while aiming for a O(1) lexer/parser is good, it isn't exactly something you would want to absolutely restrict yourself with for a complex language like C++.

@hsutter
Copy link
Owner

hsutter commented Dec 27, 2022

Thanks, everyone.

Expression x&&y is equivalent to (x&) & (y), because there is no whitespace before the first symbol &.

I can't reproduce this... maybe I changed something since you opened this issue? But I can't think what it would be because that && should always have max-munched to a single LogicalAnd token, and I think that's the right answer and it's what it's doing now.

Maybe you were thinking of *... it is true that I been trying out a way to make simple expressions like a*b legal without whitespace, by treating the last * of a postfix-expression as binary if followed by something that couldn't make sense if it was unary. (And similarly for &, and this should also include ~.) So a**b was (until today's checkin) equivalent to (x*) * (y), which I agree is a bit subtle. I haven't been satisfied with how that experiment has worked out, so today I've switched what I do in that case from "treat that last operator as binary" to "give a nicer diagnostic that the programmer needs to add whitespace before it in this case if they meant binary."

Here's what this means for your examples:

  • Expression x&&y is still equivalent to (x) && (y).
  • Expression x && y is equivalent to (x) && (y).
  • Expression x & & y is still not legal.
  • Expression x& & y is equivalent to (x&) & (y). (But FWIW I'm inclined to disallow that by default later by a semantic rule for non-grammar reasons, because xoring pointers is a dangerous and rarely useful thing (e.g., Steve Dewhurst's two-way pointers)... if you want to write that you should have to write "opt into this unsafe thing" explicitly in some way.)

I like this because I think it gets rid of the visually ambiguous cases, and the error message is clear.

And for * (and similarly ~):

  • Expression x**y is no longer legal. Diagnostic is: error: postfix unary * (dereference) cannot be immediately followed by a (, identifier, or literal - add whitespace before * here if you meant binary * (multiplication) [Edit: Fixed error message in 8386329]
  • Expression x ** y is still not legal.
  • Expression x * * y is still not legal.
  • Expression x* * y is equivalent to (x*) * (y).

Re and et al.: I considered this, also for other reasons, but I think the familiarity of && and || is too strong to break without stronger justification. And even if we used and, that still wouldn't be a full solution for this tension since it wouldn't resolve the a**b or a~~b cases.

I'm open to further improving this. There's a tradeoff here that I cover in the Design note, which I've updated with this updated current resolution.

Thanks!

@msadeqhe
Copy link
Author

msadeqhe commented Dec 28, 2022

Thank you. Yes, I were thinking of *, and mistakenly I thought & has similar behavior too.

Your solution gets rid of visually ambiguous cases very well (now both x**y and x ** y have similar behavior), also it allows to introduce new binary operators (for example ** as exponentiation operator) without breaking source compatibility in the future.

Azmah-Bad pushed a commit to Azmah-Bad/cppfront that referenced this issue Feb 24, 2023
Before this I had been trying out making `a*b` work without whitespace by making `a****b` interpret the final `*` as binary if followed by something that wouldn't make sense if it was unary, but that's a bit subtle. (And similarly for `&` and should also include `~`)

See additional discussion in hsutter#152.
hsutter added a commit that referenced this issue Mar 21, 2024
Undo commit 30c79a0

Updates comment on partly-related #152
Closes #319
Closes #989
@hsutter
Copy link
Owner

hsutter commented Mar 21, 2024

Update: I decided to revert the weird * and & whitespace sensitivity, and x**y, x ** y, x * * y, and x* * y are all legal and equivalent to ((x*) * (y)).

See longer note in #989.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants