Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Diagnose use of C++1 alternative tokens #328

Closed
JohelEGP opened this issue Apr 7, 2023 · 16 comments
Closed

[BUG] Diagnose use of C++1 alternative tokens #328

JohelEGP opened this issue Apr 7, 2023 · 16 comments
Labels
bug Something isn't working

Comments

@JohelEGP
Copy link
Contributor

JohelEGP commented Apr 7, 2023

The use of C++1 alternative tokens lead to invalid Cpp2 or generated C++1 depending on use.

Currently, they're treated as identifiers. So cppfront emits an error if used C++1-correctly (x = y and z;, 3 identifiers in a row is ill-formed), or emits bad C++1 code (x = and;).
-- #304.


Minimal reproducer (https://godbolt.org/z/4dT5hre58):

and : int = 0;

Commands:

cppfront x.cpp2
clang++17 -std=c++2b -I $CPPFRONT_INCLUDE_DIR x.cpp

Expected result: Diagnose and as reserved.

Actual result and error:

Generated C++1
//=== Cpp2 type declarations ====================================================


#include "cpp2util.h"


//=== Cpp2 type definitions and function declarations ===========================

#line 1 "x.cpp2"
extern int and;
//=== Cpp2 function definitions =================================================

#line 1 "x.cpp2"
int and {0}; 
x.cpp2:1:15: error: expected unqualified-id
extern int and;
              ^
x.cpp2:1:9: error: expected unqualified-id
int and {0}; 
        ^
2 errors generated.

Minimal reproducer (https://godbolt.org/z/WMzzjhGWM):

x : int = y and z;

Commands:

cppfront x.cpp2 

Expected result: Diagnose and as reserved, suggest &&.

Actual result and error:

x.cpp2...
x.cpp2(1,13): error: ill-formed initializer (at 'and')
x.cpp2(1,1): error: unexpected text at end of Cpp2 code section (at 'x')
x.cpp2(1,0): error: parse failed for section starting here

@JohelEGP JohelEGP added the bug Something isn't working label Apr 7, 2023
@JohelEGP
Copy link
Contributor Author

JohelEGP commented Apr 7, 2023

For the first reproducer, I noticed that for GCC, the error message points to the Cpp2 source, so the marker is off.
Clang:

x.cpp2:1:15: error: expected unqualified-id
extern int and;
              ^
x.cpp2:1:9: error: expected unqualified-id
int and {0}; 
        ^
2 errors generated.

GCC:

x.cpp2:1:15: error: expected unqualified-id before ‘;’ token
    1 | and : int = 0;
      |               ^
x.cpp2:1:9: error: expected unqualified-id before ‘{’ token
    1 | and : int = 0;
      |         ^

@gregmarr
Copy link
Contributor

gregmarr commented Apr 7, 2023

It doesn't diagnose most uses of reserved words as variable names. For example, all of these are passed through to be diagnosed by the C++1 compiler: nullptr, enum, try, catch, explicit, template, signed, short, long, int.

int : int = 0;
//=== Cpp2 type definitions and function declarations ===========================

extern int int;
//=== Cpp2 function definitions =================================================

int int {0}; 

@gregmarr
Copy link
Contributor

gregmarr commented Apr 7, 2023

FYI, both of those Compiler Explorer links contain the Cpp2 code in a C++ environment, and there are a bunch of other files loaded in various panes.

@JohelEGP
Copy link
Contributor Author

JohelEGP commented Apr 7, 2023

It'd be OK to just diagnose alternative tokens where their primary token would be valid. Like the second reproducer.

FYI, both of those Compiler Explorer links contain the Cpp2 code in a C++ environment, and there are a bunch of other files loaded in various panes.

Yes, those bring Cpp2 and C++ modules support.

@gregmarr
Copy link
Contributor

gregmarr commented Apr 7, 2023

It'd be OK to just diagnose alternative tokens where their primary token would be valid. Like the second reproducer.

That would be interesting. I wonder if it would be more work to just support them or to diagnose them. Have you seen anything from Herb that says that he's not planning to support them?

Yes, those bring Cpp2 and C++ modules support.

Okay, but the main file is still marked as C++ instead of CppFront, so I have to copy the contents, change the type to CppFront, and paste the text. I don't know if this is a CE bug with the links, or if something else is weird.

@JohelEGP
Copy link
Contributor Author

JohelEGP commented Apr 7, 2023

Have you seen anything from Herb that says that he's not planning to support them?

Not quite. #152 (comment) mentions

Re and et al.: I considered this, also for other reasons, but I think the familiarity of && and || is too strong to break without stronger justification. And even if we used and, that still wouldn't be a full solution for this tension since it wouldn't resolve the a**b or a~~b cases.

And the design note linked below that, as #304 puts it:

https://github.com/hsutter/cppfront/wiki/Design-note%3A-Postfix-unary-operators-vs-binary-operators talks about "alternative tokens". It seems they were supported at some point before this repository was published.

Okay, but the main file is still marked as C++ instead of CppFront, so I have to copy the contents, change the type to CppFront, and paste the text. I don't know if this is a CE bug with the links, or if something else is weird.

CE's not bugged. CE's Cpp2-cppfront language option only compiles with cppfront. It has no option to compile the generated C++1. That's why I use C++ and a CMake project to compile both.

@gregmarr
Copy link
Contributor

gregmarr commented Apr 7, 2023

Not quite.

It would be good to hear from him on whether he thinks they should still be supported.

That's why I use C++ and a CMake project to compile both.

Interesting. BTW, that transpile to C++1 and compile the converted file in CE was one of the things that Herb called out at the end of his keynote as something he'd like someone to help him with. (That was before he knew that the transpile stage had been put up on CE during his talk.)

@JohelEGP
Copy link
Contributor Author

JohelEGP commented Apr 7, 2023

The CMake module used was added to the wiki: #4 (comment). But unlike CE's support for Cpp2, I have to manually update cpp2util.h, and print the generated C++1 to the build output. See compiler-explorer/compiler-explorer#4074. Still, that wouldn't let me mix Cpp2 with C++ modules or CMake projects (a.k.a. support for multiple source files). What I want is to be able to output to a read-only editor too.

@hsutter
Copy link
Owner

hsutter commented Apr 8, 2023

Re Godbolt Compiler Explorer: Yes, I'd love for there to be a way in CE to compile Cpp2 using cppfront + a choice of C++ compiler as the second compiler (or for now even just one hardwired compiler).

Re alternative spellings for operators:

  • Option 1: I could emit an error on them. The downside is that a few of them, such as and, could be a useful name for a local variable.
  • Option 2: I could also just rename them. For example, I could automatically rename and to and_ (and similarly for the other 10) to make the names available as regular names in Cpp2 code.

I'm more inclined to Option 2... it's similar to how I already rename new to cpp2_new and this to (*this) so I can make those words mean what I want in Cpp2... from emit(token:

    if (n == "new") {
        printer.print_cpp2("cpp2_new", pos);
    }
    else if (n == "this") {
        printer.print_cpp2("(*this)", pos);
    }

A non-option is to support the alternative spellings with their current meaning, because they violate "don't have two ways to say the same thing"... 😁🦆🏃 (trying an emoji version of the old <gdr> -- but I'm also serious about this) After all, the alternative spellings in <iso646.h> exist for a real reason, but one that AFAIK has not been valid for a long time, namely: keyboards that don't have &, |, !, ^, or ~. Do those keyboards even exist anymore outside a museum? Even antiquated/obsolete US missile silo keyboards seem to have them, though hopefully the folks in those locations aren't doing a lot of programming there. IIRC, IBM systems using EBCDIC were the original motivators for these (as well as for digraphs and trigraphs, for keyboards that don't have { } [ ] # \... and note that trigraphs have already been removed, first from C++ and then from C).

@hsutter hsutter closed this as completed in cb8c1c5 Apr 8, 2023
@hsutter
Copy link
Owner

hsutter commented Apr 8, 2023

Option 2 FTW! Thanks.

@JohelEGP
Copy link
Contributor Author

JohelEGP commented Apr 9, 2023

Thank you.

I see that cb8c1c5 actually appends _ to them. It's common practice to append _ to keywords when one wants an identifier that represents them in some way (i.e., see https://github.com/search?l=C%2B%2B&q=and_+not_&type=code). This could lead to unexpected results (shadowing, accidental use) and friction when adopting Cpp2. How about prefixing cpp2_ instead?

@hsutter hsutter reopened this Apr 9, 2023
@msadeqhe
Copy link

msadeqhe commented Apr 9, 2023

IMO and, or, not and other alternative spellings should be reserved instead of being renamed.

Option 2 increases complexity for language introp with C, C++ and other programming languages, because the programmer has to know what will be the renamed and. C++2 may reserve name and for external linkage, but it would be more natural to just reserve it everywhere even for local variable.

Option 1 is a more straight solution. Programmers will choose their own variable name and_ or and__ or whatever they want other than name and as they did in C, C++, and other programming languages. In this way, C++2 won't do a sneaky rename.

@JohelEGP
Copy link
Contributor Author

JohelEGP commented Apr 9, 2023

cppfront renames and to and_. That means an usual convention of appending _ to data members wouldn't work:

algo: () -> (and: T) = {
  // Step calls.
}
lazy_algo: type = {
  and_: T;
  // Step members.
  and: () -> T = and_; // error: `and_` redefined.
}

That's likely to break reflection. What about this alternative?

int x = algo().__cpp2_and; // C++1. OK, renaming guaranteed to work.

That could be guaranteed by Cpp2 to work. So it'd work using a non-transpiler Cpp2 compiler.

@hsutter
Copy link
Owner

hsutter commented Apr 9, 2023

Thanks! I agree, using the cpp2_ prefix I already use for new is better, I'll switch to that for now.

@JohelEGP
Copy link
Contributor Author

JohelEGP commented Apr 30, 2023

FYI, both of those Compiler Explorer links contain the Cpp2 code in a C++ environment, and there are a bunch of other files loaded in various panes.

I didn't realize I could close them. Now it looks much cleaner: https://cpp2.godbolt.org/z/KdW89o475.

Okay, but the main file is still marked as C++ instead of CppFront, so I have to copy the contents, change the type to CppFront, and paste the text. I don't know if this is a CE bug with the links, or if something else is weird.

Now main.cpp2 is marked "Cpp2-cppfront" rather than "C++".

What I want is to be able to output to a read-only editor too.

After the above, we have this.

zaucy pushed a commit to zaucy/cppfront that referenced this issue Dec 5, 2023
Another thing there's no need to support anymore and that we shouldn't have to teach... these haven't been needed since the 1980s.

Normally we don't have to teach things that compilers treat as errors, because they don't compile, but this is a case where the error is surprising and so we still have to teach it (i.e., when users trip across trying to use a common word like `and` as a variable name and wonder why it's an error and then it's a whole thing again)
zaucy pushed a commit to zaucy/cppfront that referenced this issue Dec 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants