Peggy 3.0.0 reserves lots of keywords which aren't actually reserved in JavaScript #359

nene · 2023-02-28T09:54:06Z

Peggy 3.0.0 added a bunch of new reserved keywords which can no more be used as label names:

abstract
boolean
byte
char
double
final
float
goto
int
long
native
short
synchronized
throws
transient
volatile

These are keywords that were reserved in ECMAScript 1..3, but are no more reserved in modern ECMAScript. Does Peggy plan to target these old ECMAScript versions when generating code? It doesn't seem so, because at the same list of reserved keywords it also contains keywords that are only reserved in ECMAScript strict mode - but strict mode was introduced in ECMAScript 5. This makes no sense to me. Peggy should either target modern ECMAScript or the old one... but it can't really do both. And even when Peggy wants to target old ECMAScript, there's no actual problem that the use of these names would cause.

On more practical side. When I read Peggy documentation, I get the impression that I can use as label names all the names that I could normally use as JavaScript variable names. That's easy to remember, as I already know JavaScript very well. But turns out that Peggy also blocks me from using some names that I wouldn't have guessed as being reserved keywords in JavaScript. This behavior breaks my intuitive assumptions.

Additionally this new reserved keywords list contains names that aren't reserved keywords at all:

arguments
as
async
eval
from
get
of
set

For example: one might think that of is reserved because it's used in for (.. of ...) statement, but in fact one can freely have a variable named of. The following is perfectly valid code:

for (const of of [1,2,3]) {
  console.log(of);
}

The only names out of this list that really shouldn't be allowed are:

arguments - Referring to arguments inside a function won't work for variable arguments declared outside of a function.
eval - declaring variable named eval is prohibited in strict mode.

But I might be mistaken. Perhaps all these extra keywords have been added to solve some actual problem. I would be interested in learning about that problem.

The text was updated successfully, but these errors were encountered:

hildjj · 2023-02-28T15:16:07Z

You are correct. I added all of those last-minute as I was wrapping up the release, and we didn't have any conversation about it first. If you have time, I would welcome a pull request that modified the list here by commenting out the words that should still work, leaving notes as to why they should not be added back in (so that I don't make the same mistake in the future), and adding tests roughly here to ensure that the code we generate continues to work with those non-reserved words as rule names, rule references, and labels.

hildjj · 2023-02-28T15:18:39Z

Note: I'll do a 3.0.1 release with just this and whatever else is ready when it's done. Please do not forget to add yourself to the AUTHORS file.

If you don't have time or inclination to send a pull request, I'll get to it eventually.

reverofevil · 2023-02-28T15:50:57Z

Let me add 2 cents here: peggy doesn't even need any ES keywords. With plugins it's supposed to generate code in other languages. For example, in PHP, that has its own different set of keywords. I think correct approach would be to allow any identifiers that do not conflict with peggy's own grammar, and rename keywords at codegen phase wherever it's actually required.

Mingun · 2023-02-28T15:53:15Z

For that codegen plugins can (and should) change the list of reserved words. By default it's contain JavaScript words.

We do not rename them, because peggy does not analyses the content of code blocks where that identifiers are used. So with renaming we will force users to remember that some identifiers should be written differently in the code block. Ok, now when we have the ability to emit warnings and info messages it would not be such bad UX as before, but from the other hand you always can write a plugin that will do that.

reverofevil · 2023-02-28T15:56:33Z

If there is a single grammar without any code inserts, it can be compiled into several languages with different peggy plugins. Why would a user be required to rewrite the grammar to suit another target language? Which goal justifies this increase in compiler complexity?

This makes no more sense than to ban certain arithmetic operations in C based on a choice of target platform. Luckily, compilers don't usually do that: the whole idea of compiled language is to give a layer of abstraction around target platform quirks.

hildjj · 2023-02-28T16:02:50Z

Think of the default list of reserved words coming from the generate-js pass. If you replace that pass, you're likely to need to at least modify the list of reserved words to match whatever you're generating to.

Now, if you're arguing that the generated code should never use the tokens that the grammar writer chose in the generated output, that's interesting. It might make debugging a little more difficult depending on how you mangled those tokens into usable identifiers in the output.

reverofevil · 2023-02-28T16:07:20Z

I don't think there is even any need for that. Rule a is compiled into peg$parsea. Then it's exported as an object with {a: ...} field, and any keyword can be used as a field name. If there is actually any place in JS codegen that fails due to keywords as rule names, I'd rather consider it a codegen bug.

reverofevil · 2023-02-28T16:10:22Z

For keywords as names of partial matches the exception should rather be thrown from generate-js, because it's the only logical place where it can be done. There is no way to rename a variable to salvage generation: rule = let:foo { return let; } will never become valid anyway.

Mingun · 2023-02-28T16:27:49Z

I don't think there is even any need for that.

Rename is required for labels which is used as names of formal parameters in functions (which used to represent actions and semantic predicates). Of course, we can auto-apply some consistent rename rules, so the user would use the other name in action than the label name, but I think this is not obvious, even if the warning message would be printed during such replace. It would be better just rename label in that case:

rule = let:foo {
  // We could automatically rename all invalid identifiers
  // but it would be much simpler and clearly,
  // if the user just name label `let_`
  return let_;
}

The only way, where check for invalid identifiers could be annoying -- when you writes grammar with placeholder action blocks (that do nothing, just {}) and replace that empty functions with the real code on some ahead unknown language. In that case the plugin that does that also can disable the reserved identifiers check.

hildjj · 2023-02-28T16:31:27Z

We could loosen the reserved word restriction to not make it apply to rule names or rule references, but only to labels. We could also make it more explicit that this list is owned by the generator.

It is a requirement that the user get a good error message at generation time for labels that are reserved words in the generator's target language. The current approach is only one way for that to happen, and I'm open to other ways.

reverofevil · 2023-02-28T16:36:00Z

The example shows peggy with semantic actions in JS. It implies the only valid target is JS. So in case JS keyword is used as a label, it means we actually can just consider it a bug. There won't be another target, where let label is fine, because there is only one possible target. There is no need to rename anything here.

Parsers that ask code generators for extra data is an architectural abomination, some kind of colonoscopic dentistry that should never be put in practice. The only place in the code that owns code generation is code generator, and if the chosen code generator is unhappy with an identifier, it should throw an error on its own.

hildjj · 2023-02-28T16:37:33Z

There's no need to use that sort of rhetoric when we're engaging with your issue constructively. Neither of us wrote this code in the first place.

reverofevil · 2023-02-28T16:38:40Z

It didn't look that way to me. I'm sorry.

hildjj · 2023-02-28T16:48:19Z

Anyway, if you want to put together a patch, I'll take a look at it.

Mingun · 2023-02-28T17:21:25Z

Parsers that ask code generators for extra data is an architectural abomination, some kind of colonoscopic dentistry that should never be put in practice.

Unfortunately, we still not able to report errors in code blocks together with the line numbers. And even if we do that, that will not help us to point user the correct place where a mistake was done. So the black-list is a forced measure that allows us to clearly say what and where the error is.

hildjj · 2023-02-28T22:34:05Z

I'm going to leave this open until I change the docs. I think we may need a few other issues as a result of this discussion however:

Ability to generate good errors from the generation pass. I think this may be part of a larger discussion about how we do generation that I'd like to have.
Moving reserved word detection to to the generation pass. This will depend upon the first issue. I'm also worried about existing plugins needing to change how they do reserved word detection, so at least making a good error message when the plugin hasn't been updated yet will be a part of the solution to this issue.

hildjj · 2023-02-28T23:42:44Z

Everyone on this thread, please review the changes in #361 to make sure I've adequately captured the conversation.

reverofevil · 2023-03-01T00:02:40Z

I didn't realize AST doesn't store any information about location. There's only 4 references to IdentifierName in the grammar, and it would probably be enough to capture location data in every of those 4 AST nodes in a backwards compatible way. Something like

Location = "" {
	const {start, source} = location();
	return console.log({source, ...start});
}

Rule = ... from:Location name:IdentifierName to:Location ...

should capture all the required information, even if somewhat inconsistently.

hildjj · 2023-03-01T00:31:00Z

The other issue is that we're currently generating from bytecode, not directly from the AST.

Mingun · 2023-03-01T15:36:32Z

Something like
Location = "" {
	const {start, source} = location();
	return console.log({source, ...start});
}

Rule = ... from:Location name:IdentifierName to:Location ...
should capture all the required information, even if somewhat inconsistently.

That is not required, because IdentifierName is itself the identifier + its location.

peggy/src/parser.pegjs

Lines 319 to 322 in 6344542

    
           IdentifierName "identifier" 
        
             = head:IdentifierStart tail:IdentifierPart* { 
        
                 return [head + tail.join(""), location()]; 
        
               }

Yes, the problem with moving check to the code generation pass is the absence of that information in the AST. At least that was true at time of writing that check. But since #240 we have locations of all rules in the AST, so it would be easy to also capture labels' locations and move that check.

Some hints for implementing such a check: add an utility method to the asts:

const asts = {
  renameInvalidLabels(ast: peggy.ast.Grammar, reservedWords: string[], session: peggy.Session) { ... }
}

This function may follow the following algorithm for each label:

if label not in the reservedWords, then return
add into reservedWords all label names in the current scope
let i = 1
let newName = label + i
if newName in the reservedWords increase i and go to step 4
add newName to the reservedWords
replace label with newName and emit an info message with information about rename

hildjj · 2023-03-01T15:41:53Z

I don't want to rename labels. If I'm understanding the approach correctly, there's no way for the person writing the grammar to predict what the label name will turn out to be in their action or predicate code.

nene · 2023-03-01T16:01:09Z

As I understand this whole issue:

Currently Peggy doesn't run any static analysis of the code blocks. To perform any sort of renaming of labels, one would need to also analyze the code blocks and transform the code inside them. Doing such a thing would be a major undertaking. It would also need to be implemented for each language supported in the code blocks, which would become a major hurdle for anyone wanting to implement another target language. A much simpler solution is to just restrict the names of labels.

reverofevil · 2023-03-01T16:02:53Z

I think we all agree that renaming labels is a bad idea.

hildjj · 2023-03-01T16:03:54Z

To be clear: I think it's an interesting idea, I just don't want to do it. :)

reverofevil · 2023-03-01T16:04:50Z

To be clear: I don't think it's interesting, because it's neither required, nor leads to any kind of desired behavior.

* main: (21 commits) Update CHANGELOG.md Update version number & rebuild Update dependencies Update test/unit/compiler/passes/report-infinite-repetition.spec.js Fixes peggyjs#357. Do not allow infinite recursion in repetition delimiter. Update changelog Allow extra semicolons between rules. Fix an error in the code generator for "repeated" node Update changelog Fixes peggyjs#329 Update changelog Fixes peggyjs#359. Clarifies documentation about reserved words. Fix more HTML indentation. Test that the generated parser also works without errors Remove use of expect.to.not.throw() Add Rene Saarsoo to AUTHORS Typo in test description Add test to ensure special non-reserved keywords are allowed Comment out unnecessary reserved words Fixes peggyjs#347. Makes $ invalid as an identifier start character. ...

hildjj · 2023-03-08T23:49:58Z

The other issue is that we're currently generating from bytecode, not directly from the AST.

See #378

hildjj added the blocks-release Blocks an imminent release. High Priorty. label Feb 28, 2023

nene mentioned this issue Feb 28, 2023

Remove unnecessary reserved words introduced in 3.0.0 #360

Merged

hildjj closed this as completed in #360 Feb 28, 2023

hildjj reopened this Feb 28, 2023

hildjj added a commit to hildjj/peggy that referenced this issue Feb 28, 2023

Fixes peggyjs#359. Clarifies documentation about reserved words.

de00860

hildjj mentioned this issue Feb 28, 2023

Doc identifiers #361

Merged

hildjj closed this as completed in #361 Mar 1, 2023

hildjj mentioned this issue Mar 1, 2023

Release 3.0.1 #363

Closed

22 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Peggy 3.0.0 reserves lots of keywords which aren't actually reserved in JavaScript #359

Peggy 3.0.0 reserves lots of keywords which aren't actually reserved in JavaScript #359

nene commented Feb 28, 2023

hildjj commented Feb 28, 2023

hildjj commented Feb 28, 2023

reverofevil commented Feb 28, 2023 •

edited

Loading

Mingun commented Feb 28, 2023 •

edited

Loading

reverofevil commented Feb 28, 2023 •

edited

Loading

hildjj commented Feb 28, 2023

reverofevil commented Feb 28, 2023

reverofevil commented Feb 28, 2023 •

edited

Loading

Mingun commented Feb 28, 2023

hildjj commented Feb 28, 2023

reverofevil commented Feb 28, 2023

hildjj commented Feb 28, 2023

reverofevil commented Feb 28, 2023

hildjj commented Feb 28, 2023

Mingun commented Feb 28, 2023

hildjj commented Feb 28, 2023

hildjj commented Feb 28, 2023

reverofevil commented Mar 1, 2023

hildjj commented Mar 1, 2023

Mingun commented Mar 1, 2023

hildjj commented Mar 1, 2023

nene commented Mar 1, 2023

reverofevil commented Mar 1, 2023

hildjj commented Mar 1, 2023

reverofevil commented Mar 1, 2023

hildjj commented Mar 8, 2023

Peggy 3.0.0 reserves lots of keywords which aren't actually reserved in JavaScript #359

Peggy 3.0.0 reserves lots of keywords which aren't actually reserved in JavaScript #359

Comments

nene commented Feb 28, 2023

hildjj commented Feb 28, 2023

hildjj commented Feb 28, 2023

reverofevil commented Feb 28, 2023 • edited Loading

Mingun commented Feb 28, 2023 • edited Loading

reverofevil commented Feb 28, 2023 • edited Loading

hildjj commented Feb 28, 2023

reverofevil commented Feb 28, 2023

reverofevil commented Feb 28, 2023 • edited Loading

Mingun commented Feb 28, 2023

hildjj commented Feb 28, 2023

reverofevil commented Feb 28, 2023

hildjj commented Feb 28, 2023

reverofevil commented Feb 28, 2023

hildjj commented Feb 28, 2023

Mingun commented Feb 28, 2023

hildjj commented Feb 28, 2023

hildjj commented Feb 28, 2023

reverofevil commented Mar 1, 2023

hildjj commented Mar 1, 2023

Mingun commented Mar 1, 2023

hildjj commented Mar 1, 2023

nene commented Mar 1, 2023

reverofevil commented Mar 1, 2023

hildjj commented Mar 1, 2023

reverofevil commented Mar 1, 2023

hildjj commented Mar 8, 2023

reverofevil commented Feb 28, 2023 •

edited

Loading

Mingun commented Feb 28, 2023 •

edited

Loading

reverofevil commented Feb 28, 2023 •

edited

Loading

reverofevil commented Feb 28, 2023 •

edited

Loading