-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Peggy 3.0.0 reserves lots of keywords which aren't actually reserved in JavaScript #359
Comments
You are correct. I added all of those last-minute as I was wrapping up the release, and we didn't have any conversation about it first. If you have time, I would welcome a pull request that modified the list here by commenting out the words that should still work, leaving notes as to why they should not be added back in (so that I don't make the same mistake in the future), and adding tests roughly here to ensure that the code we generate continues to work with those non-reserved words as rule names, rule references, and labels. |
Note: I'll do a 3.0.1 release with just this and whatever else is ready when it's done. Please do not forget to add yourself to the AUTHORS file. If you don't have time or inclination to send a pull request, I'll get to it eventually. |
Let me add 2 cents here: peggy doesn't even need any ES keywords. With plugins it's supposed to generate code in other languages. For example, in PHP, that has its own different set of keywords. I think correct approach would be to allow any identifiers that do not conflict with peggy's own grammar, and rename keywords at codegen phase wherever it's actually required. |
For that codegen plugins can (and should) change the list of reserved words. By default it's contain JavaScript words. We do not rename them, because peggy does not analyses the content of code blocks where that identifiers are used. So with renaming we will force users to remember that some identifiers should be written differently in the code block. Ok, now when we have the ability to emit warnings and info messages it would not be such bad UX as before, but from the other hand you always can write a plugin that will do that. |
If there is a single grammar without any code inserts, it can be compiled into several languages with different peggy plugins. Why would a user be required to rewrite the grammar to suit another target language? Which goal justifies this increase in compiler complexity? This makes no more sense than to ban certain arithmetic operations in C based on a choice of target platform. Luckily, compilers don't usually do that: the whole idea of compiled language is to give a layer of abstraction around target platform quirks. |
Think of the default list of reserved words coming from the Now, if you're arguing that the generated code should never use the tokens that the grammar writer chose in the generated output, that's interesting. It might make debugging a little more difficult depending on how you mangled those tokens into usable identifiers in the output. |
I don't think there is even any need for that. Rule |
For keywords as names of partial matches the exception should rather be thrown from |
Rename is required for labels which is used as names of formal parameters in functions (which used to represent actions and semantic predicates). Of course, we can auto-apply some consistent rename rules, so the user would use the other name in action than the label name, but I think this is not obvious, even if the warning message would be printed during such replace. It would be better just rename label in that case: rule = let:foo {
// We could automatically rename all invalid identifiers
// but it would be much simpler and clearly,
// if the user just name label `let_`
return let_;
} The only way, where check for invalid identifiers could be annoying -- when you writes grammar with placeholder action blocks (that do nothing, just |
We could loosen the reserved word restriction to not make it apply to rule names or rule references, but only to labels. We could also make it more explicit that this list is owned by the generator. It is a requirement that the user get a good error message at generation time for labels that are reserved words in the generator's target language. The current approach is only one way for that to happen, and I'm open to other ways. |
The example shows peggy with semantic actions in JS. It implies the only valid target is JS. So in case JS keyword is used as a label, it means we actually can just consider it a bug. There won't be another target, where Parsers that ask code generators for extra data is an architectural abomination, some kind of colonoscopic dentistry that should never be put in practice. The only place in the code that owns code generation is code generator, and if the chosen code generator is unhappy with an identifier, it should throw an error on its own. |
There's no need to use that sort of rhetoric when we're engaging with your issue constructively. Neither of us wrote this code in the first place. |
It didn't look that way to me. I'm sorry. |
Anyway, if you want to put together a patch, I'll take a look at it. |
Unfortunately, we still not able to report errors in code blocks together with the line numbers. And even if we do that, that will not help us to point user the correct place where a mistake was done. So the black-list is a forced measure that allows us to clearly say what and where the error is. |
I'm going to leave this open until I change the docs. I think we may need a few other issues as a result of this discussion however:
|
Everyone on this thread, please review the changes in #361 to make sure I've adequately captured the conversation. |
I didn't realize AST doesn't store any information about location. There's only 4 references to
should capture all the required information, even if somewhat inconsistently. |
The other issue is that we're currently generating from bytecode, not directly from the AST. |
That is not required, because Lines 319 to 322 in 6344542
Yes, the problem with moving check to the code generation pass is the absence of that information in the AST. At least that was true at time of writing that check. But since #240 we have locations of all rules in the AST, so it would be easy to also capture labels' locations and move that check. Some hints for implementing such a check: add an utility method to the const asts = {
renameInvalidLabels(ast: peggy.ast.Grammar, reservedWords: string[], session: peggy.Session) { ... }
} This function may follow the following algorithm for each label:
|
I don't want to rename labels. If I'm understanding the approach correctly, there's no way for the person writing the grammar to predict what the label name will turn out to be in their action or predicate code. |
As I understand this whole issue: Currently Peggy doesn't run any static analysis of the code blocks. To perform any sort of renaming of labels, one would need to also analyze the code blocks and transform the code inside them. Doing such a thing would be a major undertaking. It would also need to be implemented for each language supported in the code blocks, which would become a major hurdle for anyone wanting to implement another target language. A much simpler solution is to just restrict the names of labels. |
I think we all agree that renaming labels is a bad idea. |
To be clear: I think it's an interesting idea, I just don't want to do it. :) |
To be clear: I don't think it's interesting, because it's neither required, nor leads to any kind of desired behavior. |
* main: (21 commits) Update CHANGELOG.md Update version number & rebuild Update dependencies Update test/unit/compiler/passes/report-infinite-repetition.spec.js Fixes peggyjs#357. Do not allow infinite recursion in repetition delimiter. Update changelog Allow extra semicolons between rules. Fix an error in the code generator for "repeated" node Update changelog Fixes peggyjs#329 Update changelog Fixes peggyjs#359. Clarifies documentation about reserved words. Fix more HTML indentation. Test that the generated parser also works without errors Remove use of expect.to.not.throw() Add Rene Saarsoo to AUTHORS Typo in test description Add test to ensure special non-reserved keywords are allowed Comment out unnecessary reserved words Fixes peggyjs#347. Makes $ invalid as an identifier start character. ...
See #378 |
Peggy 3.0.0 added a bunch of new reserved keywords which can no more be used as label names:
These are keywords that were reserved in ECMAScript 1..3, but are no more reserved in modern ECMAScript. Does Peggy plan to target these old ECMAScript versions when generating code? It doesn't seem so, because at the same list of reserved keywords it also contains keywords that are only reserved in ECMAScript strict mode - but strict mode was introduced in ECMAScript 5. This makes no sense to me. Peggy should either target modern ECMAScript or the old one... but it can't really do both. And even when Peggy wants to target old ECMAScript, there's no actual problem that the use of these names would cause.
On more practical side. When I read Peggy documentation, I get the impression that I can use as label names all the names that I could normally use as JavaScript variable names. That's easy to remember, as I already know JavaScript very well. But turns out that Peggy also blocks me from using some names that I wouldn't have guessed as being reserved keywords in JavaScript. This behavior breaks my intuitive assumptions.
Additionally this new reserved keywords list contains names that aren't reserved keywords at all:
arguments
as
async
eval
from
get
of
set
For example: one might think that
of
is reserved because it's used infor (.. of ...)
statement, but in fact one can freely have a variable namedof
. The following is perfectly valid code:The only names out of this list that really shouldn't be allowed are:
arguments
- Referring toarguments
inside a function won't work for variablearguments
declared outside of a function.eval
- declaring variable namedeval
is prohibited in strict mode.But I might be mistaken. Perhaps all these extra keywords have been added to solve some actual problem. I would be interested in learning about that problem.
The text was updated successfully, but these errors were encountered: