-
Notifications
You must be signed in to change notification settings - Fork 7.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PHP 8.3: "yield /*comment*/ from" is no longer a parse error ? #14926
Comments
This was changed by @iluuu1994 during f291d37 in response to #10083... Tip: for grammar questions, one of the first places to check for answers is zend_language_scanner. |
Damnit, beat me on commenting literally by seconds 😂 |
I do kind of consider it a bug as it is an undocumented change in behaviour - the changelog entry really doesn't cover the potential/actual impact of the change. Knowing where this is coming from, I'm surprised I haven't seen more breakage due to this change in PHPCS. Then again, that would require the PHPCS tests to hit the changed tokenization behaviour and as this kind of thing was previously a parse error, I can understand why this is not covered by the existing tests. If this change stays, in my opinion, the changelog entry needs to be updated to more comprehensively state what has actually changed and what the impact of this. I also think it should be mentioned in the CHANGELOG, not just in NEWS. Additionally, I'd recommend for tests to be added with the above code sample for |
P.S.: Thanks @damianwadley and @nielsdos for your fast response to my query! |
Since this affects PHP 8.3, I'm not sure whether updating CHANGELOG makes much sense now, but it's certainly worth documenting it in the migration guide of the PHP documentation. |
@jrfnl I indeed didn't think about linters, this should have gotten an UPGRADING entry. Sorry about that. |
IMO it is not a bug, as comments are (should be) allowed between any code token in general. This is probably already documented or at least it is "generally expected". So f291d37 is a fix. |
@mvorisek What Juliette references is the patch changing the output of |
That is indeed not good and such token should be broken into separate tokens further. repro: https://3v4l.org/7VmX1 |
Forgive me if I don't use the correct terminology for some things, I'm hoping that even if the terminology is not 100% correct for C, you can still understand what I'm trying to say.
... which is exactly the reason why I was testing the Tokenizer behaviour with comments between the words. Having said that, it's also not true.... In nearly all cases, tokens consist of a "single entity", so no comments can exist within a token. A typical case where the tokenization changed from multiple tokens to single token is the PHP 8.0 "namespaced names" change. Another one which comes to mind are cast tokens, like So, in practice, Retrospectively, it would have been better if I consider the change in f291d37 a bug as it is an undocumented (breaking) change in behaviour for the Tokenizer for projects consuming the token output. So, what remains then is the discussion about how I'm not sure about this and you can also see some thoughts about this in the linked PHPCS ticket (as PHPCS needs to ensure the token stream sniffs receive is the same on all PHP versions, so I need to make a choice on how to get round the changed tokenization). On the one hand I agree that comments between the keywords should be ignored - as they mostly are elsewhere -, on the other hand, comments cause a parse error in other "multi entity" tokens, so this change introduces an inconsistency. |
That's true, but what qualifies as a token is usually very intuitive.
I think that's correct. The other cases only use lookahead for comments.
Unfortunately, for It might be possible for you to artificially split |
And that's what I don't necessarily agree with - and nor did PHP 7.0 - 8.2 in which this was a parse error.
And again, I don't agree with this. I only discovered this bug now as the change wasn't documented. That doesn't make it any less of a bug/inconsistency with every single other token, as none of them allow for comments within the token. If it had been included in the changelog, I would probably have questioned the change/introduction of this inconsistency before the release was out. Note: I'm only challenging the change to the |
Well, we can't go back in time, so the question is what would break more code: reverting the change for |
I would agree with Christoph. From what I understand, PHP_CodeSniffer would either not correctly format the comment, or mistakenly remove it. While that's not great, it's not breaking. Removing parsing at this point would fatal error on a PHP patch update, and I think that's generally a no-go. Please correct me if my understanding is incorrect. |
The other side of that argument is that the chances of people having discovered that they can place a comment between |
Maybe we should ask for opinions on the internals mailing list? |
But what is the "solution" anyway? Complicating the lexer to deal with "yield from" specially? |
I think this is about reverting
to the previous
or not. |
I think the proper way of fixing this would be introducing T_FROM, in a special lexer mode. I.e.:
And then add Sort of related issue is #14961. Basically needs the same handling than that one. |
@bwoebi Wouldn't that be a much larger BC-break as it would turn |
@jrfnl It would be a keyword only if preceded by |
I've posted to Internals now: https://externals.io/message/124462 Let's see if we get a response. |
Hmm, the discussion on the internals mailing list ended without consensus, not even agreement that the current behavior is a bug; as such we can at most keep that as feature request, but I guess that would not help @jrfnl in any way, and may cause even more work(-arounds). So, what to do, @jrfnl? |
@cmb69 Ha, you beat me to it. Interesting to see your take on it. The way I read it, is that there is consensus that there is a bug, just not whether the bug is the change in the tokenizer or the missing documentation ;-) I also see consensus that the current tokenization is not ideal, but no consensus on how it should be changed and whether there should be consistency rules for whitespace/comments in tokens.
As I also shared on the list:
|
Oh, right. In this case I was referring to an implementation bug, since documenting the issue wouldn't really help you. Regarding PR #15041: I feel that this is curing the symptoms, but not the desease, namely that |
Well, until there is clarity where this is going, I can't fix things for PHPCS, so PHPCS will stay broken until a solution is found. |
I don't think changing that in 8.3 is acceptable. It's been out for some time and doing such break in bug fixing release is not a good idea IMHO. We allow some breaks in minor version so this is just a break in minor version even though it is a bit unfortunate. |
Okay, if we don't change the behavior, we should document the issue. Thus, I've submitted PR #15276. |
Thank you! That makes sense to me. Let's not merge #15041 without a further discussion or RFC then. |
PR #15276 has been applied; I'm not transferring this ticket to doc-en to avoid confusion (it's a rather long discussion), but will update the PHP manual soonish. |
Description
The following code:
https://3v4l.org/2SI2Q#veol
Resulted in this output:
But I expected this output instead:
For a full analysis of the differences in tokenization, see: PHPCSStandards/PHP_CodeSniffer#529 (comment)
While I'm not necessarily challenging this change, I'd like to know if this was a deliberate/intentional change.
I have not been able to find anything in the PHP 8.3 CHANGELOG about this change, nor in the NEWS file. I can't even seem to find the commit which caused this change.
If this change was intentional, this is probably a documentation issue and the change should be annotated in the PHP 8.3 CHANGELOG.
If this change was unintentional, I believe it may be prudent to revert the change (or at least revert the side-effects of the commit which incidentally caused this change).
I'd like to get some clarity about this as it will inform how the PHP_CodeSniffer issue linked above should be fixed.
PHP Version
PHP 8.3.8
Operating System
Not relevant
The text was updated successfully, but these errors were encountered: