-
-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PHP 8.3 | Tokenizer/PHP: support "yield from" with comments #647
Merged
jrfnl
merged 8 commits into
master
from
feature/529-tokenizer-yield-from-tokenization-review
Nov 2, 2024
Merged
PHP 8.3 | Tokenizer/PHP: support "yield from" with comments #647
jrfnl
merged 8 commits into
master
from
feature/529-tokenizer-yield-from-tokenization-review
Nov 2, 2024
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This was referenced Oct 25, 2024
As discussed in and discovered via issue 529: * Prior to PHP 8.3, only whitespace was allowed between the `yield` and `from` keywords. A comment between the `yield` and `from` keywords in a `yield from` expression would result in a parse error. * As of PHP 8.3, this is no longer a parse error and both whitespace as well as comments are allowed between the `yield` and `from` and the complete expression is tokenized in PHP itself as one `T_YIELD_FROM` token. See: https://3v4l.org/2SI2Q#veol In the context of PHPCS this is problematic as comments should always have their own token to allow sniffs to examine them. Additionally, such comments may contain PHPCS ignore annotations, which, when not tokenized as a separate token, will not be respected. This commit adds support for this change in PHP 8.3 to PHP_CodeSniffer. It does contain an, albeit small, BC-break, due to the BC-break created by PHP. Previously in PHPCS: * A single line `yield from` expression would always tokenize as `T_YIELD_FROM`, independently of the type and amount of whitespace between the keywords. * A multi-line `yield` [new line]+ `from` expression would tokenize as multiple `T_YIELD_FROM` tokens, one for each line. * A `yield from` expression with a comment between the keywords was not supported. In PHP < 8.3, this meant that this would tokenize as `T_YIELD`, [`T_WHITESPACE`|T_COMMENT`]+, `T_STRING` (`from`). As of PHP 8.3, this was tokenized as one or more `T_YIELD_FROM` tokens (depending on single/multi-line) with the comment being tokenized as `T_YIELD_FROM` as well. This commit changes this as follows: * Single line `yield from` expression with only whitespace between the keywords: **no change**, this will still tokenize as a single `T_YIELD_FROM` token. * Multi-line `yield` [new line]+ `from` expressions and `yield from` expressions with a comment (both single line as well as multi-line) will now consistently be tokenized as `T_YIELD_FROM` (`yield`), [`T_WHITESPACE`|T_COMMENT`]+, `T_YIELD_FROM` (`from`). In practice, this means that: * Whitespace and comments between the keywords can now be examined and handled by relevant sniffs, which are likely to give more accurate results (fewer false negatives, like for tab indentation of a `from` keyword on a separate line). * The tokenization used by PHPCS is now consistent again for all supported PHP versions. * The PHP 8.3 change is now supported. It does mean that sniffs which explicitly handle multi-token `yield from` expressions, will need to be updated. In my opinion, adding this change in a minor is justified as: 1. The PHP 8.3 change can not be supported otherwise. 2. The impact is expected to be minimal anyhow as there are not many sniffs which specifically look for and handle `T_YIELD_FROM` tokens and those sniffs within PHPCS itself will be updated/adjusted in the same release. Also, the (negative) impact on _end-users_ of this BC-break is also expected to be minimal as a scan of the top 2000 projects listed on Packagist shows that in those project no multi-line/multi-token `yield from` expressions are used in the source code, which means that even when sniff code is not updated (yet) for the change in tokenization, the chances of an end-user getting incorrect results because of this are very slim as the code affected is just not written as multi-line/with comment that often. Includes tests. Fixes 529 Refs: * squizlabs/PHP_CodeSniffer 1524 (original polyfill code) * php/php-src 10125 * php/php-src 14926 * https://externals.io/message/124462 --- Information for standards maintainers The "yield from" _keyword_ could previously already consist of multiple T_YIELD_FROM tokens if the "keyword" was spread over multiple lines. Now, the tokens between the actual keywords will be tokenized as `T_WHITESPACE` and comment tokens. To find the last token for a `T_YIELD_FROM` "keyword", change old code like this: ```php $yieldFromEnd = $stackPtr; if (preg_match('`yield\s+from`', $tokens[$stackPtr]['content']) !== 1) { for ($yieldFromEnd = ($stackPtr + 1); $tokens[$yieldFromEnd]['code'] === T_YIELD_FROM; $yieldFromEnd++); --$yieldFromEnd; } ``` to ```php $yieldFromEnd = $stackPtr; if (strtolower(trim($tokens[$stackPtr]['content'])) === 'yield') { for ($i = ($stackPtr + 1); $i < $phpcsFile->numTokens; $i++) { if ($tokens[$i]['code'] === T_YIELD_FROM && strtolower(trim($tokens[$i]['content'])) === 'from') { $yieldFromEnd = $i; break; } if (isset(Tokens::$emptyTokens[$tokens[$i]['code']]) === false && $tokens[$i]['code'] !== T_YIELD_FROM) { // Shouldn't be possible. Just to be on the safe side. break; } } } ``` The above presumes that `$stackPtr` is set to a `T_YIELD_FROM` token. Also note that the second code snippet is largely cross-version compatible. It will work with older PHPCS versions with code compatible with PHP < 8.3 and will work on PHPCS 3.11.0+ for code compatible with all supported PHP versions.
The `yield from` keyword(s) can be separated by spaces, tabs, new lines and since PHP 8.3, even comments. The PHPCS `Tokenizer` previously did not execute tab replacement on this token leading to unexpected `'content'` and incorrect `'length'` values in the `File::$tokens` array, which in turn could lead to incorrect sniff results and incorrect fixes. Previously, this affected all `T_YIELD_FROM` tokens. After the change to support PHP 8.3 "yield ... comment... from" syntax, this now only affects single line `yield from` expressions where the keywords are separated by a tab or a mix of tabs and spaces. All the same, consistency is key, so this commit adds the `T_YIELD_FROM` token to the array of tokens for which to do tab replacement, which should make them more consistent with the rest of PHPCS. Includes unit tests safeguarding this change.
…enization This commit updates the `Generic.WhiteSpace.LanguageConstructSpacing` sniff to recognize that a multi-token `yield from` may now have whitespace and comment tokens between the keywords.
…mment If there is a comment between the `yield` and the `from` keywords in a `yield from` expression, the sniff should report this, but should not auto-fix it, as it cannot be determined where the comment should be moved to. Thic commit updates the sniff to throw a non-autofixable error with error code `IncorrectYieldFromWithComment` for that situation. Includes unit tests.
... to safeguard that a "yield from" keyword split into multiple tokens is still handled correctly.
This confirms that the Tokenizer changes related to `yield from` tokenization which were made in a previous commit also fix issue squizlabs/PHP_CodeSniffer 3808, as reported upstream. Closes squizlabs/PHP_CodeSniffer 3808 Includes moving the `phpcs:set` directives in test case file 3 to inside the PHP tags to make the tests reproducable when doing a plain `phpcs` run against the file.
This commit on the one hand confirms (via a test) that indentation for a multi-line `yield from` expression is now handled correctly by the sniff. On the other hand, this commit enables checking for tabs being used for inline "alignment" between the `yield` and the `from` keyword. This will now also be handled (and auto-fixed) by this sniff. Includes test.
As the indentation whitespace in a multi-line `yield from` expression is now tokenized as `T_WHITESPACE`, instead of being tokenized as part of the (second) `T_YIELD_FROM` token, space indentation of such a line will now be handled correctly by the sniff. This commit just adds a test to confirm and safeguard this.
Rebasing without changes. Merging once the build passes. |
f3985c7
to
fc6d98a
Compare
wmfgerrit
pushed a commit
to wikimedia/mediawiki-tools-codesniffer
that referenced
this pull request
Jan 27, 2025
Update AssignmentInReturnSniff for the BC break in PHPCSStandards/PHP_CodeSniffer#647. Bug: T379674 Change-Id: I01009b858466b7c678c799d8c99207f3395b7a5d
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
PHP 8.3 | Tokenizer/PHP: support "yield from" with comments
As discussed in and discovered via issue #529:
yield
andfrom
keywords. A comment between theyield
andfrom
keywords in ayield from
expression would result in a parse error.yield
andfrom
and the complete expression is tokenized in PHP itself as oneT_YIELD_FROM
token. See: https://3v4l.org/2SI2Q#veolIn the context of PHPCS this is problematic as comments should always have their own token to allow sniffs to examine them.
Additionally, such comments may contain PHPCS ignore annotations, which, when not tokenized as a separate token, will not be respected.
This commit adds support for this change in PHP 8.3 to PHP_CodeSniffer. It does contain an, albeit small, BC-break, due to the BC-break created by PHP.
Previously in PHPCS:
yield from
expression would always tokenize asT_YIELD_FROM
, independently of the type and amount of whitespace between the keywords.yield
[new line]+from
expression would tokenize as multipleT_YIELD_FROM
tokens, one for each line.yield from
expression with a comment between the keywords was not supported.In PHP < 8.3, this meant that this would tokenize as
T_YIELD
, [T_WHITESPACE
|T_COMMENT
]+,T_STRING
(from
).As of PHP 8.3, this was tokenized as one or more
T_YIELD_FROM
tokens (depending on single/multi-line) with the comment being tokenized asT_YIELD_FROM
as well.This commit changes this as follows:
yield from
expression with only whitespace between the keywords: no change, this will still tokenize as a singleT_YIELD_FROM
token.yield
[new line]+from
expressions andyield from
expressions with a comment (both single line as well as multi-line) will now consistently be tokenized asT_YIELD_FROM
(yield
), [T_WHITESPACE
|T_COMMENT
]+,T_YIELD_FROM
(from
).In practice, this means that:
from
keyword on a separate line).It does mean that sniffs which explicitly handle multi-token
yield from
expressions, will need to be updated.In my opinion, adding this change in a minor is justified as:
T_YIELD_FROM
tokens and those sniffs within PHPCS itself will be updated/adjusted in the same release.Also, the (negative) impact on end-users of this BC-break is also expected to be minimal as a scan of the top 2000 projects listed on Packagist shows that in those project no multi-line/multi-token
yield from
expressions are used in the source code, which means that even when sniff code is not updated (yet) for the change in tokenization, the chances of an end-user getting incorrect results because of this are very slim as the code affected is just not written as multi-line/with comment that often.Includes tests.
Fixes #529
Refs:
yield
keyword asT_YIELD
on PHP 5.4, andyield from
asT_YIELD_FROM
on PHP < 7.0 squizlabs/PHP_CodeSniffer#1524 (original polyfill code)Information for standards maintainers
The "yield from" keyword could previously already consist of multiple T_YIELD_FROM tokens if the "keyword" was spread over multiple lines.
Now, the tokens between the actual keywords will be tokenized as
T_WHITESPACE
and comment tokens.To find the last token for a
T_YIELD_FROM
"keyword", change old code like this:to
The above presumes that
$stackPtr
is set to aT_YIELD_FROM
token.Also note that the second code snippet is largely cross-version compatible. It will work with older PHPCS versions with code compatible with PHP < 8.3 and will work on PHPCS 3.11.0+ for code compatible with all supported PHP versions.
Tokenizer: apply tab replacement to "yield from"
The
yield from
keyword(s) can be separated by spaces, tabs, new lines and since PHP 8.3, even comments.The PHPCS
Tokenizer
previously did not execute tab replacement on this token leading to unexpected'content'
and incorrect'length'
values in theFile::$tokens
array, which in turn could lead to incorrect sniff results and incorrect fixes.Previously, this affected all
T_YIELD_FROM
tokens. After the change to support PHP 8.3 "yield ... comment... from" syntax, this now only affects single lineyield from
expressions where the keywords are separated by a tab or a mix of tabs and spaces.All the same, consistency is key, so this commit adds the
T_YIELD_FROM
token to the array of tokens for which to do tab replacement, which should make them more consistent with the rest of PHPCS.Includes unit tests safeguarding this change.
Generic/LanguageConstructSpacing: update for changed "yield from" tokenization
This commit updates the
Generic.WhiteSpace.LanguageConstructSpacing
sniff to recognize that a multi-tokenyield from
may now have whitespace and comment tokens between the keywords.Generic/LanguageConstructSpacing: don't auto-fix 'yield from' with comment
If there is a comment between the
yield
and thefrom
keywords in ayield from
expression, the sniff should report this, but should not auto-fix it, as it cannot be determined where the comment should be moved to.Thic commit updates the sniff to throw a non-autofixable error with error code
IncorrectYieldFromWithComment
for that situation.Includes unit tests.
Generic/LowerCaseKeyword: add extra tests with "yield from"
... to safeguard that a "yield from" keyword split into multiple tokens is still handled correctly.
Generic/ScopeIndent: add extra test with multi-token "yield from"
This confirms that the Tokenizer changes related to
yield from
tokenization which were made in a previous commit also fix issue squizlabs/PHP_CodeSniffer#3808, as reported upstream.Closes squizlabs/PHP_CodeSniffer#3808
Includes moving the
phpcs:set
directives in test case file 3 to inside the PHP tags to make the tests reproducable when doing a plainphpcs
run against the file.Generic/DisallowTabIndent: improve handling of yield from
This commit on the one hand confirms (via a test) that indentation for a multi-line
yield from
expression is now handled correctly by the sniff.On the other hand, this commit enables checking for tabs being used for inline "alignment" between the
yield
and thefrom
keyword.This will now also be handled (and auto-fixed) by this sniff. Includes test.
Generic/DisallowSpaceIndent: add test with multi-line yield from
As the indentation whitespace in a multi-line
yield from
expression is now tokenized asT_WHITESPACE
, instead of being tokenized as part of the (second)T_YIELD_FROM
token, space indentation of such a line will now be handled correctly by the sniff.This commit just adds a test to confirm and safeguard this.
Suggested changelog entry
yield from
expressions with a comment between the keywords.T_YIELD_FROM
tokens may need updating. See the PR description for guidance.yield from
expressions with comments:yield from
keywords.yield from
expression.yield from