-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some maintenance work #77
Conversation
lyrixx
commented
May 30, 2022
- Allow org_heigl/hyphenator in ^3.0
- Simplify composer.json 'scripts' section
- Drop support for Symfony < 4.4 + Add support for ^6.0
- Run tests on PHP 8.1
there is a failure on PHP 8.1, but I don't know how to fix it:
|
The line causing the bug is this one : For an uknown reason (from me at least), the behaviour of this function changed in PHP 8.1. $tofix = "/ˈdʒɪf/";
echo(mb_detect_encoding($tofix) . PHP_EOL);
mb_detect_order('ASCII,UTF-8,ISO-8859-1,windows-1252,iso-8859-15');
echo(mb_detect_encoding($tofix)); In PHP 8.0 I get UFT-8
UTF-8 In PHP 8.1 I get UTF-8
Windows-1252 Sandbox showing the issue : https://onlinephp.io/c/5f836 |
So after doing some more research I found out that the function that has changed is not Question :
Answer :
So now
This is not what we want to do in this library. We do not want to get the most likely encoding, we want to validate that the text we are given is valid in some encodings, with some preferences. |
I tried fixing it :
|
Awesome work! I'll take care of the deprecation if you want. |
@@ -46,7 +46,7 @@ public function testFullPageMarkup() | |||
HTML; | |||
|
|||
$fixed = <<<'STRING' | |||
“Who Let the Dogs Out?” is a song written and originally recorded by Anslem Douglas (titled “Doggie”). | |||
“Who Let the Dogs Out?” is a song written and originally recorded by Anslem Douglas (titled “Doggie”). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does the behaviour changed here?! I don't get it :/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know, encoding seems to have had numerous changes in PHP 8.1.
However, this is what we expected to get in similar tests :
https://github.com/jolicode/JoliTypo/blob/master/tests/JoliTypo/Tests/Fixer/EnglishQuotesTest.php#L42-L45
https://github.com/jolicode/JoliTypo/blob/master/tests/JoliTypo/Tests/JoliTypoTest.php#L175
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new version look good to me 👍🏼
@@ -356,7 +361,9 @@ private function fixContentEncoding($content) | |||
mb_substr($content, $headPos); | |||
} | |||
|
|||
$content = mb_convert_encoding($content, 'HTML-ENTITIES', $encoding); | |||
if ('UTF-8' !== $encoding) { | |||
$content = mb_convert_encoding($content, 'UTF-8', $encoding); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually i'm not sure about this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it should be HTML-ENTITIES
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And there is no need to test if it's UTF-8.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using mb_convert_encoding
to convert to HTML entities is deprecated in PHP 8.2 and did not function well previously : php/php-src@9308974
And we don't want to ue html_entity_decode
because it will break the fixer if the user pass something like 1 > 3
or <3
.
My concern was more about the fact that we set the charset to $encoding
but then we encode the content to UTF-8. This is weird. Either we don't set the charset or we don't convert IMO.
Thank you both 👍 |