-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Encoding issue with Html reader after upgrading to 1.24.0 #2942
Comments
Possibly due to #2894 (PHP8.2 will deprecate a function that was being used without an entirely clear path away from it). I will investigate. |
Yes, that's it alright. PHP8.2 deprecates mb_convert_encoding from translating to HTML_ENTITIES because "PHP already provides built-in functions that support encoding/decoding ... HTML Entities." Yeah, right. The function they suggest is not a complete equivalent; another possibility which they suggest is "not recommended". The problem is that the DOM loadHtml function assumes ISO-8859-1, and you cannot override it with code - only by adding a character set tag in the data. Wow, just wow. I should have a PR ready today or tomorrow. In the meantime, as a workaround you can always add to the start of your string:
|
I tried the <?xml encoding = "UTF-8"> and it did not work for me. |
Strange, it certainly works for me. Sorry it doesn't for you. Fix is on its way. $htmlString =
'<?xml encoding = "UTF-8">' .
'<table><tbody><tr><td>éàâèî</td></tr></tbody></table>'; |
Fix PHPOffice#2942. Code was changed by PHPOffice#2894 because PHP8.2 will deprecate how it was being done. See linked issue for more details. Dom loadhtml assumes ISO-8859-1 in the absence of a charset attribute or equivalent, and there is no way to override that assumption. Sigh. The suggested replacements are unsuitable in one way or another. I think this will work with minimal disruption (replace ampersand, less than, and greater than with entities representing illegal characters, then use htmlentities, then restore ampersand, less than, and greater than).
* Html Reader Not Handling non-ASCII Data Correctly Fix #2942. Code was changed by #2894 because PHP8.2 will deprecate how it was being done. See linked issue for more details. Dom loadhtml assumes ISO-8859-1 in the absence of a charset attribute or equivalent, and there is no way to override that assumption. Sigh. The suggested replacements are unsuitable in one way or another. I think this will work with minimal disruption (replace ampersand, less than, and greater than with entities representing illegal characters, then use htmlentities, then restore ampersand, less than, and greater than). * Better Implementation Use regexp to escape non-ASCII. Less kludgey, less reliant on the vagaries of the PHP maintainers. * Additional Tests Test non-ASCII outside of cell contents: sheet title, image alt attribute. * Apply Same Change in Second Location Forgot to change loadFromString. * Additional Test Confirm escaped ampersand is handled correctly.
### Added - Add Chart Axis Option textRotation [Issue #2705](#2705) [PR #2940](#2940) ### Changed - Nothing ### Deprecated - Nothing ### Removed - Nothing ### Fixed - Fix Encoding issue with Html reader (PHP 8.2 deprecation for mb_convert_encoding) [Issue #2942](#2942) [PR #2943](#2943) - Additional Chart fixes - Pie chart with part separated unwantedly [Issue #2506](#2506) [PR #2928](#2928) - Chart styling is lost on simple load / save process [Issue #1797](#1797) [Issue #2077](#2077) [PR #2930](#2930) - Can't create contour chart (surface 2d) [Issue #2931](#2931) [PR #2933](#2933) - VLOOKUP Breaks When Array Contains Null Cells [Issue #2934](#2934) [PR #2939](#2939)
This is:
What is the expected behavior?
Correct encoding of characters
What is the current behavior?
Accented characters do not display correctly
What are the steps to reproduce?
Create a sheet from html reader with accented characters
What features do you think are causing the issue
Does an issue affect all spreadsheet file formats? If not, which formats are affected?
I didn't. test with other formats
Which versions of PhpSpreadsheet and PHP are affected?
Only happens with phpspreadsheet 1.24.0, reverting to 1.23.0 fix the issue
Attached is the generated file for the example
output.xlsx
(EDIT, attached the wrong file)
The text was updated successfully, but these errors were encountered: