forked from PHPOffice/PhpSpreadsheet
-
Notifications
You must be signed in to change notification settings - Fork 3
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Html Reader Not Handling non-ASCII Data Correctly
Fix PHPOffice#2942. Code was changed by PHPOffice#2894 because PHP8.2 will deprecate how it was being done. See linked issue for more details. Dom loadhtml assumes ISO-8859-1 in the absence of a charset attribute or equivalent, and there is no way to override that assumption. Sigh. The suggested replacements are unsuitable in one way or another. I think this will work with minimal disruption (replace ampersand, less than, and greater than with entities representing illegal characters, then use htmlentities, then restore ampersand, less than, and greater than).
- Loading branch information
Showing
3 changed files
with
78 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
<?php | ||
|
||
namespace PhpOffice\PhpSpreadsheetTests\Reader\Html; | ||
|
||
use PhpOffice\PhpSpreadsheet\Reader\Html; | ||
use PHPUnit\Framework\TestCase; | ||
|
||
class Issue2942Test extends TestCase | ||
{ | ||
public function testLoadFromString(): void | ||
{ | ||
$content = '<table><tbody><tr><td>éàâèî</td></tr></tbody></table>'; | ||
$reader = new Html(); | ||
$spreadsheet = $reader->loadFromString($content); | ||
$sheet = $spreadsheet->getActiveSheet(); | ||
self::assertSame('éàâèî', $sheet->getCell('A1')->getValue()); | ||
} | ||
|
||
public function testLoadFromFile(): void | ||
{ | ||
$file = 'tests/data/Reader/HTML/utf8chars.html'; | ||
$reader = new Html(); | ||
$spreadsheet = $reader->loadSpreadsheetFromFile($file); | ||
$sheet = $spreadsheet->getActiveSheet(); | ||
self::assertSame('éàâèî', $sheet->getCell('A1')->getValue()); | ||
self::assertSame('αβγδε', $sheet->getCell('B1')->getValue()); | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
<!DOCTYPE html> | ||
<html> | ||
<head> | ||
<!-- deliberately do not identify charset for this test --> | ||
<title>Test Utf-8 characters</title> | ||
</head> | ||
<body> | ||
<table> | ||
<tbody> | ||
<tr> | ||
<td>éàâèî</td> | ||
<td>αβγδε</td> | ||
</tr> | ||
</tbody> | ||
</table> | ||
</body> | ||
</html> |