Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSV Reader Enhancements #2103

Merged
merged 6 commits into from
May 30, 2021
Merged

CSV Reader Enhancements #2103

merged 6 commits into from
May 30, 2021

Conversation

oleibman
Copy link
Collaborator

This PR came about as I pondered how feasible it was to change the default escape character from backslash to null string, since the latter emulates Excel's own actions. Also, surveying issues relating to CSV, it seems that people are often in a situation where the current defaults aren't optimal for them (e.g. they are in a region where semicolon rather than comma is a better default delimiter). My case and that case can both be handled by methods after a reader is constructed. However, the issues also show that many use IOFactory::load rather than new Csv(), and the methods to affect the defaults are not available in that case.

Adding a static callback that can be invoked by the constructor addresses all these problems. This can be set as part of the user application's normal initialization, and no special attention needs to be paid to CSV loads thereafter, no matter how they are invoked.

This also makes it feasible to use 'guess' as inputEncoding, by providing a new setFallbackEncoding (default CP1252) method to use if none of the heuristic tests pass. There was already the ability to guess the encoding before $reader->load(), but not before IOFactory::load.

Almost all typehints in Reader/Csv and Reader/Csv/Delimiter are now part of the function signature rather than in the DocBlock. The exceptions are one method in Delimiter which uses a resource parameter, and the canRead and load methods, which must match the signature in IOFactory. I will look into changing those later.

The Csv Reader tests are moved into their own directory. All Phpstan baseline entries involving Csv Reader are eliminated.

This is:

- [ ] a bugfix
- [x] a new feature

Checklist:

Why this change is needed?

oleibman added 6 commits May 16, 2021 06:05
This PR came about as I pondered how feasible it was to change the default escape character from backslash to null string, since the latter emulates Excel's own actions. Also, surveying issues relating to CSV, it seems that people are often in a situation where the current defaults aren't optimal for them (e.g. they are in a region where semicolon rather than comma is a better default delimiter). My case and that case can both be handled by methods after a reader is constructed. However, the issues also show that many use `IOFactory::load` rather than `new Csv()`, and the methods to affect the defaults are not available in that case.

Adding a static callback that can be invoked by the constructor addresses all these problems. This can be set as part of the user application's normal initialization, and no special attention needs to be paid to CSV loads thereafter, no matter how they are invoked.

This also makes it feasible to use 'guess' as inputEncoding, by providing a new setFallbackEncoding (default CP1252) method to use if none of the heuristic tests pass. There was already the ability to guess the encoding before `$reader->load()`, but not before `IOFactory::load`.

Almost all typehints in Reader/Csv and Reader/Csv/Delimiter are now part of the function signature rather than in the DocBlock. The exceptions are one method in Delimiter which uses a `resource` parameter, and the `canRead` and `load` methods, which must match the signature in IOFactory. I will look into changing those later.

The Csv Reader tests are moved into their own directory. All Phpstan baseline entries involving Csv Reader are eliminated.
@oleibman oleibman merged commit 1d86840 into PHPOffice:master May 30, 2021
@oleibman oleibman deleted the csvdflts branch July 1, 2021 15:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

1 participant