Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Csv Handling of Booleans (and an 8.1 Deprecation) #2232

Merged
merged 3 commits into from
Aug 4, 2021

Conversation

oleibman
Copy link
Collaborator

PhpSpreadsheet writes boolean values to a Csv as null-string/1, and treats input values of 'true' and 'false' as if they were strings. On the other hand, Excel writes boolean values to a Csv as TRUE/FALSE, and case-insensitively treats a matching string as boolean on read. This PR changes PhpSpreadsheet to match Excel.

A side-effect of this change is that it fixes behavior incorrectly reported as a bug in PR #2048. That issue was closed, correctly, as user error. The user had altered Csv Writer, including adding declare(strict_types=1);; that declaration was the cause of the error. The "offending" statements, calls to strpbrk and str_replace, will now work correctly whether or not strict_types is in use.

And, just as I was getting ready to push this, the dailies for PHP 8.1 introduced a change deprecating auto_detect_line_endings. Csv Reader uses that setting; it allows it to process a Csv with Mac line endings, which happens to be something that Excel can do. As they say in https://wiki.php.net/rfc/deprecations_php_8_1, where the proposal passed without a single dissenting vote, "These newlines were used by “Classic” Mac OS, a system which has been discontinued in 2001, nearly two decades ago. Interoperability with such systems is no longer relevant." I tend to agree, but I don't know that we're ready to pull the plug yet. I don't see an easy way to emulate that functionality. For now, I have silenced the deprecation notices with at signs. I have also added a test case which will fail when support for that setting is pulled; this will give time to consider alternatives.

This is:

- [x] a bugfix
- [ ] a new feature

Checklist:

Why this change is needed?

oleibman added 3 commits July 22, 2021 21:14
PhpSpreadsheet writes boolean values to a Csv as null-string/1, and treats input values of 'true' and 'false' as if they were strings. On the other hand, Excel writes boolean values to a Csv as TRUE/FALSE, and case-insensitively treats a matching string as boolean on read. This PR changes PhpSpreadsheet to match Excel.

A side-effect of this change is that it fixes behavior incorrectly reported as a bug in PR PHPOffice#2048. That issue was closed, correctly, as user error. The user had altered Csv Writer, including adding ```declare(strict_types=1);```; that declaration was the cause of the error. The "offending" statements, calls to strpbrk and str_replace, will now work correctly whether or not strict_types is in use.

And, just as I was getting ready to push this, the dailies for PHP 8.1 introduced a change deprecating auto_detect_line_endings. Csv Reader uses that setting; it allows it to process a Csv with Mac line endings, which happens to be something that Excel can do. As they say in https://wiki.php.net/rfc/deprecations_php_8_1, where the proposal passed without a single dissenting vote, "These newlines were used by “Classic” Mac OS, a system which has been discontinued in 2001, nearly two decades ago. Interoperability with such systems is no longer relevant." I tend to agree, but I don't know that we're ready to pull the plug yet. I don't see an easy way to emulate that functionality. For now, I have silenced the deprecation notices with at signs. I have also added a test case which will fail when support for that setting is pulled; this will give time to consider alternatives.
This could be interesting. It doesn't like not handling an error condition for ini_set. Let's see if this satisfies it.
@oleibman oleibman merged commit 0cd20f3 into PHPOffice:master Aug 4, 2021
@oleibman oleibman deleted the csvnumbers branch August 27, 2021 15:31
@TobiasBg
Copy link
Contributor

Hi @oleibman,

I just stumbled upon this after seeing unit tests in my project fail after updating PHPSpreadsheet to 1.19.0.
Essentially, I'm expecting true and false (and actually everything) in a CSV to be imported as string and not as boolean.
Up to now, I've been using the StringValueBinder for this. However, this no longer works, as the conversion of the strings true and false to boolean happens before a value binder is executed.
Therefore, shouldn't the conversion be part of a value binder? Otherwise, there's no way to turn this conversion off, right?

Thanks for looking into this again!
Tobias

@oleibman
Copy link
Collaborator Author

oleibman commented Oct 31, 2021

My apologies for the inconvenience. PhpSpreadsheet CSV Reader had not been handling booleans correctly before. I think it is possible that you needed StringValueBinder for that reason, and that you can drop it now (using the default value binder, you get TRUE/FALSE as expected). It is also possible that we need to cover this situation as an extra option in the new StringValueBinder; I need to think about that. In the meantime, if default value binder won't do, the following workaround should work for both 18 and 19:

class StringValueBinder2 extends StringValueBinder {
    public function bindValue(Cell $cell, $value)
    {
        if (!method_exists($this, 'setBooleanConversion')) {
            return parent::bindValue($cell, $value);
        }
        if (is_object($value)) {
            return $this->bindObjectValue($cell, $value);
        }

        // sanitize UTF-8 strings
        if (is_string($value)) {
            $value = StringHelper::sanitizeUTF8($value);
        }

        if ($value === null && $this->convertNull === false) {
            $cell->setValueExplicit($value, DataType::TYPE_NULL);
        } elseif (is_bool($value) && $this->convertBoolean === false) {
            $cell->setValueExplicit($value, DataType::TYPE_BOOL);
        } elseif ((is_int($value) || is_float($value)) && $this->convertNumeric === false) {
            $cell->setValueExplicit($value, DataType::TYPE_NUMERIC);
        } elseif (is_string($value) && strlen($value) > 1 && $value[0] === '=' && $this->convertFormula === false) {
            $cell->setValueExplicit($value, DataType::TYPE_FORMULA);
        } else {
            if (is_bool($value)) {
                $value = $value ? 'TRUE' : 'FALSE';
            }
            if (is_string($value) && strlen($value) > 1 && $value[0] === '=') {
                $cell->getStyle()->setQuotePrefix(true);
            }
            $cell->setValueExplicit((string) $value, DataType::TYPE_STRING);
        }

        return true;
    }
}

Then use your own StringValueBinder2 rather than StringValueBinder. This overrides only method bindValue, and only by using the method_exists tests near the beginning, and the is_bool test near the end.

@TobiasBg
Copy link
Contributor

Hi @oleibman,

thanks for the quick reply! Using a custom value binder is what I've been thinking about now, too, but I fear that this approach (e.g. with your suggested StringValueBinder2) won't solve my problem entirely:
The information about upper/lowercase of the words true, TRUE, True, etc. is now lost (due to the use of strcasecmp()). In my case, where the CSV files (not coming from Excel) contain mostly text, retaining the upper/lowercase information is however desired.
A configuration option that allows retrieving the "raw"/unmodified CSV data would probably be better, or this conversion could maybe be moved to a place where it can be turned off somehow (e.g. into a CSV value binder or something like that?).

Thank a lot!
Tobias

@TobiasBg
Copy link
Contributor

TobiasBg commented Nov 2, 2021

Hi @oleibman,

I just realized that I posted my comment above to a closed/merged PR. Should I instead maybe post this as a new issue to get more eyes/potential ideas on this? Thanks for your guidance!

Best wishes,
Tobias

@oleibman
Copy link
Collaborator Author

oleibman commented Nov 2, 2021

This is good enough for now.

oleibman added a commit to oleibman/PhpSpreadsheet that referenced this pull request Nov 6, 2021
See the discussion in PR PHPOffice#2232 which came about 3 months after it was merged. It caused a problem in an unusual situation which did not come to light until the change was part of the new release version. The original PR changed PhpSpreadsheet's behavior to match Excel's for (not case sensitive) strings `TRUE` and `FALSE`. Excel treats the values as boolean, and now so does PhpSpreadsheet.

When StringValueBinder is used, this becomes a tricky situation. The user wants the original strings preserved, including the case of all the letters. This PR changes the behavior of CSV reader as follows:
- If StringValueBinder is not in effect, convert to boolean.
- If StringValueBinder (actually any binder with method getBooleanConversion) is in effect, and the result of getBooleanConversion is true (which is the default in StringValueBinder), leave the value coming out of Csv Reader as the unchanged string.
- Otherwise, convert to boolean.

This should mean that there are no regression problems with StringValueBinder, while allowing PhpSpreadsheet to continue to match Excel in the default situation. No new settings are required.
@oleibman oleibman mentioned this pull request Nov 6, 2021
5 tasks
TobiasBg added a commit to TablePress/PhpSpreadsheet that referenced this pull request Nov 10, 2021
oleibman added a commit that referenced this pull request Nov 12, 2021
See the discussion in PR #2232 which came about 3 months after it was merged. It caused a problem in an unusual situation which did not come to light until the change was part of the new release version. The original PR changed PhpSpreadsheet's behavior to match Excel's for (not case sensitive) strings `TRUE` and `FALSE`. Excel treats the values as boolean, and now so does PhpSpreadsheet.

When StringValueBinder is used, this becomes a tricky situation. The user wants the original strings preserved, including the case of all the letters. This PR changes the behavior of CSV reader as follows:
- If StringValueBinder is not in effect, convert to boolean.
- If StringValueBinder (actually any binder with method getBooleanConversion) is in effect, and the result of getBooleanConversion is true (which is the default in StringValueBinder), leave the value coming out of Csv Reader as the unchanged string.
- Otherwise, convert to boolean.

This should mean that there are no regression problems with StringValueBinder, while allowing PhpSpreadsheet to continue to match Excel in the default situation. No new settings are required.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants