Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cell shared formula does not exists when the xlsx is chunked with readFilter #1669

Closed
ghost opened this issue Oct 7, 2020 · 4 comments
Closed

Comments

@ghost
Copy link

ghost commented Oct 7, 2020

This is:

- [x] a bug report
- [x] a feature request
- [x] **not** a usage question (ask them on https://stackoverflow.com/questions/tagged/phpspreadsheet or https://gitter.im/PHPOffice/PhpSpreadsheet)

What is the expected behavior?

When using XSLX Reader with read filter, the shared formulas outside the filtered rows should be loaded.

In this way, when using getCalculatedValue on a formula cell with shared formula, the returned value is the formula result.

What is the current behavior?

When using XSLX Reader with read filter, the shared formulas outside the filtered rows are not loaded.

Therefore when calculating the formula for the cell, it will give '=' because $sharedFormulas ( if condition in Xlsx.php:271) is empty,

What are the steps to reproduce?

We have an excel xlsx file with a column C1=ROUND(A1 + B1, 2).
From the C1 we drag down, and excel autofills the C2,C3,C4,C5....,C999 cells with the formula (in this way the C2 ... C999 formulas will be shared).

That means the xml from the xslx zip ( you can extract the xslx by changing the extension to .zip) should look like:
C1 XML : <f t="shared" ref="C1:C999" si="0">ROUND(A1 + B1, 2)</f><v>22.22</v>
...
Cn XML : <f t="shared" si="0"/><v>99.99</v>
...
C999 XML : <f t="shared" si="0"/><v>11.11</v>

When importing this file with chunk filter, the getCalculatedValue in Cell.php line 63 comes as empty string.

I found that the cell type C2,C3,..C999 is shared, so it goes in castToFormula() in Xlsx.php:269 when loading and reading the file.

Because the shared formula C1 is not loaded yet in $sharedFormulas ( if condition in Xlsx.php:271), and the $value ( defined in Xlsx.php:264 ) is equal to '=', the return value will have no formula like "=ROUND(An + Bn, 2)" where N = { 2,3,4,5,6,7...,999 }

I think because the file is chunked, the initial formula is never loaded for some chunk, which will result in an empty value instead of a computed formula value.

private function castToFormula($c, $r, &$cellDataType, &$value, &$calculatedValue, &$sharedFormulas, $castBaseType): void
    {
        $cellDataType = 'f';
        $value = "={$c->f}";
        $calculatedValue = self::$castBaseType($c);

        // Shared formula?
        if (isset($c->f['t']) && strtolower((string) $c->f['t']) == 'shared') {
            $instance = (string) $c->f['si'];

            #here is the problem. when searching for the shared formula, the key 'si' is not set because the formula is in another chunk which was not loaded.
            if (!isset($sharedFormulas[(string) $c->f['si']])) {
                $sharedFormulas[$instance] = ['master' => $r, 'formula' => $value];
            } else {
                $master = Coordinate::coordinateFromString($sharedFormulas[$instance]['master']);
                $current = Coordinate::coordinateFromString($r);

                $difference = [0, 0];
                $difference[0] = Coordinate::columnIndexFromString($current[0]) - Coordinate::columnIndexFromString($master[0]);
                $difference[1] = $current[1] - $master[1];

                $value = $this->referenceHelper->updateFormulaReferences($sharedFormulas[$instance]['formula'], 'A1', $difference[0], $difference[1]);
            }
        }
    }

I found this which also explains the problem: https://github.com/nuovo/spreadsheet-reader

Notes about library performance
XLSX files use so called "shared strings" internally to optimize for cases where the same string is repeated multiple times. Internally XLSX is an XML text that is parsed sequentially to extract data from it, however, in some cases these shared strings are a problem - sometimes Excel may put all, or nearly all of the strings from the spreadsheet in the shared string file (which is a separate XML text), and not necessarily in the same order. Worst case scenario is when it is in reverse order - for each string we need to parse the shared string XML from the beginning, if we want to avoid keeping the data in memory. To that end, the XLSX parser has a cache for shared strings that is used if the total shared string count is not too high. In case you get out of memory errors, you can try adjusting the SHARED_STRING_CACHE_LIMIT constant in SpreadsheetReader_XLSX to a lower one.

Which versions of PhpSpreadsheet and PHP are affected?

PHP version: php-7.4.10-Win32-vc15-x64
Laravel version: Laravel Framework 8.5.0
phpoffice/phpspreadsheet version : * 1.14.1

@MarkBaker
Copy link
Member

Fix is WIP PR #1680

@MarkBaker
Copy link
Member

Merged to master, ready for the next release

@ghost
Copy link
Author

ghost commented Oct 16, 2020

Thanks!

~ETA for next release?

@ghost
Copy link
Author

ghost commented Oct 27, 2020

Can this fix be in 1.14 version?

I see that laravel excel uses "^1.14"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

1 participant