Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial work enabling Excel function implementations for handling arrays as arguments when used in "array formulae". #2562

Merged

Conversation

MarkBaker
Copy link
Member

@MarkBaker MarkBaker commented Feb 6, 2022

This is:

- [ ] a bugfix
- [X] a new feature

Checklist:

Why this change is needed?

See Issue #2551 for details

Currently, the PhpSpreadsheet function implementations ignore array arguments, simply extracting the first value from any array and using that value as the argument. This can give erroneous results, as in the example of the array formula =MAX(ABS({-3, 2.5, -1; 0, -1, -12})) where the ABS() function takes only the -3 value from the array, giving a final result from this formula as 3, when the correct result for the formula should actually be 12.

This change modifies all of the relevant Excel function implementations so that they can work correctly with the MS Excel handling of array arguments passed to functions, and actually process the arrays rather than reducing them to a single value before processing that value.

How Excel handles array arguments passed to functions

Single array argument

When a single array argument is passed to a function, whether it is a row vector, a column vector, or a matrix, the function will return an array with the same dimensions.

So passing a 2x2 matrix argument for year to the DATE() function (=DATE({2020,2021;2022,2023}, 1, 1)) will result in a 2x2 matrix being returned ({43831, 44197; 44562, 44927}): A row vector with three elements will return a row vector with three results; a column vector with three elements will return a column vector with three results.

Two array arguments

It gets a bit more complicated when more than one array argument is passed to a function, and the size of the resulting array depends on the sizes and types of arguments passed in.

  • Row vector and Column vector:
    Will return a matrix, with as many rows as the row vector, and as many columns as the column vector; so passing a 3 element row vector and a 4 element column vector will result in a 3 row x 4 column matrix.

  • Row vector and Row vector
    Will return a row vector containing as many columns as the smaller of the two vectors; so passing a 3 column row vector and a 5 column row vector will result in a 3 column row vector.

  • Column vector and Column vector
    Will return a column vector containing as many rows as the smaller of the two vectors; so passing a 2 row column vector and a 5 row column vector will result in a 2 row column vector.

  • Row Vector and Matrix
    Will return a result with as many rows as the matrix; but only as many columns as the smaller of the columns in the matrix or the columns in the row vector. So passing a 4 row x 3 column matrix and a 5 column row vector will result in a 4 row by 3 column matrix, while passing a 5 row by 5 column matrix and a 2 column row vector will result in a 5 row by 2 column matrix.

  • Column vector and Matrix
    Will return a result with as many columns as the matrix; but only as many rows as the smaller of the rows in the matrix or the rows in the column vector. So passing a 4 row x 3 column matrix and a 5 row column vector will result in a 4 row by 3 column matrix, while passing a 5 row by 5 column matrix and a 3 row column vector will result in a 5 row by 3 column matrix.

  • Matrix and Matrix
    Will return a result with the small of each dimension from the two matrices; so a 5x2 matrix and a 3x8 matrix will result in a 3x2 matrix

Three or more array arguments

MS Excel gets even more complicated and starts dipping into the realms of non-euclidian geometry (and there lies madness akin to trying to parse HTML markup using regular expressions) when returning n-dimensional arrays of results: in part, MS Excel supports 3-dimensional matrices across multiple worksheets; but that adds a lot of complexity in the Calculation Engine, and introduces a whole new series of inconsistencies.

One approach would have been to flatten the n-dimensional result down to a 2-dimensional result, which would be correct in some cases, but not in others.

So (for the moment) I've decided to throw an Exception if a function is called with more than two array arguments.

This has only one caveat: an array argument that only contains a single value (a 1x1 matrix) is not treated as an array, but as a simple scalar value; so it is not included in this count.

@MarkBaker MarkBaker marked this pull request as draft February 6, 2022 17:25
@MarkBaker MarkBaker force-pushed the Issue-2551-Prepare-Excel-Functions-as-Array-Functions branch 12 times, most recently from 7917998 to 4d867c9 Compare February 7, 2022 17:34
…ays as aguments when used in "array formulae".

So far:
 - handling for single argument functions
 - for functions where only one of the arguments is an array (a matrix or a row/column vector)
 - for when there are two array arguments, and one is a row vector, the other a column vector
 - for when there are either 2 row vectors, or 2 column vectors

Will work ok, as long as there are no more than two array arguments; still need to identify the logic to apply when there are more than two arrays; or there are two that aren't an already supported row vector/column vector pairing (ie two matrices)
@MarkBaker MarkBaker force-pushed the Issue-2551-Prepare-Excel-Functions-as-Array-Functions branch from 4d867c9 to 435bbd5 Compare February 7, 2022 19:42
@MarkBaker MarkBaker force-pushed the Issue-2551-Prepare-Excel-Functions-as-Array-Functions branch 3 times, most recently from 0e22939 to 30bfd9e Compare February 8, 2022 17:35
MarkBaker added 3 commits February 9, 2022 14:25
…s and column/column vectors to ensure that correctly dimensioned results are returned
…matrix/matrix arguments

Re-baseline phpstan
… that paired arrays/vectors work with functions that support more than 2 arguments
@MarkBaker MarkBaker force-pushed the Issue-2551-Prepare-Excel-Functions-as-Array-Functions branch from 9cec4fc to 34bb6f2 Compare February 9, 2022 13:25
MarkBaker added 2 commits February 9, 2022 14:49
…attening) until we identify the abstruse non-euclidian logic behind how Excel handles building, using and presenting those n-dimensional result arrays
@MarkBaker MarkBaker force-pushed the Issue-2551-Prepare-Excel-Functions-as-Array-Functions branch from 34bb6f2 to 61763cf Compare February 9, 2022 13:55
@MarkBaker MarkBaker marked this pull request as ready for review February 9, 2022 13:58
@MarkBaker MarkBaker merged commit 291ea88 into master Feb 9, 2022
@MarkBaker MarkBaker deleted the Issue-2551-Prepare-Excel-Functions-as-Array-Functions branch February 9, 2022 14:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

1 participant