-
-
Notifications
You must be signed in to change notification settings - Fork 18.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Require the dtype of SparseArray.fill_value and sp_values.dtype to match #23124
Comments
It was indeed allowed in some places, but in previous releases, I think it was mostly explicitly not allowed: (examples are using 0.23.4)
One place where we did not check it, was the constructor. However, I think it is only possible to that way create an array with an incompatible fill_value if it is fully dense?
Because if there is already a On master we now allow this, but this gives rise to inconsistencies: (examples are using master now)
So you could of course say that And #23125 about |
Do you have an example of how to create such an integer sparse array? Even with constructing it explicitly with passing integer
So my suspicion is that it was actually not meant to be supported (but of course, only guessing here, as I never used it So for me the main questions is; what is the use case of allowing this? Why allowing the potential confusing, instead of adding a few checks? And then the "integer with NA" use case you mention is indeed one possible answer. But since (I think, see above) this is a new possibility, and not kept for backwards compatibility, I am personally not sure we should allow it for that reason. |
Hmm thanks for testing that out... I thought that additional tests broke, but I could be misremembering. I do agree that we shouldn't be adding new functionality. If the only thing that breaks is the constructor, i.e. In [13]: a = pd.SparseArray([1, 0, 0, 1], fill_value=np.nan)
In [14]: a
Out[14]:
[1, 0, 0, 1]
Fill: nan
IntIndex
Indices: array([0, 1, 2, 3], dtype=int32) then I'm OK with not allowing that, because it isn't actually saving any memory in that case. |
I might be wrong of course, I only tried out some obvious things. Might be good to assure that with the tests. |
FYI, I think this will require a change in So I think we need to change the meaning of We'll also I think need to update the API to allow |
Somewhat xref #39584 |
take |
Now we have situation when SparseDType keeps both the type of non zero elements and the VALUE of fill_value. The crucial thing of sparse array (as I understand, but I'm not a specialist in this topic) is only non zero values and indices. According to Julia sparsevector docs: In Julia, sparse vectors have the type SparseVector{Tv,Ti} where Tv is the type of the stored values and Ti the integer type for the indices. The internal representation is as follows:
Looks like more rational to go on this way. I mentioned this issues in #45126 and it looks like api issue. So I suggest to close this and concentrate on #45126. |
This has confused me for a while, but we apparently (sometimes?) allow a
fill_value
whose dtype is not the same assp_values.dtype
This can lead to confusing behavior when doing operations.
I suspect a primary motivation was supporting sparse integer values with
NaN
for a fill value. We should investigate what's tested, part of the API, and useful to users.The text was updated successfully, but these errors were encountered: