-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
missing append!(::CategoricalArray, ::SubArray{<:CategoricalArray}) #170
Comments
I see that we have support for Also there is a question if we want to support |
That's probably just an oversight. PR welcome! Regarding |
Filled #171 to fix it. Agreed that |
Sorry, I always think Base provides a generic fallback based on |
That was my problem in the first place :). A generic |
I think the fallback should literally be defined as calling |
OK. I will investigate this as a separate PR (and you will make me to better know the internals of CategoricalArrays.jl 😄). |
While working on #171 I noted that actually appending unordered Also if we append an ordered categorical array to other categorical array we keep the order from the LHS. Here is an example:
Proposal: we should keep the result ordered only if what we append is ordered AND that order does not conflict the order of what we append to. Also in #171 I would keep the as-is behavior (so only change the tests), but in the next PR implement the target functionality (if we decide something needs to be changed). |
Good catch. Yes, we should mark the array as unordered in that case. There's logic for that in |
OK. So I have the following design question before proceeding. This throws an error:
Should then the following statements throw an error
|
As for the case:
I see that However, then the design issue I have is that:
and I do not understand why the last operation succeeds while first two failed (which would be expected). |
Funny, I had forgotten about that strict behavior. Then we should throw an error in all cases where a level would have to be added. |
OK, so the rule (in general in all functions should be):
right? This means in particular that both:
and
should be accepted and use the order and levels of If we agree on this I can go through all the relevant functions and sanitize them to follow those rules. |
Sounds right. We could imagine checking that the order of levels of the source matches that of the destination, but that could be inconvenient and I'm not sure it matters a lot in practice. |
I thought about it but this is not how And as you say - this is a corner case that actually could be inconvenient in some cases. |
An example of corruption behavior:
|
You mean it leaves the destination half-modified in case of errors? That's also the case for |
But the same applies to Here you have a branch with "the safe way" (but slow at this time) approach: https://github.com/bkamins/CategoricalArrays.jl/tree/better_copyto. If we want to go the fast way actually what we currently have can be speeded up I think as we can avoid some checks that are done now (also currently there is a bug in current implementation with checking for |
Also note that in the old
but
|
Yes, But simplifying |
Let us concentrate on I am not 100% sure what target contract you wan to this function to follow (in the parenthesis I write how I understand your intention):
If this is your intention I will try to implement it. |
Sounds right! |
Slowly moving forward (the corner cases are a headache).Regarding point 5 above I want the following:
to work instead of throwing error. This will require to dynamically determine which levels in |
Also I find this "fast path" problematic:
as user might expect that old levels of (we can discuss all this when I push a PR with the proposed changes) |
Yes, that's not ideal. IIRC that's because I couldn't use |
Actually I have accidentally written As for |
Also |
I hadn't noticed your first comment. I don't think we can do anything about pool overflow during Indeed we can't drop levels for views, but does that imply we shouldn't drop levels for plain categorical arrays? Not sure. Anyway |
I have made a PR so we can continue a discussion there (when it passes tests :)). It introduces around 10% performance decrease, due to:
The rule for unordered means that we solve problem:
if we actually do not need to add levels even if Also I think that we should make Thank you for discussing this. |
We do not have a way to append subarray to an array.
@nalimilan I do not want to mess up with a hasty PR, because a more thought of redesign of SubArray handling should be done, and you know the package best.
This request is pretty urgent to decide what to do and tag a new release, as without it we will not be able to change
getindex
in DataFrames.jl to return views in casesdf[colind]
(now it returns a copy so all worked). This, in consequence, has a major impact ongroupby
-related functions.The text was updated successfully, but these errors were encountered: