-
-
Notifications
You must be signed in to change notification settings - Fork 18.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP - Bool Extension Array #22226
WIP - Bool Extension Array #22226
Conversation
Is the idea to make this a bit-per-entry? If not, I'm not clear on what the benefit of this is. |
No this would be using int8 underneath - I don't think a bit-per-entry is possible since that's not an addressable unit. Benefit would be to give users an easy way to cast to and store boolean data with the same masking technique that we are using for integers to denote missing data, albeit the actual implementation underneath uses int8. I figure that would be easier than completely reimplementing this with a dedicated bool subtype, though I'm also looking for feedback on that front |
haven’t looked at the impl but the clear win for this is efficient boolean arrays with missing values right now you get nice boolean arrays but as soon as you have a NaN you coerce to object (or worse to float) |
BoolNA makes sense to me, thanks for clarifying.
Yah this would takes some behind-the-scenes trickery. Something like a length-N bool array being backed by a len-N/8 int8 array. |
i made an issue about using bitarray as an impl detail for integer NA; would obviously be useful here as well (so this would then be really cheap from a memory perspective) |
|
||
@cache_readonly | ||
def is_unsigned_integer(self): | ||
return False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For what it's worth, you should be able to get is_signed_integer
and is_unsigned_integer
for free without needing to override since np.dtype(np.bool).kind
returns 'b'
. This does save performing a single comparison and is more explicit though.
agree this is a nice idea and we should do it but closing as stale |
If this ends up being possible, you may want to also provide it as an answer to this StackOverflow question. |
This is nowhere near complete as I have a ton of broken tests that need to be resolved, but theoretically progress towards #21778 as I get my feet wet with EAs.
My thought here was to leverage the masking operations used by the Integer EAs to implement an easy Boolean EA on top of that. I've essentially copied over all of the integer tests as well, though someone may have thoughts on a better way to structure all of this.
Any and all direction greatly appreciated