-
-
Notifications
You must be signed in to change notification settings - Fork 18.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ArrowStringArray] API: StringDtype parameterized by storage (python or pyarrow) #39908
Merged
jorisvandenbossche
merged 67 commits into
pandas-dev:master
from
simonjayhawkins:arrow-string-array-dtype
Jun 8, 2021
Merged
Changes from all commits
Commits
Show all changes
67 commits
Select commit
Hold shift + click to select a range
4cb60e6
Implement BaseDtypeTests for ArrowStringDtype
xhochy d242f2d
Refactor to use parametrized StringDtype
TomAugspurger d39ab2c
Merge remote-tracking branch 'upstream/master' into arrow-string-arra…
simonjayhawkins 2367810
abs-imports
simonjayhawkins 9166d3b
post merge fixup
simonjayhawkins 8760705
StringDtype[python] -> string[python]
simonjayhawkins d5b3fec
Merge remote-tracking branch 'upstream/master' into arrow-string-arra…
simonjayhawkins 2c657df
pre-commit fix for inconsistent use of pandas namespace
simonjayhawkins 647a6c2
fix typo
simonjayhawkins 0596fd7
pre-commit fixup - undefined name 'ArrowStringDtype'
simonjayhawkins c5a19c5
Merge remote-tracking branch 'upstream/master' into arrow-string-arra…
simonjayhawkins 99680c9
Merge remote-tracking branch 'upstream/master' into arrow-string-arra…
simonjayhawkins 69a6cc1
"StringDtype[storage]" -> "string[storage]" misc
simonjayhawkins bd147ba
__from_arrow__
simonjayhawkins 830275f
more testing (wip)
simonjayhawkins 214e524
fix inference
simonjayhawkins c9ba03c
Merge remote-tracking branch 'upstream/master' into arrow-string-arra…
simonjayhawkins 7425536
Merge remote-tracking branch 'upstream/master' into arrow-string-arra…
simonjayhawkins 68ac391
Merge remote-tracking branch 'upstream/master' into arrow-string-arra…
simonjayhawkins 5cfa97a
post-merge fixup
simonjayhawkins 74dbf96
remove changes to test_string_dtype - broken off in #40725
simonjayhawkins 3985943
Merge remote-tracking branch 'upstream/master' into arrow-string-arra…
simonjayhawkins 3bda421
post merge fix-up
simonjayhawkins 0c108a4
Merge remote-tracking branch 'upstream/master' into arrow-string-arra…
simonjayhawkins 523e24c
post merge fix-up
simonjayhawkins 279624c
revert some changes made for pre-commit checks.
simonjayhawkins 80d231e
Merge remote-tracking branch 'upstream/master' into arrow-string-arra…
simonjayhawkins c5ced5a
post merge fix-up
simonjayhawkins 459812c
undo unrelated changes
simonjayhawkins d707b6b
undo changes to imports
simonjayhawkins 71ccf24
Merge remote-tracking branch 'upstream/master' into arrow-string-arra…
simonjayhawkins daaac06
StringDtype.construct_array_type - add ref to issue
simonjayhawkins 46626d1
Merge remote-tracking branch 'upstream/master' into arrow-string-arra…
simonjayhawkins 3677bfa
Merge remote-tracking branch 'upstream/master' into arrow-string-arra…
simonjayhawkins 42d382f
post merge fixup
simonjayhawkins 4fb1a0d
add draft release note
simonjayhawkins 5d4eac1
Merge remote-tracking branch 'upstream/master' into arrow-string-arra…
simonjayhawkins 15efb2e
post merge fix-up
simonjayhawkins b53cfe0
docstrings
simonjayhawkins b7db53f
benchmarks
simonjayhawkins 3399f08
pyarrow min
simonjayhawkins e365f01
Merge remote-tracking branch 'upstream/master' into arrow-string-arra…
simonjayhawkins 71d1e6c
post merge fixup
simonjayhawkins 9e23c35
misc clean
simonjayhawkins c69a611
Merge remote-tracking branch 'upstream/master' into arrow-string-arra…
simonjayhawkins 64b3206
update construct_from_string docstring
simonjayhawkins d83a4ff
update whatsnew for dtype="string"
simonjayhawkins ef38660
Merge remote-tracking branch 'upstream/master' into arrow-string-arra…
simonjayhawkins aef1162
update release note
simonjayhawkins 6247a5b
paramertize test for df.convert_dtypes()
simonjayhawkins a6d066c
fixup pd.array and more testing of string_storage option
simonjayhawkins 8adb08d
use string_storage fixture more
simonjayhawkins 3ad0638
Merge remote-tracking branch 'upstream/master' into arrow-string-arra…
simonjayhawkins 56714c9
post merge fixup
simonjayhawkins 6a1cc2b
Merge remote-tracking branch 'upstream/master' into arrow-string-arra…
simonjayhawkins 1761a84
remove accessor methods section from release note
simonjayhawkins 3e26baa
consistent dtype naming in benchmark
simonjayhawkins 6b470b1
Apply suggestions from code review
simonjayhawkins 2ec6de0
name and str() change to "string"
simonjayhawkins a0b7a70
remove testing of sting dtype without storage specified.
simonjayhawkins d9dcd20
update StringDtype docstring
simonjayhawkins 4a37470
add ArrowStringArray to pd.arrays namespace
simonjayhawkins 1d59c7a
add common base class, BaseStringArray
simonjayhawkins e57c850
Merge remote-tracking branch 'upstream/master' into arrow-string-arra…
simonjayhawkins 51f1b1d
fixup roundtrip tests
simonjayhawkins fc95c06
Merge remote-tracking branch 'upstream/master' into arrow-string-arra…
simonjayhawkins ef02a43
remove link
simonjayhawkins File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i guess should be consistent about using
string[python]
to be more explicit (rather than 'string'). i think its worth it in benchmarks for example. (and you do it on other benchmarks)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the "string" used here denotes an object Index. These are not dtypes, but dictionary keys. There is no benchmark for StringArray.factorize
The last 3 are benchmarking arrays, all the others are Indexes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could maybe call it ArrowStringArray and rename the others for clarity. (does that affect the asv history?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure about asv history, but nbd.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
string
->object
in 3e26baa