-
-
Notifications
You must be signed in to change notification settings - Fork 18.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DataFrame and DataMatrix column ordering #12
Comments
You make a good point, and I think it might be time well-invested to have a set ordering in DataFrame for the reason that you mention. Otherwise you have this inconsistent behavior. |
I refactored DataFrame to have a set column ordering. A few more changes might be needed for 100% support but the basic use cases (e.g. what you listed above) work and are consistent with DataMatrix now. All the unit tests pass-- but there still might be some "bugs" (inconsistencies) that I will locate over the next few weeks or so. |
Create a logo for arctic
…ster * commit '145f4e5ea86fbe88c4df0a3e22235ed94efc3226': Fix tz comment in list_versions
type ignore common.py
First, thank you for the pandas package -- it's incredibly useful and well done.
I know that one of the fundamental concepts behind the data structures is that column ordering doesn't matter. And, as long as one only uses pandas' data access/manipulation functions (eg, sum(), ewma(), etc.), this works fine. But often, it's useful to access the underling values in a numpy array for some more complicated data manipulation. Using the values attribute (or values() method for a series) does this, but it's not always obvious what order the values come back in.
For example:
DataMatrix seems to respect the passed in ordering of columns, while DataFrame does not. I know this is documented, and not the biggest deal in the world, but does seem to cause quite a bit of confusion for some. Is it possible to have both data types keep the ordering that's passed in? If a user passes in the same column name twice, could this just throw an exception? Something stills need to be done when an operation is performed on two DataFrames (eg, combining them), but instead of reordering in alphabetical order, how about preserving the column ordering from left to right?
Anyway, my bigger concern is actually the following:
Regardless of the ordering of the columns after creating a DataFrame/Matrix, a naive users (ie, me) would expect calling reindex and values would return an ndarray with the columns in the same order as was requested. But it looks like this only happens for DataMatrixes (and I'm not even sure that's always guaranteed).
The text was updated successfully, but these errors were encountered: