Feature/write dataframe auto order df columns #1142

Kevin-Dekker · 2024-08-29T08:18:41Z

Automatically infer the column order if it deviates from the dimension order in the target cube. Case and space insensitive.

…eAndSpaceInsensitive column-dimension recognition

…AndSpaceInsensitive logic check on dimension order in combination with the new column additions.

MariusWirtz · 2024-09-03T00:52:46Z

Nice work @Kevin-Dekker! Thank you.
The implementation looks good. Just two things.

I see that you suggest doing the column ordering by default.
I worry this might impact compatibility.
Maybe introduce an optional argument that defaults to False.
What do you think about calling it infer_column_order? That should be in line with pandas terminology.

Regarding the fixed_dimension_elements. Not sure about the name TBH.
What do you think about static_dimension_elements or context_dimension_elements. Any other ideas?
I think this might be a popular feature, so we better get the name right :)

Kevin-Dekker · 2024-09-03T12:35:09Z

Added the argument to infer_column_order (default=False). Renamed fixed_dimension_elements to static_dimension_elements. If static_dimension_elements is passed infer_column_order is set to True.

Let's double check the others on the naming for static_dimension_elements.

vmitsenko · 2024-09-04T08:37:32Z

Hi both, well done!

Just a minor suggestion - we could simplify the code slightly by lowercasing the column names in df. It might look something like this:

from TM1py.Utils import lower_and_drop_spaces
...
# First, add new columns
for dim, elem in static_dimension_elements.items():
    df[dim] = elem

# Then, lowercase all column names
df.columns = df.columns.map(lower_and_drop_spaces)

# Finally, simply reorder the columns
ordered_columns = list(map(lower_and_drop_spaces, dimensions))
columns_not_in_dimensions = df.columns.difference(ordered_columns).tolist()

df = df[ordered_columns + columns_not_in_dimension ]

MariusWirtz · 2024-09-30T09:35:45Z

@Kevin-Dekker, @vmitsenko, @rclapp

Is it acceptable for us to mutate the passed data frame?

I think we need to take the performance hit, make a copy of the data frame early in the function, and then work on the copy.
Otherwise we mutate the dataframe, which the users might not expect from the function!

MariusWirtz · 2024-09-30T09:40:57Z

Hi both, well done!

Just a minor suggestion - we could simplify the code slightly by lowercasing the column names in df. It might look something like this:

from TM1py.Utils import lower_and_drop_spaces
...
# First, add new columns
for dim, elem in static_dimension_elements.items():
    df[dim] = elem

# Then, lowercase all column names
df.columns = df.columns.map(lower_and_drop_spaces)

# Finally, simply reorder the columns
ordered_columns = list(map(lower_and_drop_spaces, dimensions))
columns_not_in_dimensions = df.columns.difference(ordered_columns).tolist()

df = df[ordered_columns + columns_not_in_dimension ]

@vmitsenko
I implemented your suggestion with 960afe8

Kevin-Dekker · 2024-10-01T08:57:21Z

@Kevin-Dekker, @vmitsenko, @rclapp

Is it acceptable for us to mutate the passed data frame?

I think we need to take the performance hit, make a copy of the data frame early in the function, and then work on the copy. Otherwise we mutate the dataframe, which the users might not expect from the function!

Good point. I think we shouldn't mutate the inserted argument. People may reuse a dataframe they've sent to tm1 (for instance by aggregating the data and sending it to another cube with slightly different dimensionality).

Kevin-Dekker added 4 commits August 29, 2024 10:08

add automatic reordering of dimensions in dataframe.

0b9b223

add test for automatic reordering of dimensions in dataframe with Cas…

47020c1

…eAndSpaceInsensitive column-dimension recognition

add fixed element values option instead of passing as df. Change Case…

162d3ee

…AndSpaceInsensitive logic check on dimension order in combination with the new column additions.

add extra test for fixed elements

7462627

MariusWirtz mentioned this pull request Sep 3, 2024

Updated the Write Dataframe function to change the order as per cube dimension Order #1141

Closed

Kevin-Dekker added 3 commits September 3, 2024 13:54

rename fixed to static

cfc4f5e

add condition argument infer_column_order

8a6fa04

force infer_column_order=True if static_dimension_elements are passed

e0d61e0

Optimize infer_column_order and avoid df mutation

960afe8

MariusWirtz merged commit 50a6c34 into cubewise-code:master Oct 2, 2024

MariusWirtz mentioned this pull request Oct 3, 2024

optional fixed_elements and determine_order arguments in write_dataframe function #1140

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/write dataframe auto order df columns #1142

Feature/write dataframe auto order df columns #1142

Kevin-Dekker commented Aug 29, 2024

MariusWirtz commented Sep 3, 2024

Kevin-Dekker commented Sep 3, 2024

vmitsenko commented Sep 4, 2024

MariusWirtz commented Sep 30, 2024

MariusWirtz commented Sep 30, 2024

Kevin-Dekker commented Oct 1, 2024

Feature/write dataframe auto order df columns #1142

Feature/write dataframe auto order df columns #1142

Conversation

Kevin-Dekker commented Aug 29, 2024

MariusWirtz commented Sep 3, 2024

Kevin-Dekker commented Sep 3, 2024

vmitsenko commented Sep 4, 2024

MariusWirtz commented Sep 30, 2024

MariusWirtz commented Sep 30, 2024

Kevin-Dekker commented Oct 1, 2024