Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/write dataframe auto order df columns #1142

Conversation

Kevin-Dekker
Copy link
Collaborator

Automatically infer the column order if it deviates from the dimension order in the target cube. Case and space insensitive.

@MariusWirtz
Copy link
Collaborator

Nice work @Kevin-Dekker! Thank you.
The implementation looks good. Just two things.

I see that you suggest doing the column ordering by default.
I worry this might impact compatibility.
Maybe introduce an optional argument that defaults to False.
What do you think about calling it infer_column_order? That should be in line with pandas terminology.

Regarding the fixed_dimension_elements. Not sure about the name TBH.
What do you think about static_dimension_elements or context_dimension_elements. Any other ideas?
I think this might be a popular feature, so we better get the name right :)

@Kevin-Dekker
Copy link
Collaborator Author

Added the argument to infer_column_order (default=False). Renamed fixed_dimension_elements to static_dimension_elements. If static_dimension_elements is passed infer_column_order is set to True.

Let's double check the others on the naming for static_dimension_elements.

@vmitsenko
Copy link
Collaborator

Hi both, well done!

Just a minor suggestion - we could simplify the code slightly by lowercasing the column names in df. It might look something like this:

from TM1py.Utils import lower_and_drop_spaces
...
# First, add new columns
for dim, elem in static_dimension_elements.items():
    df[dim] = elem

# Then, lowercase all column names
df.columns = df.columns.map(lower_and_drop_spaces)

# Finally, simply reorder the columns
ordered_columns = list(map(lower_and_drop_spaces, dimensions))
columns_not_in_dimensions = df.columns.difference(ordered_columns).tolist()

df = df[ordered_columns + columns_not_in_dimension ]

@MariusWirtz
Copy link
Collaborator

@Kevin-Dekker, @vmitsenko, @rclapp

Is it acceptable for us to mutate the passed data frame?

I think we need to take the performance hit, make a copy of the data frame early in the function, and then work on the copy.
Otherwise we mutate the dataframe, which the users might not expect from the function!

@MariusWirtz
Copy link
Collaborator

Hi both, well done!

Just a minor suggestion - we could simplify the code slightly by lowercasing the column names in df. It might look something like this:

from TM1py.Utils import lower_and_drop_spaces
...
# First, add new columns
for dim, elem in static_dimension_elements.items():
    df[dim] = elem

# Then, lowercase all column names
df.columns = df.columns.map(lower_and_drop_spaces)

# Finally, simply reorder the columns
ordered_columns = list(map(lower_and_drop_spaces, dimensions))
columns_not_in_dimensions = df.columns.difference(ordered_columns).tolist()

df = df[ordered_columns + columns_not_in_dimension ]

@vmitsenko
I implemented your suggestion with 960afe8

@Kevin-Dekker
Copy link
Collaborator Author

@Kevin-Dekker, @vmitsenko, @rclapp

Is it acceptable for us to mutate the passed data frame?

I think we need to take the performance hit, make a copy of the data frame early in the function, and then work on the copy. Otherwise we mutate the dataframe, which the users might not expect from the function!

Good point. I think we shouldn't mutate the inserted argument. People may reuse a dataframe they've sent to tm1 (for instance by aggregating the data and sending it to another cube with slightly different dimensionality).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants