-
Notifications
You must be signed in to change notification settings - Fork 82
Speed up columns slices: etna.datasets.utils.select_columns
#775
Comments
etna.datasets.utils.select_columns
And here |
I'll try to explain the core of the issue. res = df.loc[:, pd.IndexSlice[:, [column_1, column_2]]] In pandas 1.1*: we will get a dataframe where at the last index In pandas 1.1.* and >= 1.2: we will get columns in order that we gave to If we make selection like: res = df.loc[:, pd.IndexSlice[segments, [column_1, column_2]]] then in both cases we get an order from loc. |
More detailed results. Imagine we have a Calling
Calling
|
🚀 Feature Request
In a lot of places we use
df.loc[:, pd.IndexSlice[segments, column]]
to selectcolumn
from all the segments. It appears to be very slow on a lot of segments.We should find places where we use it and make sure that it can be replaced with
df.loc[:, pd.IndexSlice[:, column]]
without problems.Where was some problem with the second choice: #188. We should investigate is it still existing and in which conditions:
SklearnTransform
selects many)Proposal
df.loc[:, pd.IndexSlice[segments, column]]
where column is scalar. Replace them with function (you can add itetna.datasets.utils
). Try to replace slow slice in function with fast slice:df.loc[:, pd.IndexSlice[:, column]
. Make sure that in that case we don't have reordering of columns in different pandas versions.column
(e.g.SklearnTransform
) and investigate reordering issue during testing. We want to avoid it without putting all the segments into the slice.Test cases
SklearnTransform
we had some tests on reordering, it can be useful).Additional context
No response
The text was updated successfully, but these errors were encountered: