-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
flatten_df is too slow #22
Comments
https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html
|
What if instead of concatenating each time we keep new columns somewhere (in memory\disk) and concatenate in the end? |
I did several experiments with different versions. The latest one results:
The notebook is placed in: /shared/qa/Experiments/flat.ipynb |
@ivankivanov It might be a good idea to generate a df for testing so we can share notebooks here. The idea looks doable, basically we replace recursion with a loop. |
This is the pull request: Notebook: shared/qa/Experiments/flattenFinal.ipynb |
sync with master current code - 58.9 sec - full |
Can it be rewritted to not use recursion?
If not, profile and see how to improve. To test, use jobs with nested fields and a considerable amount of items.
Add tests to check the speed
The text was updated successfully, but these errors were encountered: