Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PerformanceWarning: DataFrame is highly fragmented #340

Open
mikeperalta1 opened this issue Jul 13, 2021 · 6 comments
Open

PerformanceWarning: DataFrame is highly fragmented #340

mikeperalta1 opened this issue Jul 13, 2021 · 6 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@mikeperalta1
Copy link

mikeperalta1 commented Jul 13, 2021

Which version are you running? The lastest version is on Github. Pip is for major releases.
0.3.2b0

Upgrade.
I appear to be running the latest (?)

Describe the bug
When adding technical indicators to an existing data frame, I receive the following warning:

PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider using pd.concat instead.  To get a de-fragmented frame, use `newframe = frame.copy()`

These warnings are reduced when I implement the suggestion of newframe = frame.copy(), but not eliminated. I believe fragmentation is happening during pandas-ta calls, but I can't tell for sure because there is no trace.

To Reproduce

I start with a DataFrame populated with just the right columns for pandas-ta. Then add technical indicators like so:

for n in self.__SMA_INTERVALS:
	
	key = "SMA_{}".format(n)
	
	df[key] = ta.sma(df["Close"], length=n)
	df[key] = df[key].fillna(0)
	
	df = df.copy()

I'm not sure whether the warnings come from the SMA indicator. I'm adding SMA, EMA, MACD, STOCH, RSI, ADX, CCI, AROON, BBANDS, the ta.cmf one (Chaikin's AD??), and OBV

Expected behavior
No warnings or excessive fragmentation during the computations.

Screenshots
N/A

Additional context
pandas: 1.3.0
pandas-ta: 0.3.2b0
python: 3.9.6
numpy: 1.21.0

Thanks for using Pandas TA!

@mikeperalta1 mikeperalta1 added the bug Something isn't working label Jul 13, 2021
@twopirllc twopirllc added enhancement New feature or request help wanted Extra attention is needed and removed bug Something isn't working labels Jul 13, 2021
@twopirllc
Copy link
Owner

twopirllc commented Jul 13, 2021

Hello @mikeperalta1,

I am aware of the PerformanceWarning (PW), but it's just a warning and not a bug. It was never a warning in prior versions of Pandas. So you can downgrade to an earlier Pandas version, or you can suppress the warning. In fact the current development version does suppress the PW until a permanent fix is in place.

from warnings import simplefilter
simplefilter(action="ignore", category=pd.errors.PerformanceWarning)

It occurs when you are appending a Pandas DataFrame to an existing Pandas DataFrame. They recommend using pd.concat() to quickly combine two or more DataFrames. It is not because of an internal calculation of an indicator.

In Pandas TA, it occurs in the internal method _append of the main Pandas TA DataFrame extension class when trying to append the resultant (aroon, bbands, et al) DataFrame to current DataFrame and almost never when appending a resultant (sma, rsi, et al) Series to the current DataFrame.

@pd.api.extensions.register_dataframe_accessor("ta")
class AnalysisIndicators(BasePandasObject):
    # ...
    def _append(self, result=None, **kwargs) -> None:
        # ...

Furthermore, I have already tried to use pd.concat() and frame.copy() in the _append() method to make the PW disappear to no avail. In fact, it would not append the resultant DataFrame to the current DataFrame as expected like it currently does in the for loop:

pandas-ta/pandas_ta/core.py

Lines 416 to 418 in 1deb755

else:
for i, column in enumerate(result.columns):
df[column] = result.iloc[:, i]
:

which is generating this PW. Now this is either a Pandas bug or I am coding pd.concat incorrectly. 🤷🏼‍♂️ I am open to contributions to fix this Issue as it obviously a concern. 😎

Hope this helps!

Kind Regards,
KJ

@twopirllc twopirllc removed their assignment Jul 13, 2021
@mikeperalta1
Copy link
Author

@twopirllc Thank you for that thorough explanation! I suppose for now I will simply mute the warning.

@LucianPopaLVP
Copy link

any fix to this issue? i have the same problem :(

@twopirllc
Copy link
Owner

@LucianPopaLVP,

Which Pandas TA version are you running? It has already been silenced in version 0.3.14b. If you are concating yourself, then you need to apply the simplefilter for your situation.

As you know, it's under Pandas care so stay abreast with the discussion in Pandas Issue 42477

Kind Regards,
KJ

@ClaoLinda
Copy link

ClaoLinda commented Apr 6, 2022

What if you are not even using the "import pandas as pd" and you still get the error?

@SkyisjustTheBeginning
Copy link

SkyisjustTheBeginning commented May 6, 2022

Well , If you dont import pandas as pd , You could try this -

from warnings import simplefilter
simplefilter(action="ignore", category=pd.errors.PerformanceWarning)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

5 participants