Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak in latest FW (15.4.0) #2823

Closed
Henkeh opened this issue Jan 24, 2024 · 32 comments
Closed

Memory leak in latest FW (15.4.0) #2823

Henkeh opened this issue Jan 24, 2024 · 32 comments
Labels
bug Something isn't working

Comments

@Henkeh
Copy link

Henkeh commented Jan 24, 2024

The Problem

After upgrading to the latest FW (15.4.0) I noticed some readout issues after some time running it. This could possibly be related to #2811.

When I did some digging on my own to investigate what was going on, I noticed that the amount of free memory was stadily decreasing. This can be seen in the attached image, which is a screenshot from my HA history regarding the "Free Memory" propery of the sensor that is exposed of MQTT. Here one can clearly see that the "Free Memory" shows a steady decline over time until it reaches a certain minimum and either crashes and reboots or the FW decides to reboot (unsure if that is implemented; did not dig into this yet).

The FW was installed during the Christmas holidays (the log suggest on the evening of 25th of December around 10:00 PM as that is when this undesired behavior starts). Before the upgrade the free memory was more or less stable over long periods of time.

Version

15.4.0

Logfile

https://www.dropbox.com/scl/fi/2dx9236oqlw3r1d8vjv1g/MemoryIssueLogs.7z?rlkey=z0anabue0rf0fahgvwgh1heno&dl=0

Expected Behavior

"Free Memory" should stay stable over time. This now looks like a memory leak somewhere to me.

Screenshots

image

Additional Context

No response

@Henkeh Henkeh added the bug Something isn't working label Jan 24, 2024
@Hootie81
Copy link

Noticing the same thing with mine. I've also had 2 corrupt SD cards and I'm wondering if there is something in that too
image
I replaced the corrupt SD card with a different one on the night of the 15th when i noticed there was no watermeter reading.
i also loaded a new tflite file on the early morning of the 16th and notice a small jump in free mem after. but at midnight the meter went offline.
today is the first chance I've had to look at it.. the SD card is corrupt again

@nliaudat
Copy link
Contributor

I confirm memory leak on my 2 devices :
image
image

@friedpa
Copy link

friedpa commented Jan 26, 2024

I also confirm. I was wondering since a couple of weeks, why the two ESPs are freezing periodically. After your post I did also turn on the recording of he free memory value and I can see the same behaviour.

@nliaudat
Copy link
Contributor

nliaudat commented Jan 26, 2024

We have to confirm that's the memory leak was introduced with v15.4.0 (Commit: 74d4f20),

Can someone graph the memory with older release < 15.4.0 ?

It's crucial to find the cause.

I created a discussions to find more people sharing their graph

#2829

@nliaudat
Copy link
Contributor

Can someone test the https://github.com/VioletGiraffe/cppcheck-vs-addin plugin to find memory leak ?
I have no more a dev env - sry

@Henkeh
Copy link
Author

Henkeh commented Jan 27, 2024

We have to confirm that's the memory leak was introduced with v15.4.0 (Commit: 74d4f20),

Can someone graph the memory with older release < 15.4.0 ?

It's crucial to find the cause.

I created a discussions to find more people sharing their graph

#2829

The first part of the graph , before 26th of December, shows exactly this. I was on the previous FW version then (15.3.0), since I always keep it up-to-date.

Or are you looking for something else?

@nliaudat
Copy link
Contributor

The first part of the graph , before 26th of December, shows exactly this. I was on the previous FW version then (15.3.0), since I always keep it up-to-date.

Or are you looking for something else?

This is exactly the proof that it was introduced in 15.4.0 release.
Thanks.

Now need to review the code changes.
Regards

@nliaudat
Copy link
Contributor

v15.3.0...v15.4.0

The mains changes are in submodules :

code/components/esp-nn
code/components/esp32-camera

and in new MakeStaticResolver() function in code/components/jomjol_tfliteclass/CTfLiteClass.cpp

@spanzetta
Copy link

Same problem here.. frequent reboot after few hours (due to memory leak)
I suspect (but not 100% sure) that the problem started when I enabled MQTT..

@friedpa
Copy link

friedpa commented Jan 28, 2024

See picture below:
grafik
Decreasing Free Memory of 15.4.0, stable Free Memory after OTA to 15.3.0

@caco3
Copy link
Collaborator

caco3 commented Jan 28, 2024

I can confirm that there are issues with 15.4!
As a workaround, until we can find and solve it, simply revert to the 15.3 release.

@caco3 caco3 pinned this issue Jan 28, 2024
@caco3
Copy link
Collaborator

caco3 commented Jan 28, 2024

I once wrote a memory usage visualizer: https://github.com/caco3/psram-usage-visualizer

I will see if i can find something...

@caco3
Copy link
Collaborator

caco3 commented Jan 28, 2024

Between 15.3 and 15.4 we updated 3 ESP modules.

I tried to revert them, but it does not build anymore.
This would need deeper investigation but I don't have the time atm, sry.

@caco3
Copy link
Collaborator

caco3 commented Jan 29, 2024

Attempt to find leak: #2837

@MihaiKrieger
Copy link

I have not upgraded to 15.4. I am still on 15.3 and this is how my graph looks like.

image

@JanHBade
Copy link

JanHBade commented Feb 1, 2024

My 15.4 is running for 6d without a restart... so i have not a memory leak ?!

grafik

Freemem is about 698683 (read via MQTT)

@Hootie81
Copy link

Hootie81 commented Feb 1, 2024

@JanHBade thats interesting, the longest mine has been up for is just over 2days.
interestingly though my memory is 729319 right before the reboot, and 772655 after.

i have 8 digital and 1 analog digits, i wonder if that makes a difference to the reboot?

@spanzetta
Copy link

This is my memory situation with 15.4
I have 4 digital and 3 analog

image

@matze338
Copy link

matze338 commented Feb 2, 2024

Also with me. The internal memory decreases hour by hour until there is no more space. Then the ESP restarts.
IMG_20240202_051717

@caco3
Copy link
Collaborator

caco3 commented Feb 2, 2024

The memory leak will be fixed with #2842

You can already use the image from https://github.com/jomjol/AI-on-the-edge-device/actions/runs/7731455511?pr=2842 or wait until it is in rolling or until we created the next release.

My 15.4 is running for 6d without a restart... so i have not a memory leak ?!

It depends on your configuration. I also have one device with only digits where the leak did not occur.

@friedpa
Copy link

friedpa commented Feb 2, 2024

The above picture is taken after loading the latest rolling:

grafik

@caco3
Copy link
Collaborator

caco3 commented Feb 2, 2024

because it is not in rolling yet! see my post!

@caco3
Copy link
Collaborator

caco3 commented Feb 2, 2024

There is a new release 15.5.0 which fixes this.

@caco3 caco3 closed this as completed Feb 2, 2024
@Henkeh
Copy link
Author

Henkeh commented Feb 2, 2024

It seems to work indeed. Tnx for the quick fix!

@caco3 caco3 unpinned this issue Feb 3, 2024
@spanzetta
Copy link

spanzetta commented Feb 4, 2024

This is my memory situation with 15.4 I have 4 digital and 3 analog

image

With 15.5 the situation is completelly changed..

Now it's completelly stable

image

@caco3
Copy link
Collaborator

caco3 commented Feb 4, 2024

@spanzetta Please stop spreading the wrong thing about the power usage. Your issue is an insufficient power supply! Else I wonder why you are the only one with this issue!

@spanzetta
Copy link

Removed comment about power supply

@spanzetta
Copy link

@spanzetta Please stop spreading the wrong thing about the power usage. Your issue is an insufficient power supply! Else I wonder why you are the only one with this issue!

Apparently it's not only me..
#2863

@caco3
Copy link
Collaborator

caco3 commented Feb 4, 2024

Apparently it's not only me..
#2863

Please point me to the comment there regarding power issues!

@friedpa
Copy link

friedpa commented Feb 4, 2024

@spanzetta Hi Stefano, please don´t make your problems with this Open Source SW to ours. You are spreading wrong facts out because your skills unfortunately is not guiding you in the right direction. We are glad to support anyone with issues here, but please don´t go on our nerves. If you have something to report, collect all the facts and report, as it is common in a Bug Fix forum. If you don´t have facts please report to the comment forum, maybe someone can help you there.

@spanzetta
Copy link

@spanzetta Hi Stefano, please don´t make your problems with this Open Source SW to ours. You are spreading wrong facts out because your skills unfortunately is not guiding you in the right direction. We are glad to support anyone with issues here, but please don´t go on our nerves. If you have something to report, collect all the facts and report, as it is common in a Bug Fix forum. If you don´t have facts please report to the comment forum, maybe someone can help you there.

Ok I will stop reporting my experience..
I thought that sharing experience was/could be useful ..

@nliaudat
Copy link
Contributor

nliaudat commented Feb 5, 2024

@spanzetta
Feel free to open a discussion and I'll try to help you.
Regards

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

9 participants