Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Potential off-heap memory leak on Linux #147

Closed
jypma opened this issue Nov 3, 2023 · 15 comments · Fixed by #150, #133 or sblantipodi/glow_worm_luciferin#53
Closed

[Bug]: Potential off-heap memory leak on Linux #147

jypma opened this issue Nov 3, 2023 · 15 comments · Fixed by #150, #133 or sblantipodi/glow_worm_luciferin#53
Assignees
Labels
bug Something isn't working

Comments

@jypma
Copy link
Contributor

jypma commented Nov 3, 2023

Firefly Luciferin version

2.12.5

Glow Worm Luciferin version

5.11.8

Firmware type

FULL

What is the stream method?

MQTT Stream

Fiefly Luciferin config file

mqttStream: true
wifiEnable: true
serialPort: "AUTO"
baudRate: "115200"
extendedLog: "INFO"
audioChannels: "2 channels"
audioDevice: "Default audio output (Native)"
audioLoopbackGain: 0.0
autoDetectBlackBars: true
bottomLeftLed: 27
bottomRightLed: 27
bottomRowLed: 26
brightness: 255
brightnessLimiter: 1.0
captureMethod: "XIMAGESRC"
checkForUpdates: false
colorChooser: "255,82,0,255"
colorMode: 1
configVersion: "2.12.5"
defaultLedMatrix: "FullScreen"
desiredFramerate: "30"
effect: "Bias light"
enableLDR: false
eyeCare: false
frameInsertion: "No smoothing"
gamma: 2.2
gapTypeSide: "0%"
gapTypeTopBottom: "8%"
grabberAreaTopBottom: "8%"
grabberSide: "8%"
groupBy: 1
language: "English"
ldrInterval: 0
ldrMin: 0
ldrTurnOff: false
ledStartOffset: 63
leftLed: 36
monitorNumber: 0
mqttDiscoveryTopic: "homeassistant"
mqttEnable: true
mqttServer: "tcp://192.168.0.144:1883"
mqttTopic: "glowwormluciferin"
mqttUsername: "firefly"
multiMonitor: 1
multiScreenSingleDevice: false
nightModeBrightness: "0%"
nightModeFrom: "22:00"
nightModeTo: "08:00"
numberOfCPUThreads: 1
orientation: "Clockwise"
osScaling: 100
powerSaving: "Disabled"
rightLed: 36
sampleRate: 0
screenResX: 1280
screenResY: 720
powerSaving: "Disabled"
rightLed: 36
sampleRate: 0
screenResX: 1280
screenResY: 720
splitBottomMargin: "15%"
splitBottomRow: true
startWithSystem: true
streamType: "MQTT stream"
syncCheck: true
theme: "Classic theme"
threadPriority: "HIGH"
timeout: 100
toggleLed: true
topLed: 54
whiteTemperature: 70

Relevant log output

Nothing that stands out.

How to reproduce

Start the video grabber, everything with default options, except for streaming through MQTT.

I've manually started the application using

java -Xmx256M -XX:MaxMetaspaceSize=128M -Djpackage.app-version=2.12.5 -Djpackage.app-path=/opt/fireflyluciferin/bin/FireflyLuciferin -jar  /opt/fireflyluciferin/lib/app/FireflyLuciferin-jar-with-dependencies.jar

(on JDK21), to rule out heap usage. The heap and metaspace both look fine, but the RES memory usage of the process increases with ~100MB every few seconds. I've looked at some hints to diagnose, but nothing stands out there. NIO ByteBuffers aren't much in use. I did in pmap find a lot of the following entries:

Address           Kbytes     RSS   Dirty Mode  Mapping
00007fbdb0000000   65536   65536   65536 rw---   [ anon ]
00007fbdb8000000   65536   65536   65536 rw---   [ anon ]
00007fbdc0000000   65536   65536   65536 rw---   [ anon ]
00007fbdc8000000   65536   65536   65536 rw---   [ anon ]
00007fbdd0000000   65412   65412   65412 rw---   [ anon ]

So something is allocating memory in 64MB chunks (and it's probably not heap).

That's as far as I got...any hints? I'm unsure which version introduced this, it's been a while since I've attempted to upgrade :-)

@jypma jypma added the bug Something isn't working label Nov 3, 2023
@sblantipodi
Copy link
Owner

@jypma
can you try the latest beta please?
https://github.com/sblantipodi/firefly_luciferin/wiki/How-to-install-a-BETA

What Linux distro are you using? KDE or GNOME or whatelse?

@jypma
Copy link
Contributor Author

jypma commented Nov 4, 2023

I'll have a go with the beta, thanks!

I'm using Arch linux. Using neither KDE or GNOME, just i3 as window manager on plain X11.

@jypma
Copy link
Contributor Author

jypma commented Nov 4, 2023

I've tried with a beta version, and I'm still seeing the same behavior. The memory usage increases with about 2MB per second when the streaming is active.

@sblantipodi
Copy link
Owner

@jypma
I was able to reproduce the issue, I'll keep you posted. Thanks for reporting 👍

@sblantipodi
Copy link
Owner

@jypma I correct myself. I'm not able to reproduce it... memory usage tops out a bit higher than XMX since Luciferin uses "off heap" memory due to native calls to the OS using JNA but then stops growing...

Using -Xmx256M is not recommended since it's not enough for all the "capture, post process"...

Using the default Xmx1024 my RAM usage tops out at 1.1GiB.

Have you tried waiting to see what is the max memory used?
What tool you use to monitor the RAM usage?

@jypma
Copy link
Contributor Author

jypma commented Nov 6, 2023

I'm monitoring the heap usage using jconsole, and the process usage using top. With -Xmx256, GC seems to be keeping up. Note that it's not the heap usage that's increasing, it's non-heap usage (memory allocated by the Java process, but not heap). So while I'm running the application:

  • Heap usage stays under 100MB, GC is working fine (observed through jconsole).
  • Java process memory usage increases with 2MB per second (observed through top).
  • If I leave it running, it'll eat all memory on the machine (8GB in this case), but even linux's OOM killer doesn't shoot it down in time, so the computer just stalls.

Do you know which native libraries might be in use that stream the video frames into Java? Perhaps it's something outside of firefly that has the leak.

I'll also revert to 2.9.2 (the last version I was running before the upgrade), to see what behavior we get there.

@sblantipodi
Copy link
Owner

@jypma GStreamer is a lib that uses native APIs, which version are you using?

you can check it via

gst-launch-1.0 --version

GStreamer is not bundled in the Luciferin's deb or rpm package so this depends on your installation.

Reverting back to a previous version of GStreamer may worth a try but be sure to correctly use the package manager in order to not "compromise" your installation.

@jypma
Copy link
Contributor Author

jypma commented Nov 6, 2023

It's currently

gst-launch-1.0 version 1.22.6
GStreamer 1.22.6

and I seem to have upgraded from gstreamer-1.20.3 during the upgrade. I'll try those permutations as well, once I have some time :)

@sblantipodi
Copy link
Owner

ok please keep me posted @jypma :)

@jypma
Copy link
Contributor Author

jypma commented Nov 8, 2023

Here's what I've been able to test so far.

Downgrade to Luciferin 2.9.2

  • Comes up fine, streams to the LEDs for a few seconds, then prints lots of MQTTManager - Can't send MQTT msg and crashes. Memory usage does not seem to rise in the same way as 2.12.5 does in those seconds, though.

Back on 2.12.5

  • I now get a message on startup that Glow Worm has a firmware update available. It doesn't (it's on 5.11.8).
  • I tried to switch from MQTT to UDP streaming. Both work, but both leak memory in the same way.
  • I did notice a lot of occurrences of the following exception in the logs:
    On 2.15.5 proper:
WARNING: JNA: Callback org.freedesktop.gstreamer.elements.AppSink$2@705b6b8c threw the following exception
java.lang.IndexOutOfBoundsException: Index -634 out of bounds for length 57600
	at java.base/jdk.internal.util.Preconditions.outOfBounds(Unknown Source)
	at java.base/jdk.internal.util.Preconditions.outOfBoundsCheckIndex(Unknown Source)
	at java.base/jdk.internal.util.Preconditions.checkIndex(Unknown Source)
	at java.base/java.util.Objects.checkIndex(Unknown Source)
	at java.base/java.nio.Buffer.checkIndex(Unknown Source)
	at java.base/java.nio.DirectIntBufferU.get(Unknown Source)
	at org.dpsoftware.grabber.ImageProcessor.calculateBlackPixels(ImageProcessor.java:282)
	at org.dpsoftware.grabber.ImageProcessor.autodetectBlackBars(ImageProcessor.java:225)
	at org.dpsoftware.grabber.GStreamerGrabber$AppSinkListener.rgbFrame(GStreamerGrabber.java:188)
	at org.dpsoftware.grabber.GStreamerGrabber$AppSinkListener.newSample(GStreamerGrabber.java:322)
	at org.freedesktop.gstreamer.elements.AppSink$2.callback(AppSink.java:232)
	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(Unknown Source)
	at java.base/java.lang.reflect.Method.invoke(Unknown Source)
	at com.sun.jna.CallbackReference$DefaultCallbackProxy.invokeCallback(CallbackReference.java:585)
	at com.sun.jna.CallbackReference$DefaultCallbackProxy.callback(CallbackReference.java:616)

and on the latest beta (slightly different):

WARNING: JNA: Callback org.freedesktop.gstreamer.elements.AppSink$2@157b24bd threw the following exception
java.lang.IndexOutOfBoundsException
	at java.base/java.nio.Buffer$1.apply(Buffer.java:757)
	at java.base/java.nio.Buffer$1.apply(Buffer.java:754)
	at java.base/jdk.internal.util.Preconditions$4.apply(Preconditions.java:213)
	at java.base/jdk.internal.util.Preconditions$4.apply(Preconditions.java:210)
	at java.base/jdk.internal.util.Preconditions.outOfBounds(Preconditions.java:98)
	at java.base/jdk.internal.util.Preconditions.outOfBoundsCheckIndex(Preconditions.java:106)
	at java.base/jdk.internal.util.Preconditions.checkIndex(Preconditions.java:302)
	at java.base/java.nio.Buffer.checkIndex(Buffer.java:768)
	at java.base/java.nio.DirectIntBufferU.get(DirectIntBufferU.java:358)
	at org.dpsoftware.grabber.ImageProcessor.calculateBlackPixels(ImageProcessor.java:290)
	at org.dpsoftware.grabber.ImageProcessor.autodetectBlackBars(ImageProcessor.java:233)
	at org.dpsoftware.grabber.GStreamerGrabber$AppSinkListener.rgbFrame(GStreamerGrabber.java:187)
	at org.dpsoftware.grabber.GStreamerGrabber$AppSinkListener.newSample(GStreamerGrabber.java:323)
	at org.freedesktop.gstreamer.elements.AppSink$2.callback(AppSink.java:232)
	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
	at java.base/java.lang.reflect.Method.invoke(Method.java:580)
	at com.sun.jna.CallbackReference$DefaultCallbackProxy.invokeCallback(CallbackReference.java:585)
	at com.sun.jna.CallbackReference$DefaultCallbackProxy.callback(CallbackReference.java:616)

These are continually occurring, possibly on every frame. I didn't pay attention to them before, figuring it might be a different issue. Don't know JNA that well, but an exception inside a callback there might be reason for some memory to disappear.

Now that you mention gstreamer, it might be related? Still, it's strange... looking at ImageProcessor, it sensibly assumes that IntBuffer to have width * height size.

Downgrade gstreamer from 1.22.6 to 1.20.3

  • No effect, streaming works, but has the same memory leak.

Concluding

I think the above exception is the biggest smoking gun.

@sblantipodi
Copy link
Owner

mmm... this is interesting...
but does LEDs works correctly when the exception occurs?

can you attach the complete FireflyLuciferin.yaml you are using please?
please use the latest beta with the latest firmware,
save settings just to overwrite the configuration file before sending me it.

@jypma
Copy link
Contributor Author

jypma commented Nov 8, 2023

The LEDs work fine when the exception occurs. Which is strange 😄

Current config is here: FireflyLuciferin.yaml.gz
(I actually have 180 LEDs but I haven't updated the config for that yet)

As mentioned above, I'm using firmware 5.11.8, which is the latest release I found on Github. Should I use a different one? I don't think it's likely that the firmware would be causing a memory leak in the client though. It's displaying LED data from both Luciferin versions just fine.

jypma added a commit to jypma/firefly_luciferin that referenced this issue Nov 9, 2023
The magic numbers in this calculation return a negative border size when
running at 1280x720 resolution. Let's return at least 0 for all cases,
while a more appropriate calculation is derived.

Fixes sblantipodi#147.
@jypma
Copy link
Contributor Author

jypma commented Nov 9, 2023

I decided to dive a little deeper, and a bit of extra logging showed offsetY becoming negative, resulting in the buffer index being negative as well. The root cause for that was calculateBorders diving below zero for low resolutions. Blame my still-not-dead low-res plasma TV :-)

PR is here: #150

@sblantipodi
Copy link
Owner

I love this kind of issue that ends with a pull request :D
thank you very much @jypma, this is much appreciated!!!

@sblantipodi
Copy link
Owner

PS: A new release is coming soon and it will obviously include your fix.

sblantipodi pushed a commit that referenced this issue Nov 26, 2023
The magic numbers in this calculation return a negative border size when
running at 1280x720 resolution. Let's return at least 0 for all cases,
while a more appropriate calculation is derived.

Fixes #147.
sblantipodi added a commit to sblantipodi/glow_worm_luciferin that referenced this issue Nov 26, 2023
- ***Breaking changes***: requires `Firefly Luciferin` (v2.13.8).   
- **Introducing the [Luciferin surround lighting with satellites](https://github.com/sblantipodi/firefly_luciferin/wiki/Surround-lighting-with-satellites). Closes [#97](sblantipodi/firefly_luciferin#97
- **[Added Wayland support for Linux.](https://github.com/sblantipodi/firefly_luciferin/wiki/Linux-support#luciferin-supports-wayland)** Thanks @h7io for the contribution to this feature. Closes [#130](sblantipodi/firefly_luciferin#130).
- **It's now possible to [**disable Glow Worm device auto discovery**](https://github.com/sblantipodi/firefly_luciferin/wiki/Static-IP-and-auto-discovery) in Firefly Luciferin PC software.** This is useful when PC and ESP lives in separate VLANs/Subnets. Closes [#132](sblantipodi/firefly_luciferin#132).
- **Added the possibility to average the color of the screen on all the LEDs**. 
- **Big performance improvements for [Linux](https://github.com/sblantipodi/firefly_luciferin/wiki/Linux-support) while running on X11.**
- **Added an optimization for [Linux users that is specific for NVIDIA GPUs](https://github.com/sblantipodi/firefly_luciferin/wiki/Linux-support#nvidia-cuda).** Thanks to @Phoshi for the support on this feature.
- **Ram usage improvements.**
- **UI/UX improvements.** Revamped title bar, one left click on tray icon now open settings. Double left click on tray icon starts/stops screen capture, right left click opens the menu as usual. (Windows only, Linux version has no tray bar).
- Added an option during the installation process to create a desktop shortcut to Firefly Luciferin (Windows only).
- Added an option during the installation process to create a start menu shortcut to Firefly Luciferin (Windows only).
- Potential off-heap memory leak on Linux. Fixed. Thanks @jypma for fixing this issue. Closes [#147](sblantipodi/firefly_luciferin#147).
- Firefly Luciferin caused a brief audio stutter on some systems during startup. Fixed. 
- Fixed sporadic crashes on ESP32-S3 devices.
- Fixed an issue that prevented Glow Worm Luciferin firmware to be [flashed using external tools like esptool](https://github.com/sblantipodi/firefly_luciferin/wiki/How-to-flash-Glow-Worm-Luciferin-firmware-via-esptool).
- Upgrade to Java 21 and JavaFX 21.
- [Arduino Bootstrapper](https://github.com/sblantipodi/arduino_bootstrapper/releases) update (v.1.15.3).
sblantipodi added a commit that referenced this issue Nov 26, 2023
- ***Breaking changes***: requires `Glow Worm Luciferin` firmware (v5.12.9).   
- **Introducing the [Luciferin surround lighting with satellites](https://github.com/sblantipodi/firefly_luciferin/wiki/Surround-lighting-with-satellites). Closes [#97](#97
- **[Added Wayland support for Linux.](https://github.com/sblantipodi/firefly_luciferin/wiki/Linux-support#luciferin-supports-wayland)** Thanks @h7io for the contribution to this feature. Closes [#130](#130).
- **It's now possible to [**disable Glow Worm device auto discovery**](https://github.com/sblantipodi/firefly_luciferin/wiki/Static-IP-and-auto-discovery) in Firefly Luciferin PC software.** This is useful when PC and ESP lives in separate VLANs/Subnets. Closes [#132](#132).
- **Added the possibility to average the color of the screen on all the LEDs**. 
- **Big performance improvements for [Linux](https://github.com/sblantipodi/firefly_luciferin/wiki/Linux-support) while running on X11.**
- **Added an optimization for [Linux users that is specific for NVIDIA GPUs](https://github.com/sblantipodi/firefly_luciferin/wiki/Linux-support#nvidia-cuda).** Thanks to @Phoshi for the support on this feature.
- **Ram usage improvements.**
- **UI/UX improvements.** Revamped title bar, one left click on tray icon now open settings. Double left click on tray icon starts/stops screen capture, right left click opens the menu as usual. (Windows only, Linux version has no tray bar).
- Added an option during the installation process to create a desktop shortcut to Firefly Luciferin (Windows only).
- Added an option during the installation process to create a start menu shortcut to Firefly Luciferin (Windows only).
- Potential off-heap memory leak on Linux. Fixed. Thanks @jypma for fixing this issue. Closes [#147](#147).
- Firefly Luciferin caused a brief audio stutter on some systems during startup. Fixed. 
- Fixed sporadic crashes on ESP32-S3 devices.
- Fixed an issue that prevented Glow Worm Luciferin firmware to be [flashed using external tools like esptool](https://github.com/sblantipodi/firefly_luciferin/wiki/How-to-flash-Glow-Worm-Luciferin-firmware-via-esptool).
- Upgrade to Java 21 and JavaFX 21.
- [Arduino Bootstrapper](https://github.com/sblantipodi/arduino_bootstrapper/releases) update (v.1.15.3).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
2 participants