Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

midi import crashes semi-randomly with large files #6160

Closed
RiedleroD opened this issue Sep 12, 2021 · 26 comments
Closed

midi import crashes semi-randomly with large files #6160

RiedleroD opened this issue Sep 12, 2021 · 26 comments
Labels

Comments

@RiedleroD
Copy link
Contributor

Bug Summary

The midi import can just …crash with certain midis. I think the size of the midi imports plays a role, although I haven't yet found any specific file that crashes lmms every time. lmms only prints "killed" to output and leaves no backtrace.

Steps to reproduce

  • import a large midi file (best results from 200KiB and up, although I've observed crashes from 50KiB and upwards)

Expected behavior

midi file gets imported, or, if absolutely not possible otherwise, an error message gets displayed importing is stopped

Actual behavior

midi gets imported sometimes, lmms crashes sometimes (usually after a short wait or after some of the midi has been imported)

Affected LMMS versions

At least 1.2.0 and 1.3.0-alpha.1.126+gace502f1a on Arch linux 5.10.59.52 with wayland/sway 1.6.1 and jack/pipewire 0.3.35

Notes

It's not an out-of-memory exception, according to htop (and my combined RAM size of 7.76GiB) and neither is it a problem with wayland nor pipewire since nothing shows up in journalctl - I'm not a C++ dev, so idk how to debug further. I'm absolutely willing to do whatever it takes to get this fixed though, if I'm the only one able to reproduce this.

test midi - fails to import 9/10 times

@RiedleroD RiedleroD added the bug label Sep 12, 2021
@Veratil
Copy link
Contributor

Veratil commented Sep 12, 2021

Testing with latest master, I've imported the midi 4/4 times. The fourth though I did get a freeze only when closing LMMS which wouldn't be related to the midi itself.

There are known issues with 1.2.x and MIDI, which is why it will fail to import with 1.2.x. I'd say 99% of problems with MIDI loading were fixed with an update I did for 1.3.0 though.

Additionally with the latest master there have been two or three commits fixing a few issues with JACK.

@Veratil
Copy link
Contributor

Veratil commented Sep 12, 2021

In case there's something causing LMMS to crash, you can use gdb to hopefully get a backtrace and find out where it's dying at.

$ gdb lmms
(gdb) run

This will open lmms like normal, import the MIDI and if anything crashes during that it will catch it and you can get a backtrace to where it crashed ((gdb) bt). Oh and please use 1.3.0 to test and not 1.2.x. :)

@softrabbit
Copy link
Member

It's not an out-of-memory exception, according to htop (and my combined RAM size of 7.76GiB) and neither is it a problem with wayland nor pipewire since nothing shows up in journalctl - I'm not a C++ dev, so idk how to debug further. I'm absolutely willing to do whatever it takes to get this fixed though, if I'm the only one able to reproduce this.

I noticed the file provided took a good while to import, does your system kill "unresponsive" processes? That might be recorded in some log.

@allejok96
Copy link
Contributor

Tested on 1.3.0-alpha1.130, Arch 5.14, Xorg, during a RAID scrub... Cannot reproduce. Though I discovered that pressing Cancel during import does nothing.

@allejok96
Copy link
Contributor

You said it's not a memory problem, meh, I'll just link some stuff I found anyway
https://stackoverflow.com/questions/726690/what-killed-my-process-and-why
https://linux.die.net/man/5/limits.conf
https://ss64.com/bash/ulimit.html

@Veratil
Copy link
Contributor

Veratil commented Sep 13, 2021

Though I discovered that pressing Cancel during import does nothing.

You weren't suppose to find out about that! 😂 Yeah that's something I've known about but I just haven't fixed yet.

@midi-pascal
Copy link
Contributor

When loading the attached midi file in my own MIDI program, I discovered that it does not have a time signature event.
Could this be the source of the crash, i.e. some values not initialized inside LMMS when importing the file?
Just a thought...

@Veratil
Copy link
Contributor

Veratil commented Sep 13, 2021

When loading the attached midi file in my own MIDI program, I discovered that it does not have a time signature event.
Could this be the source of the crash, i.e. some values not initialized inside LMMS when importing the file?
Just a thought...

MIDI defines the default time signature as 4/4, it isn't needed.

EDIT: MIDI spec says
All MIDI Files should specify tempo and time signature. If they don't, the time signature is assumed to be 4/4, and the tempo 120 beats per minute.

EDIT2: LMMS doesn't care if the import has a time signature or not, too. It will default to whatever time signature you have set in LMMS.

@RiedleroD
Copy link
Contributor Author

RiedleroD commented Sep 14, 2021

tested again with 1.3.0 - gdb prints a bit more, but nothing useful to me either. Shortened gdb output:

Starting program: /data/diyfs/lmms/build/lmms --import "/home/riedler/lmms/templates/SMG_-_Gusty_Garden.mid"
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
[New Thread 0x7ffff11c4640 (LWP 1520)]
[New Thread 0x7fffeb483640 (LWP 1521)]
[New Thread 0x7fffea68a640 (LWP 1522)]
[New Thread 0x7fffe9e89640 (LWP 1523)]
[New Thread 0x7fffe9688640 (LWP 1524)]
[New Thread 0x7fffe8e87640 (LWP 1525)]
[Thread 0x7fffe8e87640 (LWP 1525) exited]
[Thread 0x7fffe9e89640 (LWP 1523) exited]
[Thread 0x7fffe9688640 (LWP 1524) exited]
[Thread 0x7fffea68a640 (LWP 1522) exited]
[New Thread 0x7fffe8e87640 (LWP 1526)]
[New Thread 0x7fffe9688640 (LWP 1527)]
[New Thread 0x7fffe9e89640 (LWP 1528)]
Lv2 plugin SUMMARY: 5 of 8  loaded in 3 msecs. <blah about env-vars>
[New Thread 0x7fffea68a640 (LWP 1529)]
[New Thread 0x7fffdb2d6640 (LWP 1530)]
[New Thread 0x7fffdaa94640 (LWP 1531)]
[New Thread 0x7fffd9a53640 (LWP 1532)]
MidiImport::tryImport(): found MThd
[New Thread 0x7fffc73a2640 (LWP 1533)]
[New Thread 0x7fffc6ba1640 (LWP 1534)]
[New Thread 0x7fffc63a0640 (LWP 1535)]
<a few dozen warnings about fluidsynth not finding presets on certain channels mixed in with a few "using existing reference to (my soundfont)" and "really deleting (my soundfont)">
[Thread 0x7fffc63a0640 (LWP 1535) exited]
[Thread 0x7fffc6ba1640 (LWP 1534) exited]
[Thread 0x7fffc73a2640 (LWP 1533) exited]
[Thread 0x7fffd9a53640 (LWP 1532) exited]
[Thread 0x7fffdb2d6640 (LWP 1530) exited]
[Thread 0x7fffea68a640 (LWP 1529) exited]
[Thread 0x7fffe9e89640 (LWP 1528) exited]
[Thread 0x7fffe9688640 (LWP 1527) exited]
[Thread 0x7fffe8e87640 (LWP 1526) exited]
[Thread 0x7fffeb483640 (LWP 1521) exited]
[Thread 0x7ffff11c4640 (LWP 1520) exited]
[Thread 0x7ffff2f55cc0 (LWP 1519) exited]

Program terminated with signal SIGKILL, Killed.
The program no longer exists.

entering bt returns: no stack

my system doesn't kill unresponsive processes, otherwise a lot more would crash on my pentium potato :P besides, no log says anything about a process being killed (tried journalctl, /var/log and dmesg)

edit: oops forgot to compile debug version, I'll try it again with that in a bit.

@Veratil
Copy link
Contributor

Veratil commented Sep 14, 2021

my system doesn't kill unresponsive processes, otherwise a lot more would crash on my pentium potato :P besides, no log says anything about a process being killed (tried journalctl, /var/log and dmesg)

Unless you manually killed the process, your system killed it: Program terminated with signal SIGKILL, Killed.

@allejok96
Copy link
Contributor

This is turning into a murder mystery. There is no stack trace and all processes have perfect alibis!

@RiedleroD
Copy link
Contributor Author

RiedleroD commented Sep 14, 2021

So I did a bit more testing (this time with a debug build) - seems the crash only happens with the jack backend selected. That's odd considering any backend goes over pipewire anyway (including alsa) - I tested SDL, alsa, Pulseaudio and dummy, without any crashes.

I also tried to get more crash information from the jack version, but either I have no idea how to use gdb or it somehow doesn't work this way either. I tried this:

<in debug build folder>
gdb ./lmms
  GNU gdb (gdb) 10.2
  <license and help stuff>
  Reading symbols from ./lmms...
(gdb) run --import "/path/to.mid"
  <same output as before, although right before the fluidsynth warnings start, there was this:>
  Error reading song: song node not found
  <fluidsynth warnings as usual and then the same thread messages as before>
  Program terminated with signal SIGKILL, Killed.
  The program no longer exists.
(gdb) tb
  No default breakpoint address now.
(gdb) bt
  No stack.

very strange, I do not know what to make of this.

add. note: the debug build is from commit g770d2498b now

@RiedleroD
Copy link
Contributor Author

RiedleroD commented Sep 14, 2021

oh, there's something in journalctl now, I'll see what I can find there

@RiedleroD
Copy link
Contributor Author

okay so this is what's printed per-crash, in journalctl:

root[17022]: ACPI action undefined: PNP0C0A:00
audit[17001]: ANOM_ABEND auid=1000 uid=1000 gid=1000 ses=1 pid=17001 comm="lmms" exe="/data/diyfs/lmms-debug/build/lmms" sig=6 res=1
kernel: audit: type=1701 audit(1631643041.701:125): auid=1000 uid=1000 gid=1000 ses=1 pid=17001 comm="lmms" exe="/data/diyfs/lmms-debug/build/lmms" sig=6 res=1
audit: BPF prog-id=37 op=LOAD
audit: BPF prog-id=38 op=LOAD
audit: BPF prog-id=39 op=LOAD
kernel: audit: type=1334 audit(1631643041.732:126): prog-id=37 op=LOAD
kernel: audit: type=1334 audit(1631643041.733:127): prog-id=38 op=LOAD
kernel: audit: type=1334 audit(1631643041.733:128): prog-id=39 op=LOAD
systemd[1]: Started Process Core Dump (PID 17029/UID 0).
 Subject: A start job for unit [email protected] has finished successfully
 Defined-By: systemd
 Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
 
 A start job for unit [email protected] has finished successfully.
 
 The job identifier is 1515.
TravelMateoD audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-coredump@3-17029-0 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
kernel: audit: type=1130 audit(1631643041.737:129): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-coredump@3-17029-0 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
systemd-coredump[17031]: [🡕] Process 17001 (lmms) of user 1000 dumped core.
  <a few dozend times something along the lines of "Found module libdelay.so with build-id: 4689cde61eff298ffab75ae7b985ac5bdd339eca">
 Subject: Process 17001 (lmms) dumped core
 Defined-By: systemd
 Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
 Documentation: man:core(5)
 
 Process 17001 (lmms) crashed and dumped core.
 
 This usually indicates a programming error in the crashing program and
 should be reported to its vendor as a bug.
systemd[1]: [email protected]: Deactivated successfully.
 Subject: Unit succeeded
 Defined-By: systemd
 Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
 
 The unit [email protected] has successfully entered the 'dead' state.
audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-coredump@3-17029-0 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
systemd[1]: [email protected]: Consumed 1.890s CPU time.
 Subject: Resources consumed by unit runtime
 Defined-By: systemd
 Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
 
 The unit [email protected] completed and consumed the indicated resources.
kernel: audit: type=1131 audit(1631643044.042:130): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-coredump@3-17029-0 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
audit: BPF prog-id=39 op=UNLOAD
audit: BPF prog-id=38 op=UNLOAD
audit: BPF prog-id=37 op=UNLOAD
kernel: audit: type=1334 audit(1631643044.090:131): prog-id=39 op=UNLOAD
kernel: audit: type=1334 audit(1631643044.090:132): prog-id=38 op=UNLOAD
kernel: audit: type=1334 audit(1631643044.090:133): prog-id=37 op=UNLOAD

I'm not sure how much of that log is relevant, so I just pasted everything which seemed vaguely related (this all happened in 4 seconds, so there might be some other stuff in there)

@Veratil
Copy link
Contributor

Veratil commented Sep 14, 2021

Okay try this and we'll see if this is something happening in MIDI import itself or afterwards:

$ gdb ./lmms

# these will probably prompt about "depend on future import" or whatnot, just say 'yes'
(gdb) break MidiImport.cpp:307
(gdb) break MidiImport.cpp:586

# let's try using the GUI first, then we can try the --import path later
(gdb) run

# import the MIDI like normal, LMMS will freeze when it starts the import, click back in gdb window
(gdb) c

# we just want to know LMMS gets to the start of the main import function, "c" continues running LMMS
# the next breakpoint is the end of the main import function,
#  if we crash before it breaks then there's an issue in the import, otherwise we know it's after

@RiedleroD
Copy link
Contributor Author

the crash is after the first, but before the second breakpoint - I already mentioned this in the original issue report, but some of the midi does import. It crashes (in this case) at about 75%.

@Veratil
Copy link
Contributor

Veratil commented Sep 14, 2021

the crash is after the first, but before the second breakpoint - I already mentioned this in the original issue report, but some of the midi does import. It crashes (in this case) at about 75%.

Yeah this is just ensuring it's within that function.

Let's add a breakpoint to figure out what track is causing the failure (if it's during the track loading).

# you don't need the other breakpoints, only this one now
# add breakpoint at start of track handling loop
(gdb) break MidiImport.cpp:408

# every break, check the trackName and trk variable
# I forget if this is the right way to print the trackName data, this may not print the actual string
(gdb) p trackName.toUtf8().data()
# this will tell us if the trk is not null
(gdb) p trk

After one of the breaks we hopefully will get the track that's causing the issue and then we can work from there to figure out the issue.

@allejok96
Copy link
Contributor

I've only ever used QtCreator for debugging, but what about:

(gdb) catch throw
(gdb) run

@Veratil
Copy link
Contributor

Veratil commented Sep 14, 2021

Might work! Not sure anything is thrown though. Doesn't hurt to try.

@RiedleroD
Copy link
Contributor Author

RiedleroD commented Sep 19, 2021

@Veratil Shortened Output:

0x5555572de108 "Track 0"
(Alg_track_ptr) 0x5555573b4a80
0x555557c0ff88 "Track 1"
(Alg_track_ptr) 0x555557817540
0x55555866b278 "Track 2"
(Alg_track_ptr) 0x555557522fc0
0x55555900e638 "Track 3"
(Alg_track_ptr) 0x5555573b4a10
0x555559a4ab88 "Track 4"
(Alg_track_ptr) 0x5555573b4b80
0x55555a6a1cf8 "Track 5"
(Alg_track_ptr) 0x555557892720
0x55555b013ba8 "Track 6"
(Alg_track_ptr) 0x55555796b780
0x55555b3ed6c8 "Track 7"
(Alg_track_ptr) 0x55555796b9b0
0x55555c8bfa68 "Track 8"
(Alg_track_ptr) 0x55555798cba0
0x55555d927888 "Track 9"
(Alg_track_ptr) 0x5555579bbfe0
0x55555e2bb338 "Track 10"
(Alg_track_ptr) 0x5555579e6ca0
0x55555ecd6258 "Track 11"
(Alg_track_ptr) 0x555557a1c720
0x55556035b1e8 "Track 12"
(Alg_track_ptr) 0x555557a54330
0x555563676618 "Track 13"
(Alg_track_ptr) 0x555557955150
Program terminated with signal SIGKILL, Killed.
The program no longer exists.

The full output is here

Also I noticed that p trackName.toUtf8().data() took quite some time every run, so that's kind of weird.

@RiedleroD
Copy link
Contributor Author

@Veratil didn't work - so it doesn't throw anything, which is also very weird.

@Veratil
Copy link
Contributor

Veratil commented Sep 19, 2021

@Veratil didn't work - so it doesn't throw anything, which is also very weird.

I didn't expect anything to be thrown, there are no throws in portsmf (yet). Even though it's a C++ project it's written mostly in C.

0x555563676618 "Track 13"
(Alg_track_ptr) 0x555557955150
Program terminated with signal SIGKILL, Killed.
The program no longer exists.

Okay so something it happening during the 13th track. Looking at it, it has a lot of notes. It might be a OOM thing. Now that we know it's breaking on track 13, let's debug more during this loop. Do the same thing as before (you don't need to p trk, but still print the track name so we know which loop we're on). Once you get to Track 13 add some new breaks:

break MidiImport.cpp:420
break MidiImport.cpp:452
break MidiImport.cpp:465
break MidiImport.cpp:472
break MidiImport.cpp:494
break MidiImport.cpp:536

Once those breaks are in place, using c to continue like before. The first break at 420 is the start of the event processing loop. 452 is if the event is a note. 465 is if it's an update event. 472 and 494 are for update types. 536 is just another block that it might get killed at for allocating memory.

I'd say alongside this use top or similar to monitor your memory. My best guess is the OOM killer is causing this for some reason.

@RiedleroD
Copy link
Contributor Author

RiedleroD commented Sep 20, 2021

It was killed after 536, with no visible change in memory consumption. The output (after the last continue) is as follows:

[New Thread 0x7fffc568b640 (LWP 69203)]
[New Thread 0x7fffabfff640 (LWP 69204)]
[New Thread 0x7fffc4e8a640 (LWP 69205)]
[New Thread 0x7fffa9c7f640 (LWP 69206)]

Thread 21 "lmms" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffc568b640 (LWP 69203)]
0x00007ffff5e97f87 in __memset_erms () from /usr/lib/libc.so.6

<16 threads exited>

Program terminated with signal SIGSEGV, Segmentation fault.
The program no longer exists.

@Veratil
Copy link
Contributor

Veratil commented Sep 20, 2021

A segfault is a better error now! Could you get a backtrace at that point?

@RiedleroD
Copy link
Contributor Author

nope, it still says No stack.

@RiedleroD
Copy link
Contributor Author

can't reproduce with master anymore, no idea what fixed it, but I'm glad it's gone now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants