Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] DSP Panic on Intel MTL #9695

Closed
as400l opened this issue Nov 29, 2024 · 33 comments · Fixed by thesofproject/linux#5267
Closed

[BUG] DSP Panic on Intel MTL #9695

as400l opened this issue Nov 29, 2024 · 33 comments · Fixed by thesofproject/linux#5267
Labels
bug Something isn't working as expected P2 Critical bugs or normal features
Milestone

Comments

@as400l
Copy link

as400l commented Nov 29, 2024

Describe the bug
DSP Panic seen and full freeze of the OS.

To Reproduce
Open pavucontrol mute/unmute microphone few times. Close pavucontrol. Wait for freeze.

Reproduction Rate
100%

Expected behavior
No DSP Panic.

Impact
Cannot use builtin microphone.

Environment

  1. Branch name and commit hash of the 2 repositories: sof (firmware/topology) and linux (kernel driver).
    • Kernel: 6.12.1
    • SOF: sof-bin 2024.09.1
  2. Name of the topology file
    • Topology: sof-hda-generic-2ch.tplg
  3. Name of the platform(s) on which the bug is observed.
    • Platform: Intel Meteor Lake Ultra 9 185H, Asus Zenbook 14 OLED UX3405M, Alpine Linux

Screenshots or console output

[  186.448058] sof-audio-pci-intel-mtl 0000:00:1f.3: ------------[ DSP dump start ]------------
[  186.448069] sof-audio-pci-intel-mtl 0000:00:1f.3: DSP panic!
[  186.448071] sof-audio-pci-intel-mtl 0000:00:1f.3: fw_state: SOF_FW_BOOT_COMPLETE (7)
[  186.448078] sof-audio-pci-intel-mtl 0000:00:1f.3: 0x50000005: module: ROM_EXT, state: FW_ENTERED, running
[  186.448083] sof-audio-pci-intel-mtl 0000:00:1f.3: Firmware state: 0x5, status/error code: 0x0
[  186.448116] sof-audio-pci-intel-mtl 0000:00:1f.3: Unknown toolchain is used
[  186.448120] sof-audio-pci-intel-mtl 0000:00:1f.3: error: DSP Firmware Oops
[  186.448121] sof-audio-pci-intel-mtl 0000:00:1f.3: error: Exception Cause: AllocaCause, MOVSP instruction, if caller’s registers are not in the register file
[  186.448123] sof-audio-pci-intel-mtl 0000:00:1f.3: EXCCAUSE 0x00000005 EXCVADDR 0x00000000 PS       0x00060d20 SAR     0x0000000c
[  186.448126] sof-audio-pci-intel-mtl 0000:00:1f.3: EPC1     0xa007626d EPC2     0x00000000 EPC3     0x00000000 EPC4    0x00000000
[  186.448128] sof-audio-pci-intel-mtl 0000:00:1f.3: EPC5     0x00000000 EPC6     0x00000000 EPC7     0x00000000 DEPC    0x00000000
[  186.448129] sof-audio-pci-intel-mtl 0000:00:1f.3: EPS2     0x00000000 EPS3     0x00000000 EPS4     0x00000000 EPS5    0x00000000
[  186.448131] sof-audio-pci-intel-mtl 0000:00:1f.3: EPS6     0x00000000 EPS7     0x00000000 INTENABL 0x00000000 INTERRU 0x00000000
[  186.448132] sof-audio-pci-intel-mtl 0000:00:1f.3: stack dump from 0x00000000
[  186.448134] sof-audio-pci-intel-mtl 0000:00:1f.3: AR registers:
[  186.448136] sof-audio-pci-intel-mtl 0000:00:1f.3: 0x0: a004ed15 a0111680 00000000 4015a7c0
[  186.448138] sof-audio-pci-intel-mtl 0000:00:1f.3: 0x10: a0166b00 00000018 401492b0 a0111680
[  186.448140] sof-audio-pci-intel-mtl 0000:00:1f.3: 0x20: a005fb41 a0111640 401492b0 a006506c
[  186.448142] sof-audio-pci-intel-mtl 0000:00:1f.3: 0x30: a005fb41 a0111640 401492b0 a006506c
[  186.448144] sof-audio-pci-intel-mtl 0000:00:1f.3: ------------[ DSP dump end ]------------
[  186.946817] sof-audio-pci-intel-mtl 0000:00:1f.3: ipc timed out for 0xe030001|0x300
[  186.946837] sof-audio-pci-intel-mtl 0000:00:1f.3: ------------[ IPC dump start ]------------
[  186.946851] sof-audio-pci-intel-mtl 0000:00:1f.3: Host IPC initiator: 0x8e030001|0x300|0x0, target: 0x1b0a0000|0x0|0x0, ctl: 0x3
[  186.946856] sof-audio-pci-intel-mtl 0000:00:1f.3: ------------[ IPC dump end ]------------
[  186.946859] sof-audio-pci-intel-mtl 0000:00:1f.3: IPC timeout
[  186.946866] sof-audio-pci-intel-mtl 0000:00:1f.3: ASoC: error at soc_component_trigger on 0000:00:1f.3: -110
[  186.946878]  HDMI2: ASoC: trigger FE cmd: 1 failed: -110
[  186.946897] sof-audio-pci-intel-mtl 0000:00:1f.3: ipc4_tx_msg_unlocked: ipc message send for 0xe010001|0x0 failed: -19
[  186.946902] sof-audio-pci-intel-mtl 0000:00:1f.3: ASoC: error at soc_component_trigger on 0000:00:1f.3: -19
[  186.946904]  HDMI2: ASoC: trigger FE cmd: 0 failed: -19
[  186.947086] sof-audio-pci-intel-mtl 0000:00:1f.3: ipc4_tx_msg_unlocked: ipc message send for 0x13000003|0x1 failed: -19
[  186.947091] sof-audio-pci-intel-mtl 0000:00:1f.3: failed to pause all pipelines
[  186.947093] sof-audio-pci-intel-mtl 0000:00:1f.3: ASoC: error at soc_component_trigger on 0000:00:1f.3: -19
[  186.947096]  DMIC Raw: ASoC: trigger FE cmd: 0 failed: -19
[  186.947198] sof-audio-pci-intel-mtl 0000:00:1f.3: ipc4_tx_msg_unlocked: ipc message send for 0x46060004|0x19 failed: -19
[  186.947203] sof-audio-pci-intel-mtl 0000:00:1f.3: failed to unbind modules module-copier.12.2:0 -> tdfb.11.1:0
[  186.947208] sof-audio-pci-intel-mtl 0000:00:1f.3: ipc4_tx_msg_unlocked: ipc message send for 0x12040000|0x0 failed: -19
[  186.947212] sof-audio-pci-intel-mtl 0000:00:1f.3: failed to free pipeline widget pipeline.12
[  186.947219] sof-audio-pci-intel-mtl 0000:00:1f.3: ipc4_tx_msg_unlocked: ipc message send for 0x12050000|0x0 failed: -19
[  186.947222] sof-audio-pci-intel-mtl 0000:00:1f.3: failed to free pipeline widget pipeline.11
[  186.947225] sof-audio-pci-intel-mtl 0000:00:1f.3: Failed to free connected widgets
[  186.947233] sof-audio-pci-intel-mtl 0000:00:1f.3: sof_pcm_stream_free: sof_widget_list_free failed -19
[  186.947236] sof-audio-pci-intel-mtl 0000:00:1f.3: ASoC: error at snd_soc_pcm_component_prepare on 0000:00:1f.3: -19
[  186.947240]  DMIC Raw: ASoC: error at __soc_pcm_prepare on DMIC Raw: -19
[  186.947243]  DMIC Raw: ASoC: error at dpcm_fe_dai_prepare on DMIC Raw: -19
[  186.947344] sof-audio-pci-intel-mtl 0000:00:1f.3: ipc4_tx_msg_unlocked: ipc message send for 0x13020003|0x0 failed: -19
[  186.947348] sof-audio-pci-intel-mtl 0000:00:1f.3: ASoC: error at soc_dai_trigger on Analog CPU DAI: -19
[  186.947354]  HDA Analog: ASoC: error at dpcm_be_dai_trigger on HDA Analog: -19
[  186.947357]  HDA Analog: ASoC: trigger FE cmd: 0 failed: -19

Full dmesg in attachment.
dmesg.txt

@as400l as400l added the bug Something isn't working as expected label Nov 29, 2024
@lgirdwood
Copy link
Member

@as400l are you able to see this with alsamixer ? and if so which DMIC Kcontrol ?
@ujfalusi any additional kernel debug options to enable ?

@lgirdwood lgirdwood added this to the v2.12 milestone Nov 29, 2024
@lgirdwood lgirdwood added the P2 Critical bugs or normal features label Nov 29, 2024
@ujfalusi
Copy link
Contributor

As usual, @as400l:
Can you add this file sof-dyndbg.conf.txt
as /etc/modprobe.d/sof-dyndbg.conf, reboot and re-attach the dmesg log which contains the boot and the error itself?

In case the log is truncated because of a small log buffer, please add log_buf_len=4M to the kernel command line parameter (passed by the bootloader to the kernel)

@as400l
Copy link
Author

as400l commented Nov 29, 2024

Here is dmesg with the error and sof-dyndbg.conf enabled.

BTW - isn't it strange that it uses sof-hda-generic-2ch.tplg file ?

@lgirdwood - I tried with alsamixer but can't reproduce it. But, on the other hand, with alsamixer I can't unmute the mic. I have this LED on keyboard and no matter what I tried with alsamixer it's just constantly on. Which means that the mic was not unmuted.

dmesg.log.gz

@ujfalusi
Copy link
Contributor

@as400l, for some reason the dyndbg did not enabled the debug prints, we don't see what was the last message that was sent to the firmware, we know that the next would have been 0xe010002|0x0, which is not sent as the firmware has crashed.
Can you check again if the dyndbg is in place? The probing should be much more verbose with lots of prints about modules and stuff.

sof-hda-generic-2ch.tplg is chosen, because you have DMIC in your system

[   15.097047] sof-audio-pci-intel-mtl 0000:00:1f.3: DMICs detected in NHLT tables: 2

you also have BT offload advertised:

[   15.097044] sof-audio-pci-intel-mtl 0000:00:1f.3: NHLT device BT(0) detected, ssp_mask 0x4
[   15.097046] sof-audio-pci-intel-mtl 0000:00:1f.3: BT link detected in NHLT tables: 0x4

I'm not sure if that can cause any issues.

You can disable the dmic for testing the analog path (you will loose the laptop microphones) :

options snd_sof_intel_hda_generic dyndbg=+pmf dmic_num=0

in for example /etc/modprobe.d/no-dmic.conf

@as400l
Copy link
Author

as400l commented Nov 29, 2024

I tried multiple times with "wpctl set-mute @DEFAULT_AUDIO_SOURCE@ toggle". But could not reproduce this behaviour.

So maybe the real cause of this is actually XE drm module crash or hang related to pavucontrol ? Which may be seen at the end of dmesg I've sent ? Is this even possible ?

As to the debug prints. My kernel may is really slimmed down. So that may be the reason. May have to try with default distro kernel.

@lgirdwood
Copy link
Member

Here is dmesg with the error and sof-dyndbg.conf enabled.

BTW - isn't it strange that it uses sof-hda-generic-2ch.tplg file ?

@lgirdwood - I tried with alsamixer but can't reproduce it. But, on the other hand, with alsamixer I can't unmute the mic. I have this LED on keyboard and no matter what I tried with alsamixer it's just constantly on. Which means that the mic was not unmuted.

dmesg.log.gz

Ok, its strange that alsamixer wont unmute the mic, I assume you tried alsamixer -c N (where N is card number) to make sure all kcontrols have been tried.

Btw, is the keyboard LED on a key ? i.e. can it be pressed with Fn/Alt/Ctrl/shift combinations to switch LED on/off ? This should be mapped to the kcontrol that will mute/unmute the mic.

Please do try the stock kernel. We need to figure out what has happened here with stock kernel logs.

@as400l
Copy link
Author

as400l commented Dec 2, 2024

@lgirdwood - as I mentioned above - I tried with "wpctl" and it correctly mutes/unmutes microphone. LED goes off/on as it should. But I could not reproduce this error.

I'm leaning towards something else causing this panic.

Stock Alpine kernel was also not helpful since it's probably also stripped.

@as400l
Copy link
Author

as400l commented Dec 2, 2024

@lgirdwood
@ujfalusi

I compiled a kernel with DYNAMIC_DEBUG and here are logs with the error. I had to try to trigger it mutliple times as this time it wasn't so eager to panic.
Panic is at "313.393689".

dmesg.log.gz

@ujfalusi
Copy link
Contributor

ujfalusi commented Dec 3, 2024

Based on the log I think it is the ChainDMA (HDMI audio) which is causing the firmware panic:

[  313.391960] snd_sof:sof_pcm_trigger: sof-audio-pci-intel-mtl 0000:00:1f.3: pcm: trigger stream 4 dir 0 cmd 1
[  313.391963] snd_sof:sof_ipc4_trigger_pipelines: sof-audio-pci-intel-mtl 0000:00:1f.3: trigger cmd: 1 state: 4
[  313.391966] snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-mtl 0000:00:1f.3: ipc tx      : 0xe030001|0x300
[  313.393682] snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-mtl 0000:00:1f.3: ipc rx      : 0x1b0a0000|0x0
[  313.393687] sof-audio-pci-intel-mtl 0000:00:1f.3: ------------[ DSP dump start ]------------
[  313.393689] sof-audio-pci-intel-mtl 0000:00:1f.3: DSP panic!
[  313.393691] sof-audio-pci-intel-mtl 0000:00:1f.3: fw_state: SOF_FW_BOOT_COMPLETE (7)
[  313.393695] sof-audio-pci-intel-mtl 0000:00:1f.3: 0x50000005: module: ROM_EXT, state: FW_ENTERED, running
[  313.393700] sof-audio-pci-intel-mtl 0000:00:1f.3: Firmware state: 0x5, status/error code: 0x0
[  313.393733] sof-audio-pci-intel-mtl 0000:00:1f.3: Unknown toolchain is used
[  313.393735] sof-audio-pci-intel-mtl 0000:00:1f.3: error: DSP Firmware Oops
[  313.393737] sof-audio-pci-intel-mtl 0000:00:1f.3: error: Exception Cause: AllocaCause, MOVSP instruction, if caller’s registers are not in the register file
[  313.393741] sof-audio-pci-intel-mtl 0000:00:1f.3: EXCCAUSE 0x00000005 EXCVADDR 0x00000000 PS       0x00060d20 SAR     0x0000000c
[  313.393745] sof-audio-pci-intel-mtl 0000:00:1f.3: EPC1     0xa007626d EPC2     0x00000000 EPC3     0x00000000 EPC4    0x00000000
[  313.393748] sof-audio-pci-intel-mtl 0000:00:1f.3: EPC5     0x00000000 EPC6     0x00000000 EPC7     0x00000000 DEPC    0x00000000
[  313.393750] sof-audio-pci-intel-mtl 0000:00:1f.3: EPS2     0x00000000 EPS3     0x00000000 EPS4     0x00000000 EPS5    0x00000000
[  313.393752] sof-audio-pci-intel-mtl 0000:00:1f.3: EPS6     0x00000000 EPS7     0x00000000 INTENABL 0x00000000 INTERRU 0x00000000
[  313.393754] sof-audio-pci-intel-mtl 0000:00:1f.3: stack dump from 0x00000000
[  313.393756] sof-audio-pci-intel-mtl 0000:00:1f.3: AR registers:
[  313.393759] sof-audio-pci-intel-mtl 0000:00:1f.3: 0x0: a004ed15 a0111680 00000000 40152c80
[  313.393762] sof-audio-pci-intel-mtl 0000:00:1f.3: 0x10: a0166740 00000018 40149740 a0111680
[  313.393764] sof-audio-pci-intel-mtl 0000:00:1f.3: 0x20: a005fb41 a0111640 40149740 a006506c
[  313.393766] sof-audio-pci-intel-mtl 0000:00:1f.3: 0x30: a005fb41 a0111640 40149740 a006506c
[  313.393768] sof-audio-pci-intel-mtl 0000:00:1f.3: ------------[ DSP dump end ]------------
[  313.393770] snd_sof:sof_set_fw_state: sof-audio-pci-intel-mtl 0000:00:1f.3: fw_state change: 7 -> 8
[  313.393774] snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-mtl 0000:00:1f.3: ipc rx done : 0x1b0a0000|0x0
[  313.898625] sof-audio-pci-intel-mtl 0000:00:1f.3: ipc timed out for 0xe030001|0x300
[  313.898645] sof-audio-pci-intel-mtl 0000:00:1f.3: ------------[ IPC dump start ]------------
[  313.898657] sof-audio-pci-intel-mtl 0000:00:1f.3: Host IPC initiator: 0x8e030001|0x300|0x0, target: 0x1b0a0000|0x0|0x0, ctl: 0x3
[  313.898661] sof-audio-pci-intel-mtl 0000:00:1f.3: ------------[ IPC dump end ]------------
[  313.898663] sof-audio-pci-intel-mtl 0000:00:1f.3: IPC timeout

0xe030001 is ChainDMA with ALLOCATE and ENABLE bit set, but what is not right is that the Host DMA ID is 1 while the Link DMA ID is 0.
We had similar issue in past (thesofproject/linux#5116) which supposed to be fixed by thesofproject/linux#5119.

There are lots of things happening in the log, but looks like something (PW?) is trying PCMs at random keeping them open and stopping, starting, reconfiguring them.

@as400l
Copy link
Author

as400l commented Dec 3, 2024

@ujfalusi - just to remind - this happens only while using pavucontrol which is actually PulseAudio tool.
I could not reproduce this while using native WirePlumber tool - wpctl.

@ujfalusi
Copy link
Contributor

ujfalusi commented Dec 3, 2024

OK, so to reproduce the issue:
aplay -Dhw:0,3 -c8 -r48000 -fS32_LE /dev/zero -d 120

[ 2810.282081] snd_sof:sof_pcm_trigger: sof-audio-pci-intel-tgl 0000:00:1f.3: pcm3 (HDMI1), dir 0: Entry: trigger (cmd: 1)
[ 2810.282087] snd_sof:sof_ipc4_trigger_pipelines: sof-audio-pci-intel-tgl 0000:00:1f.3: pcm3 (HDMI1), dir 0: cmd: 1, state: 4
[ 2810.282093] snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-tgl 0000:00:1f.3: ipc tx      : 0xe030000|0xc00: GLB_CHAIN_DMA
[ 2810.282656] snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-tgl 0000:00:1f.3: ipc tx reply: 0x2e000000|0xc00: GLB_CHAIN_DMA
[ 2810.282692] snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-tgl 0000:00:1f.3: ipc tx done : 0xe030000|0xc00: GLB_CHAIN_DMA
[ 2810.283232] snd_sof_intel_hda_common:hda_dsp_stream_trigger: sof-audio-pci-intel-tgl 0000:00:1f.3: FW Poll Status: reg[0x160]=0x2014001e successful

Press <CTRL+z> to freeze aplay

[ 2814.029625] snd_sof:sof_pcm_trigger: sof-audio-pci-intel-tgl 0000:00:1f.3: pcm3 (HDMI1), dir 0: Entry: trigger (cmd: 0)
[ 2814.029633] snd_sof:sof_ipc4_trigger_pipelines: sof-audio-pci-intel-tgl 0000:00:1f.3: pcm3 (HDMI1), dir 0: cmd: 0, state: 3
[ 2814.029645] snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-tgl 0000:00:1f.3: ipc tx      : 0xe010000|0x0: GLB_CHAIN_DMA
[ 2814.030855] snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-tgl 0000:00:1f.3: ipc tx reply: 0x2e000000|0x0: GLB_CHAIN_DMA
[ 2814.031022] snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-tgl 0000:00:1f.3: ipc tx done : 0xe010000|0x0: GLB_CHAIN_DMA
[ 2814.031034] snd_soc_core:dpcm_be_dai_trigger:  iDisp1: ASoC: trigger BE iDisp1 cmd 0
[ 2814.031045] snd_sof_intel_hda_common:hda_dai_trigger: sof-audio-pci-intel-tgl 0000:00:1f.3: cmd=0 dai iDisp1 Pin direction 0

wait a sec or two then start a new HDMI playback (while the :0,3 is frozen):
aplay -Dhw:0,4 -c8 -r48000 -fS32_LE /dev/zero -d 120

[ 2823.354025] snd_sof:sof_ipc4_trigger_pipelines: sof-audio-pci-intel-tgl 0000:00:1f.3: pcm4 (HDMI2), dir 0: cmd: 1, state: 4
[ 2823.354033] snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-tgl 0000:00:1f.3: ipc tx      : 0xe030001|0xc00: GLB_CHAIN_DMA
[ 2823.357361] snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-tgl 0000:00:1f.3: ipc rx      : 0x1b0a0000|0x0: GLB_NOTIFICATION|EXCEPTION_CAUGHT
[ 2823.357367] sof-audio-pci-intel-tgl 0000:00:1f.3: ------------[ DSP dump start ]------------
[ 2823.357370] sof-audio-pci-intel-tgl 0000:00:1f.3: DSP panic!
[ 2823.357373] sof-audio-pci-intel-tgl 0000:00:1f.3: fw_state: SOF_FW_BOOT_COMPLETE (7)
[ 2823.357381] sof-audio-pci-intel-tgl 0000:00:1f.3: 0x00000005: module: ROM, state: FW_ENTERED, running
[ 2823.357490] sof-audio-pci-intel-tgl 0000:00:1f.3: FW is built with Zephyr toolchain
[ 2823.357493] sof-audio-pci-intel-tgl 0000:00:1f.3: error: DSP Firmware Oops
[ 2823.357496] sof-audio-pci-intel-tgl 0000:00:1f.3: error: Exception Cause: AllocaCause, MOVSP instruction, if caller’s registers are not in the register file
[ 2823.357499] sof-audio-pci-intel-tgl 0000:00:1f.3: EXCCAUSE 0x00000005 EXCVADDR 0x00000000 PS       0x00060f20 SAR     0x0000001d
[ 2823.357503] sof-audio-pci-intel-tgl 0000:00:1f.3: EPC1     0xbe04126c EPC2     0x00000000 EPC3     0x00000000 EPC4    0x00000000
[ 2823.357507] sof-audio-pci-intel-tgl 0000:00:1f.3: EPC5     0x00000000 EPC6     0x00000000 EPC7     0x00000000 DEPC    0x00000000
[ 2823.357510] sof-audio-pci-intel-tgl 0000:00:1f.3: EPS2     0x00000000 EPS3     0x00000000 EPS4     0x00000000 EPS5    0x00000000
[ 2823.357513] sof-audio-pci-intel-tgl 0000:00:1f.3: EPS6     0x00000000 EPS7     0x00000000 INTENABL 0x00000000 INTERRU 0x00000000
[ 2823.357515] sof-audio-pci-intel-tgl 0000:00:1f.3: stack dump from 0x00000000
[ 2823.357518] sof-audio-pci-intel-tgl 0000:00:1f.3: AR registers:
[ 2823.357521] sof-audio-pci-intel-tgl 0000:00:1f.3: 0x0: be04156b be0a2eb0 9e0b1700 be0b17c0
[ 2823.357525] sof-audio-pci-intel-tgl 0000:00:1f.3: 0x10: fff001ff 00000000 00003000 be0a2eb0
[ 2823.357540] sof-audio-pci-intel-tgl 0000:00:1f.3: 0x20: 00000000 be0a2e90 9e0a8630 00060f25
[ 2823.357546] sof-audio-pci-intel-tgl 0000:00:1f.3: 0x30: 00000000 be0a2e90 9e0a8630 00060f25
[ 2823.357551] sof-audio-pci-intel-tgl 0000:00:1f.3: ------------[ DSP dump end ]------------

@ujfalusi
Copy link
Contributor

ujfalusi commented Dec 3, 2024

Reverting 7eab5d86f218 ("ASoC: SOF: Intel: hda: Always clean up link DMA during stop") is fixing this particular issue.

That patch was part of thesofproject/linux#5197, which was fixing various metallic noise issues around similar sequences.

The issue is not limited to ChainDMA
TGL HDA machine will fail:

aplay -Dhw:0,3 -c2 -r48000 -fS32_LE /dev/zero -d 120
CTRL+z
aplay -Dhw:0,0 -c2 -r48000 -fS32_LE /dev/zero -d 120

or will cause fw panic:

aplay -Dhw:0,0 -c2 -r48000 -fS32_LE /dev/zero -d 120
CTRL+z
aplay -Dhw:0,3 -c2 -r48000 -fS32_LE /dev/zero -d 120

On LNL sdw it is the same with all endpoints:

aplay -Dhw:0,0 -c2 -r48000 -fS32_LE /dev/zero -d 120
CTRL+z
aplay -Dhw:0,2 -c2 -r48000 -fS32_LE /dev/zero -d 120

Only the ChainDMA PCMs will cause panic, others will fail.

@ranj063, I think it might be because we release the LinkDMA channel in sof/intel/ but we don't inform the firmware about this (we don't do a full stop) and this is causing a race if a new PCM comes in between the stop and the prepare/hw_params/start of the other PCM.

ujfalusi added a commit to ujfalusi/sof-linux that referenced this issue Dec 9, 2024
…IPC4

We need to reclaim the link DMA channel after clearing it with IPC4 as
the pipelines are not cleared in firmware, the Link DMA channel is
preserved.

This issue is not easy to reproduce under normal conditions as usually
after stop the stream is closed, or the same stream is restarted, but if
another stream got in between the stop and start, like this:
aplay -Dhw:0,3 -c2 -r48000 -fS32_LE /dev/zero -d 120
CTRL+z
aplay -Dhw:0,0 -c2 -r48000 -fS32_LE /dev/zero -d 120

then the link DMA channels will be mixed up resulting firmware error or
crash.

Fixes: ab55937 ("ASoC: SOF: Intel: hda: Always clean up link DMA during stop")
Closes: thesofproject/sof#9695
Signed-off-by: Peter Ujfalusi <[email protected]>
ujfalusi added a commit to ujfalusi/sof-linux that referenced this issue Dec 9, 2024
…IPC4

We need to reclaim the link DMA channel after clearing it with IPC4 as
the pipelines are not cleared in firmware, the Link DMA channel is
preserved.

This issue is not easy to reproduce under normal conditions as usually
after stop the stream is closed, or the same stream is restarted, but if
another stream got in between the stop and start, like this:
aplay -Dhw:0,3 -c2 -r48000 -fS32_LE /dev/zero -d 120
CTRL+z
aplay -Dhw:0,0 -c2 -r48000 -fS32_LE /dev/zero -d 120

then the link DMA channels will be mixed up resulting firmware error or
crash.

Fixes: ab55937 ("ASoC: SOF: Intel: hda: Always clean up link DMA during stop")
Closes: thesofproject/sof#9695
Signed-off-by: Peter Ujfalusi <[email protected]>
@ujfalusi
Copy link
Contributor

ujfalusi commented Dec 9, 2024

@ranj063, @as400l, this patch fixes the issue for me: thesofproject/linux#5267

@lgirdwood
Copy link
Member

@abonislawski fyi - for FW panic, it probably worth checking if this panic is due to HW state transition (fixed above in SW) and if it needs a FW fix too. Thanks !

ujfalusi added a commit to ujfalusi/sof-linux that referenced this issue Dec 10, 2024
We need to reclaim the link DMA channel after clearing it with IPC4 as
the pipelines are not cleared in firmware, the Link DMA channel is
preserved.

This issue is not easy to reproduce under normal conditions as usually
after stop the stream is closed, or the same stream is restarted, but if
another stream got in between the stop and start, like this:
aplay -Dhw:0,3 -c2 -r48000 -fS32_LE /dev/zero -d 120
CTRL+z
aplay -Dhw:0,0 -c2 -r48000 -fS32_LE /dev/zero -d 120

then the link DMA channels will be mixed up resulting firmware error or
crash.

Fixes: ab55937 ("ASoC: SOF: Intel: hda: Always clean up link DMA during stop")
Closes: thesofproject/sof#9695
Signed-off-by: Peter Ujfalusi <[email protected]>
ujfalusi added a commit to ujfalusi/sof-linux that referenced this issue Dec 10, 2024
We need to reclaim the link DMA channel after clearing it with IPC4 as
the pipelines are not cleared in firmware, the Link DMA channel is
preserved.

This issue is not easy to reproduce under normal conditions as usually
after stop the stream is closed, or the same stream is restarted, but if
another stream got in between the stop and start, like this:
aplay -Dhw:0,3 -c2 -r48000 -fS32_LE /dev/zero -d 120
CTRL+z
aplay -Dhw:0,0 -c2 -r48000 -fS32_LE /dev/zero -d 120

then the link DMA channels will be mixed up resulting firmware error or
crash.

Fixes: ab55937 ("ASoC: SOF: Intel: hda: Always clean up link DMA during stop")
Closes: thesofproject/sof#9695
Signed-off-by: Peter Ujfalusi <[email protected]>
ujfalusi added a commit to ujfalusi/sof-linux that referenced this issue Dec 10, 2024
We need to reclaim the link DMA channel after clearing it with IPC4 as
the pipelines are not cleared in firmware, the Link DMA channel is
preserved.

This issue is not easy to reproduce under normal conditions as usually
after stop the stream is closed, or the same stream is restarted, but if
another stream got in between the stop and start, like this:
aplay -Dhw:0,3 -c2 -r48000 -fS32_LE /dev/zero -d 120
CTRL+z
aplay -Dhw:0,0 -c2 -r48000 -fS32_LE /dev/zero -d 120

then the link DMA channels will be mixed up resulting firmware error or
crash.

Fixes: ab55937 ("ASoC: SOF: Intel: hda: Always clean up link DMA during stop")
Closes: thesofproject/sof#9695
Signed-off-by: Peter Ujfalusi <[email protected]>
ujfalusi added a commit to ujfalusi/sof-linux that referenced this issue Dec 11, 2024
The linkDMA should not be released on stop trigger since a stream re-start
might happen without closing of the stream. This leaves a short time for
other streams to 'steal' the linkDMA since it has been released.

This issue is not easy to reproduce under normal conditions as usually
after stop the stream is closed, or the same stream is restarted, but if
another stream got in between the stop and start, like this:
aplay -Dhw:0,3 -c2 -r48000 -fS32_LE /dev/zero -d 120
CTRL+z
aplay -Dhw:0,0 -c2 -r48000 -fS32_LE /dev/zero -d 120

then the link DMA channels will be mixed up, resulting firmware error or
crash.

Fixes: ab55937 ("ASoC: SOF: Intel: hda: Always clean up link DMA during stop")
Closes: thesofproject/sof#9695
Signed-off-by: Peter Ujfalusi <[email protected]>
ujfalusi added a commit to ujfalusi/sof-linux that referenced this issue Dec 13, 2024
The linkDMA should not be released on stop trigger since a stream re-start
might happen without closing of the stream. This leaves a short time for
other streams to 'steal' the linkDMA since it has been released.

This issue is not easy to reproduce under normal conditions as usually
after stop the stream is closed, or the same stream is restarted, but if
another stream got in between the stop and start, like this:
aplay -Dhw:0,3 -c2 -r48000 -fS32_LE /dev/zero -d 120
CTRL+z
aplay -Dhw:0,0 -c2 -r48000 -fS32_LE /dev/zero -d 120

then the link DMA channels will be mixed up, resulting firmware error or
crash.

Fixes: ab55937 ("ASoC: SOF: Intel: hda: Always clean up link DMA during stop")
Closes: thesofproject/sof#9695
Signed-off-by: Peter Ujfalusi <[email protected]>
bardliao pushed a commit to thesofproject/linux that referenced this issue Dec 18, 2024
The linkDMA should not be released on stop trigger since a stream re-start
might happen without closing of the stream. This leaves a short time for
other streams to 'steal' the linkDMA since it has been released.

This issue is not easy to reproduce under normal conditions as usually
after stop the stream is closed, or the same stream is restarted, but if
another stream got in between the stop and start, like this:
aplay -Dhw:0,3 -c2 -r48000 -fS32_LE /dev/zero -d 120
CTRL+z
aplay -Dhw:0,0 -c2 -r48000 -fS32_LE /dev/zero -d 120

then the link DMA channels will be mixed up, resulting firmware error or
crash.

Fixes: ab55937 ("ASoC: SOF: Intel: hda: Always clean up link DMA during stop")
Closes: thesofproject/sof#9695
Signed-off-by: Peter Ujfalusi <[email protected]>
Reviewed-by: Ranjani Sridharan <[email protected]>
Reviewed-by: Liam Girdwood <[email protected]>
Reviewed-by: Bard Liao <[email protected]>
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this issue Dec 18, 2024
The linkDMA should not be released on stop trigger since a stream re-start
might happen without closing of the stream. This leaves a short time for
other streams to 'steal' the linkDMA since it has been released.

This issue is not easy to reproduce under normal conditions as usually
after stop the stream is closed, or the same stream is restarted, but if
another stream got in between the stop and start, like this:
aplay -Dhw:0,3 -c2 -r48000 -fS32_LE /dev/zero -d 120
CTRL+z
aplay -Dhw:0,0 -c2 -r48000 -fS32_LE /dev/zero -d 120

then the link DMA channels will be mixed up, resulting firmware error or
crash.

Fixes: ab55937 ("ASoC: SOF: Intel: hda: Always clean up link DMA during stop")
Cc: [email protected]
Closes: thesofproject/sof#9695
Signed-off-by: Peter Ujfalusi <[email protected]>
Reviewed-by: Ranjani Sridharan <[email protected]>
Reviewed-by: Liam Girdwood <[email protected]>
Reviewed-by: Bard Liao <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Mark Brown <[email protected]>
johnny-mnemonic pushed a commit to linux-ia64/linux-stable-rc that referenced this issue Dec 28, 2024
commit e8d0ba1 upstream.

The linkDMA should not be released on stop trigger since a stream re-start
might happen without closing of the stream. This leaves a short time for
other streams to 'steal' the linkDMA since it has been released.

This issue is not easy to reproduce under normal conditions as usually
after stop the stream is closed, or the same stream is restarted, but if
another stream got in between the stop and start, like this:
aplay -Dhw:0,3 -c2 -r48000 -fS32_LE /dev/zero -d 120
CTRL+z
aplay -Dhw:0,0 -c2 -r48000 -fS32_LE /dev/zero -d 120

then the link DMA channels will be mixed up, resulting firmware error or
crash.

Fixes: ab55937 ("ASoC: SOF: Intel: hda: Always clean up link DMA during stop")
Cc: [email protected]
Closes: thesofproject/sof#9695
Signed-off-by: Peter Ujfalusi <[email protected]>
Reviewed-by: Ranjani Sridharan <[email protected]>
Reviewed-by: Liam Girdwood <[email protected]>
Reviewed-by: Bard Liao <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Mark Brown <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
johnny-mnemonic pushed a commit to linux-ia64/linux-stable-rc that referenced this issue Dec 29, 2024
commit e8d0ba1 upstream.

The linkDMA should not be released on stop trigger since a stream re-start
might happen without closing of the stream. This leaves a short time for
other streams to 'steal' the linkDMA since it has been released.

This issue is not easy to reproduce under normal conditions as usually
after stop the stream is closed, or the same stream is restarted, but if
another stream got in between the stop and start, like this:
aplay -Dhw:0,3 -c2 -r48000 -fS32_LE /dev/zero -d 120
CTRL+z
aplay -Dhw:0,0 -c2 -r48000 -fS32_LE /dev/zero -d 120

then the link DMA channels will be mixed up, resulting firmware error or
crash.

Fixes: ab55937 ("ASoC: SOF: Intel: hda: Always clean up link DMA during stop")
Cc: [email protected]
Closes: thesofproject/sof#9695
Signed-off-by: Peter Ujfalusi <[email protected]>
Reviewed-by: Ranjani Sridharan <[email protected]>
Reviewed-by: Liam Girdwood <[email protected]>
Reviewed-by: Bard Liao <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Mark Brown <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
johnny-mnemonic pushed a commit to linux-ia64/linux-stable-rc that referenced this issue Dec 30, 2024
commit e8d0ba1 upstream.

The linkDMA should not be released on stop trigger since a stream re-start
might happen without closing of the stream. This leaves a short time for
other streams to 'steal' the linkDMA since it has been released.

This issue is not easy to reproduce under normal conditions as usually
after stop the stream is closed, or the same stream is restarted, but if
another stream got in between the stop and start, like this:
aplay -Dhw:0,3 -c2 -r48000 -fS32_LE /dev/zero -d 120
CTRL+z
aplay -Dhw:0,0 -c2 -r48000 -fS32_LE /dev/zero -d 120

then the link DMA channels will be mixed up, resulting firmware error or
crash.

Fixes: ab55937 ("ASoC: SOF: Intel: hda: Always clean up link DMA during stop")
Cc: [email protected]
Closes: thesofproject/sof#9695
Signed-off-by: Peter Ujfalusi <[email protected]>
Reviewed-by: Ranjani Sridharan <[email protected]>
Reviewed-by: Liam Girdwood <[email protected]>
Reviewed-by: Bard Liao <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Mark Brown <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
@lvanderree
Copy link

I also had this bug on my Lenovo X1 Carbon gen 12 with Build in Audio (Meteor Lake-P HD)
Running Fedora 41 up till kernel 6.12.7-200
When using WebEx (44.10.2)

I could reproduce this by start-stopping the Audio-Test in WebEx-settings multiple times after each other

I Installed the experimental kernel 6.13.0-0.rc4.36.fc42.x86_64 (hoping this work was already merged), but unfortunately this didn't fix the problem.
Then I compiled the kernel with this source (517a41a6df6ac78e7a8da35856531ad432 (HEAD -> topic/sof-dev), 7aae4a3729e4627ca25bddba2023eb7f9afc95a4 (HEAD -> rawhide)) and I can confirm this FIXED the DSP panic

Thanks!

mj22226 pushed a commit to mj22226/linux that referenced this issue Dec 30, 2024
commit e8d0ba1 upstream.

The linkDMA should not be released on stop trigger since a stream re-start
might happen without closing of the stream. This leaves a short time for
other streams to 'steal' the linkDMA since it has been released.

This issue is not easy to reproduce under normal conditions as usually
after stop the stream is closed, or the same stream is restarted, but if
another stream got in between the stop and start, like this:
aplay -Dhw:0,3 -c2 -r48000 -fS32_LE /dev/zero -d 120
CTRL+z
aplay -Dhw:0,0 -c2 -r48000 -fS32_LE /dev/zero -d 120

then the link DMA channels will be mixed up, resulting firmware error or
crash.

Fixes: ab55937 ("ASoC: SOF: Intel: hda: Always clean up link DMA during stop")
Cc: [email protected]
Closes: thesofproject/sof#9695
Signed-off-by: Peter Ujfalusi <[email protected]>
Reviewed-by: Ranjani Sridharan <[email protected]>
Reviewed-by: Liam Girdwood <[email protected]>
Reviewed-by: Bard Liao <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Mark Brown <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
johnny-mnemonic pushed a commit to linux-ia64/linux-stable-rc that referenced this issue Dec 31, 2024
commit e8d0ba1 upstream.

The linkDMA should not be released on stop trigger since a stream re-start
might happen without closing of the stream. This leaves a short time for
other streams to 'steal' the linkDMA since it has been released.

This issue is not easy to reproduce under normal conditions as usually
after stop the stream is closed, or the same stream is restarted, but if
another stream got in between the stop and start, like this:
aplay -Dhw:0,3 -c2 -r48000 -fS32_LE /dev/zero -d 120
CTRL+z
aplay -Dhw:0,0 -c2 -r48000 -fS32_LE /dev/zero -d 120

then the link DMA channels will be mixed up, resulting firmware error or
crash.

Fixes: ab55937 ("ASoC: SOF: Intel: hda: Always clean up link DMA during stop")
Cc: [email protected]
Closes: thesofproject/sof#9695
Signed-off-by: Peter Ujfalusi <[email protected]>
Reviewed-by: Ranjani Sridharan <[email protected]>
Reviewed-by: Liam Girdwood <[email protected]>
Reviewed-by: Bard Liao <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Mark Brown <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
@ujfalusi
Copy link
Contributor

ujfalusi commented Jan 2, 2025

@lvanderree, 6.13-rc4 should include the patch and should be working (6.12.8 will also have the backport).

I wonder what are we missing... Let me test 6.13-rc5 here.

@ujfalusi
Copy link
Contributor

ujfalusi commented Jan 2, 2025

@lvanderree, 6.13-rc4 should include the patch and should be working (6.12.8 will also have the backport).

I wonder what are we missing... Let me test 6.13-rc5 here.

6.13-rc5 is working fine and indeed the patch is first available in -rc5, -rc4 should still fail (confirmed: it fails), @lvanderree, can you re-test with -rc5 if you have time to confirm that you no longer need to do git merges to get things working?

e8d0ba147d90 ASoC: SOF: Intel: hda-dai: Do not release the link DMA on STOP

gregkh pushed a commit to gregkh/linux that referenced this issue Jan 2, 2025
commit e8d0ba1 upstream.

The linkDMA should not be released on stop trigger since a stream re-start
might happen without closing of the stream. This leaves a short time for
other streams to 'steal' the linkDMA since it has been released.

This issue is not easy to reproduce under normal conditions as usually
after stop the stream is closed, or the same stream is restarted, but if
another stream got in between the stop and start, like this:
aplay -Dhw:0,3 -c2 -r48000 -fS32_LE /dev/zero -d 120
CTRL+z
aplay -Dhw:0,0 -c2 -r48000 -fS32_LE /dev/zero -d 120

then the link DMA channels will be mixed up, resulting firmware error or
crash.

Fixes: ab55937 ("ASoC: SOF: Intel: hda: Always clean up link DMA during stop")
Cc: [email protected]
Closes: thesofproject/sof#9695
Signed-off-by: Peter Ujfalusi <[email protected]>
Reviewed-by: Ranjani Sridharan <[email protected]>
Reviewed-by: Liam Girdwood <[email protected]>
Reviewed-by: Bard Liao <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Mark Brown <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
@lvanderree
Copy link

Unfortunately I immediately was able to reproduce an audio problem with WebEx under kernel 6.13.0-0.rc5.42.fc42.x86_64
When play-pausing the test-tune under WebEx Settings it stopped all audio-output (originally playing over the internal-speakers)
However it didn't kernel panic this time, only showing broken pipes in journalctl

jan 02 16:09:22 X1-Leon pipewire[4892]: spa.alsa: hw:sofhdadspp: (0 suppressed) snd_pcm_avail after recover: Gebroken pijp
jan 02 16:09:28 X1-Leon pipewire[4892]: spa.alsa: hw:sofhdadspp: (375 suppressed) snd_pcm_avail after recover: Gebroken pijp
jan 02 16:09:30 X1-Leon gnome-shell[5105]: JS ERROR: Error: Expected an object of type GvcMixerStream for argument 'stream' but got type undefined
readline_callback@file:///home/leon/.local/share/gnome-shell/extensions/[email protected]/libs/widgets.js:462:51
@resource:///org/gnome/shell/ui/init.js:21:20
jan 02 16:09:35 X1-Leon pipewire[4892]: spa.alsa: hw:sofhdadspp: (375 suppressed) snd_pcm_avail after recover: Gebroken pijp
jan 02 16:09:42 X1-Leon pipewire[4892]: spa.alsa: hw:sofhdadspp: (375 suppressed) snd_pcm_avail after recover: Gebroken pijp
jan 02 16:09:49 X1-Leon pipewire[4892]: spa.alsa: hw:sofhdadspp: (375 suppressed) snd_pcm_avail after recover: Gebroken pijp
jan 02 16:09:56 X1-Leon pipewire[4892]: spa.alsa: hw:sofhdadspp: (375 suppressed) snd_pcm_avail after recover: Gebroken pijp
jan 02 16:10:03 X1-Leon pipewire[4892]: spa.alsa: hw:sofhdadspp: (375 suppressed) snd_pcm_avail after recover: Gebroken pijp
jan 02 16:10:09 X1-Leon pipewire[4892]: spa.alsa: hw:sofhdadspp: (375 suppressed) snd_pcm_avail after recover: Gebroken pijp
jan 02 16:10:16 X1-Leon pipewire[4892]: spa.alsa: hw:sofhdadspp: (375 suppressed) snd_pcm_avail after recover: Gebroken pijp
jan 02 16:10:23 X1-Leon pipewire[4892]: spa.alsa: hw:sofhdadspp: (375 suppressed) snd_pcm_avail after recover: Gebroken pijp
jan 02 16:10:30 X1-Leon pipewire[4892]: spa.alsa: hw:sofhdadspp: (375 suppressed) snd_pcm_avail after recover: Gebroken pijp
jan 02 16:10:37 X1-Leon pipewire[4892]: spa.alsa: hw:sofhdadspp: (375 suppressed) snd_pcm_avail after recover: Gebroken pijp
jan 02 16:10:44 X1-Leon pipewire[4892]: spa.alsa: hw:sofhdadspp: (375 suppressed) snd_pcm_avail after recover: Gebroken pijp
jan 02 16:10:46 X1-Leon pipewire[4892]: spa.alsa: hw:sofhdadspp: (127 suppressed) snd_pcm_avail after recover: Gebroken pijp
jan 02 16:10:57 X1-Leon systemd[4874]: app-gnome-webex-17064.scope: Couldn't move process 17064 to requested cgroup '/user.slice/user-1000.slice/[email protected]/app.slice/app-gnome-webex-17064.scope': No such process
jan 02 16:10:57 X1-Leon systemd[4874]: app-gnome-webex-17064.scope: Failed to add PIDs to scope's control group: No such process
jan 02 16:10:57 X1-Leon systemd[4874]: app-gnome-webex-17064.scope: Failed with result 'resources'.
jan 02 16:10:57 X1-Leon systemd[4874]: Failed to start app-gnome-webex-17064.scope - Application launched by gnome-shell.
jan 02 16:11:01 X1-Leon systemd[4874]: app-gnome-webex-17111.scope: Couldn't move process 17111 to requested cgroup '/user.slice/user-1000.slice/[email protected]/app.slice/app-gnome-webex-17111.scope': No such process
jan 02 16:11:01 X1-Leon systemd[4874]: app-gnome-webex-17111.scope: Failed to add PIDs to scope's control group: No such process
jan 02 16:11:01 X1-Leon systemd[4874]: app-gnome-webex-17111.scope: Failed with result 'resources'.
jan 02 16:11:01 X1-Leon systemd[4874]: Failed to start app-gnome-webex-17111.scope - Application launched by gnome-shell.
jan 02 16:11:02 X1-Leon /usr/libexec/gdm-x-session[4945]: (II) modeset(0): EDID vendor "BOE", prod id 3072
jan 02 16:11:02 X1-Leon /usr/libexec/gdm-x-session[4945]: (II) modeset(0): Using hsync ranges from config file
jan 02 16:11:02 X1-Leon /usr/libexec/gdm-x-session[4945]: (II) modeset(0): Using vrefresh ranges from config file
jan 02 16:11:02 X1-Leon /usr/libexec/gdm-x-session[4945]: (II) modeset(0): Printing DDC gathered Modelines:
jan 02 16:11:02 X1-Leon /usr/libexec/gdm-x-session[4945]: (II) modeset(0): Modeline "1920x1200"x0.0 154.76 1920 1968 2000 2080 1200 1203 1209 1240 +hsync -vsync (74.4 kHz eP)
jan 02 16:11:09 X1-Leon pipewire[4892]: pw.node: (alsa_input.pci-0000_00_1f.3-platform-skl_hda_dsp_generic.HiFi__Mic1__source-63) graph xrun not-triggered (49 suppressed)
jan 02 16:11:09 X1-Leon pipewire[4892]: pw.node: (alsa_input.pci-0000_00_1f.3-platform-skl_hda_dsp_generic.HiFi__Mic1__source-63) xrun state:0x7fc776434008 pending:1/2 s:1164561764670 a:1164561796505 f:1164561798193 waiting:31835 process:1688 status:triggered

Besides that I didn't have any audio when setting output over HDMI (1). Spotify would play over the Internal speakers, but when switching to HDMI(1) I couldn't hear anything anymore. Switching back to Internal and music was back.
I could unplug and replug my screen (USB-C). But this didn't help.
The only thing that resolved this was sleeping my system, and awake. Than audio over HDMI (1) worked again.
I don' t think however that this (no audio over HDMI until sleep-awake) is related to the original issue (DSP Panic).

However after rebooting again with my custom kernel 6.13.0-rc3-sof I could reproduce both problems (crashing audio when testing with WebEx, and no audio over HDMI1 until sleep/wake-up), I don' t know why I didn't had this problem 3 days ago, but now I can reproduce this broken-pipe errors and missing audio over hdmi(1) until awake-from-sleep every time. with both kernels.

I found out about the WebEx issue, because the audio disappeared when haveing WebEx-calls after some minutes. Then after some testing in the WebEx-settings, when tailing journalctl I saw the DSP Panic. I usually have my WebEx calls audio over the internal build in speakers and Microphone of my X1 gen12 (something that works out fine on my previous X1 Yoga gen3). And music I play over my HDMI-out.
What I noticed when testing is that when playing music with Spotify over HDMI (which it can do for hours without a problem), and go to settings in WebEx I can make the music playback stuttering, and eventually crash, when I put the Audio out in Webex to the HDMI as well, without actually starting the test-tune under settings. After changing the WebEx output under settings to system-default (which is the internal speaker) the Spotify output over HDMI is restored again. See this journal output:

jan 02 16:42:20 X1-Leon pipewire[4892]: spa.alsa: hw:sofhdadsp,3p: (277 suppressed) snd_pcm_avail after recover: Gebroken pijp
jan 02 16:42:22 X1-Leon pipewire[4892]: spa.alsa: hw:sofhdadsp,3p: (375 suppressed) snd_pcm_avail after recover: Gebroken pijp
jan 02 16:42:24 X1-Leon pipewire[4892]: spa.alsa: hw:sofhdadsp,3p: (375 suppressed) snd_pcm_avail after recover: Gebroken pijp
jan 02 16:42:26 X1-Leon pipewire[4892]: spa.alsa: hw:sofhdadsp,3p: (375 suppressed) snd_pcm_avail after recover: Gebroken pijp
jan 02 16:42:28 X1-Leon pipewire[4892]: pw.node: (alsa_output.pci-0000_00_1f.3-platform-skl_hda_dsp_generic.HiFi__Speaker__sink-61) graph xrun not-triggered (88 suppressed)
jan 02 16:42:28 X1-Leon pipewire[4892]: pw.node: (alsa_output.pci-0000_00_1f.3-platform-skl_hda_dsp_generic.HiFi__Speaker__sink-61) xrun state:0x7fc785797008 pending:1/1 s:3040857921559 a:3040867940596 f:3040867954580 waiting:10019037 process:13984 status:triggered
jan 02 16:42:28 X1-Leon pipewire[4892]: spa.alsa: hw:sofhdadsp,3p: (375 suppressed) snd_pcm_avail after recover: Gebroken pijp
jan 02 16:42:30 X1-Leon pipewire[4892]: spa.alsa: hw:sofhdadsp,3p: (375 suppressed) snd_pcm_avail after recover: Gebroken pijp
jan 02 16:42:32 X1-Leon pipewire[4892]: spa.alsa: hw:sofhdadsp,3p: (375 suppressed) snd_pcm_avail after recover: Gebroken pijp
jan 02 16:42:34 X1-Leon pipewire[4892]: spa.alsa: hw:sofhdadsp,3p: (375 suppressed) snd_pcm_avail after recover: Gebroken pijp

If there is anything else I can do to provide more insights please say so (and maybe point me in the right direction how to get this info). Or if this is not directly related to the original DSP Panic, and release DMA solution, but I should open a new ticket I can do that as well.

@carlinigraphy
Copy link

I am also experiencing this issue. Also on a 12th Gen X1 Carbon.

If you need someone else to test--please let me know.

@ujfalusi
Copy link
Contributor

ujfalusi commented Jan 3, 2025

@carlinigraphy, also triggered by WebEx?

@ujfalusi
Copy link
Contributor

ujfalusi commented Jan 3, 2025

@lvanderree, can you clarify these:

  • with 6.13-rc5 there is no DSP panic
  • with topic/sof-dev kernel there is no DSP panic
  • With both kernel when using WebEx you experience loss of audio which can be fixed by suspend/resume

When the audio broke, can it be resurrected by stopping all audio activities and letting PA/PW to close the PCM devices?
This can be checked with cat /proc/asound/card0/pcm*/sub0/status (or better for pcm in /proc/asound/card*/pcm*; do echo ${pcm}; cat ${pcm}/sub0/status; done) if it returns only closed then there is no audio activity going on and the card should power of and in theory audio would work?

@ujfalusi
Copy link
Contributor

ujfalusi commented Jan 3, 2025

@lvanderree, @carlinigraphy, please follow #9695 (comment) and provide raw dmesg log without filtering which contains the events around the time the issue happens, but as long as possible to see the prior events.

If you don't see the DSP panic with 6.13-rc5 / 6.12.8 / sof-dev kernel then please open a new issue to track this, thank you.

@lvanderree
Copy link

@lvanderree, can you clarify these:

  • with 6.13-rc5 there is no DSP panic
  • with topic/sof-dev kernel there is no DSP panic
  • with both kernel when using WebEx you experience loss of audio (also of alternative sources playing at the same time), which can be fixed by closing the WebEx stream
  • Audio out via HDMI (1) does not work, until after a suspend/resume

When the audio broke, can it be resurrected by stopping all audio activities and letting PA/PW to close the PCM devices? This can be checked with cat /proc/asound/card0/pcm*/sub0/status (or better for pcm in /proc/asound/card*/pcm*; do echo ${pcm}; cat ${pcm}/sub0/status; done) if it returns only closed then there is no audio activity going on and the card should power of and in theory audio would work?

When I play music (Spotify) and then start/stop/start/stop/start... the audio-test in WebEx I can break alle audio. I let Spotify continue playing, but close the (already stopped test) WebEx settings-window, letting WebEx itself continu to run, and the music starts playing again.

So in short , I don't even have to close all audio activity to restore audio. Only closing the Web-Ex streams seem to be enough.

I haven' t had time to reboot with the sof-dyndbg.conf.txt config, but I can do that in a new ticket, which probably is more suitable than continue in this thread.

Some debug results:

When I've closed both Spotify and WebEx no PCM streams are opened:

for pcm in /proc/asound/card*/pcm*; do echo ${pcm}; cat ${pcm}/sub0/status; done
/proc/asound/card0/pcm0c
closed
/proc/asound/card0/pcm0p
closed
/proc/asound/card0/pcm31p
closed
/proc/asound/card0/pcm3p
closed
/proc/asound/card0/pcm4p
closed
/proc/asound/card0/pcm5p
closed
/proc/asound/card0/pcm6c
closed

and after starting both (Spotify and WebEx), but not starting sound yet, the PCM streams remain closed.

Then when playing only in Spotify over the internal speaker I see:

for pcm in /proc/asound/card*/pcm*; do echo ${pcm}; cat ${pcm}/sub0/status; done
/proc/asound/card0/pcm0c
closed
/proc/asound/card0/pcm0p
state: RUNNING
owner_pid   : 3530
trigger_time: 10541.693975450
tstamp      : 10576.476283753
delay       : 3702
avail       : 29280
avail_max   : 32304
-----
hw_ptr      : 1669728
appl_ptr    : 1673216
/proc/asound/card0/pcm31p
closed
/proc/asound/card0/pcm3p
closed
/proc/asound/card0/pcm4p
closed
/proc/asound/card0/pcm5p
closed
/proc/asound/card0/pcm6c
closed

Then opening the Audio-settings in WebEx, while Spofity remains playing, but without starting tests I see:

for pcm in /proc/asound/card*/pcm*; do echo ${pcm}; cat ${pcm}/sub0/status; done
/proc/asound/card0/pcm0c
closed
/proc/asound/card0/pcm0p
state: RUNNING
owner_pid   : 3530
trigger_time: 10693.413010349
tstamp      : 10740.771640427
delay       : 374
avail       : 32608
avail_max   : 32736
-----
hw_ptr      : 2273376
appl_ptr    : 2273536
/proc/asound/card0/pcm31p
closed
/proc/asound/card0/pcm3p
closed
/proc/asound/card0/pcm4p
closed
/proc/asound/card0/pcm5p
closed
/proc/asound/card0/pcm6c
state: RUNNING
owner_pid   : 3530
trigger_time: 10738.352621064
tstamp      : 10740.775385002
delay       : 2305843009213578122
avail       : 540
avail_max   : 572
-----
hw_ptr      : 116252
appl_ptr    : 115712

So an additional stream have started. I guess that is the microphone being read

Then I start/stop/start/stop/.... the WebEx audio test, unit audio completely breaks, and the result is:

for pcm in /proc/asound/card*/pcm*; do echo ${pcm}; cat ${pcm}/sub0/status; done
/proc/asound/card0/pcm0c
closed
/proc/asound/card0/pcm0p
state: RUNNING
owner_pid   : 3530
trigger_time: 10864.060850351
tstamp      : 10864.060873695
delay       : 0
avail       : 32832
avail_max   : 32832
-----
hw_ptr      : 192
appl_ptr    : 128
/proc/asound/card0/pcm31p
closed
/proc/asound/card0/pcm3p
closed
/proc/asound/card0/pcm4p
closed
/proc/asound/card0/pcm5p
closed
/proc/asound/card0/pcm6c
state: RUNNING
owner_pid   : 3530
trigger_time: 10862.674775343
tstamp      : 10864.068168377
delay       : 2305843009213658112
avail       : 31020
avail_max   : 31292
-----
hw_ptr      : 66860
appl_ptr    : 35840

with this in journalctl:

jan 03 11:27:04 X1-Leon udc[1882]: 2025-01-03T11:27:04Z [INFO] [Syncer] Syncing item_event: DeviceStatus
jan 03 11:27:14 X1-Leon pipewire[3530]: spa.alsa: hw:sofhdadspp: (22 suppressed) snd_pcm_avail after recover: Gebroken pijp
jan 03 11:27:21 X1-Leon pipewire[3530]: spa.alsa: hw:sofhdadspp: (375 suppressed) snd_pcm_avail after recover: Gebroken pijp
jan 03 11:27:27 X1-Leon pipewire[3530]: spa.alsa: hw:sofhdadspp: (375 suppressed) snd_pcm_avail after recover: Gebroken pijp
jan 03 11:27:34 X1-Leon pipewire[3530]: spa.alsa: hw:sofhdadspp: (375 suppressed) snd_pcm_avail after recover: Gebroken pijp
jan 03 11:27:41 X1-Leon pipewire[3530]: spa.alsa: hw:sofhdadspp: (375 suppressed) snd_pcm_avail after recover: Gebroken pijp

When I keep the windows like this, and rerun the cat of /proc/asound/card0/pcm0p I see that avail=avail_max, and sometimes avail_max=0

Then closing the audio-settings screen in WebEx, restores the audio, letting the music from Spotify hear again.
I also see the /proc/asound/card0/pcm6c stream is again closed, and the /proc/asound/card0/pcm0p is continue running with avail<avail_max and avail_max not being 0 again

/proc/asound/card0/pcm0p
state: RUNNING
owner_pid   : 3530
trigger_time: 11043.459009206
tstamp      : 11154.669993750
delay       : 3267
avail       : 29728
avail_max   : 30752
-----
hw_ptr      : 5338272
appl_ptr    : 5341312

What I also noticed is that when playing Spotify over HDMI, I see that it uses pcm3p

for pcm in /proc/asound/card*/pcm*; do echo ${pcm}; cat ${pcm}/sub0/status; done
/proc/asound/card0/pcm0c
closed
/proc/asound/card0/pcm0p
closed
/proc/asound/card0/pcm31p
closed
/proc/asound/card0/pcm3p
state: RUNNING
owner_pid   : 3530
trigger_time: 11249.070593003
tstamp      : 11273.013007550
delay       : 3540
avail       : 29228
avail_max   : 30956
-----
hw_ptr      : 1149484
appl_ptr    : 1153024
/proc/asound/card0/pcm4p
closed
/proc/asound/card0/pcm5p
closed
/proc/asound/card0/pcm6c
closed

and when I then open the WebEx audio settings, it opens again on pcm0p and pcm6c besides the already opend pcm3p

for pcm in /proc/asound/card*/pcm*; do echo ${pcm}; cat ${pcm}/sub0/status; done
/proc/asound/card0/pcm0c
closed
/proc/asound/card0/pcm0p
state: RUNNING
owner_pid   : 3530
trigger_time: 11331.773588943
tstamp      : 11334.647066680
delay       : 470
avail       : 32544
avail_max   : 32704
-----
hw_ptr      : 138144
appl_ptr    : 138368
/proc/asound/card0/pcm31p
closed
/proc/asound/card0/pcm3p
state: RUNNING
owner_pid   : 3530
trigger_time: 11249.070593003
tstamp      : 11334.649165490
delay       : 2332
avail       : 30436
avail_max   : 30748
-----
hw_ptr      : 4108004
appl_ptr    : 4110336
/proc/asound/card0/pcm4p
closed
/proc/asound/card0/pcm5p
closed
/proc/asound/card0/pcm6c
state: RUNNING
owner_pid   : 3530
trigger_time: 11331.706623783
tstamp      : 11334.651337670
delay       : 2305843009213552988
avail       : 508
avail_max   : 604
-----
hw_ptr      : 141308
appl_ptr    : 140800

BUT when I then start/stop the tests, to break the audio, The webex audio will mute, but the Spotify audio over HDMI will start stuttering.

for pcm in /proc/asound/card*/pcm*; do echo ${pcm}; cat ${pcm}/sub0/status; done
/proc/asound/card0/pcm0c
closed
/proc/asound/card0/pcm0p
state: RUNNING
owner_pid   : 3530
trigger_time: 11473.847726298
tstamp      : 11473.847742545
delay       : 0
avail       : 32832
avail_max   : 32832
-----
hw_ptr      : 192
appl_ptr    : 128
/proc/asound/card0/pcm31p
closed
/proc/asound/card0/pcm3p
state: RUNNING
owner_pid   : 3530
trigger_time: 11472.809807060
tstamp      : 11473.851675325
delay       : 936
avail       : 31832
avail_max   : 31832
-----
hw_ptr      : 50264
appl_ptr    : 51200
/proc/asound/card0/pcm4p
closed
/proc/asound/card0/pcm5p
closed
/proc/asound/card0/pcm6c
state: RUNNING
owner_pid   : 3530
trigger_time: 11472.750559634
tstamp      : 11473.853639429
delay       : 2305843009213662768
avail       : 21660
avail_max   : 22028
-----
hw_ptr      : 52892
appl_ptr    : 31232

And it keeps on stuttering, even while pcm3p sometimes "recovers" avail<avail_max (but sometimes also again is at max):

/proc/asound/card0/pcm0c
closed
/proc/asound/card0/pcm0p
state: SETUP
owner_pid   : 3530
trigger_time: 11610.694379598
tstamp      : 11610.700505371
delay       : 0
avail       : 32832
avail_max   : 0
-----
hw_ptr      : 192
appl_ptr    : 128
/proc/asound/card0/pcm31p
closed
/proc/asound/card0/pcm3p
state: RUNNING
owner_pid   : 3530
trigger_time: 11609.675993003
tstamp      : 11610.702031185
delay       : 1660
avail       : 31108
avail_max   : 31828
-----
hw_ptr      : 49540
appl_ptr    : 51200
/proc/asound/card0/pcm4p
closed
/proc/asound/card0/pcm5p
closed
/proc/asound/card0/pcm6c
state: RUNNING
owner_pid   : 3530
trigger_time: 11609.634805019
tstamp      : 11610.704091548
delay       : 2305843009213663792
avail       : 21100
avail_max   : 21132
-----
hw_ptr      : 51308
appl_ptr    : 30208

until I close the Webex audio-settings window, and the audio recovers again wihtout stuttering:

for pcm in /proc/asound/card*/pcm*; do echo ${pcm}; cat ${pcm}/sub0/status; done
/proc/asound/card0/pcm0c
closed
/proc/asound/card0/pcm0p
closed
/proc/asound/card0/pcm31p
closed
/proc/asound/card0/pcm3p
state: RUNNING
owner_pid   : 3530
trigger_time: 11661.147549638
tstamp      : 11665.175596114
delay       : 2988
avail       : 29780
avail_max   : 32732
-----
hw_ptr      : 193620
appl_ptr    : 196608
/proc/asound/card0/pcm4p
closed
/proc/asound/card0/pcm5p
closed
/proc/asound/card0/pcm6c
closed

@ujfalusi
Copy link
Contributor

ujfalusi commented Jan 3, 2025

@lvanderree, thank you for the details!
In all cases WebEx is involved if I read it right, have you checked if it happens also if you run WebEx in browser instead of the application?

I don't have WebEx account and it looks like that the configuration is account-walled.

That is really surprising that WebEx using analog out+DMIC can break HDMI playback...
Can you check the PCM states

  1. when WebEx test is started
  2. when WebEx test is stopped

What is even more strange is that when you play audio to speaker and start WebEx (which will use the same PCM device) it will break.
All audio is going through PA/PW audio servers and it is the only entity which allowed to use the hardware, adding, removing new streams must not have any effect on the opened stream, PA/PW is doing the mixing and keeps the audio device running.

Unless WebEx takes over the audio somehow...

@ujfalusi
Copy link
Contributor

ujfalusi commented Jan 3, 2025

OK, I can reproduce this with WebEx now, wow.

@lvanderree
Copy link

For now WebEx was the only app I've experienced issues with. Good to hear you are able to install it as well, and can enjoy this wonderful product as well 🤞
I hadn't thought about using their web-app yet, but tried it out right now, and the web app didn't seem to have any troubles, even though they seem to have a extensive web implementation as well, capable of iterating over all my devices.

The web app doesn't have a settings window, so I cannot test the settings via the web app. However I made a call via web to the native (linux) app and that did made my audio system crash (with broken pipes) again. But as soon as I ended the app-side my spotify sound restored again.

So it definitely is some magic the app does. I can partly follow what you say and that indeed sounds surprising that they somehow interfere with PA/PW. Please let me know what else I can do, now you can test and see for yourself as well.

@carlinigraphy
Copy link

@ujfalusi,

I am not experiencing with WebEx. I can reproduce fairly consistently by just opening pavucontrol.

Please allow me some time, I'll try to follow the guide in the referenced comment and post the output.

gestionlin pushed a commit to gestionlin/linux that referenced this issue Jan 4, 2025
The linkDMA should not be released on stop trigger since a stream re-start
might happen without closing of the stream. This leaves a short time for
other streams to 'steal' the linkDMA since it has been released.

This issue is not easy to reproduce under normal conditions as usually
after stop the stream is closed, or the same stream is restarted, but if
another stream got in between the stop and start, like this:
aplay -Dhw:0,3 -c2 -r48000 -fS32_LE /dev/zero -d 120
CTRL+z
aplay -Dhw:0,0 -c2 -r48000 -fS32_LE /dev/zero -d 120

then the link DMA channels will be mixed up, resulting firmware error or
crash.

Fixes: ab55937 ("ASoC: SOF: Intel: hda: Always clean up link DMA during stop")
Cc: [email protected]
Closes: thesofproject/sof#9695
Signed-off-by: Peter Ujfalusi <[email protected]>
Reviewed-by: Ranjani Sridharan <[email protected]>
Reviewed-by: Liam Girdwood <[email protected]>
Reviewed-by: Bard Liao <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Mark Brown <[email protected]>
@ujfalusi
Copy link
Contributor

ujfalusi commented Jan 7, 2025

@lvanderree , @carlinigraphy , quick update: something odd is going on when pipewire is in use, we are still debugging what could be the reason.
Switching to Pulseaudio seams to get things working correctly, that could be a short term workaround if you can live with that (and confirm).

@lvanderree
Copy link

thank for the update @ujfalusi

I did a

sudo dnf swap --allowerasing pipewire-pulseaudio pulseaudio
systemctl --user stop pipewire.socket pipewire-pulse.service
pulseaudio -D

and eventually a reboot

After that my audio did not came out my internal speakers, nor my HDMI (via USB-C).
However after looking around in pavucontrol the profile under configuration, the audio was restored.

Then I played Spotify over HDMI, while testing WebEx settings and everything went on fine.
Then I made a WebEx call via my browser to the native app, increasing the volume until it was singing around, but all without any crash or technical problem.

So for now I can confirm the issue is related to PipreWire, and switch to Pulseaudio is a perfectly fine fix for me for the moment (better than the web-implementation of WebEx that I used today, which worked good with PipeWire).
If I can test something for PipeWire, please let me know!

@vanushwashere
Copy link

I've got the same freezes on my laptop:
OS: KDE neon 6.2 x86_64
Host: 21MA002XRT ThinkPad E16 Gen 2
Kernel: 6.8.0-51-generic

2025-01-08T00:08:01.169603+04:00 mangata kernel: sof-audio-pci-intel-mtl 0000:00:1f.3: ASoC: error at soc_dai_trigger on Analog CPU DAI: -22                                                                                              │
│2025-01-08T00:08:01.169613+04:00 mangata kernel:  HDA Analog: ASoC: trigger FE cmd: 1 failed: -22                                                                                                                                         │
│2025-01-08T00:08:01.169614+04:00 mangata kernel: sof-audio-pci-intel-mtl 0000:00:1f.3: ASoC: error at soc_dai_trigger on Analog CPU DAI: -22                                                                                              │
│2025-01-08T00:08:01.169615+04:00 mangata kernel:  HDA Analog: ASoC: trigger FE cmd: 1 failed: -22                                                                                                                                         │
│2025-01-08T00:08:01.169616+04:00 mangata kernel: sof-audio-pci-intel-mtl 0000:00:1f.3: ASoC: error at soc_dai_trigger on Analog CPU DAI: -22                                                                                              │
│2025-01-08T00:08:01.169617+04:00 mangata kernel:  HDA Analog: ASoC: trigger FE cmd: 1 failed: -22                                                                                                                                         │
│2025-01-08T00:08:01.169628+04:00 mangata kernel: sof-audio-pci-intel-mtl 0000:00:1f.3: ASoC: error at soc_dai_trigger on Analog CPU DAI: -22                                                                                              │
│2025-01-08T00:08:01.169629+04:00 mangata kernel:  HDA Analog: ASoC: trigger FE cmd: 1 failed: -22                                                                                                                                         │
│2025-01-08T00:08:01.169630+04:00 mangata kernel: sof-audio-pci-intel-mtl 0000:00:1f.3: ASoC: error at soc_dai_trigger on Analog CPU DAI: -22                                                                                              │
│2025-01-08T00:08:01.169631+04:00 mangata kernel:  HDA Analog: ASoC: trigger FE cmd: 1 failed: -22                                                                                                                                         │
│2025-01-08T00:08:01.169632+04:00 mangata kernel: sof-audio-pci-intel-mtl 0000:00:1f.3: ASoC: error at soc_dai_trigger on Analog CPU DAI: -22                                                                                              │
│2025-01-08T00:08:01.169633+04:00 mangata kernel:  HDA Analog: ASoC: trigger FE cmd: 1 failed: -22                                                                                                                                         │
│2025-01-08T00:08:01.169634+04:00 mangata kernel: sof-audio-pci-intel-mtl 0000:00:1f.3: ASoC: error at soc_dai_trigger on Analog CPU DAI: -22                                                                                              │
│2025-01-08T00:08:01.169634+04:00 mangata kernel:  HDA Analog: ASoC: trigger FE cmd: 1 failed: -22                                                                                                                                         │
│2025-01-08T00:08:01.169635+04:00 mangata kernel: sof-audio-pci-intel-mtl 0000:00:1f.3: ASoC: error at soc_dai_trigger on Analog CPU DAI: -22                                                                                              │
│2025-01-08T00:08:01.169636+04:00 mangata kernel:  HDA Analog: ASoC: trigger FE cmd: 1 failed: -22                                                                                                                                         │
│2025-01-08T00:08:01.169636+04:00 mangata kernel: sof-audio-pci-intel-mtl 0000:00:1f.3: ASoC: error at soc_dai_trigger on Analog CPU DAI: -22                                                                                              │
│2025-01-08T00:08:01.169637+04:00 mangata kernel:  HDA Analog: ASoC: trigger FE cmd: 1 failed: -22                                                                                                                                         │
│2025-01-08T00:08:01.169638+04:00 mangata kernel: sof-audio-pci-intel-mtl 0000:00:1f.3: ASoC: error at soc_dai_trigger on Analog CPU DAI: -22                                                                                              │
│2025-01-08T00:08:01.169639+04:00 mangata kernel:  HDA Analog: ASoC: trigger FE cmd: 1 failed: -22                                                                                                                                         │
│2025-01-08T00:08:01.169640+04:00 mangata kernel: sof-audio-pci-intel-mtl 0000:00:1f.3: ASoC: error at soc_dai_trigger on Analog CPU DAI: -22                                                                                              │
│2025-01-08T00:08:01.169640+04:00 mangata kernel:  HDA Analog: ASoC: trigger FE cmd: 1 failed: -22                                                                                                                                         │
│2025-01-08T00:08:01.169641+04:00 mangata kernel: sof-audio-pci-intel-mtl 0000:00:1f.3: ASoC: error at soc_dai_trigger on Analog CPU DAI: -22                                                                                              │
│2025-01-08T00:08:01.169653+04:00 mangata kernel:  HDA Analog: ASoC: trigger FE cmd: 1 failed: -22                                                                                                                                         │
│2025-01-08T00:08:01.169654+04:00 mangata kernel: sof-audio-pci-intel-mtl 0000:00:1f.3: ASoC: error at soc_dai_trigger on Analog CPU DAI: -22                                                                                              │
│2025-01-08T00:08:01.169655+04:00 mangata kernel:  HDA Analog: ASoC: trigger FE cmd: 1 failed: -22                                                                                                                                         │
│2025-01-08T00:08:01.169656+04:00 mangata kernel: sof-audio-pci-intel-mtl 0000:00:1f.3: ASoC: error at soc_dai_trigger on Analog CPU DAI: -22                                                                                              │
│2025-01-08T00:08:01.169657+04:00 mangata kernel:  HDA Analog: ASoC: trigger FE cmd: 1 failed: -22                                                                                                                                         │
│2025-01-08T00:08:01.169658+04:00 mangata kernel: sof-audio-pci-intel-mtl 0000:00:1f.3: ASoC: error at soc_dai_trigger on Analog CPU DAI: -22                                                                                              │
│2025-01-08T00:08:01.169659+04:00 mangata kernel:  HDA Analog: ASoC: trigger FE cmd: 1 failed: -22                                                                                                                                         │
│2025-01-08T00:08:01.169660+04:00 mangata kernel: sof-audio-pci-intel-mtl 0000:00:1f.3: ASoC: error at soc_dai_trigger on Analog CPU DAI: -22                                                                                              │
│2025-01-08T00:08:01.169661+04:00 mangata kernel:  HDA Analog: ASoC: trigger FE cmd: 1 failed: -22                                                                                                                                         │
│2025-01-08T00:08:01.169662+04:00 mangata kernel: sof-audio-pci-intel-mtl 0000:00:1f.3: ASoC: error at soc_dai_trigger on Analog CPU DAI: -22                                                                                              │
│2025-01-08T00:08:01.169663+04:00 mangata kernel:  HDA Analog: ASoC: trigger FE cmd: 1 failed: -22                                                                                                                                         │
│2025-01-08T00:08:01.169664+04:00 mangata kernel: sof-audio-pci-intel-mtl 0000:00:1f.3: ASoC: error at soc_dai_trigger on Analog CPU DAI: -22                                                                                              │
│2025-01-08T00:08:01.169665+04:00 mangata kernel:  HDA Analog: ASoC: trigger FE cmd: 1 failed: -22                                                  

@vanushwashere
Copy link

Changing to pulseaudio didn't help with the problem for me, I am still getting that errors and experience freezes

@ujfalusi
Copy link
Contributor

ujfalusi commented Jan 8, 2025

@vanushwashere, I think your issue is different and unrelated but I cannot see what goes wrong.
Can you follow #9695 (comment) and open a new issue with the full kernel log attached?

Please also attach the output of alsa-info.sh

Thank you.

@vanushwashere
Copy link

It just spams that messages to log every millisecond :/
I will open a new issue later

tacitness pushed a commit to tacitness/linux that referenced this issue Jan 20, 2025
The linkDMA should not be released on stop trigger since a stream re-start
might happen without closing of the stream. This leaves a short time for
other streams to 'steal' the linkDMA since it has been released.

This issue is not easy to reproduce under normal conditions as usually
after stop the stream is closed, or the same stream is restarted, but if
another stream got in between the stop and start, like this:
aplay -Dhw:0,3 -c2 -r48000 -fS32_LE /dev/zero -d 120
CTRL+z
aplay -Dhw:0,0 -c2 -r48000 -fS32_LE /dev/zero -d 120

then the link DMA channels will be mixed up, resulting firmware error or
crash.

Fixes: ab55937 ("ASoC: SOF: Intel: hda: Always clean up link DMA during stop")
Cc: [email protected]
Closes: thesofproject/sof#9695
Signed-off-by: Peter Ujfalusi <[email protected]>
Reviewed-by: Ranjani Sridharan <[email protected]>
Reviewed-by: Liam Girdwood <[email protected]>
Reviewed-by: Bard Liao <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Mark Brown <[email protected]>
tacitness pushed a commit to tacitness/linux that referenced this issue Jan 20, 2025
The linkDMA should not be released on stop trigger since a stream re-start
might happen without closing of the stream. This leaves a short time for
other streams to 'steal' the linkDMA since it has been released.

This issue is not easy to reproduce under normal conditions as usually
after stop the stream is closed, or the same stream is restarted, but if
another stream got in between the stop and start, like this:
aplay -Dhw:0,3 -c2 -r48000 -fS32_LE /dev/zero -d 120
CTRL+z
aplay -Dhw:0,0 -c2 -r48000 -fS32_LE /dev/zero -d 120

then the link DMA channels will be mixed up, resulting firmware error or
crash.

Fixes: ab55937 ("ASoC: SOF: Intel: hda: Always clean up link DMA during stop")
Cc: [email protected]
Closes: thesofproject/sof#9695
Signed-off-by: Peter Ujfalusi <[email protected]>
Reviewed-by: Ranjani Sridharan <[email protected]>
Reviewed-by: Liam Girdwood <[email protected]>
Reviewed-by: Bard Liao <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Mark Brown <[email protected]>
ninjafinne pushed a commit to data-respons-solutions/dr-kernel-mirror that referenced this issue Jan 23, 2025
commit e8d0ba147d901022bcb69da8d8fd817f84e9f3ca upstream.

The linkDMA should not be released on stop trigger since a stream re-start
might happen without closing of the stream. This leaves a short time for
other streams to 'steal' the linkDMA since it has been released.

This issue is not easy to reproduce under normal conditions as usually
after stop the stream is closed, or the same stream is restarted, but if
another stream got in between the stop and start, like this:
aplay -Dhw:0,3 -c2 -r48000 -fS32_LE /dev/zero -d 120
CTRL+z
aplay -Dhw:0,0 -c2 -r48000 -fS32_LE /dev/zero -d 120

then the link DMA channels will be mixed up, resulting firmware error or
crash.

Fixes: ab55937 ("ASoC: SOF: Intel: hda: Always clean up link DMA during stop")
Cc: [email protected]
Closes: thesofproject/sof#9695
Signed-off-by: Peter Ujfalusi <[email protected]>
Reviewed-by: Ranjani Sridharan <[email protected]>
Reviewed-by: Liam Girdwood <[email protected]>
Reviewed-by: Bard Liao <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Mark Brown <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working as expected P2 Critical bugs or normal features
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants