[WHL HDA] firmware boot failure on WHL platform. #767

Jiangxinx · 2019-04-02T06:05:50Z

summary:
firmware boot failure on WHL platform. According to the call stack information,it seems to be caused by commit 9c6b980

Step:
1.Aplay -l

Output:

aplay: device_list:270: no soundcards found...

Dmesg:

[    3.474102] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
[    3.474102] #PF error: [normal kernel read fault]
[    3.474102] PGD 0 P4D 0
[    3.474104] Oops: 0000 [#1] SMP NOPTI
[    3.474106] CPU: 1 PID: 2076 Comm: irq/144-AudioDS Not tainted 5.0.0-daily-hda-20190402 #f93c4ee9
[    3.474106] Hardware name: Intel Corporation CoffeeLake Client Platform/WhiskeyLake U DDR4 ERB, BIOS CNLSFWR1.R00.X137.B00.1803280218 03/28/2018
[    3.474110] RIP: 0010:hda_dsp_ipc_get_reply+0x35/0x130 [snd_sof_intel_hda_common]
[    3.474110] Code: 53 48 89 fb 48 83 ec 18 48 8b af 88 01 00 00 4c 89 e7 65 48 8b 04 25 28 00 00 00 48 89 44 24 10 31 c0 e8 6e a8 ea cf 49 89 c6 <48> 8b 45 08 81 78 04 00 00 01 40 0f 85 8d 00 00 00 48 8b 55 20 48
[    3.474111] RSP: 0018:ffffb9be41017e28 EFLAGS: 00010046
[    3.474112] RAX: 0000000000000246 RBX: ffff9f3cdf881028 RCX: ffffffffc0304240
[    3.474112] RDX: 0000000000000001 RSI: 0000000000000286 RDI: ffff9f3cdf881030
[    3.474113] RBP: 0000000000000000 R08: 0000000000000002 R09: ffff9f3ce3a60a40
[    3.474114] R10: ffffb9be412a7e08 R11: 0000000000000004 R12: ffff9f3cdf881030
[    3.474114] R13: 0000000000000002 R14: 0000000000000246 R15: ffff9f3cdf5f0000
[    3.474115] FS:  0000000000000000(0000) GS:ffff9f3ce3a40000(0000) knlGS:0000000000000000
[    3.474116] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    3.474116] CR2: 0000000000000008 CR3: 00000002606c8005 CR4: 00000000003606e0
[    3.474117] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    3.474117] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[    3.474117] Call Trace:
[    3.474121]  ? snd_sof_dsp_update_bits+0x51/0x70 [snd_sof]
[    3.474123]  cnl_ipc_irq_thread+0x149/0x9f0 [snd_sof_intel_hda_common]
[    3.474126]  ? irq_forced_thread_fn+0x70/0x70
[    3.474127]  irq_thread_fn+0x1c/0x60
[    3.474129]  irq_thread+0xe2/0x160
[    3.474130]  ? wake_threads_waitq+0x30/0x30
[    3.474132]  ? irq_thread_dtor+0x90/0x90
[    3.474133]  kthread+0x10e/0x130
[    3.474134]  ? kthread_park+0x80/0x80
[    3.474136]  ret_from_fork+0x35/0x40
[    3.474137] Modules linked in: snd_soc_hdac_hdmi(+) snd_hda_codec_realtek snd_hda_codec_generic snd_soc_dmic sof_pci_dev snd_sof_intel_hda_common snd_soc_hdac_hda iwlmvm(+) snd_sof_intel_hda snd_sof_intel_byt snd_sof_xtensa_dsp snd_sof snd_soc_acpi_intel_match snd_soc_acpi snd_hda_ext_core snd_soc_core snd_hda_codec snd_hwdep snd_hda_core snd_pcm x86_pkg_temp_thermal iwlwifi intel_lpss_pci intel_lpss mfd_core efivarfs sdhci_pci xhci_pci cqhci sdhci xhci_hcd
[    3.474148] CR2: 0000000000000008
[    3.474149] ---[ end trace b2191f1416986bb1 ]---
[    3.474151] RIP: 0010:hda_dsp_ipc_get_reply+0x35/0x130 [snd_sof_intel_hda_common]
[    3.474151] Code: 53 48 89 fb 48 83 ec 18 48 8b af 88 01 00 00 4c 89 e7 65 48 8b 04 25 28 00 00 00 48 89 44 24 10 31 c0 e8 6e a8 ea cf 49 89 c6 <48> 8b 45 08 81 78 04 00 00 01 40 0f 85 8d 00 00 00 48 8b 55 20 48
[    3.474152] RSP: 0018:ffffb9be41017e28 EFLAGS: 00010046
[    3.474153] RAX: 0000000000000246 RBX: ffff9f3cdf881028 RCX: ffffffffc0304240
[    3.474153] RDX: 0000000000000001 RSI: 0000000000000286 RDI: ffff9f3cdf881030
[    3.474154] RBP: 0000000000000000 R08: 0000000000000002 R09: ffff9f3ce3a60a40
[    3.474154] R10: ffffb9be412a7e08 R11: 0000000000000004 R12: ffff9f3cdf881030
[    3.474155] R13: 0000000000000002 R14: 0000000000000246 R15: ffff9f3cdf5f0000
[    3.474156] FS:  0000000000000000(0000) GS:ffff9f3ce3a40000(0000) knlGS:0000000000000000
[    3.474157] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    3.474157] CR2: 0000000000000008 CR3: 00000002606c8005 CR4: 00000000003606e0
[    3.474158] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    3.474158] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[    3.474161] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
[    3.474161] #PF error: [INSTR]
[    3.474161] PGD 0 P4D 0
[    3.474163] Oops: 0010 [#2] SMP NOPTI
[    3.474164] CPU: 1 PID: 2076 Comm: irq/144-AudioDS Tainted: G      D           5.0.0-daily-hda-20190402 #f93c4ee9
[    3.474164] Hardware name: Intel Corporation CoffeeLake Client Platform/WhiskeyLake U DDR4 ERB, BIOS CNLSFWR1.R00.X137.B00.1803280218 03/28/2018
[    3.474165] RIP: 0010:          (null)
[    3.474167] Code: Bad RIP value.
[    3.474167] RSP: 0018:ffffb9be41017ea8 EFLAGS: 00010282
[    3.474168] RAX: 0000000000000000 RBX: ffff9f3cdf5f0000 RCX: 0000000000000000
[    3.474169] RDX: ffffb9be41017ec8 RSI: 0000000000000000 RDI: ffffb9be41017ec8
[    3.474169] RBP: ffffffff912d69c0 R08: 0000000000000000 R09: 0000000000000000
[    3.474169] R10: 0000000000000246 R11: 0000000000000000 R12: 0000000000000000
[    3.474170] R13: ffff9f3cdf5f0704 R14: 0000000000000001 R15: ffff9f3cdf5f06d0
[    3.474171] FS:  0000000000000000(0000) GS:ffff9f3ce3a40000(0000) knlGS:0000000000000000
[    3.474171] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    3.474172] CR2: ffffffffffffffd6 CR3: 00000002606c8005 CR4: 00000000003606e0
[    3.474172] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    3.474173] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[    3.474173] Call Trace:
[    3.474174]  ? task_work_run+0x79/0xa0
[    3.474176]  ? do_exit+0x2ca/0xbc0
[    3.474178]  ? irq_thread_dtor+0x90/0x90
[    3.474179]  ? kthread+0x10e/0x130
[    3.474180]  ? rewind_stack_do_exit+0x17/0x20
[    3.474181] Modules linked in: snd_soc_hdac_hdmi(+) snd_hda_codec_realtek snd_hda_codec_generic snd_soc_dmic sof_pci_dev snd_sof_intel_hda_common snd_soc_hdac_hda iwlmvm(+) snd_sof_intel_hda snd_sof_intel_byt snd_sof_xtensa_dsp snd_sof snd_soc_acpi_intel_match snd_soc_acpi snd_hda_ext_core snd_soc_core snd_hda_codec snd_hwdep snd_hda_core snd_pcm x86_pkg_temp_thermal iwlwifi intel_lpss_pci intel_lpss mfd_core efivarfs sdhci_pci xhci_pci cqhci sdhci xhci_hcd
[    3.474186] CR2: 0000000000000000
[    3.474187] ---[ end trace b2191f1416986bb2 ]---
[    3.474188] RIP: 0010:hda_dsp_ipc_get_reply+0x35/0x130 [snd_sof_intel_hda_common]
[    3.474189] Code: 53 48 89 fb 48 83 ec 18 48 8b af 88 01 00 00 4c 89 e7 65 48 8b 04 25 28 00 00 00 48 89 44 24 10 31 c0 e8 6e a8 ea cf 49 89 c6 <48> 8b 45 08 81 78 04 00 00 01 40 0f 85 8d 00 00 00 48 8b 55 20 48
[    3.474189] RSP: 0018:ffffb9be41017e28 EFLAGS: 00010046
[    3.474190] RAX: 0000000000000246 RBX: ffff9f3cdf881028 RCX: ffffffffc0304240
[    3.474190] RDX: 0000000000000001 RSI: 0000000000000286 RDI: ffff9f3cdf881030
[    3.474191] RBP: 0000000000000000 R08: 0000000000000002 R09: ffff9f3ce3a60a40
[    3.474191] R10: ffffb9be412a7e08 R11: 0000000000000004 R12: ffff9f3cdf881030
[    3.474192] R13: 0000000000000002 R14: 0000000000000246 R15: ffff9f3cdf5f0000
[    3.474193] FS:  0000000000000000(0000) GS:ffff9f3ce3a40000(0000) knlGS:0000000000000000
[    3.474193] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    3.474194] CR2: ffffffffffffffd6 CR3: 00000002606c8005 CR4: 00000000003606e0
[    3.474194] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    3.474195] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[    3.474195] Fixing recursive fault but reboot is needed!
[    3.478124] HDMI HDA Codec ehdaudio0D2: Max dais supported: 3

Env:
sof master: acf2bd5
kernel sof-dev: f93c4ee
tplg: sof-hda-generic.tplg

The text was updated successfully, but these errors were encountered:

lgirdwood · 2019-04-02T10:42:27Z

@Jiangxinx can you attach full log. This may be a FW issue if we dont have ROM errors.

emilchudzik · 2019-04-02T11:48:11Z

Double check issue on my side:

python tests in windows HDA works without issues. (FW: master/d7b36bf3).

loading FW
basic playback and record on HDA (loopback on codec ALC700)

linux Env:
FW: master/d7b36bf3
tplg: master/d7b36bf3 -> sof-hda-generic.tplg
kernel: sof-dev/e29122d
HW: WHL (HD-A mode) with ALC700

The same as I described on thesofproject/sof#1173 (comment)

loading is ok
aplay -l return devices from topology as expected
run "aplay" command return error => unable to playback

wenqingfu · 2019-04-02T21:40:34Z

it seems to be caused by commit 9c6b980

9c6b980 is part of #740 patchset, so @lyakh

ranj063 · 2019-04-03T01:38:17Z

@Jiangxinx @emilchudzik can you please check if linux PR #775 helps?

xiulipan · 2019-04-03T02:18:30Z

@emilchudzik @ranj063 @Jiangxinx
e29122d is some older version than 9c6b980

So it should be OK.

f3adfd6 (HEAD -> sof-dev, sof/topic/sof-dev) ASoC:sof: remove duplicate posn message in kernel log
0967e07 ASoC: SOF: control: fix a PM put missing at error
d6d661c ASoC: SOF: Intel: fix period_bytes calculation at hw_params()
f93c4ee Merge pull request #740 from lyakh/ipc-20190321
f304b5c ASoC: sof: use iopoll.h macro for polled register reads
1a30233 ASoC: SOF: topology: use the default hw_config in DAI link init
c091fce ASoC: SOF: core: fix a typo in a comment
5c78516 ASoC: SOF: skl: remove the .cmd_done platform driver method
3dcbf59 ASoC: SOF: hsw: remove the .cmd_done platform driver method
a74c774 ASoC: SOF: spi: remove the .cmd_done platform driver method
2c787c5 ASoC: SOF: hsw: eliminate redundant mailbox reads
7f7d265 ASoC: SOF: skl: remove .get_reply()
bb7890c ASoC: SOF: hsw: remove .get_reply()
63f6059 ASoC: SOF: spi: remove .get_reply()
dc5f5f0 ASoC: SOF: hsw: remove an always true condition check
ae4e51e ASoC: SOF: hsw: print sizes in decimal format
d435088 ASoC: SOF: hsw: simplify .get_reply() implementations
9c6b980 ASoC: SOF: ipc: remove the .cmd_done platform driver method
04654c8 ASoC: SOF: Intel: eliminate redundant mailbox reads
df70b78 ASoC: SOF: ipc: remove .get_reply()
9bfc74c ASoC: SOF: Intel: remove an always true condition check
344bfbb ASoC: SOF: ipc: eliminate a trivial function
64b1050 ASoC: SOF: Intel: print sizes in decimal format
bedc1d3 ASoC: SOF: Intel: simplify multiple .get_reply() implementations
e29122d ASoC: Intel: sof_rt5682: test devm_ and return -ENOMEM

Jiangxinx · 2019-04-03T06:35:14Z

@ranj063 PR #775 doesn't help, got the same dmesg log.
Git diff:

commit ad88ef1313311b45dc97bd037a6bcaaccfeca353
Author: Ranjani Sridharan <[email protected]>
Date:   Tue Apr 2 17:22:02 2019 -0700

    ASoC: SOF: intel: ipc: don't read mailbox for CTX_SAVE

    If the reply from the DSP is for a CTX_SAVE ipc, don't
    read the mailbox, just return after setting the reply attributes.

    Signed-off-by: Ranjani Sridharan <[email protected]>

commit f3adfd66c97c730a9db99dafabab7a94f6bcf552
Author: Rander Wang <[email protected]>
Date:   Fri Mar 29 15:46:04 2019 +0800

    ASoC:sof: remove duplicate posn message in kernel log

    There is a lot of posn offset message in kernel log. Actually
    the posn in mailbox is never changed after it is set in hw_params.
    So now just print it once in hw_params and make kernel message lesser

    dmesg log example:
    sof-audio-pci 0000:00:1f.3: posn mailbox: posn offset is 0xc104c
    sof-audio-pci 0000:00:1f.3: posn : host 0x6c00 dai 0xcc660 wall 0x31e4fbd
    sof-audio-pci 0000:00:1f.3: posn mailbox: posn offset is 0xc104c
    sof-audio-pci 0000:00:1f.3: posn : host 0xab00 dai 0xd4460 wall 0x33d12bc
    sof-audio-pci 0000:00:1f.3: posn mailbox: posn offset is 0xc104c
    sof-audio-pci 0000:00:1f.3: posn : host 0xea00 dai 0xdc260 wall 0x35bd5bd
    sof-audio-pci 0000:00:1f.3: posn mailbox: posn offset is 0xc104c

    Signed-off-by: Rander Wang <[email protected]>

commit 0967e0766216940cbdcd2d1d3662ea9e5beb239d
Author: Keyon Jie <[email protected]>
Date:   Tue Apr 2 14:01:52 2019 +0800

    ASoC: SOF: control: fix a PM put missing at error

    We still need call pm_runtime_put_autosuspend() to release dev at
    failure of copy_to_user(), here correct it.

    Signed-off-by: Keyon Jie <[email protected]>

commit d6d661c7521a72c2bce6f2ced11b812f449f8e43
Author: Keyon Jie <[email protected]>
Date:   Tue Apr 2 14:06:27 2019 +0800

    ASoC: SOF: Intel: fix period_bytes calculation at hw_params()

    We should use params_period_bytes for hdac_stream period_bytes
    calculation, it used params_period_size so actually wrong period bytes
    have being used for long time, here correct it.

    Signed-off-by: Keyon Jie <[email protected]>

emilchudzik · 2019-04-03T07:28:42Z

@Jiangxinx @emilchudzik can you please check if linux PR #775 helps?

@ranj063 is #775 should fix booting issue?
Or should fix my observations? I can boot firmware, but I have this issue: #774

Jiangxinx · 2019-04-03T07:36:44Z

@emilchudzik do you use the latest kernel commit to test the boot issue? e29122d is older than my test version.

emilchudzik · 2019-04-03T11:27:00Z

With the latest sof-dev on linux repo (f3adfd6) I can't boot to OS on WHL.
FW and tplg: master/d7b36bf3 (2 April)

On e29122d loading to OS and loading above firmware was done correctly.

I see panic while trying to load the system. I made quick screen shot:

mengdonglin · 2019-04-08T08:25:03Z

@lyakh It seems your IPC commit 9c6b980 trigger this regression. Please help to work with Libin on this issue.

lyakh · 2019-04-08T19:20:18Z

The fact, that libinyang@1db36f3 fixes the problem makes me think, that maybe #799 will fix it too.

libinyang · 2019-04-09T02:05:29Z

The fact, that [libinyang/linux@1db36f3]

@lyakh Thanks. I will apply the patch and have a test.

(libinyang@1db36f3) fixes the problem makes me think, that maybe #799 will fix it too.

libinyang · 2019-04-09T02:08:53Z

Add some more information:
The bug is caused that the code accepted an unknown IPC reply of a command, which is not sent by the driver. The reply interrupt happens at very beginning of the boot, before any IPC cmd is sent out. And the cmd is "0x1010e0e", which the driver will never send.
I think this may be a bug from FW or ROM. We need FW team help.

libinyang · 2019-04-09T02:55:52Z

@lyakh
I tried your patch just now. It doesn't work. This is because:

cnl uses cnl_ipc_irq_thread() in cnl.c, not in hda_dsp_ipc_irq_thread() in hda-ipc.c as the thread.
should not use HDA_DSP_REG_HIPCI_MSG_MASK to get the message cmd. The register is used for the cmd id, not cmd type.

libinyang · 2019-04-09T06:19:13Z

fix my previous comment: it seems the purge cmd is using the HDA_DSP_REG_HIPCI_MSG_MASK. However, the msg shows to be 0x0. And read from the mailbox, it shows the cmd is: 0x1010e0e.

lyakh · 2019-04-09T06:44:03Z

@libinyang ok, thanks for testing. Sorry, I didn't realise, that WHL was using CNL. And I didn't find a full kernel log in this message, so, I couldn't check what message was actually coming from the DSP. But yes, it should be easy enough to filter out this specific message and you're right, we have to understand what it is and why it is coming.

libinyang · 2019-04-09T06:53:11Z

@lyakh Yes. And I found that in the old code, we do meet the unknown IPC too. The code can handle this situation and complain “error: no reply expected, received xxx” and ignore this msg. So I think we can do like the old code that we ignore the message.

lyakh · 2019-04-09T07:00:22Z

@libinyang yes, so, basically we could take the first part of your work-around, that you referenced here (its second part seems to be a left-over and unrelated), just replace the word "suspicious" with "spurious." Maybe even make that a dev_err() to raise the chance of us looking at every such case and checking it. But I also think we should also avoid calling hda_dsp_ipc_get_reply() like hda_dsp_ipc_irq_thread() does. For that we have to understand what this message is and how to reliably check for it.

ranj063 · 2019-04-09T20:29:55Z

@libinyang @lyakh I have a dumb suggestion to try. Basically, we're getting a reply from the DSP when not expecting it and in this case sdev->msg is NULL. Can we simply ignore this like this:

diff --git a/sound/soc/sof/intel/hda-ipc.c b/sound/soc/sof/intel/hda-ipc.c
index 6924d8504d09..7fe3888aee41 100644
--- a/sound/soc/sof/intel/hda-ipc.c
+++ b/sound/soc/sof/intel/hda-ipc.c
@@ -77,6 +77,9 @@ void hda_dsp_ipc_get_reply(struct snd_sof_dev *sdev)
 
        spin_lock_irqsave(&sdev->ipc_lock, flags);
 
+       if (!msg)
+               return 0;
+
        hdr = msg->msg_data;
        if (hdr->cmd == (SOF_IPC_GLB_PM_MSG | SOF_IPC_PM_CTX_SAVE)) {
                /*

libinyang · 2019-04-10T00:56:44Z

@ranj063 Yes, I think this should be a right direction. And maybe we need add some warnings.

ranj063 · 2019-04-10T01:48:46Z

@ranj063 Yes, I think this should be a right direction. And maybe we need add some warnings.

@libinyang these should be info rather than warnings I think.

libinyang · 2019-04-10T02:06:05Z

@ranj063 OK, I will use the info. Thanks for the suggestion.

xun2z · 2019-04-10T02:19:38Z

@ranj063 @libinyang Simply return from irq thread may not be a good solution here, this may cause a deaklock. I enabled deadlock debug in kconfig.

[   30.879950] ============================================
[   30.879953] WARNING: possible recursive locking detected
[   30.879956] 5.0.0-xun-hda #14 Not tainted
[   30.879959] --------------------------------------------
[   30.879962] irq/134-AudioDS/2760 is trying to acquire lock:
[   30.879965] 000000007591428d (&(&sdev->ipc_lock)->rlock){....}, at: snd_sof_ipc_reply+0x21/0x90 [snd_sof]
[   30.879977] 
               but task is already holding lock:
[   30.879980] 000000007591428d (&(&sdev->ipc_lock)->rlock){....}, at: hda_dsp_ipc_get_reply+0x32/0x140 [snd_sof_intel_hda_common]
[   30.879990] 
               other info that might help us debug this:
[   30.879992]  Possible unsafe locking scenario:

[   30.879994]        CPU0
[   30.879996]        ----
[   30.879998]   lock(&(&sdev->ipc_lock)->rlock);
[   30.880009]   lock(&(&sdev->ipc_lock)->rlock);
[   30.880021] 
                *** DEADLOCK ***

[   30.880028]  May be due to missing lock nesting notation

[   30.880036] 1 lock held by irq/134-AudioDS/2760:
[   30.880043]  #0: 000000007591428d (&(&sdev->ipc_lock)->rlock){....}, at: hda_dsp_ipc_get_reply+0x32/0x140 [snd_sof_intel_hda_common]
[   30.880055] 
               stack backtrace:
[   30.880060] CPU: 1 PID: 2760 Comm: irq/134-AudioDS Not tainted 5.0.0-xun-hda #14
[   30.880063] Hardware name: Intel Corporation CoffeeLake Client Platform/WhiskeyLake U DDR4 ERB, BIOS CNLSFWR1.R00.X137.B00.1803280218 03/28/2018
[   30.880065] Call Trace:
[   30.880073]  dump_stack+0x67/0x9b
[   30.880079]  __lock_acquire+0x6e3/0x16b0
[   30.880085]  ? __lock_is_held+0x59/0xa0
[   30.880091]  ? dev_printk_emit+0x45/0x70
[   30.880096]  ? lock_acquire+0xa7/0x1b0
[   30.880100]  lock_acquire+0xa7/0x1b0
[   30.880112]  ? snd_sof_ipc_reply+0x21/0x90 [snd_sof]
[   30.880122]  _raw_spin_lock_irqsave+0x36/0x70
[   30.880135]  ? snd_sof_ipc_reply+0x21/0x90 [snd_sof]
[   30.880146]  snd_sof_ipc_reply+0x21/0x90 [snd_sof]
[   30.880155]  cnl_ipc_irq_thread+0x159/0x690 [snd_sof_intel_hda_common]
[   30.880161]  ? irq_forced_thread_fn+0x70/0x70
[   30.880165]  irq_thread_fn+0x1c/0x60
[   30.880170]  ? irq_thread+0x9c/0x1a0
[   30.880179]  irq_thread+0x102/0x1a0
[   30.880188]  ? wake_threads_waitq+0x30/0x30
[   30.880194]  ? irq_thread_dtor+0x90/0x90
[   30.880199]  kthread+0x11c/0x140
[   30.880206]  ? kthread_park+0x80/0x80
[   30.880211]  ret_from_fork+0x3a/0x50

deadlock.txt

ranj063 · 2019-04-10T02:31:00Z

@xun2z good point. Yes, we need to spin_unlock* before we return.

libinyang · 2019-04-10T02:33:05Z

or We just move the check before spin_lock

libinyang · 2019-04-10T02:33:52Z

After second thought, maybe in the lock is better. Let me have a check of the source code.

xun2z · 2019-04-10T02:35:58Z

OK, i see the cause.

libinyang · 2019-04-10T07:11:10Z

A formal patch is at: #807
With this patch, the new kernel behavior is the same as the old kernel.

I didn't check suspicious the ipc reply interrupt now as the old kernel, and the function snd_sof_ipc_reply() will check it.

I can add the code to check the suspicious the ipc reply interrupt in hda_dsp_ipc_get_reply(), but I don't think it is necessary. We can have a discussion on it if you think we should check in hda_dsp_ipc_get_reply()

lyakh · 2019-04-10T08:06:21Z

@libinyang @lyakh I have a dumb suggestion to try. Basically, we're getting a reply from the DSP when not expecting it and in this case sdev->msg is NULL. Can we simply ignore this like this:

diff --git a/sound/soc/sof/intel/hda-ipc.c b/sound/soc/sof/intel/hda-ipc.c
index 6924d8504d09..7fe3888aee41 100644
--- a/sound/soc/sof/intel/hda-ipc.c
+++ b/sound/soc/sof/intel/hda-ipc.c
@@ -77,6 +77,9 @@ void hda_dsp_ipc_get_reply(struct snd_sof_dev *sdev)
 
        spin_lock_irqsave(&sdev->ipc_lock, flags);
 
+       if (!msg)
+               return 0;
+
        hdr = msg->msg_data;
        if (hdr->cmd == (SOF_IPC_GLB_PM_MSG | SOF_IPC_PM_CTX_SAVE)) {
                /*

@ranj063 that's what the old patch was doing (almost, it was also releasing the spinlock, I think)
edit actually it was doing that check before taking the spinlock.

libinyang · 2019-04-10T09:13:02Z

#807 is updated. Please have a check.

libinyang · 2019-04-12T13:25:31Z

@Jiangxinx Can this issue be closed?

Jiangxinx added bug Something isn't working WHL Applies to WhiskeyLake platform labels Apr 2, 2019

mengdonglin changed the title ~~[WHL] firmware boot failure on WHL-rvp-hda platform.~~ [WHL HDA] firmware boot failure on WHL platform. Apr 2, 2019

mengdonglin added the P1 Blocker bugs or important features label Apr 2, 2019

mengdonglin assigned RanderWang Apr 2, 2019

RanderWang mentioned this issue Apr 3, 2019

[RFC] ASoC:sof: fix FW loading failure in resuming occasion #779

Closed

emilchudzik mentioned this issue Apr 4, 2019

[WHL HDA] can't boot up with lastest Linux kernel code #784

Closed

mengdonglin added the HDA Applies to HD-Audio bus for codec connection label Apr 8, 2019

mengdonglin assigned libinyang and lyakh and unassigned RanderWang Apr 8, 2019

libinyang mentioned this issue Apr 10, 2019

ASoC: SOF: fix panic caused by sdev->msg being null #807

Merged

xun2z mentioned this issue Apr 10, 2019

[BUG][WHL HDA] incorrect sound when playback thesofproject/sof#1251

Closed

plbossart closed this as completed in #807 Apr 11, 2019

xun2z mentioned this issue May 8, 2019

ASoC: SOF: Intel: improve irq thread in handling empty message interrupt from firmware loader #916

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WHL HDA] firmware boot failure on WHL platform. #767

[WHL HDA] firmware boot failure on WHL platform. #767

Jiangxinx commented Apr 2, 2019 •

edited by stevyan

Loading

lgirdwood commented Apr 2, 2019

emilchudzik commented Apr 2, 2019

wenqingfu commented Apr 2, 2019

ranj063 commented Apr 3, 2019

xiulipan commented Apr 3, 2019 •

edited by wenqingfu

Loading

Jiangxinx commented Apr 3, 2019

emilchudzik commented Apr 3, 2019 •

edited

Loading

Jiangxinx commented Apr 3, 2019 •

edited

Loading

emilchudzik commented Apr 3, 2019

mengdonglin commented Apr 8, 2019

lyakh commented Apr 8, 2019

libinyang commented Apr 9, 2019 •

edited

Loading

libinyang commented Apr 9, 2019 •

edited

Loading

libinyang commented Apr 9, 2019

libinyang commented Apr 9, 2019

lyakh commented Apr 9, 2019

libinyang commented Apr 9, 2019

lyakh commented Apr 9, 2019

ranj063 commented Apr 9, 2019 •

edited by wenqingfu

Loading

libinyang commented Apr 10, 2019

ranj063 commented Apr 10, 2019

libinyang commented Apr 10, 2019

xun2z commented Apr 10, 2019

ranj063 commented Apr 10, 2019

libinyang commented Apr 10, 2019

libinyang commented Apr 10, 2019

xun2z commented Apr 10, 2019

libinyang commented Apr 10, 2019

lyakh commented Apr 10, 2019 •

edited

Loading

libinyang commented Apr 10, 2019

libinyang commented Apr 12, 2019

[WHL HDA] firmware boot failure on WHL platform. #767

[WHL HDA] firmware boot failure on WHL platform. #767

Comments

Jiangxinx commented Apr 2, 2019 • edited by stevyan Loading

lgirdwood commented Apr 2, 2019

emilchudzik commented Apr 2, 2019

wenqingfu commented Apr 2, 2019

ranj063 commented Apr 3, 2019

xiulipan commented Apr 3, 2019 • edited by wenqingfu Loading

Jiangxinx commented Apr 3, 2019

emilchudzik commented Apr 3, 2019 • edited Loading

Jiangxinx commented Apr 3, 2019 • edited Loading

emilchudzik commented Apr 3, 2019

mengdonglin commented Apr 8, 2019

lyakh commented Apr 8, 2019

libinyang commented Apr 9, 2019 • edited Loading

libinyang commented Apr 9, 2019 • edited Loading

libinyang commented Apr 9, 2019

libinyang commented Apr 9, 2019

lyakh commented Apr 9, 2019

libinyang commented Apr 9, 2019

lyakh commented Apr 9, 2019

ranj063 commented Apr 9, 2019 • edited by wenqingfu Loading

libinyang commented Apr 10, 2019

ranj063 commented Apr 10, 2019

libinyang commented Apr 10, 2019

xun2z commented Apr 10, 2019

ranj063 commented Apr 10, 2019

libinyang commented Apr 10, 2019

libinyang commented Apr 10, 2019

xun2z commented Apr 10, 2019

libinyang commented Apr 10, 2019

lyakh commented Apr 10, 2019 • edited Loading

libinyang commented Apr 10, 2019

libinyang commented Apr 12, 2019

Jiangxinx commented Apr 2, 2019 •

edited by stevyan

Loading

xiulipan commented Apr 3, 2019 •

edited by wenqingfu

Loading

emilchudzik commented Apr 3, 2019 •

edited

Loading

Jiangxinx commented Apr 3, 2019 •

edited

Loading

libinyang commented Apr 9, 2019 •

edited

Loading

libinyang commented Apr 9, 2019 •

edited

Loading

ranj063 commented Apr 9, 2019 •

edited by wenqingfu

Loading

lyakh commented Apr 10, 2019 •

edited

Loading