Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WHL HDA] firmware boot failure on WHL platform. #767

Closed
Jiangxinx opened this issue Apr 2, 2019 · 31 comments · Fixed by #807
Closed

[WHL HDA] firmware boot failure on WHL platform. #767

Jiangxinx opened this issue Apr 2, 2019 · 31 comments · Fixed by #807
Assignees
Labels
bug Something isn't working HDA Applies to HD-Audio bus for codec connection P1 Blocker bugs or important features WHL Applies to WhiskeyLake platform

Comments

@Jiangxinx
Copy link

Jiangxinx commented Apr 2, 2019

summary:
firmware boot failure on WHL platform. According to the call stack information,it seems to be caused by commit 9c6b980

Step:
1.Aplay -l

Output:

aplay: device_list:270: no soundcards found...

Dmesg:

[    3.474102] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
[    3.474102] #PF error: [normal kernel read fault]
[    3.474102] PGD 0 P4D 0
[    3.474104] Oops: 0000 [#1] SMP NOPTI
[    3.474106] CPU: 1 PID: 2076 Comm: irq/144-AudioDS Not tainted 5.0.0-daily-hda-20190402 #f93c4ee9
[    3.474106] Hardware name: Intel Corporation CoffeeLake Client Platform/WhiskeyLake U DDR4 ERB, BIOS CNLSFWR1.R00.X137.B00.1803280218 03/28/2018
[    3.474110] RIP: 0010:hda_dsp_ipc_get_reply+0x35/0x130 [snd_sof_intel_hda_common]
[    3.474110] Code: 53 48 89 fb 48 83 ec 18 48 8b af 88 01 00 00 4c 89 e7 65 48 8b 04 25 28 00 00 00 48 89 44 24 10 31 c0 e8 6e a8 ea cf 49 89 c6 <48> 8b 45 08 81 78 04 00 00 01 40 0f 85 8d 00 00 00 48 8b 55 20 48
[    3.474111] RSP: 0018:ffffb9be41017e28 EFLAGS: 00010046
[    3.474112] RAX: 0000000000000246 RBX: ffff9f3cdf881028 RCX: ffffffffc0304240
[    3.474112] RDX: 0000000000000001 RSI: 0000000000000286 RDI: ffff9f3cdf881030
[    3.474113] RBP: 0000000000000000 R08: 0000000000000002 R09: ffff9f3ce3a60a40
[    3.474114] R10: ffffb9be412a7e08 R11: 0000000000000004 R12: ffff9f3cdf881030
[    3.474114] R13: 0000000000000002 R14: 0000000000000246 R15: ffff9f3cdf5f0000
[    3.474115] FS:  0000000000000000(0000) GS:ffff9f3ce3a40000(0000) knlGS:0000000000000000
[    3.474116] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    3.474116] CR2: 0000000000000008 CR3: 00000002606c8005 CR4: 00000000003606e0
[    3.474117] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    3.474117] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[    3.474117] Call Trace:
[    3.474121]  ? snd_sof_dsp_update_bits+0x51/0x70 [snd_sof]
[    3.474123]  cnl_ipc_irq_thread+0x149/0x9f0 [snd_sof_intel_hda_common]
[    3.474126]  ? irq_forced_thread_fn+0x70/0x70
[    3.474127]  irq_thread_fn+0x1c/0x60
[    3.474129]  irq_thread+0xe2/0x160
[    3.474130]  ? wake_threads_waitq+0x30/0x30
[    3.474132]  ? irq_thread_dtor+0x90/0x90
[    3.474133]  kthread+0x10e/0x130
[    3.474134]  ? kthread_park+0x80/0x80
[    3.474136]  ret_from_fork+0x35/0x40
[    3.474137] Modules linked in: snd_soc_hdac_hdmi(+) snd_hda_codec_realtek snd_hda_codec_generic snd_soc_dmic sof_pci_dev snd_sof_intel_hda_common snd_soc_hdac_hda iwlmvm(+) snd_sof_intel_hda snd_sof_intel_byt snd_sof_xtensa_dsp snd_sof snd_soc_acpi_intel_match snd_soc_acpi snd_hda_ext_core snd_soc_core snd_hda_codec snd_hwdep snd_hda_core snd_pcm x86_pkg_temp_thermal iwlwifi intel_lpss_pci intel_lpss mfd_core efivarfs sdhci_pci xhci_pci cqhci sdhci xhci_hcd
[    3.474148] CR2: 0000000000000008
[    3.474149] ---[ end trace b2191f1416986bb1 ]---
[    3.474151] RIP: 0010:hda_dsp_ipc_get_reply+0x35/0x130 [snd_sof_intel_hda_common]
[    3.474151] Code: 53 48 89 fb 48 83 ec 18 48 8b af 88 01 00 00 4c 89 e7 65 48 8b 04 25 28 00 00 00 48 89 44 24 10 31 c0 e8 6e a8 ea cf 49 89 c6 <48> 8b 45 08 81 78 04 00 00 01 40 0f 85 8d 00 00 00 48 8b 55 20 48
[    3.474152] RSP: 0018:ffffb9be41017e28 EFLAGS: 00010046
[    3.474153] RAX: 0000000000000246 RBX: ffff9f3cdf881028 RCX: ffffffffc0304240
[    3.474153] RDX: 0000000000000001 RSI: 0000000000000286 RDI: ffff9f3cdf881030
[    3.474154] RBP: 0000000000000000 R08: 0000000000000002 R09: ffff9f3ce3a60a40
[    3.474154] R10: ffffb9be412a7e08 R11: 0000000000000004 R12: ffff9f3cdf881030
[    3.474155] R13: 0000000000000002 R14: 0000000000000246 R15: ffff9f3cdf5f0000
[    3.474156] FS:  0000000000000000(0000) GS:ffff9f3ce3a40000(0000) knlGS:0000000000000000
[    3.474157] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    3.474157] CR2: 0000000000000008 CR3: 00000002606c8005 CR4: 00000000003606e0
[    3.474158] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    3.474158] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[    3.474161] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
[    3.474161] #PF error: [INSTR]
[    3.474161] PGD 0 P4D 0
[    3.474163] Oops: 0010 [#2] SMP NOPTI
[    3.474164] CPU: 1 PID: 2076 Comm: irq/144-AudioDS Tainted: G      D           5.0.0-daily-hda-20190402 #f93c4ee9
[    3.474164] Hardware name: Intel Corporation CoffeeLake Client Platform/WhiskeyLake U DDR4 ERB, BIOS CNLSFWR1.R00.X137.B00.1803280218 03/28/2018
[    3.474165] RIP: 0010:          (null)
[    3.474167] Code: Bad RIP value.
[    3.474167] RSP: 0018:ffffb9be41017ea8 EFLAGS: 00010282
[    3.474168] RAX: 0000000000000000 RBX: ffff9f3cdf5f0000 RCX: 0000000000000000
[    3.474169] RDX: ffffb9be41017ec8 RSI: 0000000000000000 RDI: ffffb9be41017ec8
[    3.474169] RBP: ffffffff912d69c0 R08: 0000000000000000 R09: 0000000000000000
[    3.474169] R10: 0000000000000246 R11: 0000000000000000 R12: 0000000000000000
[    3.474170] R13: ffff9f3cdf5f0704 R14: 0000000000000001 R15: ffff9f3cdf5f06d0
[    3.474171] FS:  0000000000000000(0000) GS:ffff9f3ce3a40000(0000) knlGS:0000000000000000
[    3.474171] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    3.474172] CR2: ffffffffffffffd6 CR3: 00000002606c8005 CR4: 00000000003606e0
[    3.474172] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    3.474173] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[    3.474173] Call Trace:
[    3.474174]  ? task_work_run+0x79/0xa0
[    3.474176]  ? do_exit+0x2ca/0xbc0
[    3.474178]  ? irq_thread_dtor+0x90/0x90
[    3.474179]  ? kthread+0x10e/0x130
[    3.474180]  ? rewind_stack_do_exit+0x17/0x20
[    3.474181] Modules linked in: snd_soc_hdac_hdmi(+) snd_hda_codec_realtek snd_hda_codec_generic snd_soc_dmic sof_pci_dev snd_sof_intel_hda_common snd_soc_hdac_hda iwlmvm(+) snd_sof_intel_hda snd_sof_intel_byt snd_sof_xtensa_dsp snd_sof snd_soc_acpi_intel_match snd_soc_acpi snd_hda_ext_core snd_soc_core snd_hda_codec snd_hwdep snd_hda_core snd_pcm x86_pkg_temp_thermal iwlwifi intel_lpss_pci intel_lpss mfd_core efivarfs sdhci_pci xhci_pci cqhci sdhci xhci_hcd
[    3.474186] CR2: 0000000000000000
[    3.474187] ---[ end trace b2191f1416986bb2 ]---
[    3.474188] RIP: 0010:hda_dsp_ipc_get_reply+0x35/0x130 [snd_sof_intel_hda_common]
[    3.474189] Code: 53 48 89 fb 48 83 ec 18 48 8b af 88 01 00 00 4c 89 e7 65 48 8b 04 25 28 00 00 00 48 89 44 24 10 31 c0 e8 6e a8 ea cf 49 89 c6 <48> 8b 45 08 81 78 04 00 00 01 40 0f 85 8d 00 00 00 48 8b 55 20 48
[    3.474189] RSP: 0018:ffffb9be41017e28 EFLAGS: 00010046
[    3.474190] RAX: 0000000000000246 RBX: ffff9f3cdf881028 RCX: ffffffffc0304240
[    3.474190] RDX: 0000000000000001 RSI: 0000000000000286 RDI: ffff9f3cdf881030
[    3.474191] RBP: 0000000000000000 R08: 0000000000000002 R09: ffff9f3ce3a60a40
[    3.474191] R10: ffffb9be412a7e08 R11: 0000000000000004 R12: ffff9f3cdf881030
[    3.474192] R13: 0000000000000002 R14: 0000000000000246 R15: ffff9f3cdf5f0000
[    3.474193] FS:  0000000000000000(0000) GS:ffff9f3ce3a40000(0000) knlGS:0000000000000000
[    3.474193] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    3.474194] CR2: ffffffffffffffd6 CR3: 00000002606c8005 CR4: 00000000003606e0
[    3.474194] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    3.474195] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[    3.474195] Fixing recursive fault but reboot is needed!
[    3.478124] HDMI HDA Codec ehdaudio0D2: Max dais supported: 3

Env:
sof master: acf2bd5
kernel sof-dev: f93c4ee
tplg: sof-hda-generic.tplg

@Jiangxinx Jiangxinx added bug Something isn't working WHL Applies to WhiskeyLake platform labels Apr 2, 2019
@mengdonglin mengdonglin changed the title [WHL] firmware boot failure on WHL-rvp-hda platform. [WHL HDA] firmware boot failure on WHL platform. Apr 2, 2019
@lgirdwood
Copy link
Member

@Jiangxinx can you attach full log. This may be a FW issue if we dont have ROM errors.

@mengdonglin mengdonglin added the P1 Blocker bugs or important features label Apr 2, 2019
@emilchudzik
Copy link

Double check issue on my side:

python tests in windows HDA works without issues. (FW: master/d7b36bf3).

  • loading FW
  • basic playback and record on HDA (loopback on codec ALC700)

linux Env:
FW: master/d7b36bf3
tplg: master/d7b36bf3 -> sof-hda-generic.tplg
kernel: sof-dev/e29122d
HW: WHL (HD-A mode) with ALC700

The same as I described on thesofproject/sof#1173 (comment)

  • loading is ok
  • aplay -l return devices from topology as expected
  • run "aplay" command return error => unable to playback

@wenqingfu
Copy link

it seems to be caused by commit 9c6b980

9c6b980 is part of #740 patchset, so @lyakh

@ranj063
Copy link
Collaborator

ranj063 commented Apr 3, 2019

@Jiangxinx @emilchudzik can you please check if linux PR #775 helps?

@xiulipan
Copy link

xiulipan commented Apr 3, 2019

@emilchudzik @ranj063 @Jiangxinx
e29122d is some older version than 9c6b980

So it should be OK.

f3adfd6 (HEAD -> sof-dev, sof/topic/sof-dev) ASoC:sof: remove duplicate posn message in kernel log
0967e07 ASoC: SOF: control: fix a PM put missing at error
d6d661c ASoC: SOF: Intel: fix period_bytes calculation at hw_params()
f93c4ee Merge pull request #740 from lyakh/ipc-20190321
f304b5c ASoC: sof: use iopoll.h macro for polled register reads
1a30233 ASoC: SOF: topology: use the default hw_config in DAI link init
c091fce ASoC: SOF: core: fix a typo in a comment
5c78516 ASoC: SOF: skl: remove the .cmd_done platform driver method
3dcbf59 ASoC: SOF: hsw: remove the .cmd_done platform driver method
a74c774 ASoC: SOF: spi: remove the .cmd_done platform driver method
2c787c5 ASoC: SOF: hsw: eliminate redundant mailbox reads
7f7d265 ASoC: SOF: skl: remove .get_reply()
bb7890c ASoC: SOF: hsw: remove .get_reply()
63f6059 ASoC: SOF: spi: remove .get_reply()
dc5f5f0 ASoC: SOF: hsw: remove an always true condition check
ae4e51e ASoC: SOF: hsw: print sizes in decimal format
d435088 ASoC: SOF: hsw: simplify .get_reply() implementations
9c6b980 ASoC: SOF: ipc: remove the .cmd_done platform driver method
04654c8 ASoC: SOF: Intel: eliminate redundant mailbox reads
df70b78 ASoC: SOF: ipc: remove .get_reply()
9bfc74c ASoC: SOF: Intel: remove an always true condition check
344bfbb ASoC: SOF: ipc: eliminate a trivial function
64b1050 ASoC: SOF: Intel: print sizes in decimal format
bedc1d3 ASoC: SOF: Intel: simplify multiple .get_reply() implementations
e29122d ASoC: Intel: sof_rt5682: test devm_ and return -ENOMEM

@Jiangxinx
Copy link
Author

@ranj063 PR #775 doesn't help, got the same dmesg log.
Git diff:

commit ad88ef1313311b45dc97bd037a6bcaaccfeca353
Author: Ranjani Sridharan <[email protected]>
Date:   Tue Apr 2 17:22:02 2019 -0700

    ASoC: SOF: intel: ipc: don't read mailbox for CTX_SAVE

    If the reply from the DSP is for a CTX_SAVE ipc, don't
    read the mailbox, just return after setting the reply attributes.

    Signed-off-by: Ranjani Sridharan <[email protected]>

commit f3adfd66c97c730a9db99dafabab7a94f6bcf552
Author: Rander Wang <[email protected]>
Date:   Fri Mar 29 15:46:04 2019 +0800

    ASoC:sof: remove duplicate posn message in kernel log

    There is a lot of posn offset message in kernel log. Actually
    the posn in mailbox is never changed after it is set in hw_params.
    So now just print it once in hw_params and make kernel message lesser

    dmesg log example:
    sof-audio-pci 0000:00:1f.3: posn mailbox: posn offset is 0xc104c
    sof-audio-pci 0000:00:1f.3: posn : host 0x6c00 dai 0xcc660 wall 0x31e4fbd
    sof-audio-pci 0000:00:1f.3: posn mailbox: posn offset is 0xc104c
    sof-audio-pci 0000:00:1f.3: posn : host 0xab00 dai 0xd4460 wall 0x33d12bc
    sof-audio-pci 0000:00:1f.3: posn mailbox: posn offset is 0xc104c
    sof-audio-pci 0000:00:1f.3: posn : host 0xea00 dai 0xdc260 wall 0x35bd5bd
    sof-audio-pci 0000:00:1f.3: posn mailbox: posn offset is 0xc104c

    Signed-off-by: Rander Wang <[email protected]>

commit 0967e0766216940cbdcd2d1d3662ea9e5beb239d
Author: Keyon Jie <[email protected]>
Date:   Tue Apr 2 14:01:52 2019 +0800

    ASoC: SOF: control: fix a PM put missing at error

    We still need call pm_runtime_put_autosuspend() to release dev at
    failure of copy_to_user(), here correct it.

    Signed-off-by: Keyon Jie <[email protected]>

commit d6d661c7521a72c2bce6f2ced11b812f449f8e43
Author: Keyon Jie <[email protected]>
Date:   Tue Apr 2 14:06:27 2019 +0800

    ASoC: SOF: Intel: fix period_bytes calculation at hw_params()

    We should use params_period_bytes for hdac_stream period_bytes
    calculation, it used params_period_size so actually wrong period bytes
    have being used for long time, here correct it.

    Signed-off-by: Keyon Jie <[email protected]>

@emilchudzik
Copy link

emilchudzik commented Apr 3, 2019

@Jiangxinx @emilchudzik can you please check if linux PR #775 helps?

@ranj063 is #775 should fix booting issue?
Or should fix my observations? I can boot firmware, but I have this issue: #774

@Jiangxinx
Copy link
Author

Jiangxinx commented Apr 3, 2019

@emilchudzik do you use the latest kernel commit to test the boot issue? e29122d is older than my test version.

@emilchudzik
Copy link

With the latest sof-dev on linux repo (f3adfd6) I can't boot to OS on WHL.
FW and tplg: master/d7b36bf3 (2 April)

On e29122d loading to OS and loading above firmware was done correctly.

I see panic while trying to load the system. I made quick screen shot:
image

@mengdonglin mengdonglin added the HDA Applies to HD-Audio bus for codec connection label Apr 8, 2019
@mengdonglin mengdonglin assigned libinyang and lyakh and unassigned RanderWang Apr 8, 2019
@mengdonglin
Copy link
Collaborator

@lyakh It seems your IPC commit 9c6b980 trigger this regression. Please help to work with Libin on this issue.

@lyakh
Copy link
Collaborator

lyakh commented Apr 8, 2019

The fact, that libinyang@1db36f3 fixes the problem makes me think, that maybe #799 will fix it too.

@libinyang
Copy link

libinyang commented Apr 9, 2019

The fact, that [libinyang/linux@1db36f3]

@lyakh Thanks. I will apply the patch and have a test.

(libinyang@1db36f3) fixes the problem makes me think, that maybe #799 will fix it too.

@libinyang
Copy link

libinyang commented Apr 9, 2019

Add some more information:
The bug is caused that the code accepted an unknown IPC reply of a command, which is not sent by the driver. The reply interrupt happens at very beginning of the boot, before any IPC cmd is sent out. And the cmd is "0x1010e0e", which the driver will never send.
I think this may be a bug from FW or ROM. We need FW team help.

@libinyang
Copy link

@lyakh
I tried your patch just now. It doesn't work. This is because:

  1. cnl uses cnl_ipc_irq_thread() in cnl.c, not in hda_dsp_ipc_irq_thread() in hda-ipc.c as the thread.
  2. should not use HDA_DSP_REG_HIPCI_MSG_MASK to get the message cmd. The register is used for the cmd id, not cmd type.

@libinyang
Copy link

fix my previous comment: it seems the purge cmd is using the HDA_DSP_REG_HIPCI_MSG_MASK. However, the msg shows to be 0x0. And read from the mailbox, it shows the cmd is: 0x1010e0e.

@lyakh
Copy link
Collaborator

lyakh commented Apr 9, 2019

@libinyang ok, thanks for testing. Sorry, I didn't realise, that WHL was using CNL. And I didn't find a full kernel log in this message, so, I couldn't check what message was actually coming from the DSP. But yes, it should be easy enough to filter out this specific message and you're right, we have to understand what it is and why it is coming.

@libinyang
Copy link

@lyakh Yes. And I found that in the old code, we do meet the unknown IPC too. The code can handle this situation and complain “error: no reply expected, received xxx” and ignore this msg. So I think we can do like the old code that we ignore the message.

@lyakh
Copy link
Collaborator

lyakh commented Apr 9, 2019

@libinyang yes, so, basically we could take the first part of your work-around, that you referenced here (its second part seems to be a left-over and unrelated), just replace the word "suspicious" with "spurious." Maybe even make that a dev_err() to raise the chance of us looking at every such case and checking it. But I also think we should also avoid calling hda_dsp_ipc_get_reply() like hda_dsp_ipc_irq_thread() does. For that we have to understand what this message is and how to reliably check for it.

@ranj063
Copy link
Collaborator

ranj063 commented Apr 9, 2019

@libinyang @lyakh I have a dumb suggestion to try. Basically, we're getting a reply from the DSP when not expecting it and in this case sdev->msg is NULL. Can we simply ignore this like this:

diff --git a/sound/soc/sof/intel/hda-ipc.c b/sound/soc/sof/intel/hda-ipc.c
index 6924d8504d09..7fe3888aee41 100644
--- a/sound/soc/sof/intel/hda-ipc.c
+++ b/sound/soc/sof/intel/hda-ipc.c
@@ -77,6 +77,9 @@ void hda_dsp_ipc_get_reply(struct snd_sof_dev *sdev)
 
        spin_lock_irqsave(&sdev->ipc_lock, flags);
 
+       if (!msg)
+               return 0;
+
        hdr = msg->msg_data;
        if (hdr->cmd == (SOF_IPC_GLB_PM_MSG | SOF_IPC_PM_CTX_SAVE)) {
                /*

@libinyang
Copy link

@ranj063 Yes, I think this should be a right direction. And maybe we need add some warnings.

@ranj063
Copy link
Collaborator

ranj063 commented Apr 10, 2019

@ranj063 Yes, I think this should be a right direction. And maybe we need add some warnings.

@libinyang these should be info rather than warnings I think.

@libinyang
Copy link

@ranj063 OK, I will use the info. Thanks for the suggestion.

@xun2z
Copy link

xun2z commented Apr 10, 2019

@ranj063 @libinyang Simply return from irq thread may not be a good solution here, this may cause a deaklock. I enabled deadlock debug in kconfig.

[   30.879950] ============================================
[   30.879953] WARNING: possible recursive locking detected
[   30.879956] 5.0.0-xun-hda #14 Not tainted
[   30.879959] --------------------------------------------
[   30.879962] irq/134-AudioDS/2760 is trying to acquire lock:
[   30.879965] 000000007591428d (&(&sdev->ipc_lock)->rlock){....}, at: snd_sof_ipc_reply+0x21/0x90 [snd_sof]
[   30.879977] 
               but task is already holding lock:
[   30.879980] 000000007591428d (&(&sdev->ipc_lock)->rlock){....}, at: hda_dsp_ipc_get_reply+0x32/0x140 [snd_sof_intel_hda_common]
[   30.879990] 
               other info that might help us debug this:
[   30.879992]  Possible unsafe locking scenario:

[   30.879994]        CPU0
[   30.879996]        ----
[   30.879998]   lock(&(&sdev->ipc_lock)->rlock);
[   30.880009]   lock(&(&sdev->ipc_lock)->rlock);
[   30.880021] 
                *** DEADLOCK ***

[   30.880028]  May be due to missing lock nesting notation

[   30.880036] 1 lock held by irq/134-AudioDS/2760:
[   30.880043]  #0: 000000007591428d (&(&sdev->ipc_lock)->rlock){....}, at: hda_dsp_ipc_get_reply+0x32/0x140 [snd_sof_intel_hda_common]
[   30.880055] 
               stack backtrace:
[   30.880060] CPU: 1 PID: 2760 Comm: irq/134-AudioDS Not tainted 5.0.0-xun-hda #14
[   30.880063] Hardware name: Intel Corporation CoffeeLake Client Platform/WhiskeyLake U DDR4 ERB, BIOS CNLSFWR1.R00.X137.B00.1803280218 03/28/2018
[   30.880065] Call Trace:
[   30.880073]  dump_stack+0x67/0x9b
[   30.880079]  __lock_acquire+0x6e3/0x16b0
[   30.880085]  ? __lock_is_held+0x59/0xa0
[   30.880091]  ? dev_printk_emit+0x45/0x70
[   30.880096]  ? lock_acquire+0xa7/0x1b0
[   30.880100]  lock_acquire+0xa7/0x1b0
[   30.880112]  ? snd_sof_ipc_reply+0x21/0x90 [snd_sof]
[   30.880122]  _raw_spin_lock_irqsave+0x36/0x70
[   30.880135]  ? snd_sof_ipc_reply+0x21/0x90 [snd_sof]
[   30.880146]  snd_sof_ipc_reply+0x21/0x90 [snd_sof]
[   30.880155]  cnl_ipc_irq_thread+0x159/0x690 [snd_sof_intel_hda_common]
[   30.880161]  ? irq_forced_thread_fn+0x70/0x70
[   30.880165]  irq_thread_fn+0x1c/0x60
[   30.880170]  ? irq_thread+0x9c/0x1a0
[   30.880179]  irq_thread+0x102/0x1a0
[   30.880188]  ? wake_threads_waitq+0x30/0x30
[   30.880194]  ? irq_thread_dtor+0x90/0x90
[   30.880199]  kthread+0x11c/0x140
[   30.880206]  ? kthread_park+0x80/0x80
[   30.880211]  ret_from_fork+0x3a/0x50

deadlock.txt

@ranj063
Copy link
Collaborator

ranj063 commented Apr 10, 2019

@xun2z good point. Yes, we need to spin_unlock* before we return.

@libinyang
Copy link

or We just move the check before spin_lock

@libinyang
Copy link

After second thought, maybe in the lock is better. Let me have a check of the source code.

@xun2z
Copy link

xun2z commented Apr 10, 2019

OK, i see the cause.

@libinyang
Copy link

A formal patch is at: #807
With this patch, the new kernel behavior is the same as the old kernel.

I didn't check suspicious the ipc reply interrupt now as the old kernel, and the function snd_sof_ipc_reply() will check it.

I can add the code to check the suspicious the ipc reply interrupt in hda_dsp_ipc_get_reply(), but I don't think it is necessary. We can have a discussion on it if you think we should check in hda_dsp_ipc_get_reply()

@lyakh
Copy link
Collaborator

lyakh commented Apr 10, 2019

@libinyang @lyakh I have a dumb suggestion to try. Basically, we're getting a reply from the DSP when not expecting it and in this case sdev->msg is NULL. Can we simply ignore this like this:

diff --git a/sound/soc/sof/intel/hda-ipc.c b/sound/soc/sof/intel/hda-ipc.c
index 6924d8504d09..7fe3888aee41 100644
--- a/sound/soc/sof/intel/hda-ipc.c
+++ b/sound/soc/sof/intel/hda-ipc.c
@@ -77,6 +77,9 @@ void hda_dsp_ipc_get_reply(struct snd_sof_dev *sdev)
 
        spin_lock_irqsave(&sdev->ipc_lock, flags);
 
+       if (!msg)
+               return 0;
+
        hdr = msg->msg_data;
        if (hdr->cmd == (SOF_IPC_GLB_PM_MSG | SOF_IPC_PM_CTX_SAVE)) {
                /*

@ranj063 that's what the old patch was doing (almost, it was also releasing the spinlock, I think)
edit actually it was doing that check before taking the spinlock.

@libinyang
Copy link

#807 is updated. Please have a check.

@libinyang
Copy link

@Jiangxinx Can this issue be closed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working HDA Applies to HD-Audio bus for codec connection P1 Blocker bugs or important features WHL Applies to WhiskeyLake platform
Projects
None yet