-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WHL HDA] firmware boot failure on WHL platform. #767
Comments
@Jiangxinx can you attach full log. This may be a FW issue if we dont have ROM errors. |
Double check issue on my side: python tests in windows HDA works without issues. (FW: master/d7b36bf3).
linux Env: The same as I described on thesofproject/sof#1173 (comment)
|
@Jiangxinx @emilchudzik can you please check if linux PR #775 helps? |
@emilchudzik @ranj063 @Jiangxinx So it should be OK.
|
@ranj063 PR #775 doesn't help, got the same dmesg log.
|
@ranj063 is #775 should fix booting issue? |
@emilchudzik do you use the latest kernel commit to test the boot issue? e29122d is older than my test version. |
The fact, that libinyang@1db36f3 fixes the problem makes me think, that maybe #799 will fix it too. |
@lyakh Thanks. I will apply the patch and have a test. (libinyang@1db36f3) fixes the problem makes me think, that maybe #799 will fix it too. |
Add some more information: |
@lyakh
|
fix my previous comment: it seems the purge cmd is using the HDA_DSP_REG_HIPCI_MSG_MASK. However, the msg shows to be 0x0. And read from the mailbox, it shows the cmd is: 0x1010e0e. |
@libinyang ok, thanks for testing. Sorry, I didn't realise, that WHL was using CNL. And I didn't find a full kernel log in this message, so, I couldn't check what message was actually coming from the DSP. But yes, it should be easy enough to filter out this specific message and you're right, we have to understand what it is and why it is coming. |
@lyakh Yes. And I found that in the old code, we do meet the unknown IPC too. The code can handle this situation and complain “error: no reply expected, received xxx” and ignore this msg. So I think we can do like the old code that we ignore the message. |
@libinyang yes, so, basically we could take the first part of your work-around, that you referenced here (its second part seems to be a left-over and unrelated), just replace the word "suspicious" with "spurious." Maybe even make that a dev_err() to raise the chance of us looking at every such case and checking it. But I also think we should also avoid calling hda_dsp_ipc_get_reply() like hda_dsp_ipc_irq_thread() does. For that we have to understand what this message is and how to reliably check for it. |
@libinyang @lyakh I have a dumb suggestion to try. Basically, we're getting a reply from the DSP when not expecting it and in this case sdev->msg is NULL. Can we simply ignore this like this: diff --git a/sound/soc/sof/intel/hda-ipc.c b/sound/soc/sof/intel/hda-ipc.c
index 6924d8504d09..7fe3888aee41 100644
--- a/sound/soc/sof/intel/hda-ipc.c
+++ b/sound/soc/sof/intel/hda-ipc.c
@@ -77,6 +77,9 @@ void hda_dsp_ipc_get_reply(struct snd_sof_dev *sdev)
spin_lock_irqsave(&sdev->ipc_lock, flags);
+ if (!msg)
+ return 0;
+
hdr = msg->msg_data;
if (hdr->cmd == (SOF_IPC_GLB_PM_MSG | SOF_IPC_PM_CTX_SAVE)) {
/* |
@ranj063 Yes, I think this should be a right direction. And maybe we need add some warnings. |
@libinyang these should be info rather than warnings I think. |
@ranj063 OK, I will use the info. Thanks for the suggestion. |
@ranj063 @libinyang Simply return from irq thread may not be a good solution here, this may cause a deaklock. I enabled deadlock debug in kconfig. [ 30.879950] ============================================
[ 30.879953] WARNING: possible recursive locking detected
[ 30.879956] 5.0.0-xun-hda #14 Not tainted
[ 30.879959] --------------------------------------------
[ 30.879962] irq/134-AudioDS/2760 is trying to acquire lock:
[ 30.879965] 000000007591428d (&(&sdev->ipc_lock)->rlock){....}, at: snd_sof_ipc_reply+0x21/0x90 [snd_sof]
[ 30.879977]
but task is already holding lock:
[ 30.879980] 000000007591428d (&(&sdev->ipc_lock)->rlock){....}, at: hda_dsp_ipc_get_reply+0x32/0x140 [snd_sof_intel_hda_common]
[ 30.879990]
other info that might help us debug this:
[ 30.879992] Possible unsafe locking scenario:
[ 30.879994] CPU0
[ 30.879996] ----
[ 30.879998] lock(&(&sdev->ipc_lock)->rlock);
[ 30.880009] lock(&(&sdev->ipc_lock)->rlock);
[ 30.880021]
*** DEADLOCK ***
[ 30.880028] May be due to missing lock nesting notation
[ 30.880036] 1 lock held by irq/134-AudioDS/2760:
[ 30.880043] #0: 000000007591428d (&(&sdev->ipc_lock)->rlock){....}, at: hda_dsp_ipc_get_reply+0x32/0x140 [snd_sof_intel_hda_common]
[ 30.880055]
stack backtrace:
[ 30.880060] CPU: 1 PID: 2760 Comm: irq/134-AudioDS Not tainted 5.0.0-xun-hda #14
[ 30.880063] Hardware name: Intel Corporation CoffeeLake Client Platform/WhiskeyLake U DDR4 ERB, BIOS CNLSFWR1.R00.X137.B00.1803280218 03/28/2018
[ 30.880065] Call Trace:
[ 30.880073] dump_stack+0x67/0x9b
[ 30.880079] __lock_acquire+0x6e3/0x16b0
[ 30.880085] ? __lock_is_held+0x59/0xa0
[ 30.880091] ? dev_printk_emit+0x45/0x70
[ 30.880096] ? lock_acquire+0xa7/0x1b0
[ 30.880100] lock_acquire+0xa7/0x1b0
[ 30.880112] ? snd_sof_ipc_reply+0x21/0x90 [snd_sof]
[ 30.880122] _raw_spin_lock_irqsave+0x36/0x70
[ 30.880135] ? snd_sof_ipc_reply+0x21/0x90 [snd_sof]
[ 30.880146] snd_sof_ipc_reply+0x21/0x90 [snd_sof]
[ 30.880155] cnl_ipc_irq_thread+0x159/0x690 [snd_sof_intel_hda_common]
[ 30.880161] ? irq_forced_thread_fn+0x70/0x70
[ 30.880165] irq_thread_fn+0x1c/0x60
[ 30.880170] ? irq_thread+0x9c/0x1a0
[ 30.880179] irq_thread+0x102/0x1a0
[ 30.880188] ? wake_threads_waitq+0x30/0x30
[ 30.880194] ? irq_thread_dtor+0x90/0x90
[ 30.880199] kthread+0x11c/0x140
[ 30.880206] ? kthread_park+0x80/0x80
[ 30.880211] ret_from_fork+0x3a/0x50 |
@xun2z good point. Yes, we need to spin_unlock* before we return. |
or We just move the check before spin_lock |
After second thought, maybe in the lock is better. Let me have a check of the source code. |
OK, i see the cause. |
A formal patch is at: #807 I didn't check suspicious the ipc reply interrupt now as the old kernel, and the function snd_sof_ipc_reply() will check it. I can add the code to check the suspicious the ipc reply interrupt in hda_dsp_ipc_get_reply(), but I don't think it is necessary. We can have a discussion on it if you think we should check in hda_dsp_ipc_get_reply() |
@ranj063 that's what the old patch was doing (almost, it was also releasing the spinlock, I think) |
#807 is updated. Please have a check. |
@Jiangxinx Can this issue be closed? |
summary:
firmware boot failure on WHL platform. According to the call stack information,it seems to be caused by commit 9c6b980
Step:
1.Aplay -l
Output:
Dmesg:
Env:
sof master: acf2bd5
kernel sof-dev: f93c4ee
tplg: sof-hda-generic.tplg
The text was updated successfully, but these errors were encountered: