Skip to content

Commit

Permalink
linuxnext-2019119 edac warns (was Re: edac KASAN warning in experimen…
Browse files Browse the repository at this point in the history
…tal arm64 allmodconfig boot)

Hi John,

thanks for testing and reporting this. See inline.

On 21.11.19 12:34:22, John Garry wrote:
> On 14/10/2019 16:18, John Garry wrote:
> JFYI, I see an issue on linuxnext-2019119, as follows:
>
>    21.645388] io scheduler kyber registered
> [   21.734011] input: Power Button as
> /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0C0C:00/input/input0
> [   21.743295] ACPI: Power Button [PWRB]
> [   21.809644] [Firmware Bug]: APEI: Invalid bit width + offset in GAR
> [0x94110034/64/0/3/0]
> [   21.821974] EDAC MC0: Giving out device to module ghes_edac.c controller
> ghes_edac: DEV ghes (INTERRUPT)
> [   21.831763] ------------[ cut here ]------------
> [   21.836374] refcount_t: increment on 0; use-after-free.
> [   21.841620] WARNING: CPU: 36 PID: 1 at lib/refcount.c:156
> refcount_inc_checked+0x44/0x50
> [   21.849697] Modules linked in:
> [   21.852745] CPU: 36 PID: 1 Comm: swapper/0 Not tainted
> 5.4.0-rc8-next-20191119-00003-g141a9fef5092-dirty torvalds#650
> [   21.862645] Hardware name: Huawei D06 /D06, BIOS Hisilicon D06 UEFI RC0 -
> V1.16.01 03/15/2019
> [   21.871157] pstate: 60c00009 (nZCv daif +PAN +UAO)
> [   21.875936] pc : refcount_inc_checked+0x44/0x50
> [   21.880455] lr : refcount_inc_checked+0x44/0x50

This is a warning from the refcount framework. It warns if we
increment from zero. This is reasonable as typically a kernel object
is created with a refcount of 1 and thrown away once the refcount is
zero. Afterwards the object is used-after-free.

For ghes the refcount is initialized with zero, and that is why we see
this message. However, we protect the refcount with the ghes_reg_mutex
and thus there is no use after free. The device is allocated and
registered if the refcount is zero. So this works fine.

Enclosed a fix that avoids the warning, please test.

But see below...

> [   21.884972] sp : ffff00236ffbf8a0
> [   21.888274] x29: ffff00236ffbf8a0 x28: 0000000000000002
> [   21.893576] x27: ffff00236cd07900 x26: ffff002369063010
> [   21.898876] x25: 0000000000000000 x24: ffff00233c236824
> [   21.904177] x23: ffffa000137b9000 x22: ffffa00016fbb7c0
> [   21.909477] x21: ffffa00012dfd000 x20: 1fffe0046dff7f24
> [   21.914777] x19: ffff00233c236000 x18: 0000000000000000
> [   21.920077] x17: 0000000000000000 x16: 0000000000000000
> [   21.925377] x15: 0000000000007700 x14: 64655f7365686720
> [   21.930677] x13: 72656c6c6f72746e x12: 1ffff40002719618
> [   21.935977] x11: ffff940002719618 x10: dfffa00000000000
> [   21.941278] x9 : ffff940002719619 x8 : 0000000000000001
> [   21.946578] x7 : 0000000000000000 x6 : 0000000000000001
> [   21.951877] x5 : ffff940002719618 x4 : ffff00236ffb0010
> [   21.957178] x3 : ffffa000112415e4 x2 : ffff80046dff7ede
> [   21.962478] x1 : 5aff78756b1cf400 x0 : 0000000000000000
> [   21.967779] Call trace:
> [   21.970214]  refcount_inc_checked+0x44/0x50
> [   21.974389]  ghes_edac_register+0x258/0x388
> [   21.978562]  ghes_probe+0x28c/0x5f0
> [   21.982041]  platform_drv_probe+0x70/0xd8
> [   21.986039]  really_probe+0x174/0x468
> [   21.989690]  driver_probe_device+0x7c/0x148
> [   21.993862]  device_driver_attach+0x94/0xa0
> [   21.998033]  __driver_attach+0xa4/0x110
> [   22.001857]  bus_for_each_dev+0xe8/0x158
> [   22.005768]  driver_attach+0x30/0x40
> [   22.009331]  bus_add_driver+0x234/0x2f0
> [   22.013156]  driver_register+0xbc/0x1d0
> [   22.016981]  __platform_driver_register+0x7c/0x88
> [   22.021675]  ghes_init+0xbc/0x14c
> [   22.024979]  do_one_initcall+0xb4/0x254
> [   22.028805]  kernel_init_freeable+0x248/0x2f4
> [   22.033151]  kernel_init+0x10/0x118
> [   22.036628]  ret_from_fork+0x10/0x18
> [   22.040194] ---[ end trace 33655bb65a9835fe ]---
> [   22.046666] EDAC MC: bug in low-level driver: attempt to assign
> [   22.046666]     duplicate mc_idx 0 in add_mc_to_global_list()
> [   22.058311] ghes_edac: Can't register at EDAC core
> [   22.065402] EDAC MC: bug in low-level driver: attempt to assign
> [   22.065402]     duplicate mc_idx 0 in add_mc_to_global_list()
> [   22.077080] ghes_edac: Can't register at EDAC core
> [   22.084140] EDAC MC: bug in low-level driver: attempt to assign
> [   22.084140]     duplicate mc_idx 0 in add_mc_to_global_list()
> [   22.095789] ghes_edac: Can't register at EDAC core
> [   22.102873] EDAC MC: bug in low-level driver: attempt to assign
> [   22.102873]     duplicate mc_idx 0 in add_mc_to_global_list()
> [   22.115442] ghes_edac: Can't register at EDAC core
> [   22.122536] EDAC MC: bug in low-level driver: attempt to assign
> [   22.122536]     duplicate mc_idx 0 in add_mc_to_global_list()
> [   22.134344] ghes_edac: Can't register at EDAC core
> [   22.141441] EDAC MC: bug in low-level driver: attempt to assign
> [   22.141441]     duplicate mc_idx 0 in add_mc_to_global_list()
> [   22.153089] ghes_edac: Can't register at EDAC core
> [   22.160161] EDAC MC: bug in low-level driver: attempt to assign
> [   22.160161]     duplicate mc_idx 0 in add_mc_to_global_list()
> [   22.171810] ghes_edac: Can't register at EDAC core

What I am more concerned is this here. In total this implies 8 ghes
users that all try to register a (single-instance) ghes mc device. For
non-x86 only one instance is allowed (see ghes_edac_register(), idx =
0).

So on your platform, when parsing the HEST table
(hest_ghes_dev_register()), more than one "GHES" device is parsed,
allocated and registered. Mind sending me your HEST table
(/sys/firmware/acpi/tables/HEST), or explain what happens here? If
this is a valid use case, we need to change ghes_edac_register() to
support more than one instance.

Again, please try the patch below.

Thanks,

-Robert

From 6962f8af4a7c1051c9e87a5ac60571f70d2b6814 Mon Sep 17 00:00:00 2001
From: Robert Richter <[email protected]>
Date: Thu, 21 Nov 2019 15:01:28 +0100
Subject: [PATCH] EDAC/ghes: Do not warn on when increment refcount on 0

Signed-off-by: Robert Richter <[email protected]>
  • Loading branch information
Robert Richter authored and intel-lab-lkp committed Nov 23, 2019
1 parent b9d3d01 commit 125068f
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions drivers/edac/ghes_edac.c
Original file line number Diff line number Diff line change
Expand Up @@ -556,8 +556,8 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
ghes_pvt = pvt;
spin_unlock_irqrestore(&ghes_lock, flags);

/* only increment on success */
refcount_inc(&ghes_refcount);
/* only set on success */
refcount_set(&ghes_refcount, 1);

unlock:
mutex_unlock(&ghes_reg_mutex);
Expand Down

0 comments on commit 125068f

Please sign in to comment.