Android defconfig changes for hikey #2

johnstultz-work · 2016-02-02T06:23:03Z

Wanted to propose these for you, but I realize there may be some spots (specifically around frambuffer console, and other android specific configs like modules=n), where these may collide with the debian builds.

Don't know if you want to add a hikey_android_defconfig or what to try to work around this case.

Signed-off-by: Guodong Xu <[email protected]>

…X regulators Signed-off-by: Guodong Xu <[email protected]>

Signed-off-by: Guodong Xu <[email protected]>

enable hisi drm enable adv7511 Set cma size to 128M Enable dummy ion driver to use system heap before hisi ion driver is ready. Signed-off-by: Xinliang Liu <[email protected]> Signed-off-by: Guodong Xu <[email protected]>

Signed-off-by: Guodong Xu <[email protected]>

Signed-off-by: Xinliang Liu <[email protected]>

Moves many of the hikey_defconfig adjustments made for Android over to the 4.4 based tree. Includes: * Core android functions (binder, ashmem, lmk, etc) * Disables modules * Cgroup support * ipv6 and netfilter support * bluetooth & other input drivers * ppp support * configfs gadget support * squashfs * pstore Signed-off-by: John Stultz <[email protected]>

Adds the required config option for ION, as well as other tweaks needed to get Android graphics working on hikey. Signed-off-by: John Stultz <[email protected]>

docularxu · 2016-02-05T01:59:38Z

We are now using arch/arm64/configs/defconfig for hikey debian, and using hikey_defconfig for android plus-on configs.

It makes this pull request invalid.

Close it.

write_buildid() increments 'name_len' with intention to take into account trailing zero byte. However, 'name_len' was already incremented in machine__write_buildid_table() before. So this leads to out-of-bounds read in do_write(): $ ./perf record sleep 0 [ perf record: Woken up 1 times to write data ] ================================================================= ==15899==ERROR: AddressSanitizer: global-buffer-overflow on address 0x00000099fc92 at pc 0x7f1aa9c7eab5 bp 0x7fff940f84d0 sp 0x7fff940f7c78 READ of size 19 at 0x00000099fc92 thread T0 #0 0x7f1aa9c7eab4 (/usr/lib/gcc/x86_64-pc-linux-gnu/5.3.0/libasan.so.2+0x44ab4) #1 0x649c5b in do_write util/header.c:67 #2 0x649c5b in write_padded util/header.c:82 #3 0x57e8bc in write_buildid util/build-id.c:239 #4 0x57e8bc in machine__write_buildid_table util/build-id.c:278 ... 0x00000099fc92 is located 0 bytes to the right of global variable '*.LC99' defined in 'util/symbol.c' (0x99fc80) of size 18 '*.LC99' is ascii string '[kernel.kallsyms]' ... Shadow bytes around the buggy address: 0x00008012bf80: f9 f9 f9 f9 00 00 00 00 00 00 03 f9 f9 f9 f9 f9 =>0x00008012bf90: 00 00[02]f9 f9 f9 f9 f9 00 00 00 00 00 05 f9 f9 0x00008012bfa0: f9 f9 f9 f9 00 03 f9 f9 f9 f9 f9 f9 00 00 00 00 Signed-off-by: Andrey Ryabinin <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] [ Remove the off-by one at the origin, to keep len(s) == strlen(s) assumption ] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

We must not attempt to send WMI packets while holding the data-lock, as it may deadlock: BUG: sleeping function called from invalid context at drivers/net/wireless/ath/ath10k/wmi.c:1824 in_atomic(): 1, irqs_disabled(): 0, pid: 2878, name: wpa_supplicant ============================================= [ INFO: possible recursive locking detected ] 4.4.6+ #21 Tainted: G W O --------------------------------------------- wpa_supplicant/2878 is trying to acquire lock: (&(&ar->data_lock)->rlock){+.-...}, at: [<ffffffffa0721511>] ath10k_wmi_tx_beacons_iter+0x26/0x11a [ath10k_core] but task is already holding lock: (&(&ar->data_lock)->rlock){+.-...}, at: [<ffffffffa070251b>] ath10k_peer_create+0x122/0x1ae [ath10k_core] other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&(&ar->data_lock)->rlock); lock(&(&ar->data_lock)->rlock); *** DEADLOCK *** May be due to missing lock nesting notation 4 locks held by wpa_supplicant/2878: #0: (rtnl_mutex){+.+.+.}, at: [<ffffffff816493ca>] rtnl_lock+0x12/0x14 #1: (&ar->conf_mutex){+.+.+.}, at: [<ffffffffa0706932>] ath10k_add_interface+0x3b/0xbda [ath10k_core] #2: (&(&ar->data_lock)->rlock){+.-...}, at: [<ffffffffa070251b>] ath10k_peer_create+0x122/0x1ae [ath10k_core] #3: (rcu_read_lock){......}, at: [<ffffffffa062f304>] rcu_read_lock+0x0/0x66 [mac80211] stack backtrace: CPU: 3 PID: 2878 Comm: wpa_supplicant Tainted: G W O 4.4.6+ #21 Hardware name: To be filled by O.E.M. To be filled by O.E.M./ChiefRiver, BIOS 4.6.5 06/07/2013 0000000000000000 ffff8801fcadf8f0 ffffffff8137086d ffffffff82681720 ffffffff82681720 ffff8801fcadf9b0 ffffffff8112e3be ffff8801fcadf920 0000000100000000 ffffffff82681720 ffffffffa0721500 ffff8801fcb8d348 Call Trace: [<ffffffff8137086d>] dump_stack+0x81/0xb6 [<ffffffff8112e3be>] __lock_acquire+0xc5b/0xde7 [<ffffffffa0721500>] ? ath10k_wmi_tx_beacons_iter+0x15/0x11a [ath10k_core] [<ffffffff8112d0d0>] ? mark_lock+0x24/0x201 [<ffffffff8112e908>] lock_acquire+0x132/0x1cb [<ffffffff8112e908>] ? lock_acquire+0x132/0x1cb [<ffffffffa0721511>] ? ath10k_wmi_tx_beacons_iter+0x26/0x11a [ath10k_core] [<ffffffffa07214eb>] ? ath10k_wmi_cmd_send_nowait+0x1ce/0x1ce [ath10k_core] [<ffffffff816f9e2b>] _raw_spin_lock_bh+0x31/0x40 [<ffffffffa0721511>] ? ath10k_wmi_tx_beacons_iter+0x26/0x11a [ath10k_core] [<ffffffffa0721511>] ath10k_wmi_tx_beacons_iter+0x26/0x11a [ath10k_core] [<ffffffffa07214eb>] ? ath10k_wmi_cmd_send_nowait+0x1ce/0x1ce [ath10k_core] [<ffffffffa062eb18>] __iterate_interfaces+0x9d/0x13d [mac80211] [<ffffffffa062f609>] ieee80211_iterate_active_interfaces_atomic+0x32/0x3e [mac80211] [<ffffffffa07214eb>] ? ath10k_wmi_cmd_send_nowait+0x1ce/0x1ce [ath10k_core] [<ffffffffa071fa9f>] ath10k_wmi_tx_beacons_nowait.isra.13+0x14/0x16 [ath10k_core] [<ffffffffa0721676>] ath10k_wmi_cmd_send+0x71/0x242 [ath10k_core] [<ffffffffa07023f6>] ath10k_wmi_peer_delete+0x3f/0x42 [ath10k_core] [<ffffffffa0702557>] ath10k_peer_create+0x15e/0x1ae [ath10k_core] [<ffffffffa0707004>] ath10k_add_interface+0x70d/0xbda [ath10k_core] [<ffffffffa05fffcc>] drv_add_interface+0x123/0x1a5 [mac80211] [<ffffffffa061554b>] ieee80211_do_open+0x351/0x667 [mac80211] [<ffffffffa06158aa>] ieee80211_open+0x49/0x4c [mac80211] [<ffffffff8163ecf9>] __dev_open+0x88/0xde [<ffffffff8163ef6e>] __dev_change_flags+0xa4/0x13a [<ffffffff8163f023>] dev_change_flags+0x1f/0x54 [<ffffffff816a5532>] devinet_ioctl+0x2b9/0x5c9 [<ffffffff816514dd>] ? copy_to_user+0x32/0x38 [<ffffffff816a6115>] inet_ioctl+0x81/0x9d [<ffffffff816a6115>] ? inet_ioctl+0x81/0x9d [<ffffffff81621cf8>] sock_do_ioctl+0x20/0x3d [<ffffffff816223c4>] sock_ioctl+0x222/0x22e [<ffffffff8121cf95>] do_vfs_ioctl+0x453/0x4d7 [<ffffffff81625603>] ? __sys_recvmsg+0x4c/0x5b [<ffffffff81225af1>] ? __fget_light+0x48/0x6c [<ffffffff8121d06b>] SyS_ioctl+0x52/0x74 [<ffffffff816fa736>] entry_SYSCALL_64_fastpath+0x16/0x7a Signed-off-by: Ben Greear <[email protected]> Signed-off-by: Kalle Valo <[email protected]>

Nicolas Dichtel says: ==================== ovs: fix rtnl notifications on interface deletion There was no rtnl notifications for interfaces (gre, vxlan, geneve) created by ovs. This problem is fixed by adjusting the creation path. v1 -> v2: - add patch #1 and #4 - rework error handling in patch #2 ==================== Acked-by: Pravin B Shelar <[email protected]> Signed-off-by: David S. Miller <[email protected]>

When run tipcTS&tipcTC test suite, the following complaint appears: [ 56.926168] =============================== [ 56.926169] [ INFO: suspicious RCU usage. ] [ 56.926171] 4.7.0-rc1+ 96boards#160 Not tainted [ 56.926173] ------------------------------- [ 56.926174] net/tipc/bearer.c:408 suspicious rcu_dereference_protected() usage! [ 56.926175] [ 56.926175] other info that might help us debug this: [ 56.926175] [ 56.926177] [ 56.926177] rcu_scheduler_active = 1, debug_locks = 1 [ 56.926179] 3 locks held by swapper/4/0: [ 56.926180] #0: (((&req->timer))){+.-...}, at: [<ffffffff810e79b5>] call_timer_fn+0x5/0x340 [ 56.926203] #1: (&(&req->lock)->rlock){+.-...}, at: [<ffffffffa000c29b>] disc_timeout+0x1b/0xd0 [tipc] [ 56.926212] #2: (rcu_read_lock){......}, at: [<ffffffffa00055e0>] tipc_bearer_xmit_skb+0xb0/0x2e0 [tipc] [ 56.926218] [ 56.926218] stack backtrace: [ 56.926221] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 4.7.0-rc1+ 96boards#160 [ 56.926222] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007 [ 56.926224] 0000000000000000 ffff880016803d28 ffffffff813c4423 ffff8800154252c0 [ 56.926227] 0000000000000001 ffff880016803d58 ffffffff810b7512 ffff8800124d8120 [ 56.926230] ffff880013f8a160 ffff8800132b5ccc ffff8800124d8120 ffff880016803d88 [ 56.926234] Call Trace: [ 56.926235] <IRQ> [<ffffffff813c4423>] dump_stack+0x67/0x94 [ 56.926250] [<ffffffff810b7512>] lockdep_rcu_suspicious+0xe2/0x120 [ 56.926256] [<ffffffffa00051f1>] tipc_l2_send_msg+0x131/0x1c0 [tipc] [ 56.926261] [<ffffffffa000567c>] tipc_bearer_xmit_skb+0x14c/0x2e0 [tipc] [ 56.926266] [<ffffffffa00055e0>] ? tipc_bearer_xmit_skb+0xb0/0x2e0 [tipc] [ 56.926273] [<ffffffffa000c280>] ? tipc_disc_init_msg+0x1f0/0x1f0 [tipc] [ 56.926278] [<ffffffffa000c280>] ? tipc_disc_init_msg+0x1f0/0x1f0 [tipc] [ 56.926283] [<ffffffffa000c2d6>] disc_timeout+0x56/0xd0 [tipc] [ 56.926288] [<ffffffff810e7a68>] call_timer_fn+0xb8/0x340 [ 56.926291] [<ffffffff810e79b5>] ? call_timer_fn+0x5/0x340 [ 56.926296] [<ffffffffa000c280>] ? tipc_disc_init_msg+0x1f0/0x1f0 [tipc] [ 56.926300] [<ffffffff810e8f4a>] run_timer_softirq+0x23a/0x390 [ 56.926306] [<ffffffff810f89ff>] ? clockevents_program_event+0x7f/0x130 [ 56.926316] [<ffffffff819727c3>] __do_softirq+0xc3/0x4a2 [ 56.926323] [<ffffffff8106ba5a>] irq_exit+0x8a/0xb0 [ 56.926327] [<ffffffff81972456>] smp_apic_timer_interrupt+0x46/0x60 [ 56.926331] [<ffffffff81970a49>] apic_timer_interrupt+0x89/0x90 [ 56.926333] <EOI> [<ffffffff81027fda>] ? default_idle+0x2a/0x1a0 [ 56.926340] [<ffffffff81027fd8>] ? default_idle+0x28/0x1a0 [ 56.926342] [<ffffffff810289cf>] arch_cpu_idle+0xf/0x20 [ 56.926345] [<ffffffff810adf0f>] default_idle_call+0x2f/0x50 [ 56.926347] [<ffffffff810ae145>] cpu_startup_entry+0x215/0x3e0 [ 56.926353] [<ffffffff81040ad9>] start_secondary+0xf9/0x100 The warning appears as rtnl_dereference() is wrongly used in tipc_l2_send_msg() under RCU read lock protection. Instead the proper usage should be that rcu_dereference_rtnl() is called here. Fixes: 5b7066c ("tipc: stricter filtering of packets in bearer layer") Acked-by: Jon Maloy <[email protected]> Signed-off-by: Ying Xue <[email protected]> Signed-off-by: David S. Miller <[email protected]>

On s390, there are two different hardware PMUs for counting and sampling. Previously, both PMUs have shared the perf_hw_context which is not correct and, recently, results in this warning: ------------[ cut here ]------------ WARNING: CPU: 5 PID: 1 at kernel/events/core.c:8485 perf_pmu_register+0x420/0x428 Modules linked in: CPU: 5 PID: 1 Comm: swapper/0 Not tainted 4.7.0-rc1+ #2 task: 00000009c5240000 ti: 00000009c5234000 task.ti: 00000009c5234000 Krnl PSW : 0704c00180000000 0000000000220c50 (perf_pmu_register+0x420/0x428) R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3 Krnl GPRS: ffffffffffffffff 0000000000b15ac6 0000000000000000 00000009cb440000 000000000022087a 0000000000000000 0000000000b78fa0 0000000000000000 0000000000a9aa90 0000000000000084 0000000000000005 000000000088a97a 0000000000000004 0000000000749dd0 000000000022087a 00000009c5237cc0 Krnl Code: 0000000000220c44: a7f4ff54 brc 15,220aec 0000000000220c48: 92011000 mvi 0(%r1),1 #0000000000220c4c: a7f40001 brc 15,220c4e >0000000000220c50: a7f4ff12 brc 15,220a74 0000000000220c54: 0707 bcr 0,%r7 0000000000220c56: 0707 bcr 0,%r7 0000000000220c58: ebdff0800024 stmg %r13,%r15,128(%r15) 0000000000220c5e: a7f13fe0 tmll %r15,16352 Call Trace: ([<000000000022087a>] perf_pmu_register+0x4a/0x428) ([<0000000000b2c25c>] init_cpum_sampling_pmu+0x14c/0x1f8) ([<0000000000100248>] do_one_initcall+0x48/0x140) ([<0000000000b25d26>] kernel_init_freeable+0x1e6/0x2a0) ([<000000000072bda4>] kernel_init+0x24/0x138) ([<000000000073495e>] kernel_thread_starter+0x6/0xc) ([<0000000000734958>] kernel_thread_starter+0x0/0xc) Last Breaking-Event-Address: [<0000000000220c4c>] perf_pmu_register+0x41c/0x428 ---[ end trace 0c6ef9f5b771ad97 ]--- Using the perf_sw_context is an option because the cpum_cf PMU does not use interrupts. To make this more clear, initialize the capabilities in the PMU structure. Signed-off-by: Hendrik Brueckner <[email protected]> Suggested-by: Peter Zijlstra <[email protected]> Acked-by: Heiko Carstens <[email protected]> Signed-off-by: Martin Schwidefsky <[email protected]>

While testing the deadline scheduler + cgroup setup I hit this warning. [ 132.612935] ------------[ cut here ]------------ [ 132.612951] WARNING: CPU: 5 PID: 0 at kernel/softirq.c:150 __local_bh_enable_ip+0x6b/0x80 [ 132.612952] Modules linked in: (a ton of modules...) [ 132.612981] CPU: 5 PID: 0 Comm: swapper/5 Not tainted 4.7.0-rc2 #2 [ 132.612981] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.2-20150714_191134- 04/01/2014 [ 132.612982] 0000000000000086 45c8bb5effdd088b ffff88013fd43da0 ffffffff813d229e [ 132.612984] 0000000000000000 0000000000000000 ffff88013fd43de0 ffffffff810a652b [ 132.612985] 00000096811387b5 0000000000000200 ffff8800bab29d80 ffff880034c54c00 [ 132.612986] Call Trace: [ 132.612987] <IRQ> [<ffffffff813d229e>] dump_stack+0x63/0x85 [ 132.612994] [<ffffffff810a652b>] __warn+0xcb/0xf0 [ 132.612997] [<ffffffff810e76a0>] ? push_dl_task.part.32+0x170/0x170 [ 132.612999] [<ffffffff810a665d>] warn_slowpath_null+0x1d/0x20 [ 132.613000] [<ffffffff810aba5b>] __local_bh_enable_ip+0x6b/0x80 [ 132.613008] [<ffffffff817d6c8a>] _raw_write_unlock_bh+0x1a/0x20 [ 132.613010] [<ffffffff817d6c9e>] _raw_spin_unlock_bh+0xe/0x10 [ 132.613015] [<ffffffff811388ac>] put_css_set+0x5c/0x60 [ 132.613016] [<ffffffff8113dc7f>] cgroup_free+0x7f/0xa0 [ 132.613017] [<ffffffff810a3912>] __put_task_struct+0x42/0x140 [ 132.613018] [<ffffffff810e776a>] dl_task_timer+0xca/0x250 [ 132.613027] [<ffffffff810e76a0>] ? push_dl_task.part.32+0x170/0x170 [ 132.613030] [<ffffffff8111371e>] __hrtimer_run_queues+0xee/0x270 [ 132.613031] [<ffffffff81113ec8>] hrtimer_interrupt+0xa8/0x190 [ 132.613034] [<ffffffff81051a58>] local_apic_timer_interrupt+0x38/0x60 [ 132.613035] [<ffffffff817d9b0d>] smp_apic_timer_interrupt+0x3d/0x50 [ 132.613037] [<ffffffff817d7c5c>] apic_timer_interrupt+0x8c/0xa0 [ 132.613038] <EOI> [<ffffffff81063466>] ? native_safe_halt+0x6/0x10 [ 132.613043] [<ffffffff81037a4e>] default_idle+0x1e/0xd0 [ 132.613044] [<ffffffff810381cf>] arch_cpu_idle+0xf/0x20 [ 132.613046] [<ffffffff810e8fda>] default_idle_call+0x2a/0x40 [ 132.613047] [<ffffffff810e92d7>] cpu_startup_entry+0x2e7/0x340 [ 132.613048] [<ffffffff81050235>] start_secondary+0x155/0x190 [ 132.613049] ---[ end trace f91934d162ce9977 ]--- The warn is the spin_(lock|unlock)_bh(&css_set_lock) in the interrupt context. Converting the spin_lock_bh to spin_lock_irq(save) to avoid this problem - and other problems of sharing a spinlock with an interrupt. Cc: Tejun Heo <[email protected]> Cc: Li Zefan <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Juri Lelli <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: [email protected] Cc: [email protected] # 4.5+ Cc: [email protected] Reviewed-by: Rik van Riel <[email protected]> Reviewed-by: "Luis Claudio R. Goncalves" <[email protected]> Signed-off-by: Daniel Bristot de Oliveira <[email protected]> Acked-by: Zefan Li <[email protected]> Signed-off-by: Tejun Heo <[email protected]>

mem_cgroup_migrate() uses local_irq_disable/enable() but can be called with irq disabled from migrate_page_copy(). This ends up enabling irq while holding a irq context lock triggering the following lockdep warning. Fix it by using irq_save/restore instead. ================================= [ INFO: inconsistent lock state ] 4.7.0-rc1+ #52 Tainted: G W --------------------------------- inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage. kcompactd0/151 [HC0[0]:SC0[0]:HE1:SE1] takes: (&(&ctx->completion_lock)->rlock){+.?.-.}, at: [<000000000038fd96>] aio_migratepage+0x156/0x1e8 {IN-SOFTIRQ-W} state was registered at: __lock_acquire+0x5b6/0x1930 lock_acquire+0xee/0x270 _raw_spin_lock_irqsave+0x66/0xb0 aio_complete+0x98/0x328 dio_complete+0xe4/0x1e0 blk_update_request+0xd4/0x450 scsi_end_request+0x48/0x1c8 scsi_io_completion+0x272/0x698 blk_done_softirq+0xca/0xe8 __do_softirq+0xc8/0x518 irq_exit+0xee/0x110 do_IRQ+0x6a/0x88 io_int_handler+0x11a/0x25c __mutex_unlock_slowpath+0x144/0x1d8 __mutex_unlock_slowpath+0x140/0x1d8 kernfs_iop_permission+0x64/0x80 __inode_permission+0x9e/0xf0 link_path_walk+0x6e/0x510 path_lookupat+0xc4/0x1a8 filename_lookup+0x9c/0x160 user_path_at_empty+0x5c/0x70 SyS_readlinkat+0x68/0x140 system_call+0xd6/0x270 irq event stamp: 971410 hardirqs last enabled at (971409): migrate_page_move_mapping+0x3ea/0x588 hardirqs last disabled at (971410): _raw_spin_lock_irqsave+0x3c/0xb0 softirqs last enabled at (970526): __do_softirq+0x460/0x518 softirqs last disabled at (970519): irq_exit+0xee/0x110 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&(&ctx->completion_lock)->rlock); <Interrupt> lock(&(&ctx->completion_lock)->rlock); *** DEADLOCK *** 3 locks held by kcompactd0/151: #0: (&(&mapping->private_lock)->rlock){+.+.-.}, at: aio_migratepage+0x42/0x1e8 #1: (&ctx->ring_lock){+.+.+.}, at: aio_migratepage+0x5a/0x1e8 #2: (&(&ctx->completion_lock)->rlock){+.?.-.}, at: aio_migratepage+0x156/0x1e8 stack backtrace: CPU: 20 PID: 151 Comm: kcompactd0 Tainted: G W 4.7.0-rc1+ #52 Call Trace: show_trace+0xea/0xf0 show_stack+0x72/0xf0 dump_stack+0x9a/0xd8 print_usage_bug.part.27+0x2d4/0x2e8 mark_lock+0x17e/0x758 mark_held_locks+0xa2/0xd0 trace_hardirqs_on_caller+0x140/0x1c0 mem_cgroup_migrate+0x266/0x370 aio_migratepage+0x16a/0x1e8 move_to_new_page+0xb0/0x260 migrate_pages+0x8f4/0x9f0 compact_zone+0x4dc/0xdc8 kcompactd_do_work+0x1aa/0x358 kcompactd+0xba/0x2c8 kthread+0x10a/0x110 kernel_thread_starter+0x6/0xc kernel_thread_starter+0x0/0xc INFO: lockdep is turned off. Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/g/[email protected] Fixes: 74485cf ("mm: migrate: consolidate mem_cgroup_migrate() calls") Signed-off-by: Tejun Heo <[email protected]> Reported-by: Christian Borntraeger <[email protected]> Acked-by: Johannes Weiner <[email protected]> Acked-by: Michal Hocko <[email protected]> Reviewed-by: Vladimir Davydov <[email protected]> Cc: <[email protected]> [4.5+] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>

An ENOMEM when creating a pair tty in tty_ldisc_setup causes a null pointer dereference in devpts_kill_index because tty->link->driver_data is NULL. The oops was triggered with the pty stressor in stress-ng when in a low memory condition. tty_init_dev tries to clean up a tty_ldisc_setup ENOMEM error by calling release_tty, however, this ultimately tries to clean up the NULL pair'd tty in pty_unix98_remove, triggering the Oops. Add check to pty_unix98_remove to only clean up fsi if it is not NULL. Ooops: [ 23.020961] Oops: 0000 [#1] SMP [ 23.020976] Modules linked in: ppdev snd_hda_codec_generic snd_hda_intel snd_hda_codec parport_pc snd_hda_core snd_hwdep parport snd_pcm input_leds joydev snd_timer serio_raw snd soundcore i2c_piix4 mac_hid ib_iser rdma_cm iw_cm ib_cm ib_core configfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi autofs4 btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel qxl aes_x86_64 ttm lrw gf128mul glue_helper ablk_helper drm_kms_helper cryptd syscopyarea sysfillrect psmouse sysimgblt floppy fb_sys_fops drm pata_acpi jitterentropy_rng drbg ansi_cprng [ 23.020978] CPU: 0 PID: 1452 Comm: stress-ng-pty Not tainted 4.7.0-rc4+ #2 [ 23.020978] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 [ 23.020979] task: ffff88007ba30000 ti: ffff880078ea8000 task.ti: ffff880078ea8000 [ 23.020981] RIP: 0010:[<ffffffff813f11ff>] [<ffffffff813f11ff>] ida_remove+0x1f/0x120 [ 23.020981] RSP: 0018:ffff880078eabb60 EFLAGS: 00010a03 [ 23.020982] RAX: 4444444444444567 RBX: 0000000000000000 RCX: 000000000000001f [ 23.020982] RDX: 000000000000014c RSI: 000000000000026f RDI: 0000000000000000 [ 23.020982] RBP: ffff880078eabb70 R08: 0000000000000004 R09: 0000000000000036 [ 23.020983] R10: 000000000000026f R11: 0000000000000000 R12: 000000000000026f [ 23.020983] R13: 000000000000026f R14: ffff88007c944b40 R15: 000000000000026f [ 23.020984] FS: 00007f9a2f3cc700(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000 [ 23.020984] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 23.020985] CR2: 0000000000000010 CR3: 000000006c81b000 CR4: 00000000001406f0 [ 23.020988] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 23.020988] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 23.020988] Stack: [ 23.020989] 0000000000000000 000000000000026f ffff880078eabb90 ffffffff812a5a99 [ 23.020990] 0000000000000000 00000000fffffff4 ffff880078eabba8 ffffffff814f9cbe [ 23.020991] ffff88007965c800 ffff880078eabbc8 ffffffff814eef43 fffffffffffffff4 [ 23.020991] Call Trace: [ 23.021000] [<ffffffff812a5a99>] devpts_kill_index+0x29/0x50 [ 23.021002] [<ffffffff814f9cbe>] pty_unix98_remove+0x2e/0x50 [ 23.021006] [<ffffffff814eef43>] release_tty+0xb3/0x1b0 [ 23.021007] [<ffffffff814f18d4>] tty_init_dev+0xd4/0x1c0 [ 23.021011] [<ffffffff814f9fae>] ptmx_open+0xae/0x190 [ 23.021013] [<ffffffff812254ef>] chrdev_open+0xbf/0x1b0 [ 23.021015] [<ffffffff8121d973>] do_dentry_open+0x203/0x310 [ 23.021016] [<ffffffff81225430>] ? cdev_put+0x30/0x30 [ 23.021017] [<ffffffff8121ee44>] vfs_open+0x54/0x80 [ 23.021018] [<ffffffff8122b8fc>] ? may_open+0x8c/0x100 [ 23.021019] [<ffffffff8122f26b>] path_openat+0x2eb/0x1440 [ 23.021020] [<ffffffff81230534>] ? putname+0x54/0x60 [ 23.021022] [<ffffffff814f6f97>] ? n_tty_ioctl_helper+0x27/0x100 [ 23.021023] [<ffffffff81231651>] do_filp_open+0x91/0x100 [ 23.021024] [<ffffffff81230596>] ? getname_flags+0x56/0x1f0 [ 23.021026] [<ffffffff8123fc66>] ? __alloc_fd+0x46/0x190 [ 23.021027] [<ffffffff8121f1e4>] do_sys_open+0x124/0x210 [ 23.021028] [<ffffffff8121f2ee>] SyS_open+0x1e/0x20 [ 23.021035] [<ffffffff81845576>] entry_SYSCALL_64_fastpath+0x1e/0xa8 [ 23.021044] Code: 63 28 45 31 e4 eb dd 0f 1f 44 00 00 55 4c 63 d6 48 ba 89 88 88 88 88 88 88 88 4c 89 d0 b9 1f 00 00 00 48 f7 e2 48 89 e5 41 54 53 <8b> 47 10 48 89 fb 8d 3c c5 00 00 00 00 48 c1 ea 09 b8 01 00 00 [ 23.021045] RIP [<ffffffff813f11ff>] ida_remove+0x1f/0x120 [ 23.021045] RSP <ffff880078eabb60> [ 23.021046] CR2: 0000000000000010 Signed-off-by: Colin Ian King <[email protected]> Cc: stable <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

Userspace can quite legitimately perform an exec() syscall with a suspended transaction. exec() does not return to the old process, rather it load a new one and starts that, the expectation therefore is that the new process starts not in a transaction. Currently exec() is not treated any differently to any other syscall which creates problems. Firstly it could allow a new process to start with a suspended transaction for a binary that no longer exists. This means that the checkpointed state won't be valid and if the suspended transaction were ever to be resumed and subsequently aborted (a possibility which is exceedingly likely as exec()ing will likely doom the transaction) the new process will jump to invalid state. Secondly the incorrect attempt to keep the transactional state while still zeroing state for the new process creates at least two TM Bad Things. The first triggers on the rfid to return to userspace as start_thread() has given the new process a 'clean' MSR but the suspend will still be set in the hardware MSR. The second TM Bad Thing triggers in __switch_to() as the processor is still transactionally suspended but __switch_to() wants to zero the TM sprs for the new process. This is an example of the outcome of calling exec() with a suspended transaction. Note the first 700 is likely the first TM bad thing decsribed earlier only the kernel can't report it as we've loaded userspace registers. c000000000009980 is the rfid in fast_exception_return() Bad kernel stack pointer 3fffcfa1a370 at c000000000009980 Oops: Bad kernel stack pointer, sig: 6 [#1] CPU: 0 PID: 2006 Comm: tm-execed Not tainted NIP: c000000000009980 LR: 0000000000000000 CTR: 0000000000000000 REGS: c00000003ffefd40 TRAP: 0700 Not tainted MSR: 8000000300201031 <SF,ME,IR,DR,LE,TM[SE]> CR: 00000000 XER: 00000000 CFAR: c0000000000098b4 SOFTE: 0 PACATMSCRATCH: b00000010000d033 GPR00: 0000000000000000 00003fffcfa1a370 0000000000000000 0000000000000000 GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR12: 00003fff966611c0 0000000000000000 0000000000000000 0000000000000000 NIP [c000000000009980] fast_exception_return+0xb0/0xb8 LR [0000000000000000] (null) Call Trace: Instruction dump: f84d0278 e9a100d8 7c7b03a6 e84101a0 7c4ff120 e8410170 7c5a03a6 e8010070 e8410080 e8610088 e8810090 e8210078 <4c000024> 48000000 e8610178 88ed023b Kernel BUG at c000000000043e80 [verbose debug info unavailable] Unexpected TM Bad Thing exception at c000000000043e80 (msr 0x201033) Oops: Unrecoverable exception, sig: 6 [#2] CPU: 0 PID: 2006 Comm: tm-execed Tainted: G D task: c0000000fbea6d80 ti: c00000003ffec000 task.ti: c0000000fb7ec000 NIP: c000000000043e80 LR: c000000000015a24 CTR: 0000000000000000 REGS: c00000003ffef7e0 TRAP: 0700 Tainted: G D MSR: 8000000300201033 <SF,ME,IR,DR,RI,LE,TM[SE]> CR: 28002828 XER: 00000000 CFAR: c000000000015a20 SOFTE: 0 PACATMSCRATCH: b00000010000d033 GPR00: 0000000000000000 c00000003ffefa60 c000000000db5500 c0000000fbead000 GPR04: 8000000300001033 2222222222222222 2222222222222222 00000000ff160000 GPR08: 0000000000000000 800000010000d033 c0000000fb7e3ea0 c00000000fe00004 GPR12: 0000000000002200 c00000000fe00000 0000000000000000 0000000000000000 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR20: 0000000000000000 0000000000000000 c0000000fbea7410 00000000ff160000 GPR24: c0000000ffe1f600 c0000000fbea8700 c0000000fbea8700 c0000000fbead000 GPR28: c000000000e20198 c0000000fbea6d80 c0000000fbeab680 c0000000fbea6d80 NIP [c000000000043e80] tm_restore_sprs+0xc/0x1c LR [c000000000015a24] __switch_to+0x1f4/0x420 Call Trace: Instruction dump: 7c800164 4e800020 7c0022a6 f80304a8 7c0222a6 f80304b0 7c0122a6 f80304b8 4e800020 e80304a8 7c0023a6 e80304b0 <7c0223a6> e80304b8 7c0123a6 4e800020 This fixes CVE-2016-5828. Fixes: bc2a940 ("powerpc: Hook in new transactional memory code") Cc: [email protected] # v3.9+ Signed-off-by: Cyril Bur <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>

The USB core contains a bug that can show up when a USB-3 host controller is removed. If the primary (USB-2) hcd structure is released before the shared (USB-3) hcd, the core will try to do a double-free of the common bandwidth_mutex. The problem was described in graphical form by Chung-Geol Kim, who first reported it: ================================================= At *remove USB(3.0) Storage sequence <1> --> <5> ((Problem Case)) ================================================= VOLD ------------------------------------|------------ (uevent) ________|_________ |<1> | |dwc3_otg_sm_work | |usb_put_hcd | |peer_hcd(kref=2)| |__________________| ________|_________ |<2> | |New USB BUS #2 | | | |peer_hcd(kref=1) | | | --(Link)-bandXX_mutex| | |__________________| | ___________________ | |<3> | | |dwc3_otg_sm_work | | |usb_put_hcd | | |primary_hcd(kref=1)| | |___________________| | _________|_________ | |<4> | | |New USB BUS #1 | | |hcd_release | | |primary_hcd(kref=0)| | | | | |bandXX_mutex(free) |<- |___________________| (( VOLD )) ______|___________ |<5> | | SCSI | |usb_put_hcd | |peer_hcd(kref=0) | |*hcd_release | |bandXX_mutex(free*)|<- double free |__________________| ================================================= This happens because hcd_release() frees the bandwidth_mutex whenever it sees a primary hcd being released (which is not a very good idea in any case), but in the course of releasing the primary hcd, it changes the pointers in the shared hcd in such a way that the shared hcd will appear to be primary when it gets released. This patch fixes the problem by changing hcd_release() so that it deallocates the bandwidth_mutex only when the _last_ hcd structure referencing it is released. The patch also removes an unnecessary test, so that when an hcd is released, both the shared_hcd and primary_hcd pointers in the hcd's peer will be cleared. Signed-off-by: Alan Stern <[email protected]> Reported-by: Chung-Geol Kim <[email protected]> Tested-by: Chung-Geol Kim <[email protected]> CC: <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

I found a race condition triggering VM_BUG_ON() in freeze_page(), when running a testcase with 3 processes: - process 1: keep writing thp, - process 2: keep clearing soft-dirty bits from virtual address of process 1 - process 3: call migratepages for process 1, The kernel message is like this: kernel BUG at /src/linux-dev/mm/huge_memory.c:3096! invalid opcode: 0000 [#1] SMP Modules linked in: cfg80211 rfkill crc32c_intel ppdev serio_raw pcspkr virtio_balloon virtio_console parport_pc parport pvpanic acpi_cpufreq tpm_tis tpm i2c_piix4 virtio_blk virtio_net ata_generic pata_acpi floppy virtio_pci virtio_ring virtio CPU: 0 PID: 28863 Comm: migratepages Not tainted 4.6.0-v4.6-160602-0827-+ #2 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 task: ffff880037320000 ti: ffff88007cdd0000 task.ti: ffff88007cdd0000 RIP: 0010:[<ffffffff811f8e06>] [<ffffffff811f8e06>] split_huge_page_to_list+0x496/0x590 RSP: 0018:ffff88007cdd3b70 EFLAGS: 00010202 RAX: 0000000000000001 RBX: ffff88007c7b88c0 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000700000200 RDI: ffffea0003188000 RBP: ffff88007cdd3bb8 R08: 0000000000000001 R09: 00003ffffffff000 R10: ffff880000000000 R11: ffffc000001fffff R12: ffffea0003188000 R13: ffffea0003188000 R14: 0000000000000000 R15: 0400000000000080 FS: 00007f8ec241d740(0000) GS:ffff88007dc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f8ec1f3ed20 CR3: 000000003707b000 CR4: 00000000000006f0 Call Trace: ? list_del+0xd/0x30 queue_pages_pte_range+0x4d1/0x590 __walk_page_range+0x204/0x4e0 walk_page_range+0x71/0xf0 queue_pages_range+0x75/0x90 ? queue_pages_hugetlb+0x190/0x190 ? new_node_page+0xc0/0xc0 ? change_prot_numa+0x40/0x40 migrate_to_node+0x71/0xd0 do_migrate_pages+0x1c3/0x210 SyS_migrate_pages+0x261/0x290 entry_SYSCALL_64_fastpath+0x1a/0xa4 Code: e8 b0 87 fb ff 0f 0b 48 c7 c6 30 32 9f 81 e8 a2 87 fb ff 0f 0b 48 c7 c6 b8 46 9f 81 e8 94 87 fb ff 0f 0b 85 c0 0f 84 3e fd ff ff <0f> 0b 85 c0 0f 85 a6 00 00 00 48 8b 75 c0 4c 89 f7 41 be f0 ff RIP split_huge_page_to_list+0x496/0x590 I'm not sure of the full scenario of the reproduction, but my debug showed that split_huge_pmd_address(freeze=true) returned without running main code of pmd splitting because pmd_present(*pmd) in precheck somehow returned 0. If this happens, the subsequent try_to_unmap() fails and returns non-zero (because page_mapcount() still > 0), and finally VM_BUG_ON() fires. This patch tries to fix it by prechecking pmd state inside ptl. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Naoya Horiguchi <[email protected]> Signed-off-by: Kirill A. Shutemov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>

When SCSI EH invokes zFCP's callbacks for eh_device_reset_handler() and eh_target_reset_handler(), it expects us to relent the ownership over the given scsi_cmnd and all other scsi_cmnds within the same scope - LUN or target - when returning with SUCCESS from the callback ('release' them). SCSI EH can then reuse those commands. We did not follow this rule to release commands upon SUCCESS; and if later a reply arrived for one of those supposed to be released commands, we would still make use of the scsi_cmnd in our ingress tasklet. This will at least result in undefined behavior or a kernel panic because of a wrong kernel pointer dereference. To fix this, we NULLify all pointers to scsi_cmnds (struct zfcp_fsf_req *)->data in the matching scope if a TMF was successful. This is done under the locks (struct zfcp_adapter *)->abort_lock and (struct zfcp_reqlist *)->lock to prevent the requests from being removed from the request-hashtable, and the ingress tasklet from making use of the scsi_cmnd-pointer in zfcp_fsf_fcp_cmnd_handler(). For cases where a reply arrives during SCSI EH, but before we get a chance to NULLify the pointer - but before we return from the callback -, we assume that the code is protected from races via the CAS operation in blk_complete_request() that is called in scsi_done(). The following stacktrace shows an example for a crash resulting from the previous behavior: Unable to handle kernel pointer dereference at virtual kernel address fffffee17a672000 Oops: 0038 [#1] SMP CPU: 2 PID: 0 Comm: swapper/2 Not tainted task: 00000003f7ff5be0 ti: 00000003f3d38000 task.ti: 00000003f3d38000 Krnl PSW : 0404d00180000000 00000000001156b0 (smp_vcpu_scheduled+0x18/0x40) R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 EA:3 Krnl GPRS: 000000200000007e 0000000000000000 fffffee17a671fd8 0000000300000015 ffffffff80000000 00000000005dfde8 07000003f7f80e00 000000004fa4e800 000000036ce8d8f8 000000036ce8d9c0 00000003ece8fe00 ffffffff969c9e93 00000003fffffffd 000000036ce8da10 00000000003bf134 00000003f3b07918 Krnl Code: 00000000001156a2: a7190000 lghi %r1,0 00000000001156a6: a7380015 lhi %r3,21 #00000000001156aa: e32050000008 ag %r2,0(%r5) >00000000001156b0: 482022b0 lh %r2,688(%r2) 00000000001156b4: ae123000 sigp %r1,%r2,0(%r3) 00000000001156b8: b2220020 ipm %r2 00000000001156bc: 8820001c srl %r2,28 00000000001156c0: c02700000001 xilf %r2,1 Call Trace: ([<0000000000000000>] 0x0) [<000003ff807bdb8e>] zfcp_fsf_fcp_cmnd_handler+0x3de/0x490 [zfcp] [<000003ff807be30a>] zfcp_fsf_req_complete+0x252/0x800 [zfcp] [<000003ff807c0a48>] zfcp_fsf_reqid_check+0xe8/0x190 [zfcp] [<000003ff807c194e>] zfcp_qdio_int_resp+0x66/0x188 [zfcp] [<000003ff80440c64>] qdio_kick_handler+0xdc/0x310 [qdio] [<000003ff804463d0>] __tiqdio_inbound_processing+0xf8/0xcd8 [qdio] [<0000000000141fd4>] tasklet_action+0x9c/0x170 [<0000000000141550>] __do_softirq+0xe8/0x258 [<000000000010ce0a>] do_softirq+0xba/0xc0 [<000000000014187c>] irq_exit+0xc4/0xe8 [<000000000046b526>] do_IRQ+0x146/0x1d8 [<00000000005d6a3c>] io_return+0x0/0x8 [<00000000005d6422>] vtime_stop_cpu+0x4a/0xa0 ([<0000000000000000>] 0x0) [<0000000000103d8a>] arch_cpu_idle+0xa2/0xb0 [<0000000000197f94>] cpu_startup_entry+0x13c/0x1f8 [<0000000000114782>] smp_start_secondary+0xda/0xe8 [<00000000005d6efe>] restart_int_handler+0x56/0x6c [<0000000000000000>] 0x0 Last Breaking-Event-Address: [<00000000003bf12e>] arch_spin_lock_wait+0x56/0xb0 Suggested-by: Steffen Maier <[email protected]> Signed-off-by: Benjamin Block <[email protected]> Fixes: ea127f9 ("[PATCH] s390 (7/7): zfcp host adapter.") (tglx/history.git) Cc: <[email protected]> #2.6.32+ Signed-off-by: Steffen Maier <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>

…t level Since quite a while, Linux issues enough SCSI commands per scsi_device which successfully return with FCP_RESID_UNDER, FSF_FCP_RSP_AVAILABLE, and SAM_STAT_GOOD. This floods the HBA trace area and we cannot see other and important HBA trace records long enough. Therefore, do not trace HBA response errors for pure benign residual under counts at the default trace level. This excludes benign residual under count combined with other validity bits set in FCP_RSP_IU, such as FCP_SNS_LEN_VAL. For all those other cases, we still do want to see both the HBA record and the corresponding SCSI record by default. Signed-off-by: Steffen Maier <[email protected]> Fixes: a54ca0f ("[SCSI] zfcp: Redesign of the debug tracing for HBA records.") Cc: <[email protected]> #2.6.37+ Reviewed-by: Benjamin Block <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>

It is unavoidable that zfcp_scsi_queuecommand() has to finish requests with DID_IMM_RETRY (like fc_remote_port_chkready()) during the time window when zfcp detected an unavailable rport but fc_remote_port_delete(), which is asynchronous via zfcp_scsi_schedule_rport_block(), has not yet blocked the rport. However, for the case when the rport becomes available again, we should prevent unblocking the rport too early. In contrast to other FCP LLDDs, zfcp has to open each LUN with the FCP channel hardware before it can send I/O to a LUN. So if a port already has LUNs attached and we unblock the rport just after port recovery, recoveries of LUNs behind this port can still be pending which in turn force zfcp_scsi_queuecommand() to unnecessarily finish requests with DID_IMM_RETRY. This also opens a time window with unblocked rport (until the followup LUN reopen recovery has finished). If a scsi_cmnd timeout occurs during this time window fc_timed_out() cannot work as desired and such command would indeed time out and trigger scsi_eh. This prevents a clean and timely path failover. This should not happen if the path issue can be recovered on FC transport layer such as path issues involving RSCNs. Fix this by only calling zfcp_scsi_schedule_rport_register(), to asynchronously trigger fc_remote_port_add(), after all LUN recoveries as children of the rport have finished and no new recoveries of equal or higher order were triggered meanwhile. Finished intentionally includes any recovery result no matter if successful or failed (still unblock rport so other successful LUNs work). For simplicity, we check after each finished LUN recovery if there is another LUN recovery pending on the same port and then do nothing. We handle the special case of a successful recovery of a port without LUN children the same way without changing this case's semantics. For debugging we introduce 2 new trace records written if the rport unblock attempt was aborted due to still unfinished or freshly triggered recovery. The records are only written above the default trace level. Benjamin noticed the important special case of new recovery that can be triggered between having given up the erp_lock and before calling zfcp_erp_action_cleanup() within zfcp_erp_strategy(). We must avoid the following sequence: ERP thread rport_work other context ------------------------- -------------- -------------------------------- port is unblocked, rport still blocked, due to pending/running ERP action, so ((port->status & ...UNBLOCK) != 0) and (port->rport == NULL) unlock ERP zfcp_erp_action_cleanup() case ZFCP_ERP_ACTION_REOPEN_LUN: zfcp_erp_try_rport_unblock() ((status & ...UNBLOCK) != 0) [OLD!] zfcp_erp_port_reopen() lock ERP zfcp_erp_port_block() port->status clear ...UNBLOCK unlock ERP zfcp_scsi_schedule_rport_block() port->rport_task = RPORT_DEL queue_work(rport_work) zfcp_scsi_rport_work() (port->rport_task != RPORT_ADD) port->rport_task = RPORT_NONE zfcp_scsi_rport_block() if (!port->rport) return zfcp_scsi_schedule_rport_register() port->rport_task = RPORT_ADD queue_work(rport_work) zfcp_scsi_rport_work() (port->rport_task == RPORT_ADD) port->rport_task = RPORT_NONE zfcp_scsi_rport_register() (port->rport == NULL) rport = fc_remote_port_add() port->rport = rport; Now the rport was erroneously unblocked while the zfcp_port is blocked. This is another situation we want to avoid due to scsi_eh potential. This state would at least remain until the new recovery from the other context finished successfully, or potentially forever if it failed. In order to close this race, we take the erp_lock inside zfcp_erp_try_rport_unblock() when checking the status of zfcp_port or LUN. With that, the possible corresponding rport state sequences would be: (unblock[ERP thread],block[other context]) if the ERP thread gets erp_lock first and still sees ((port->status & ...UNBLOCK) != 0), (block[other context],NOP[ERP thread]) if the ERP thread gets erp_lock after the other context has already cleard ...UNBLOCK from port->status. Since checking fields of struct erp_action is unsafe because they could have been overwritten (re-used for new recovery) meanwhile, we only check status of zfcp_port and LUN since these are only changed under erp_lock elsewhere. Regarding the check of the proper status flags (port or port_forced are similar to the shown adapter recovery): [zfcp_erp_adapter_shutdown()] zfcp_erp_adapter_reopen() zfcp_erp_adapter_block() * clear UNBLOCK ---------------------------------------+ zfcp_scsi_schedule_rports_block() | write_lock_irqsave(&adapter->erp_lock, flags);-------+ | zfcp_erp_action_enqueue() | | zfcp_erp_setup_act() | | * set ERP_INUSE -----------------------------------|--|--+ write_unlock_irqrestore(&adapter->erp_lock, flags);--+ | | .context-switch. | | zfcp_erp_thread() | | zfcp_erp_strategy() | | write_lock_irqsave(&adapter->erp_lock, flags);------+ | | ... | | | zfcp_erp_strategy_check_target() | | | zfcp_erp_strategy_check_adapter() | | | zfcp_erp_adapter_unblock() | | | * set UNBLOCK -----------------------------------|--+ | zfcp_erp_action_dequeue() | | * clear ERP_INUSE ---------------------------------|-----+ ... | write_unlock_irqrestore(&adapter->erp_lock, flags);-+ Hence, we should check for both UNBLOCK and ERP_INUSE because they are interleaved. Also we need to explicitly check ERP_FAILED for the link down case which currently does not clear the UNBLOCK flag in zfcp_fsf_link_down_info_eval(). Signed-off-by: Steffen Maier <[email protected]> Fixes: 8830271 ("[SCSI] zfcp: Dont fail SCSI commands when transitioning to blocked fc_rport") Fixes: a2fa0ae ("[SCSI] zfcp: Block FC transport rports early on errors") Fixes: 5f852be ("[SCSI] zfcp: Fix deadlock between zfcp ERP and SCSI") Fixes: 338151e ("[SCSI] zfcp: make use of fc_remote_port_delete when target port is unavailable") Fixes: 3859f6a ("[PATCH] zfcp: add rports to enable scsi_add_device to work again") Cc: <[email protected]> #2.6.32+ Reviewed-by: Benjamin Block <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>

…s enabled Nils Holland and Klaus Ethgen have reported unexpected OOM killer invocations with 32b kernel starting with 4.8 kernels kworker/u4:5 invoked oom-killer: gfp_mask=0x2400840(GFP_NOFS|__GFP_NOFAIL), nodemask=0, order=0, oom_score_adj=0 kworker/u4:5 cpuset=/ mems_allowed=0 CPU: 1 PID: 2603 Comm: kworker/u4:5 Not tainted 4.9.0-gentoo #2 [...] Mem-Info: active_anon:58685 inactive_anon:90 isolated_anon:0 active_file:274324 inactive_file:281962 isolated_file:0 unevictable:0 dirty:649 writeback:0 unstable:0 slab_reclaimable:40662 slab_unreclaimable:17754 mapped:7382 shmem:202 pagetables:351 bounce:0 free:206736 free_pcp:332 free_cma:0 Node 0 active_anon:234740kB inactive_anon:360kB active_file:1097296kB inactive_file:1127848kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:29528kB dirty:2596kB writeback:0kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 184320kB anon_thp: 808kB writeback_tmp:0kB unstable:0kB pages_scanned:0 all_unreclaimable? no DMA free:3952kB min:788kB low:984kB high:1180kB active_anon:0kB inactive_anon:0kB active_file:7316kB inactive_file:0kB unevictable:0kB writepending:96kB present:15992kB managed:15916kB mlocked:0kB slab_reclaimable:3200kB slab_unreclaimable:1408kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB lowmem_reserve[]: 0 813 3474 3474 Normal free:41332kB min:41368kB low:51708kB high:62048kB active_anon:0kB inactive_anon:0kB active_file:532748kB inactive_file:44kB unevictable:0kB writepending:24kB present:897016kB managed:836248kB mlocked:0kB slab_reclaimable:159448kB slab_unreclaimable:69608kB kernel_stack:1112kB pagetables:1404kB bounce:0kB free_pcp:528kB local_pcp:340kB free_cma:0kB lowmem_reserve[]: 0 0 21292 21292 HighMem free:781660kB min:512kB low:34356kB high:68200kB active_anon:234740kB inactive_anon:360kB active_file:557232kB inactive_file:1127804kB unevictable:0kB writepending:2592kB present:2725384kB managed:2725384kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:800kB local_pcp:608kB free_cma:0kB the oom killer is clearly pre-mature because there there is still a lot of page cache in the zone Normal which should satisfy this lowmem request. Further debugging has shown that the reclaim cannot make any forward progress because the page cache is hidden in the active list which doesn't get rotated because inactive_list_is_low is not memcg aware. The code simply subtracts per-zone highmem counters from the respective memcg's lru sizes which doesn't make any sense. We can simply end up always seeing the resulting active and inactive counts 0 and return false. This issue is not limited to 32b kernels but in practice the effect on systems without CONFIG_HIGHMEM would be much harder to notice because we do not invoke the OOM killer for allocations requests targeting < ZONE_NORMAL. Fix the issue by tracking per zone lru page counts in mem_cgroup_per_node and subtract per-memcg highmem counts when memcg is enabled. Introduce helper lruvec_zone_lru_size which redirects to either zone counters or mem_cgroup_get_zone_lru_size when appropriate. We are losing empty LRU but non-zero lru size detection introduced by ca70723 ("mm: update_lru_size warn and reset bad lru_size") because of the inherent zone vs. node discrepancy. Fixes: f8d1a31 ("mm: consider whether to decivate based on eligible zones inactive ratio") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Michal Hocko <[email protected]> Reported-by: Nils Holland <[email protected]> Tested-by: Nils Holland <[email protected]> Reported-by: Klaus Ethgen <[email protected]> Acked-by: Minchan Kim <[email protected]> Acked-by: Mel Gorman <[email protected]> Acked-by: Johannes Weiner <[email protected]> Reviewed-by: Vladimir Davydov <[email protected]> Cc: <[email protected]> [4.8+] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>

commit ab2a4bf upstream. The USB core contains a bug that can show up when a USB-3 host controller is removed. If the primary (USB-2) hcd structure is released before the shared (USB-3) hcd, the core will try to do a double-free of the common bandwidth_mutex. The problem was described in graphical form by Chung-Geol Kim, who first reported it: ================================================= At *remove USB(3.0) Storage sequence <1> --> <5> ((Problem Case)) ================================================= VOLD ------------------------------------|------------ (uevent) ________|_________ |<1> | |dwc3_otg_sm_work | |usb_put_hcd | |peer_hcd(kref=2)| |__________________| ________|_________ |<2> | |New USB BUS #2 | | | |peer_hcd(kref=1) | | | --(Link)-bandXX_mutex| | |__________________| | ___________________ | |<3> | | |dwc3_otg_sm_work | | |usb_put_hcd | | |primary_hcd(kref=1)| | |___________________| | _________|_________ | |<4> | | |New USB BUS #1 | | |hcd_release | | |primary_hcd(kref=0)| | | | | |bandXX_mutex(free) |<- |___________________| (( VOLD )) ______|___________ |<5> | | SCSI | |usb_put_hcd | |peer_hcd(kref=0) | |*hcd_release | |bandXX_mutex(free*)|<- double free |__________________| ================================================= This happens because hcd_release() frees the bandwidth_mutex whenever it sees a primary hcd being released (which is not a very good idea in any case), but in the course of releasing the primary hcd, it changes the pointers in the shared hcd in such a way that the shared hcd will appear to be primary when it gets released. This patch fixes the problem by changing hcd_release() so that it deallocates the bandwidth_mutex only when the _last_ hcd structure referencing it is released. The patch also removes an unnecessary test, so that when an hcd is released, both the shared_hcd and primary_hcd pointers in the hcd's peer will be cleared. Signed-off-by: Alan Stern <[email protected]> Reported-by: Chung-Geol Kim <[email protected]> Tested-by: Chung-Geol Kim <[email protected]> Cc: Sumit Semwal <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

In the check_only mode, we should not make any dirty node pages. Otherwise, we can get this panic: F2FS-fs (nvme0n1p1): Need to recover fsync data ------------[ cut here ]------------ kernel BUG at fs/f2fs/node.c:2204! CPU: 7 PID: 19923 Comm: mount Tainted: G OE 4.9.8 #2 RIP: 0010:[<ffffffffc0979c0b>] [<ffffffffc0979c0b>] flush_nat_entries+0x43b/0x7d0 [f2fs] Call Trace: [<ffffffffc096ddaa>] ? __f2fs_submit_merged_bio+0x5a/0xd0 [f2fs] [<ffffffffc096ddaa>] ? __f2fs_submit_merged_bio+0x5a/0xd0 [f2fs] [<ffffffffc096dddb>] ? __f2fs_submit_merged_bio+0x8b/0xd0 [f2fs] [<ffffffff860e450f>] ? up_write+0x1f/0x40 [<ffffffffc096dddb>] ? __f2fs_submit_merged_bio+0x8b/0xd0 [f2fs] [<ffffffffc0969f04>] write_checkpoint+0x2f4/0xf20 [f2fs] [<ffffffff860e938d>] ? trace_hardirqs_on+0xd/0x10 [<ffffffffc0960bc9>] ? f2fs_sync_fs+0x79/0x190 [f2fs] [<ffffffffc0960bc9>] ? f2fs_sync_fs+0x79/0x190 [f2fs] [<ffffffffc0960bd5>] f2fs_sync_fs+0x85/0x190 [f2fs] [<ffffffffc097b6de>] f2fs_balance_fs_bg+0x7e/0x1c0 [f2fs] [<ffffffffc0977b64>] f2fs_write_node_pages+0x34/0x350 [f2fs] [<ffffffff860e5f42>] ? __lock_is_held+0x52/0x70 [<ffffffff861d9b31>] do_writepages+0x21/0x30 [<ffffffff86298ce1>] __writeback_single_inode+0x61/0x760 [<ffffffff86909127>] ? _raw_spin_unlock+0x27/0x40 [<ffffffff8629a735>] writeback_single_inode+0xd5/0x190 [<ffffffff8629a889>] write_inode_now+0x99/0xc0 [<ffffffff86283876>] iput+0x1f6/0x2c0 [<ffffffffc0964b52>] f2fs_fill_super+0xc32/0x10c0 [f2fs] [<ffffffff86266462>] mount_bdev+0x182/0x1b0 [<ffffffffc0963f20>] ? f2fs_commit_super+0x100/0x100 [f2fs] [<ffffffffc0960da5>] f2fs_mount+0x15/0x20 [f2fs] [<ffffffff86266e08>] mount_fs+0x38/0x170 [<ffffffff86288bab>] vfs_kern_mount+0x6b/0x160 [<ffffffff8628bcfe>] do_mount+0x1be/0xd60 Signed-off-by: Jaegeuk Kim <[email protected]>

…e_pmd() commit c9d398f upstream. I found the race condition which triggers the following bug when move_pages() and soft offline are called on a single hugetlb page concurrently. Soft offlining page 0x119400 at 0x700000000000 BUG: unable to handle kernel paging request at ffffea0011943820 IP: follow_huge_pmd+0x143/0x190 PGD 7ffd2067 PUD 7ffd1067 PMD 0 [61163.582052] Oops: 0000 [#1] SMP Modules linked in: binfmt_misc ppdev virtio_balloon parport_pc pcspkr i2c_piix4 parport i2c_core acpi_cpufreq ip_tables xfs libcrc32c ata_generic pata_acpi virtio_blk 8139too crc32c_intel ata_piix serio_raw libata virtio_pci 8139cp virtio_ring virtio mii floppy dm_mirror dm_region_hash dm_log dm_mod [last unloaded: cap_check] CPU: 0 PID: 22573 Comm: iterate_numa_mo Tainted: P OE 4.11.0-rc2-mm1+ #2 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:follow_huge_pmd+0x143/0x190 RSP: 0018:ffffc90004bdbcd0 EFLAGS: 00010202 RAX: 0000000465003e80 RBX: ffffea0004e34d30 RCX: 00003ffffffff000 RDX: 0000000011943800 RSI: 0000000000080001 RDI: 0000000465003e80 RBP: ffffc90004bdbd18 R08: 0000000000000000 R09: ffff880138d34000 R10: ffffea0004650000 R11: 0000000000c363b0 R12: ffffea0011943800 R13: ffff8801b8d34000 R14: ffffea0000000000 R15: 000077ff80000000 FS: 00007fc977710740(0000) GS:ffff88007dc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffea0011943820 CR3: 000000007a746000 CR4: 00000000001406f0 Call Trace: follow_page_mask+0x270/0x550 SYSC_move_pages+0x4ea/0x8f0 SyS_move_pages+0xe/0x10 do_syscall_64+0x67/0x180 entry_SYSCALL64_slow_path+0x25/0x25 RIP: 0033:0x7fc976e03949 RSP: 002b:00007ffe72221d88 EFLAGS: 00000246 ORIG_RAX: 0000000000000117 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fc976e03949 RDX: 0000000000c22390 RSI: 0000000000001400 RDI: 0000000000005827 RBP: 00007ffe72221e00 R08: 0000000000c2c3a0 R09: 0000000000000004 R10: 0000000000c363b0 R11: 0000000000000246 R12: 0000000000400650 R13: 00007ffe72221ee0 R14: 0000000000000000 R15: 0000000000000000 Code: 81 e4 ff ff 1f 00 48 21 c2 49 c1 ec 0c 48 c1 ea 0c 4c 01 e2 49 bc 00 00 00 00 00 ea ff ff 48 c1 e2 06 49 01 d4 f6 45 bc 04 74 90 <49> 8b 7c 24 20 40 f6 c7 01 75 2b 4c 89 e7 8b 47 1c 85 c0 7e 2a RIP: follow_huge_pmd+0x143/0x190 RSP: ffffc90004bdbcd0 CR2: ffffea0011943820 ---[ end trace e4f81353a2d23232 ]--- Kernel panic - not syncing: Fatal exception Kernel Offset: disabled This bug is triggered when pmd_present() returns true for non-present hugetlb, so fixing the present check in follow_huge_pmd() prevents it. Using pmd_present() to determine present/non-present for hugetlb is not correct, because pmd_present() checks multiple bits (not only _PAGE_PRESENT) for historical reason and it can misjudge hugetlb state. Fixes: e66f17f ("mm/hugetlb: take page table lock in follow_huge_pmd()") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Naoya Horiguchi <[email protected]> Acked-by: Hillf Danton <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Michal Hocko <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Gerald Schaefer <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

commit 0beb201 upstream. Holding the reconfig_mutex over a potential userspace fault sets up a lockdep dependency chain between filesystem-DAX and the libnvdimm ioctl path. Move the user access outside of the lock. [ INFO: possible circular locking dependency detected ] 4.11.0-rc3+ #13 Tainted: G W O ------------------------------------------------------- fallocate/16656 is trying to acquire lock: (&nvdimm_bus->reconfig_mutex){+.+.+.}, at: [<ffffffffa00080b1>] nvdimm_bus_lock+0x21/0x30 [libnvdimm] but task is already holding lock: (jbd2_handle){++++..}, at: [<ffffffff813b4944>] start_this_handle+0x104/0x460 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (jbd2_handle){++++..}: lock_acquire+0xbd/0x200 start_this_handle+0x16a/0x460 jbd2__journal_start+0xe9/0x2d0 __ext4_journal_start_sb+0x89/0x1c0 ext4_dirty_inode+0x32/0x70 __mark_inode_dirty+0x235/0x670 generic_update_time+0x87/0xd0 touch_atime+0xa9/0xd0 ext4_file_mmap+0x90/0xb0 mmap_region+0x370/0x5b0 do_mmap+0x415/0x4f0 vm_mmap_pgoff+0xd7/0x120 SyS_mmap_pgoff+0x1c5/0x290 SyS_mmap+0x22/0x30 entry_SYSCALL_64_fastpath+0x1f/0xc2 -> #1 (&mm->mmap_sem){++++++}: lock_acquire+0xbd/0x200 __might_fault+0x70/0xa0 __nd_ioctl+0x683/0x720 [libnvdimm] nvdimm_ioctl+0x8b/0xe0 [libnvdimm] do_vfs_ioctl+0xa8/0x740 SyS_ioctl+0x79/0x90 do_syscall_64+0x6c/0x200 return_from_SYSCALL_64+0x0/0x7a -> #0 (&nvdimm_bus->reconfig_mutex){+.+.+.}: __lock_acquire+0x16b6/0x1730 lock_acquire+0xbd/0x200 __mutex_lock+0x88/0x9b0 mutex_lock_nested+0x1b/0x20 nvdimm_bus_lock+0x21/0x30 [libnvdimm] nvdimm_forget_poison+0x25/0x50 [libnvdimm] nvdimm_clear_poison+0x106/0x140 [libnvdimm] pmem_do_bvec+0x1c2/0x2b0 [nd_pmem] pmem_make_request+0xf9/0x270 [nd_pmem] generic_make_request+0x118/0x3b0 submit_bio+0x75/0x150 Fixes: 62232e4 ("libnvdimm: control (ioctl) messages for nvdimm_bus and nvdimm devices") Cc: Dave Jiang <[email protected]> Reported-by: Vishal Verma <[email protected]> Signed-off-by: Dan Williams <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

Disable spi2 in dts. Enable it will cause kernel panic [ 1.334797] Unhandled fault: synchronous external abort (0x96000210) at 0xffff0000096ccfe0 [ 1.343078] Internal error: : 96000210 [#1] PREEMPT SMP [ 1.348296] Modules linked in: [ 1.351346] CPU: 2 PID: 1 Comm: swapper/0 Tainted: G S 4.12.0-rc1linaro-hikey960+ #2 [ 1.360211] Hardware name: HiKey960 (DT) [ 1.364124] task: ffff8000bb278000 task.stack: ffff8000bb280000 [ 1.370044] PC is at amba_device_try_add+0xe0/0x29c [ 1.374914] LR is at amba_device_try_add+0xd0/0x29c [ 1.379783] pc : [<ffff0000084e485c>] lr : [<ffff0000084e484c>] pstate: 60000045 [ 1.387172] sp : ffff8000bb283bd0 [ 1.390478] x29: ffff8000bb283bd0 x28: ffff000008eadb30 [ 1.395783] x27: ffff000008e39088 x26: 0000000000000000 [ 1.401089] x25: ffff8000bad38810 x24: ffff8000befe05e0 [ 1.406394] x23: ffff0000096cc000 x22: ffff8000baee66f8 [ 1.411699] x21: 0000000000001000 x20: 0000000000000000 [ 1.417004] x19: ffff8000baee6400 x18: 00000000000041ff [ 1.422309] x17: 0000000000000000 x16: 0000000000000001 [ 1.427614] x15: 0000000000000011 x14: ffffffffffffffff [ 1.432919] x13: 0000000000000020 x12: 0101010101010101 [ 1.438224] x11: 0000000000000020 x10: 0101010101010101 [ 1.443530] x9 : 0000000000000000 x8 : ffff8000baef6d00 [ 1.448835] x7 : 0000000000000000 x6 : 0000000000000000 [ 1.454140] x5 : 0000000000000000 x4 : 0000000000000000 [ 1.459444] x3 : 0000000000000000 x2 : 00000000000001eb [ 1.464749] x1 : 0000000000000000 x0 : ffff0000096ccfe0 [ 1.470055] Process swapper/0 (pid: 1, stack limit = 0xffff8000bb280000) [ 1.476749] Stack: (0xffff8000bb283bd0 to 0xffff8000bb284000) [ 1.482488] 3bc0: ffff8000bb283c10 ffff0000084e4a34 [ 1.490312] 3be0: ffff000008edca20 ffff8000baee6400 0000000000000000 ffff8000baee6768 [ 1.498136] 3c00: ffff8000baee6400 ffff8000befe05e0 ffff8000bb283c40 ffff000008852e44 [ 1.505959] 3c20: 0000000000000009 ffff8000befe0548 0000000000000000 ffff8000baee6768 [ 1.513783] 3c40: ffff8000bb283cd0 ffff000008852e84 ffff8000befe0548 ffff8000befcfd30 [ 1.521607] 3c60: ffff8000bad38810 0000000000000001 0000000000000000 ffff000008b12178 [ 1.529430] 3c80: ffff000008ced4c0 0000000000000000 ffff8000bb283cd0 ffff000008852eb0 [ 1.537254] 3ca0: ffff8000befe0548 ffff8000befcfd30 ffff8000bad38810 ffff8000befcfd30 [ 1.545077] 3cc0: ffff000008ced4c0 0000000000000000 ffff8000bb283d60 ffff000008853250 [ 1.552901] 3ce0: ffff8000befcfd30 ffff8000befcc1f0 ffff000008ced4c0 0000000000000000 [ 1.560724] 3d00: ffff000008b12178 0000000000000000 0000000000000000 ffff000008e39070 [ 1.568548] 3d20: ffff8000bb283d60 ffff00000885322c ffff8000befcfd30 ffff8000befcc1f0 [ 1.576372] 3d40: ffff000008ced4c0 ffff8000befcc1f0 ffff000008ced4c0 0000000000000000 [ 1.584195] 3d60: ffff8000bb283db0 ffff000008dffe34 ffff8000bb278000 ffff000008dffdcc [ 1.592019] 3d80: 0000000000000000 0000000000000003 ffff000008fd8000 ffff000008e39100 [ 1.599842] 3da0: ffff000008db0470 0000000000000040 ffff8000bb283dc0 ffff000008083130 [ 1.607666] 3dc0: ffff8000bb283e30 ffff000008db0d90 000000000000011f ffff000008fd8000 [ 1.615489] 3de0: ffff000008da6ba8 0000000000000003 ffff8000bb283e00 ffff000008c862e8 [ 1.623313] 3e00: ffff000008c85a48 0000000300000003 0000000000000000 0000000000000000 [ 1.631135] 3e20: ffff8000befff768 ffff8000befff770 ffff8000bb283e90 ffff0000089a1afc [ 1.638959] 3e40: ffff0000089a1aec 0000000000000000 0000000000000000 0000000000000000 [ 1.646783] 3e60: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 1.654606] 3e80: 0000000000000000 0000000000000000 0000000000000000 ffff000008082ec0 [ 1.662429] 3ea0: ffff0000089a1aec 0000000000000000 ffff0000089a1aec 0000000000000000 [ 1.670252] 3ec0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 1.678075] 3ee0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 1.685898] 3f00: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 1.693721] 3f20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 1.701545] 3f40: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 1.709368] 3f60: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 1.717191] 3f80: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 1.725014] 3fa0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 1.732837] 3fc0: 0000000000000000 0000000000000005 0000000000000000 0000000000000000 [ 1.740660] 3fe0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 1.748483] Call trace: [ 1.750922] Exception stack(0xffff8000bb283a00 to 0xffff8000bb283b30) [ 1.757356] 3a00: ffff8000baee6400 0001000000000000 ffff8000bb283bd0 ffff0000084e485c [ 1.765179] 3a20: 0000000000000007 ffff800000000000 ffff0000096ccfe0 ffff0000084e943c [ 1.773003] 3a40: ffff8000baef6c80 ffff8000bb82f380 ffff8000bb283a60 ffff0000084e71b0 [ 1.780827] 3a60: ffff8000bb283a70 ffff0000084e9470 ffff8000bb283aa0 ffff0000084e9eb8 [ 1.788650] 3a80: ffff8000bb283a90 ffff0000083fde20 ffff8000bb283aa0 ffff0000084e9e48 [ 1.796473] 3aa0: ffff0000096ccfe0 0000000000000000 00000000000001eb 0000000000000000 [ 1.804296] 3ac0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 1.812119] 3ae0: ffff8000baef6d00 0000000000000000 0101010101010101 0000000000000020 [ 1.819942] 3b00: 0101010101010101 0000000000000020 ffffffffffffffff 0000000000000011 [ 1.827765] 3b20: 0000000000000001 0000000000000000 [ 1.832636] [<ffff0000084e485c>] amba_device_try_add+0xe0/0x29c [ 1.838549] [<ffff0000084e4a34>] amba_device_add+0x1c/0xd8 [ 1.844033] [<ffff000008852e44>] of_platform_bus_create.part.5+0x21c/0x320 [ 1.850901] [<ffff000008852e84>] of_platform_bus_create.part.5+0x25c/0x320 [ 1.857769] [<ffff000008853250>] of_platform_populate+0x94/0xec [ 1.863687] [<ffff000008dffe34>] of_platform_default_populate_init+0x68/0x74 [ 1.870731] [<ffff000008083130>] do_one_initcall+0x38/0x124 [ 1.876299] [<ffff000008db0d90>] kernel_init_freeable+0x1a4/0x240 [ 1.882390] [<ffff0000089a1afc>] kernel_init+0x10/0x13c [ 1.887608] [<ffff000008082ec0>] ret_from_fork+0x10/0x50 [ 1.892916] Code: 2a0003f4 350007e0 d10082a0 8b0002e0 (b9400000) [ 1.899051] ---[ end trace 7b63cfe058bbfed5 ]--- [ 1.903678] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b Signed-off-by: Guodong Xu <[email protected]>

[ Upstream commit ddc665a ] When the instruction right before the branch destination is a 64 bit load immediate, we currently calculate the wrong jump offset in the ctx->offset[] array as we only account one instruction slot for the 64 bit load immediate although it uses two BPF instructions. Fix it up by setting the offset into the right slot after we incremented the index. Before (ldimm64 test 1): [...] 00000020: 52800007 mov w7, #0x0 // #0 00000024: d2800060 mov x0, #0x3 // #3 00000028: d2800041 mov x1, #0x2 // #2 0000002c: eb01001f cmp x0, x1 00000030: 54ffff82 b.cs 0x00000020 00000034: d29fffe7 mov x7, #0xffff // #65535 00000038: f2bfffe7 movk x7, #0xffff, lsl #16 0000003c: f2dfffe7 movk x7, #0xffff, lsl #32 00000040: f2ffffe7 movk x7, #0xffff, lsl #48 00000044: d29dddc7 mov x7, #0xeeee // #61166 00000048: f2bdddc7 movk x7, #0xeeee, lsl #16 0000004c: f2ddddc7 movk x7, #0xeeee, lsl #32 00000050: f2fdddc7 movk x7, #0xeeee, lsl #48 [...] After (ldimm64 test 1): [...] 00000020: 52800007 mov w7, #0x0 // #0 00000024: d2800060 mov x0, #0x3 // #3 00000028: d2800041 mov x1, #0x2 // #2 0000002c: eb01001f cmp x0, x1 00000030: 540000a2 b.cs 0x00000044 00000034: d29fffe7 mov x7, #0xffff // #65535 00000038: f2bfffe7 movk x7, #0xffff, lsl #16 0000003c: f2dfffe7 movk x7, #0xffff, lsl #32 00000040: f2ffffe7 movk x7, #0xffff, lsl #48 00000044: d29dddc7 mov x7, #0xeeee // #61166 00000048: f2bdddc7 movk x7, #0xeeee, lsl #16 0000004c: f2ddddc7 movk x7, #0xeeee, lsl #32 00000050: f2fdddc7 movk x7, #0xeeee, lsl #48 [...] Also, add a couple of test cases to make sure JITs pass this test. Tested on Cavium ThunderX ARMv8. The added test cases all pass after the fix. Fixes: 8eee539 ("arm64: bpf: fix out-of-bounds read in bpf2a64_offset()") Reported-by: David S. Miller <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Alexei Starovoitov <[email protected]> Cc: Xi Wang <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

this bug happened when amdgpu load failed. [ 75.740951] BUG: unable to handle kernel paging request at 00000000000031c0 [ 75.748167] IP: [<ffffffffa064a0e0>] amdgpu_fbdev_restore_mode+0x20/0x60 [amdgpu] [ 75.755774] PGD 0 [ 75.759185] Oops: 0000 [#1] SMP [ 75.762408] Modules linked in: amdgpu(OE-) ttm(OE) drm_kms_helper(OE) drm(OE) i2c_algo_bit(E) fb_sys_fops(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) rpcsec_gss_krb5(E) nfsv4(E) nfs(E) fscache(E) eeepc_wmi(E) asus_wmi(E) sparse_keymap(E) intel_rapl(E) snd_hda_codec_hdmi(E) snd_hda_codec_realtek(E) snd_hda_codec_generic(E) snd_hda_intel(E) snd_hda_codec(E) snd_hda_core(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) snd_hwdep(E) snd_pcm(E) snd_seq_midi(E) coretemp(E) kvm_intel(E) snd_seq_midi_event(E) snd_rawmidi(E) kvm(E) snd_seq(E) joydev(E) snd_seq_device(E) snd_timer(E) irqbypass(E) crct10dif_pclmul(E) crc32_pclmul(E) mei_me(E) ghash_clmulni_intel(E) snd(E) aesni_intel(E) mei(E) soundcore(E) aes_x86_64(E) shpchp(E) serio_raw(E) lrw(E) acpi_pad(E) gf128mul(E) glue_helper(E) ablk_helper(E) mac_hid(E) [ 75.835574] cryptd(E) parport_pc(E) ppdev(E) lp(E) nfsd(E) parport(E) auth_rpcgss(E) nfs_acl(E) lockd(E) grace(E) sunrpc(E) autofs4(E) hid_generic(E) usbhid(E) mxm_wmi(E) psmouse(E) e1000e(E) ptp(E) pps_core(E) ahci(E) libahci(E) wmi(E) video(E) i2c_hid(E) hid(E) [ 75.858489] CPU: 5 PID: 1603 Comm: rmmod Tainted: G OE 4.9.0-custom #2 [ 75.866183] Hardware name: System manufacturer System Product Name/Z170-A, BIOS 0901 08/31/2015 [ 75.875050] task: ffff88045d1bbb80 task.stack: ffffc90002de4000 [ 75.881094] RIP: 0010:[<ffffffffa064a0e0>] [<ffffffffa064a0e0>] amdgpu_fbdev_restore_mode+0x20/0x60 [amdgpu] [ 75.891238] RSP: 0018:ffffc90002de7d48 EFLAGS: 00010286 [ 75.896648] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000001 [ 75.903933] RDX: 0000000000000000 RSI: ffff88045d1bbb80 RDI: 0000000000000286 [ 75.911183] RBP: ffffc90002de7d50 R08: 0000000000000502 R09: 0000000000000004 [ 75.918449] R10: 0000000000000000 R11: 0000000000000001 R12: ffff880464bf0000 [ 75.925675] R13: ffffffffa0853000 R14: 0000000000000000 R15: 0000564e44f88210 [ 75.932980] FS: 00007f13d5400700(0000) GS:ffff880476540000(0000) knlGS:0000000000000000 [ 75.941238] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 75.947088] CR2: 00000000000031c0 CR3: 000000045fd0b000 CR4: 00000000003406e0 [ 75.954332] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 75.961566] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 75.968834] Stack: [ 75.970881] ffff880464bf0000 ffffc90002de7d60 ffffffffa0636592 ffffc90002de7d80 [ 75.978454] ffffffffa059015f ffff880464bf0000 ffff880464bf0000 ffffc90002de7da8 [ 75.986076] ffffffffa0595216 ffff880464bf0000 ffff880460f4d000 ffffffffa0853000 [ 75.993692] Call Trace: [ 75.996177] [<ffffffffa0636592>] amdgpu_driver_lastclose_kms+0x12/0x20 [amdgpu] [ 76.003700] [<ffffffffa059015f>] drm_lastclose+0x2f/0xd0 [drm] [ 76.009777] [<ffffffffa0595216>] drm_dev_unregister+0x16/0xd0 [drm] [ 76.016255] [<ffffffffa0595944>] drm_put_dev+0x34/0x70 [drm] [ 76.022139] [<ffffffffa062f365>] amdgpu_pci_remove+0x15/0x20 [amdgpu] [ 76.028800] [<ffffffff81416499>] pci_device_remove+0x39/0xc0 [ 76.034661] [<ffffffff81531caa>] __device_release_driver+0x9a/0x140 [ 76.041121] [<ffffffff81531e58>] driver_detach+0xb8/0xc0 [ 76.046575] [<ffffffff81530c95>] bus_remove_driver+0x55/0xd0 [ 76.052401] [<ffffffff815325fc>] driver_unregister+0x2c/0x50 [ 76.058244] [<ffffffff81416289>] pci_unregister_driver+0x29/0x90 [ 76.064466] [<ffffffffa0596c5e>] drm_pci_exit+0x9e/0xb0 [drm] [ 76.070507] [<ffffffffa0796d71>] amdgpu_exit+0x1c/0x32 [amdgpu] [ 76.076609] [<ffffffff81104810>] SyS_delete_module+0x1a0/0x200 [ 76.082627] [<ffffffff810e2b1a>] ? rcu_eqs_enter.isra.36+0x4a/0x50 [ 76.089001] [<ffffffff8100392e>] do_syscall_64+0x6e/0x180 [ 76.094583] [<ffffffff817e1d2f>] entry_SYSCALL64_slow_path+0x25/0x25 [ 76.101114] Code: 94 c0 c3 31 c0 5d c3 0f 1f 40 00 0f 1f 44 00 00 55 31 c0 48 89 e5 53 48 89 fb 48 c7 c7 1d 21 84 a0 e8 ab 77 b3 e0 e8 fc 8b d7 e0 <48> 8b bb c0 31 00 00 48 85 ff 74 09 e8 ff eb fc ff 85 c0 75 03 [ 76.121432] RIP [<ffffffffa064a0e0>] amdgpu_fbdev_restore_mode+0x20/0x60 [amdgpu] Signed-off-by: Rex Zhu <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>

… preemptibility bug With CONFIG_DEBUG_PREEMPT enabled, I get: BUG: using smp_processor_id() in preemptible [00000000] code: swapper/0/1 caller is debug_smp_processor_id CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.12.0-rc2+ #2 Call Trace: dump_stack check_preemption_disabled debug_smp_processor_id save_microcode_in_initrd_amd ? microcode_init save_microcode_in_initrd ... because, well, it says it above, we're using smp_processor_id() in preemptible code. But passing the CPU number is not really needed. It is only used to determine whether we're on the BSP, and, if so, to save the microcode patch for early loading. [ We don't absolutely need to do it on the BSP but we do that customarily there. ] Instead, convert that function parameter to a boolean which denotes whether the patch should be saved or not, thereby avoiding the use of smp_processor_id() in preemptible code. Signed-off-by: Borislav Petkov <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>

Since the introduction of .init_rq_fn() and .exit_rq_fn() it is essential that the memory allocated for struct request_queue stays around until all blk_exit_rl() calls have finished. Hence make blk_init_rl() take a reference on struct request_queue. This patch fixes the following crash: general protection fault: 0000 [#2] SMP CPU: 3 PID: 28 Comm: ksoftirqd/3 Tainted: G D 4.12.0-rc2-dbg+ #2 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014 task: ffff88013a108040 task.stack: ffffc9000071c000 RIP: 0010:free_request_size+0x1a/0x30 RSP: 0018:ffffc9000071fd38 EFLAGS: 00010202 RAX: 6b6b6b6b6b6b6b6b RBX: ffff880067362a88 RCX: 0000000000000003 RDX: ffff880067464178 RSI: ffff880067362a88 RDI: ffff880135ea4418 RBP: ffffc9000071fd40 R08: 0000000000000000 R09: 0000000100180009 R10: ffffc9000071fd38 R11: ffffffff81110800 R12: ffff88006752d3d8 R13: ffff88006752d3d8 R14: ffff88013a108040 R15: 000000000000000a FS: 0000000000000000(0000) GS:ffff88013fd80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fa8ec1edb00 CR3: 0000000138ee8000 CR4: 00000000001406e0 Call Trace: mempool_destroy.part.10+0x21/0x40 mempool_destroy+0xe/0x10 blk_exit_rl+0x12/0x20 blkg_free+0x4d/0xa0 __blkg_release_rcu+0x59/0x170 rcu_process_callbacks+0x260/0x4e0 __do_softirq+0x116/0x250 smpboot_thread_fn+0x123/0x1e0 kthread+0x109/0x140 ret_from_fork+0x31/0x40 Fixes: commit e9c787e ("scsi: allocate scsi_cmnd structures as part of struct request") Signed-off-by: Bart Van Assche <[email protected]> Acked-by: Tejun Heo <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Cc: Jan Kara <[email protected]> Cc: <[email protected]> # v4.11+ Signed-off-by: Jens Axboe <[email protected]>

…nt() In case oldtrig == trig == NULL (which happens when we set none trigger, when there is already none set) there is a NULL pointer dereference during iio_trigger_put(trig). Below is kernel output when this occurs: [ 26.741790] Unable to handle kernel NULL pointer dereference at virtual address 00000000 [ 26.750179] pgd = cacc0000 [ 26.752936] [00000000] *pgd=8adc6835, *pte=00000000, *ppte=00000000 [ 26.759531] Internal error: Oops: 17 [#1] SMP ARM [ 26.764261] Modules linked in: usb_f_ncm u_ether usb_f_acm u_serial usb_f_fs libcomposite configfs evbug [ 26.773844] CPU: 0 PID: 152 Comm: synchro Not tainted 4.12.0-rc1 #2 [ 26.780128] Hardware name: Freescale i.MX6 Ultralite (Device Tree) [ 26.786329] task: cb1de200 task.stack: cac92000 [ 26.790892] PC is at iio_trigger_write_current+0x188/0x1f4 [ 26.796403] LR is at lock_release+0xf8/0x20c [ 26.800696] pc : [<c0736f34>] lr : [<c016efb0>] psr: 600d0013 [ 26.800696] sp : cac93e30 ip : cac93db0 fp : cac93e5c [ 26.812193] r10: c0e64fe8 r9 : 00000000 r8 : 00000001 [ 26.817436] r7 : cb190810 r6 : 00000010 r5 : 00000001 r4 : 00000000 [ 26.823982] r3 : 00000000 r2 : 00000000 r1 : cb1de200 r0 : 00000000 [ 26.830528] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none [ 26.837683] Control: 10c5387d Table: 8acc006a DAC: 00000051 [ 26.843448] Process synchro (pid: 152, stack limit = 0xcac92210) [ 26.849475] Stack: (0xcac93e30 to 0xcac94000) [ 26.853857] 3e20: 00000001 c0736dac c054033c cae6b680 [ 26.862060] 3e40: cae6b680 00000000 00000001 cb3f8610 cac93e74 cac93e60 c054035c c0736db8 [ 26.870264] 3e60: 00000001 c054033c cac93e94 cac93e78 c029bf34 c0540348 00000000 00000000 [ 26.878469] 3e80: cb3f8600 cae6b680 cac93ed4 cac93e98 c029b320 c029bef0 00000000 00000000 [ 26.886672] 3ea0: 00000000 cac93f78 cb2d41fc caed3280 c029b214 cac93f78 00000001 000e20f8 [ 26.894874] 3ec0: 00000001 00000000 cac93f44 cac93ed8 c0221dcc c029b220 c0e1ca39 cb2d41fc [ 26.903079] 3ee0: cac93f04 cac93ef0 c0183ef0 c0183ab0 cb2d41fc 00000000 cac93f44 cac93f08 [ 26.911282] 3f00: c0225eec c0183ebc 00000001 00000000 c0223728 00000000 c0245454 00000001 [ 26.919485] 3f20: 00000001 caed3280 000e20f8 cac93f78 000e20f8 00000001 cac93f74 cac93f48 [ 26.927690] 3f40: c0223680 c0221da4 c0246520 c0245460 caed3283 caed3280 00000000 00000000 [ 26.935893] 3f60: 000e20f8 00000001 cac93fa4 cac93f78 c0224520 c02235e4 00000000 00000000 [ 26.944096] 3f80: 00000001 000e20f8 00000001 00000004 c0107f84 cac92000 00000000 cac93fa8 [ 26.952299] 3fa0: c0107de0 c02244e8 00000001 000e20f8 0000000e 000e20f8 00000001 fbad2484 [ 26.960502] 3fc0: 00000001 000e20f8 00000001 00000004 beb6b698 00064260 0006421c beb6b4b4 [ 26.968705] 3fe0: 00000000 beb6b450 b6f219a0 b6e2f268 800d0010 0000000e cac93ff4 cac93ffc [ 26.976896] Backtrace: [ 26.979388] [<c0736dac>] (iio_trigger_write_current) from [<c054035c>] (dev_attr_store+0x20/0x2c) [ 26.988289] r10:cb3f8610 r9:00000001 r8:00000000 r7:cae6b680 r6:cae6b680 r5:c054033c [ 26.996138] r4:c0736dac r3:00000001 [ 26.999747] [<c054033c>] (dev_attr_store) from [<c029bf34>] (sysfs_kf_write+0x50/0x54) [ 27.007686] r5:c054033c r4:00000001 [ 27.011290] [<c029bee4>] (sysfs_kf_write) from [<c029b320>] (kernfs_fop_write+0x10c/0x224) [ 27.019579] r7:cae6b680 r6:cb3f8600 r5:00000000 r4:00000000 [ 27.025271] [<c029b214>] (kernfs_fop_write) from [<c0221dcc>] (__vfs_write+0x34/0x120) [ 27.033214] r10:00000000 r9:00000001 r8:000e20f8 r7:00000001 r6:cac93f78 r5:c029b214 [ 27.041059] r4:caed3280 [ 27.043622] [<c0221d98>] (__vfs_write) from [<c0223680>] (vfs_write+0xa8/0x170) [ 27.050959] r9:00000001 r8:000e20f8 r7:cac93f78 r6:000e20f8 r5:caed3280 r4:00000001 [ 27.058731] [<c02235d8>] (vfs_write) from [<c0224520>] (SyS_write+0x44/0x98) [ 27.065806] r9:00000001 r8:000e20f8 r7:00000000 r6:00000000 r5:caed3280 r4:caed3283 [ 27.073582] [<c02244dc>] (SyS_write) from [<c0107de0>] (ret_fast_syscall+0x0/0x1c) [ 27.081179] r9:cac92000 r8:c0107f84 r7:00000004 r6:00000001 r5:000e20f8 r4:00000001 [ 27.088947] Code: 1a000009 e1a04009 e3a06010 e1a05008 (e5943000) [ 27.095244] ---[ end trace 06d1dab86d6e6bab ]--- To fix that problem call iio_trigger_put(trig) only when trig is not NULL. Fixes: d5d24bc ("iio: trigger: close race condition in acquiring trigger reference") Signed-off-by: Marcin Niestroj <[email protected]> Cc: <[email protected]> Signed-off-by: Jonathan Cameron <[email protected]>

'perf annotate' is dropping the cr* fields from branch instructions. Fix it by adding support to display branch instructions having multiple operands. Power Arch objdump of int_sqrt: 20.36 | c0000000004d2694: subf r10,r10,r3 | c0000000004d2698: v bgt cr6,c0000000004d26a0 <int_sqrt+0x40> 1.82 | c0000000004d269c: mr r3,r10 29.18 | c0000000004d26a0: mr r10,r8 | c0000000004d26a4: v bgt cr7,c0000000004d26ac <int_sqrt+0x4c> | c0000000004d26a8: mr r10,r7 Power Arch Before Patch: 20.36 | subf r10,r10,r3 | v bgt 40 1.82 | mr r3,r10 29.18 | 40: mr r10,r8 | v bgt 4c | mr r10,r7 Power Arch After patch: 20.36 | subf r10,r10,r3 | v bgt cr6,40 1.82 | mr r3,r10 29.18 | 40: mr r10,r8 | v bgt cr7,4c | mr r10,r7 Also support AArch64 conditional branch instructions, which can have up to three operands: Aarch64 Non-simplified (raw objdump) view: │ffff0000083cd11c: ↑ cbz w0, ffff0000083cd100 <security_fil▒ ... 4.44 │ffff000│083cd134: ↓ tbnz w0, #26, ffff0000083cd190 <securit▒ ... 1.37 │ffff000│083cd144: ↓ tbnz w22, #5, ffff0000083cd1a4 <securit▒ │ffff000│083cd148: mov w19, #0x20000 //▒ 1.02 │ffff000│083cd14c: ↓ tbz w22, #2, ffff0000083cd1ac <securit▒ ... 0.68 │ffff000└──3cd16c: ↑ cbnz w0, ffff0000083cd120 <security_fil▒ Aarch64 Simplified, before this patch: │ ↑ cbz 40 ... 4.44 │ │↓ tbnz w0, #26, ffff0000083cd190 <security_file_permiss▒ ... 1.37 │ │↓ tbnz w22, #5, ffff0000083cd1a4 <security_file_permiss▒ │ │ mov w19, #0x20000 // #131072 1.02 │ │↓ tbz w22, #2, ffff0000083cd1ac <security_file_permiss▒ ... 0.68 │ └──cbnz 60 the cbz operand is missing, and the tbz doesn't get simplified processing at all because the parsing function failed to match an address. Aarch64 Simplified, After this patch applied: │ ↑ cbz w0, 40 ... 4.44 │ │↓ tbnz w0, #26, d0 ... 1.37 │ │↓ tbnz w22, #5, e4 │ │ mov w19, #0x20000 // #131072 1.02 │ │↓ tbz w22, #2, ec ... 0.68 │ └──cbnz w0, 60 Originally-by: Ravi Bangoria <[email protected]> Tested-by: Ravi Bangoria <[email protected]> Reported-by: Anton Blanchard <[email protected]> Reported-by: Robin Murphy <[email protected]> Signed-off-by: Kim Phillips <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Taeung Song <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

Yuval Mintz says: ==================== bnx2x: Fix malicious VFs indication It was discovered that for a VF there's a simple [yet uncommon] scenario which would cause device firmware to declare that VF as malicious - Add a vlan interface on top of a VF and disable txvlan offloading for that VF [causing VF to transmit packets where vlan is on payload]. Patch #1 corrects driver transmission to prevent this issue. Patch #2 is a by-product correcting PF behavior once a VF is declared malicious. ==================== Signed-off-by: David S. Miller <[email protected]>

When HSR interface is setup using ip link command, an annoying warning appears with the trace as below:- [ 203.019828] hsr_get_node: Non-HSR frame [ 203.019833] Modules linked in: [ 203.019848] CPU: 0 PID: 158 Comm: sd-resolve Tainted: G W 4.12.0-rc3-00052-g9fa6bf70 #2 [ 203.019853] Hardware name: Generic DRA74X (Flattened Device Tree) [ 203.019869] [<c0110280>] (unwind_backtrace) from [<c010c2f4>] (show_stack+0x10/0x14) [ 203.019880] [<c010c2f4>] (show_stack) from [<c04b9f64>] (dump_stack+0xac/0xe0) [ 203.019894] [<c04b9f64>] (dump_stack) from [<c01374e8>] (__warn+0xd8/0x104) [ 203.019907] [<c01374e8>] (__warn) from [<c0137548>] (warn_slowpath_fmt+0x34/0x44) root@am57xx-evm:~# [ 203.019921] [<c0137548>] (warn_slowpath_fmt) from [<c081126c>] (hsr_get_node+0x148/0x170) [ 203.019932] [<c081126c>] (hsr_get_node) from [<c0814240>] (hsr_forward_skb+0x110/0x7c0) [ 203.019942] [<c0814240>] (hsr_forward_skb) from [<c0811d64>] (hsr_dev_xmit+0x2c/0x34) [ 203.019954] [<c0811d64>] (hsr_dev_xmit) from [<c06c0828>] (dev_hard_start_xmit+0xc4/0x3bc) [ 203.019963] [<c06c0828>] (dev_hard_start_xmit) from [<c06c13d8>] (__dev_queue_xmit+0x7c4/0x98c) [ 203.019974] [<c06c13d8>] (__dev_queue_xmit) from [<c0782f54>] (ip6_finish_output2+0x330/0xc1c) [ 203.019983] [<c0782f54>] (ip6_finish_output2) from [<c0788f0c>] (ip6_output+0x58/0x454) [ 203.019994] [<c0788f0c>] (ip6_output) from [<c07b16cc>] (mld_sendpack+0x420/0x744) As this is an expected path to hsr_get_node() with frame coming from the master interface, add a check to ensure packet is not from the master port and then warn. Signed-off-by: Murali Karicheri <[email protected]> Signed-off-by: David S. Miller <[email protected]>

Every kernel build on x86 will result in some output: Setup is 13084 bytes (padded to 13312 bytes). System is 4833 kB CRC 6d35fa35 Kernel: arch/x86/boot/bzImage is ready (#2) This shuts it up, so that 'make -s' is truely silent as long as everything works. Building without '-s' should produce unchanged output. Signed-off-by: Arnd Bergmann <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Matt Fleming <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>

Allocating the crashkernel region outside lowmem causes the kernel to oops while trying to kexec into the new kernel: Loading crashdump kernel... Unable to handle kernel NULL pointer dereference at virtual address 00000000 pgd = edd70000 [00000000] *pgd=de19e835 Internal error: Oops: 817 [#2] SMP ARM Modules linked in: ... CPU: 0 PID: 689 Comm: sh Not tainted 4.12.0-rc3-next-20170601-04015-gc3a5a20 Hardware name: Generic DRA74X (Flattened Device Tree) task: edb32f00 task.stack: edf18000 PC is at memcpy+0x50/0x330 LR is at 0xe3c34001 pc : [<c04baf30>] lr : [<e3c34001>] psr: 800c0193 sp : edf19c2c ip : 0a000001 fp : c0553170 r10: c055316e r9 : 00000001 r8 : e3130001 r7 : e4903004 r6 : 0a000014 r5 : e3500000 r4 : e59f106c r3 : e59f0074 r2 : ffffffe8 r1 : c010fb88 r0 : 00000000 Flags: Nzcv IRQs off FIQs on Mode SVC_32 ISA ARM Segment none Control: 10c5387d Table: add7006a DAC: 00000051 Process sh (pid: 689, stack limit = 0xedf18218) Stack: (0xedf19c2c to 0xedf1a000) ... [<c04baf30>] (memcpy) from [<c010fae0>] (machine_kexec+0xa8/0x12c) [<c010fae0>] (machine_kexec) from [<c01e4104>] (__crash_kexec+0x5c/0x98) [<c01e4104>] (__crash_kexec) from [<c01e419c>] (crash_kexec+0x5c/0x68) [<c01e419c>] (crash_kexec) from [<c010c5c0>] (die+0x228/0x490) [<c010c5c0>] (die) from [<c011e520>] (__do_kernel_fault.part.0+0x54/0x1e4) [<c011e520>] (__do_kernel_fault.part.0) from [<c082412c>] (do_page_fault+0x1e8/0x400) [<c082412c>] (do_page_fault) from [<c010135c>] (do_DataAbort+0x38/0xb8) [<c010135c>] (do_DataAbort) from [<c0823584>] (__dabt_svc+0x64/0xa0) This is caused by image->control_code_page being a highmem page, so page_address(image->control_code_page) returns NULL. In any case, we don't want the control page to be a highmem page. We already limit the crash kernel region to the top of 32-bit physical memory space. Also limit it to the top of lowmem in physical space. Reported-by: Keerthy <[email protected]> Tested-by: Keerthy <[email protected]> Signed-off-by: Russell King <[email protected]>

Subsystem migration methods shouldn't be called for empty migrations. cgroup_migrate_execute() implements this guarantee by bailing early if there are no source css_sets. This used to be correct before a79a908 ("cgroup: introduce cgroup namespaces"), but no longer since the commit because css_sets can stay pinned without tasks in them. This caused cgroup_migrate_execute() call into cpuset migration methods with an empty cgroup_taskset. cpuset migration methods correctly assume that cgroup_taskset_first() never returns NULL; however, due to the bug, it can, leading to the following oops. Unable to handle kernel paging request for data at address 0x00000960 Faulting instruction address: 0xc0000000001d6868 Oops: Kernel access of bad area, sig: 11 [#1] ... CPU: 14 PID: 16947 Comm: kworker/14:0 Tainted: G W 4.12.0-rc4-next-20170609 #2 Workqueue: events cpuset_hotplug_workfn task: c00000000ca60580 task.stack: c00000000c728000 NIP: c0000000001d6868 LR: c0000000001d6858 CTR: c0000000001d6810 REGS: c00000000c72b720 TRAP: 0300 Tainted: GW (4.12.0-rc4-next-20170609) MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 44722422 XER: 20000000 CFAR: c000000000008710 DAR: 0000000000000960 DSISR: 40000000 SOFTE: 1 GPR00: c0000000001d6858 c00000000c72b9a0 c000000001536e00 0000000000000000 GPR04: c00000000c72b9c0 0000000000000000 c00000000c72bad0 c000000766367678 GPR08: c000000766366d10 c00000000c72b958 c000000001736e00 0000000000000000 GPR12: c0000000001d6810 c00000000e749300 c000000000123ef8 c000000775af4180 GPR16: 0000000000000000 0000000000000000 c00000075480e9c0 c00000075480e9e0 GPR20: c00000075480e8c0 0000000000000001 0000000000000000 c00000000c72ba20 GPR24: c00000000c72baa0 c00000000c72bac0 c000000001407248 c00000000c72ba20 GPR28: c00000000141fc80 c00000000c72bac0 c00000000c6bc790 0000000000000000 NIP [c0000000001d6868] cpuset_can_attach+0x58/0x1b0 LR [c0000000001d6858] cpuset_can_attach+0x48/0x1b0 Call Trace: [c00000000c72b9a0] [c0000000001d6858] cpuset_can_attach+0x48/0x1b0 (unreliable) [c00000000c72ba00] [c0000000001cbe80] cgroup_migrate_execute+0xb0/0x450 [c00000000c72ba80] [c0000000001d3754] cgroup_transfer_tasks+0x1c4/0x360 [c00000000c72bba0] [c0000000001d923c] cpuset_hotplug_workfn+0x86c/0xa20 [c00000000c72bca0] [c00000000011aa44] process_one_work+0x1e4/0x580 [c00000000c72bd30] [c00000000011ae78] worker_thread+0x98/0x5c0 [c00000000c72bdc0] [c000000000124058] kthread+0x168/0x1b0 [c00000000c72be30] [c00000000000b2e8] ret_from_kernel_thread+0x5c/0x74 Instruction dump: f821ffa1 7c7d1b78 60000000 60000000 38810020 7fa3eb78 3f42ffed 4bff4c25 60000000 3b5a0448 3d420020 eb610020 <e9230960> 7f43d378 e9290000 f92af200 ---[ end trace dcaaf98fb36d9e64 ]--- This patch fixes the bug by adding an explicit nr_tasks counter to cgroup_taskset and skipping calling the migration methods if the counter is zero. While at it, remove the now spurious check on no source css_sets. Signed-off-by: Tejun Heo <[email protected]> Reported-and-tested-by: Abdul Haleem <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: [email protected] # v4.6+ Fixes: a79a908 ("cgroup: introduce cgroup namespaces") Link: http://lkml.kernel.org/r/[email protected]

Andre Wild reported the following warning: WARNING: CPU: 2 PID: 1205 at kernel/cpu.c:240 lockdep_assert_cpus_held+0x4c/0x60 Modules linked in: CPU: 2 PID: 1205 Comm: bash Not tainted 4.13.0-rc2-00022-gfd2b2c57ec20 #10 Hardware name: IBM 2964 N96 702 (z/VM 6.4.0) task: 00000000701d8100 task.stack: 0000000073594000 Krnl PSW : 0704f00180000000 0000000000145e24 (lockdep_assert_cpus_held+0x4c/0x60) ... Call Trace: lockdep_assert_cpus_held+0x42/0x60) stop_machine_cpuslocked+0x62/0xf0 build_all_zonelists+0x92/0x150 numa_zonelist_order_handler+0x102/0x150 proc_sys_call_handler.isra.12+0xda/0x118 proc_sys_write+0x34/0x48 __vfs_write+0x3c/0x178 vfs_write+0xbc/0x1a0 SyS_write+0x66/0xc0 system_call+0xc4/0x2b0 locks held by bash/1205: #0: (sb_writers#4){.+.+.+}, at: vfs_write+0xa6/0x1a0 #1: (zl_order_mutex){+.+...}, at: numa_zonelist_order_handler+0x44/0x150 #2: (zonelists_mutex){+.+...}, at: numa_zonelist_order_handler+0xf4/0x150 Last Breaking-Event-Address: lockdep_assert_cpus_held+0x48/0x60 This can be easily triggered with e.g. echo n > /proc/sys/vm/numa_zonelist_order In commit 3f906ba ("mm/memory-hotplug: switch locking to a percpu rwsem") memory hotplug locking was changed to fix a potential deadlock. This also switched the stop_machine() invocation within build_all_zonelists() to stop_machine_cpuslocked() which now expects that online cpus are locked when being called. This assumption is not true if build_all_zonelists() is being called from numa_zonelist_order_handler(). In order to fix this simply add a mem_hotplug_begin()/mem_hotplug_done() pair to numa_zonelist_order_handler(). Link: http://lkml.kernel.org/r/[email protected] Fixes: 3f906ba ("mm/memory-hotplug: switch locking to a percpu rwsem") Signed-off-by: Heiko Carstens <[email protected]> Reported-by: Andre Wild <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>

This patch fixes a bug associated with iscsit_reset_np_thread() that can occur during parallel configfs rmdir of a single iscsi_np used across multiple iscsi-target instances, that would result in hung task(s) similar to below where configfs rmdir process context was blocked indefinately waiting for iscsi_np->np_restart_comp to finish: [ 6726.112076] INFO: task dcp_proxy_node_:15550 blocked for more than 120 seconds. [ 6726.119440] Tainted: G W O 4.1.26-3321 #2 [ 6726.125045] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 6726.132927] dcp_proxy_node_ D ffff8803f202bc88 0 15550 1 0x00000000 [ 6726.140058] ffff8803f202bc88 ffff88085c64d960 ffff88083b3b1ad0 ffff88087fffeb08 [ 6726.147593] ffff8803f202c000 7fffffffffffffff ffff88083f459c28 ffff88083b3b1ad0 [ 6726.155132] ffff88035373c100 ffff8803f202bca8 ffffffff8168ced2 ffff8803f202bcb8 [ 6726.162667] Call Trace: [ 6726.165150] [<ffffffff8168ced2>] schedule+0x32/0x80 [ 6726.170156] [<ffffffff8168f5b4>] schedule_timeout+0x214/0x290 [ 6726.176030] [<ffffffff810caef2>] ? __send_signal+0x52/0x4a0 [ 6726.181728] [<ffffffff8168d7d6>] wait_for_completion+0x96/0x100 [ 6726.187774] [<ffffffff810e7c80>] ? wake_up_state+0x10/0x10 [ 6726.193395] [<ffffffffa035d6e2>] iscsit_reset_np_thread+0x62/0xe0 [iscsi_target_mod] [ 6726.201278] [<ffffffffa0355d86>] iscsit_tpg_disable_portal_group+0x96/0x190 [iscsi_target_mod] [ 6726.210033] [<ffffffffa0363f7f>] lio_target_tpg_store_enable+0x4f/0xc0 [iscsi_target_mod] [ 6726.218351] [<ffffffff81260c5a>] configfs_write_file+0xaa/0x110 [ 6726.224392] [<ffffffff811ea364>] vfs_write+0xa4/0x1b0 [ 6726.229576] [<ffffffff811eb111>] SyS_write+0x41/0xb0 [ 6726.234659] [<ffffffff8169042e>] system_call_fastpath+0x12/0x71 It would happen because each iscsit_reset_np_thread() sets state to ISCSI_NP_THREAD_RESET, sends SIGINT, and then blocks waiting for completion on iscsi_np->np_restart_comp. However, if iscsi_np was active processing a login request and more than a single iscsit_reset_np_thread() caller to the same iscsi_np was blocked on iscsi_np->np_restart_comp, iscsi_np kthread process context in __iscsi_target_login_thread() would flush pending signals and only perform a single completion of np->np_restart_comp before going back to sleep within transport specific iscsit_transport->iscsi_accept_np code. To address this bug, add a iscsi_np->np_reset_count and update __iscsi_target_login_thread() to keep completing np->np_restart_comp until ->np_reset_count has reached zero. Reported-by: Gary Guo <[email protected]> Tested-by: Gary Guo <[email protected]> Cc: Mike Christie <[email protected]> Cc: Hannes Reinecke <[email protected]> Cc: [email protected] # 3.10+ Signed-off-by: Nicholas Bellinger <[email protected]>

Dean Jenkins says: ==================== asix: Improve robustness Please consider taking these patches to improve the robustness of the ASIX USB to Ethernet driver. Failures prompting an ASIX driver code review ============================================= On an ARM i.MX6 embedded platform some strange one-off and two-off failures were observed in and around the ASIX USB to Ethernet driver. This was observed on a highly modified kernel 3.14 with the ASIX driver containing back-ported changes from kernel.org up to kernel 4.8 approximately. a) A one-off failure in asix_rx_fixup_internal(): There was an occurrence of an attempt to write off the end of the netdev buffer which was trapped by skb_over_panic() in skb_put(). [20030.846440] skbuff: skb_over_panic: text:7f2271c0 len:120 put:60 head:8366ecc0 data:8366ed02 tail:0x8366ed7a end:0x8366ed40 dev:eth0 [20030.863007] Kernel BUG at 8044ce38 [verbose debug info unavailable] [20031.215345] Backtrace: [20031.217884] [<8044cde0>] (skb_panic) from [<8044d50c>] (skb_put+0x50/0x5c) [20031.227408] [<8044d4bc>] (skb_put) from [<7f2271c0>] (asix_rx_fixup_internal+0x1c4/0x23c [asix]) [20031.242024] [<7f226ffc>] (asix_rx_fixup_internal [asix]) from [<7f22724c>] (asix_rx_fixup_common+0x14/0x18 [asix]) [20031.260309] [<7f227238>] (asix_rx_fixup_common [asix]) from [<7f21f7d4>] (usbnet_bh+0x74/0x224 [usbnet]) [20031.269879] [<7f21f760>] (usbnet_bh [usbnet]) from [<8002f834>] (call_timer_fn+0xa4/0x1f0) [20031.283961] [<8002f790>] (call_timer_fn) from [<80030834>] (run_timer_softirq+0x230/0x2a8) [20031.302782] [<80030604>] (run_timer_softirq) from [<80028780>] (__do_softirq+0x15c/0x37c) [20031.321511] [<80028624>] (__do_softirq) from [<80028c38>] (irq_exit+0x8c/0xe8) [20031.339298] [<80028bac>] (irq_exit) from [<8000e9c8>] (handle_IRQ+0x8c/0xc8) [20031.350038] [<8000e93c>] (handle_IRQ) from [<800085c8>] (gic_handle_irq+0xb8/0xf8) [20031.365528] [<80008510>] (gic_handle_irq) from [<8050de80>] (__irq_svc+0x40/0x70) Analysis of the logic of the ASIX driver (containing backported changes from kernel.org up to kernel 4.8 approximately) suggested that the software could not trigger skb_over_panic(). The analysis of the kernel BUG() crash information suggested that the netdev buffer was written with 2 minimal 60 octet length Ethernet frames (ASIX hardware drops the 4 octet FCS field) and the 2nd Ethernet frame attempted to write off the end of the netdev buffer. Note that the netdev buffer should only contain 1 Ethernet frame so if an attempt to write 2 Ethernet frames into the buffer is made then that is wrong. However, the logic of the asix_rx_fixup_internal() only allows 1 Ethernet frame to be written into the netdev buffer. Potentially this failure was due to memory corruption because it was only seen once. b) Two-off failures in the NAPI layer's backlog queue: There were 2 crashes in the NAPI layer's backlog queue presumably after asix_rx_fixup_internal() called usbnet_skb_return(). [24097.273945] Unable to handle kernel NULL pointer dereference at virtual address 00000004 [24097.398944] PC is at process_backlog+0x80/0x16c [24097.569466] Backtrace: [24097.572007] [<8045ad98>] (process_backlog) from [<8045b64c>] (net_rx_action+0xcc/0x248) [24097.591631] [<8045b580>] (net_rx_action) from [<80028780>] (__do_softirq+0x15c/0x37c) [24097.610022] [<80028624>] (__do_softirq) from [<800289cc>] (run_ksoftirqd+0x2c/0x84) and [ 1059.828452] Unable to handle kernel NULL pointer dereference at virtual address 00000000 [ 1059.953715] PC is at process_backlog+0x84/0x16c [ 1060.140896] Backtrace: [ 1060.143434] [<8045ad98>] (process_backlog) from [<8045b64c>] (net_rx_action+0xcc/0x248) [ 1060.163075] [<8045b580>] (net_rx_action) from [<80028780>] (__do_softirq+0x15c/0x37c) [ 1060.181474] [<80028624>] (__do_softirq) from [<80028c38>] (irq_exit+0x8c/0xe8) [ 1060.199256] [<80028bac>] (irq_exit) from [<8000e9c8>] (handle_IRQ+0x8c/0xc8) [ 1060.210006] [<8000e93c>] (handle_IRQ) from [<800085c8>] (gic_handle_irq+0xb8/0xf8) [ 1060.225492] [<80008510>] (gic_handle_irq) from [<8050de80>] (__irq_svc+0x40/0x70) The embedded board was only using an ASIX USB to Ethernet adaptor eth0. Analysis suggested that the doubly-linked list pointers of the backlog queue had been corrupted because one of the link pointers was NULL. Potentially this failure was due to memory corruption because it was only seen twice. Results of the ASIX driver code review ====================================== During the code review some weaknesses were observed in the ASIX driver and the following patches have been created to improve the robustness. Brief overview of the patches ----------------------------- 1. asix: Add rx->ax_skb = NULL after usbnet_skb_return() The current ASIX driver sends the received Ethernet frame to the NAPI layer of the network stack via the call to usbnet_skb_return() in asix_rx_fixup_internal() but retains the rx->ax_skb pointer to the netdev buffer. The driver no longer needs the rx->ax_skb pointer at this point because the NAPI layer now has the Ethernet frame. This means that asix_rx_fixup_internal() must not use rx->ax_skb after the call to usbnet_skb_return() because it could corrupt the handling of the Ethernet frame within the network layer. Therefore, to remove the risk of erroneous usage of rx->ax_skb, set rx->ax_skb to NULL after the call to usbnet_skb_return(). This avoids potential erroneous freeing of rx->ax_skb and erroneous writing to the netdev buffer. If the software now somehow inappropriately reused rx->ax_skb, then a NULL pointer dereference of rx->ax_skb would occur which makes investigation easier. 2. asix: Ensure asix_rx_fixup_info members are all reset This patch creates reset_asix_rx_fixup_info() to allow all the asix_rx_fixup_info structure members to be consistently reset to initial conditions. Call reset_asix_rx_fixup_info() upon each detectable error condition so that the next URB is processed from a known state. Otherwise, there is a risk that some members of the asix_rx_fixup_info structure may be incorrect after an error occurred so potentially leading to a malfunction. 3. asix: Fix small memory leak in ax88772_unbind() This patch creates asix_rx_fixup_common_free() to allow the rx->ax_skb to be freed when necessary. asix_rx_fixup_common_free() is called from ax88772_unbind() before the parent private data structure is freed. Without this patch, there is a risk of a small netdev buffer memory leak each time ax88772_unbind() is called during the reception of an Ethernet frame that spans across 2 URBs. Testing ======= The patches have been sanity tested on a 64-bit Linux laptop running kernel 4.13-rc2 with the 3 patches applied on top. The ASIX USB to Adaptor used for testing was (output of lsusb): ID 0b95:772b ASIX Electronics Corp. AX88772B Test #1 ------- The test ran a flood ping test script which slowly incremented the ICMP Echo Request's payload from 0 to 5000 octets. This eventually causes IPv4 fragmentation to occur which causes Ethernet frames to be sent very close to each other so increases the probability that an Ethernet frame will span 2 URBs. The test showed that all pings were successful. The test took about 15 minutes to complete. Test #2 ------- A script was run on the laptop to periodically run ifdown and ifup every second so that the ASIX USB to Adaptor was up for 1 second and down for 1 second. From a Linux PC connected to the laptop, the following ping command was used ping -f -s 5000 <ip address of laptop> The large ICMP payload causes IPv4 fragmentation resulting in multiple Ethernet frames per original IP packet. Kernel debug within the ASIX driver was enabled to see whether any ASIX errors were generated. The test was run for about 24 hours and no ASIX errors were seen. Patches ======= The 3 patches have been rebased off the net-next repo master branch with HEAD fbbeefd net: fec: Allow reception of frames bigger than 1522 bytes ==================== Signed-off-by: David S. Miller <[email protected]>

MEI device performs link reset during system suspend sequence. The link reset cannot be performed while device is in runtime suspend state. The resume sequence is bypassed with suspend direct complete optimization,so the optimization should be disabled for mei devices. Fixes: [ 192.940537] Restarting tasks ... [ 192.940610] PGI is not set [ 192.940619] ------------[ cut here ]------------ [ 192.940623] WARNING: CPU: 0 me.c:653 mei_me_pg_exit_sync+0x351/0x360 [ 192.940624] Modules linked in: [ 192.940627] CPU: 0 PID: 1661 Comm: kworker/0:3 Not tainted 4.13.0-rc2+ #2 [ 192.940628] Hardware name: Dell Inc. XPS 13 9343/0TM99H, BIOS A11 12/08/2016 [ 192.940630] Workqueue: pm pm_runtime_work <snip> [ 192.940642] Call Trace: [ 192.940646] ? pci_pme_active+0x1de/0x1f0 [ 192.940649] ? pci_restore_standard_config+0x50/0x50 [ 192.940651] ? kfree+0x172/0x190 [ 192.940653] ? kfree+0x172/0x190 [ 192.940655] ? pci_restore_standard_config+0x50/0x50 [ 192.940663] mei_me_pm_runtime_resume+0x3f/0xc0 [ 192.940665] pci_pm_runtime_resume+0x7a/0xa0 [ 192.940667] __rpm_callback+0xb9/0x1e0 [ 192.940668] ? preempt_count_add+0x6d/0xc0 [ 192.940670] rpm_callback+0x24/0x90 [ 192.940672] ? pci_restore_standard_config+0x50/0x50 [ 192.940674] rpm_resume+0x4e8/0x800 [ 192.940676] pm_runtime_work+0x55/0xb0 [ 192.940678] process_one_work+0x184/0x3e0 [ 192.940680] worker_thread+0x4d/0x3a0 [ 192.940681] ? preempt_count_sub+0x9b/0x100 [ 192.940683] kthread+0x122/0x140 [ 192.940684] ? process_one_work+0x3e0/0x3e0 [ 192.940685] ? __kthread_create_on_node+0x1a0/0x1a0 [ 192.940688] ret_from_fork+0x27/0x40 [ 192.940690] Code: 96 3a 9e ff 48 8b 7d 98 e8 cd 21 58 00 83 bb bc 01 00 00 04 0f 85 40 fe ff ff e9 41 fe ff ff 48 c7 c7 5f 04 99 96 e8 93 6b 9f ff <0f> ff e9 5d fd ff ff e8 33 fe 99 ff 0f 1f 00 0f 1f 44 00 00 55 [ 192.940719] ---[ end trace a86955597774ead8 ]--- [ 192.942540] done. Suggested-by: Rafael J. Wysocki <[email protected]> Reported-by: Dominik Brodowski <[email protected]> Cc: Rafael J. Wysocki <[email protected]> Cc: Dominik Brodowski <[email protected]> Signed-off-by: Alexander Usyskin <[email protected]> Signed-off-by: Tomas Winkler <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

pfkey_broadcast() might be called from non process contexts, we can not use GFP_KERNEL in these cases [1]. This patch partially reverts commit ba51b6b ("net: Fix RCU splat in af_key"), only keeping the GFP_ATOMIC forcing under rcu_read_lock() section. [1] : syzkaller reported : in_atomic(): 1, irqs_disabled(): 0, pid: 2932, name: syzkaller183439 3 locks held by syzkaller183439/2932: #0: (&net->xfrm.xfrm_cfg_mutex){+.+.+.}, at: [<ffffffff83b43888>] pfkey_sendmsg+0x4c8/0x9f0 net/key/af_key.c:3649 #1: (&pfk->dump_lock){+.+.+.}, at: [<ffffffff83b467f6>] pfkey_do_dump+0x76/0x3f0 net/key/af_key.c:293 #2: (&(&net->xfrm.xfrm_policy_lock)->rlock){+...+.}, at: [<ffffffff83957632>] spin_lock_bh include/linux/spinlock.h:304 [inline] #2: (&(&net->xfrm.xfrm_policy_lock)->rlock){+...+.}, at: [<ffffffff83957632>] xfrm_policy_walk+0x192/0xa30 net/xfrm/xfrm_policy.c:1028 CPU: 0 PID: 2932 Comm: syzkaller183439 Not tainted 4.13.0-rc4+ #24 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:16 [inline] dump_stack+0x194/0x257 lib/dump_stack.c:52 ___might_sleep+0x2b2/0x470 kernel/sched/core.c:5994 __might_sleep+0x95/0x190 kernel/sched/core.c:5947 slab_pre_alloc_hook mm/slab.h:416 [inline] slab_alloc mm/slab.c:3383 [inline] kmem_cache_alloc+0x24b/0x6e0 mm/slab.c:3559 skb_clone+0x1a0/0x400 net/core/skbuff.c:1037 pfkey_broadcast_one+0x4b2/0x6f0 net/key/af_key.c:207 pfkey_broadcast+0x4ba/0x770 net/key/af_key.c:281 dump_sp+0x3d6/0x500 net/key/af_key.c:2685 xfrm_policy_walk+0x2f1/0xa30 net/xfrm/xfrm_policy.c:1042 pfkey_dump_sp+0x42/0x50 net/key/af_key.c:2695 pfkey_do_dump+0xaa/0x3f0 net/key/af_key.c:299 pfkey_spddump+0x1a0/0x210 net/key/af_key.c:2722 pfkey_process+0x606/0x710 net/key/af_key.c:2814 pfkey_sendmsg+0x4d6/0x9f0 net/key/af_key.c:3650 sock_sendmsg_nosec net/socket.c:633 [inline] sock_sendmsg+0xca/0x110 net/socket.c:643 ___sys_sendmsg+0x755/0x890 net/socket.c:2035 __sys_sendmsg+0xe5/0x210 net/socket.c:2069 SYSC_sendmsg net/socket.c:2080 [inline] SyS_sendmsg+0x2d/0x50 net/socket.c:2076 entry_SYSCALL_64_fastpath+0x1f/0xbe RIP: 0033:0x445d79 RSP: 002b:00007f32447c1dc8 EFLAGS: 00000202 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000445d79 RDX: 0000000000000000 RSI: 000000002023dfc8 RDI: 0000000000000008 RBP: 0000000000000086 R08: 00007f32447c2700 R09: 00007f32447c2700 R10: 00007f32447c2700 R11: 0000000000000202 R12: 0000000000000000 R13: 00007ffe33edec4f R14: 00007f32447c29c0 R15: 0000000000000000 Fixes: ba51b6b ("net: Fix RCU splat in af_key") Signed-off-by: Eric Dumazet <[email protected]> Reported-by: Dmitry Vyukov <[email protected]> Cc: David Ahern <[email protected]> Acked-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>

syzkaller reported that DCCP could have a non empty write queue at dismantle time. WARNING: CPU: 1 PID: 2953 at net/core/stream.c:199 sk_stream_kill_queues+0x3ce/0x520 net/core/stream.c:199 Kernel panic - not syncing: panic_on_warn set ... CPU: 1 PID: 2953 Comm: syz-executor0 Not tainted 4.13.0-rc4+ #2 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:16 [inline] dump_stack+0x194/0x257 lib/dump_stack.c:52 panic+0x1e4/0x417 kernel/panic.c:180 __warn+0x1c4/0x1d9 kernel/panic.c:541 report_bug+0x211/0x2d0 lib/bug.c:183 fixup_bug+0x40/0x90 arch/x86/kernel/traps.c:190 do_trap_no_signal arch/x86/kernel/traps.c:224 [inline] do_trap+0x260/0x390 arch/x86/kernel/traps.c:273 do_error_trap+0x120/0x390 arch/x86/kernel/traps.c:310 do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:323 invalid_op+0x1e/0x30 arch/x86/entry/entry_64.S:846 RIP: 0010:sk_stream_kill_queues+0x3ce/0x520 net/core/stream.c:199 RSP: 0018:ffff8801d182f108 EFLAGS: 00010297 RAX: ffff8801d1144140 RBX: ffff8801d13cb280 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffffffff85137b00 RDI: ffff8801d13cb280 RBP: ffff8801d182f148 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801d13cb4d0 R13: ffff8801d13cb3b8 R14: ffff8801d13cb300 R15: ffff8801d13cb3b8 inet_csk_destroy_sock+0x175/0x3f0 net/ipv4/inet_connection_sock.c:835 dccp_close+0x84d/0xc10 net/dccp/proto.c:1067 inet_release+0xed/0x1c0 net/ipv4/af_inet.c:425 sock_release+0x8d/0x1e0 net/socket.c:597 sock_close+0x16/0x20 net/socket.c:1126 __fput+0x327/0x7e0 fs/file_table.c:210 ____fput+0x15/0x20 fs/file_table.c:246 task_work_run+0x18a/0x260 kernel/task_work.c:116 exit_task_work include/linux/task_work.h:21 [inline] do_exit+0xa32/0x1b10 kernel/exit.c:865 do_group_exit+0x149/0x400 kernel/exit.c:969 get_signal+0x7e8/0x17e0 kernel/signal.c:2330 do_signal+0x94/0x1ee0 arch/x86/kernel/signal.c:808 exit_to_usermode_loop+0x21c/0x2d0 arch/x86/entry/common.c:157 prepare_exit_to_usermode arch/x86/entry/common.c:194 [inline] syscall_return_slowpath+0x3a7/0x450 arch/x86/entry/common.c:263 Signed-off-by: Eric Dumazet <[email protected]> Reported-by: Dmitry Vyukov <[email protected]> Signed-off-by: David S. Miller <[email protected]>

The following commit: 39a0526 ("x86/mm: Factor out LDT init from context init") renamed init_new_context() to init_new_context_ldt() and added a new init_new_context() which calls init_new_context_ldt(). However, the error code of init_new_context_ldt() was ignored. Consequently, if a memory allocation in alloc_ldt_struct() failed during a fork(), the ->context.ldt of the new task remained the same as that of the old task (due to the memcpy() in dup_mm()). ldt_struct's are not intended to be shared, so a use-after-free occurred after one task exited. Fix the bug by making init_new_context() pass through the error code of init_new_context_ldt(). This bug was found by syzkaller, which encountered the following splat: BUG: KASAN: use-after-free in free_ldt_struct.part.2+0x10a/0x150 arch/x86/kernel/ldt.c:116 Read of size 4 at addr ffff88006d2cb7c8 by task kworker/u9:0/3710 CPU: 1 PID: 3710 Comm: kworker/u9:0 Not tainted 4.13.0-rc4-next-20170811 #2 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:16 [inline] dump_stack+0x194/0x257 lib/dump_stack.c:52 print_address_description+0x73/0x250 mm/kasan/report.c:252 kasan_report_error mm/kasan/report.c:351 [inline] kasan_report+0x24e/0x340 mm/kasan/report.c:409 __asan_report_load4_noabort+0x14/0x20 mm/kasan/report.c:429 free_ldt_struct.part.2+0x10a/0x150 arch/x86/kernel/ldt.c:116 free_ldt_struct arch/x86/kernel/ldt.c:173 [inline] destroy_context_ldt+0x60/0x80 arch/x86/kernel/ldt.c:171 destroy_context arch/x86/include/asm/mmu_context.h:157 [inline] __mmdrop+0xe9/0x530 kernel/fork.c:889 mmdrop include/linux/sched/mm.h:42 [inline] exec_mmap fs/exec.c:1061 [inline] flush_old_exec+0x173c/0x1ff0 fs/exec.c:1291 load_elf_binary+0x81f/0x4ba0 fs/binfmt_elf.c:855 search_binary_handler+0x142/0x6b0 fs/exec.c:1652 exec_binprm fs/exec.c:1694 [inline] do_execveat_common.isra.33+0x1746/0x22e0 fs/exec.c:1816 do_execve+0x31/0x40 fs/exec.c:1860 call_usermodehelper_exec_async+0x457/0x8f0 kernel/umh.c:100 ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:431 Allocated by task 3700: save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59 save_stack+0x43/0xd0 mm/kasan/kasan.c:447 set_track mm/kasan/kasan.c:459 [inline] kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:551 kmem_cache_alloc_trace+0x136/0x750 mm/slab.c:3627 kmalloc include/linux/slab.h:493 [inline] alloc_ldt_struct+0x52/0x140 arch/x86/kernel/ldt.c:67 write_ldt+0x7b7/0xab0 arch/x86/kernel/ldt.c:277 sys_modify_ldt+0x1ef/0x240 arch/x86/kernel/ldt.c:307 entry_SYSCALL_64_fastpath+0x1f/0xbe Freed by task 3700: save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59 save_stack+0x43/0xd0 mm/kasan/kasan.c:447 set_track mm/kasan/kasan.c:459 [inline] kasan_slab_free+0x71/0xc0 mm/kasan/kasan.c:524 __cache_free mm/slab.c:3503 [inline] kfree+0xca/0x250 mm/slab.c:3820 free_ldt_struct.part.2+0xdd/0x150 arch/x86/kernel/ldt.c:121 free_ldt_struct arch/x86/kernel/ldt.c:173 [inline] destroy_context_ldt+0x60/0x80 arch/x86/kernel/ldt.c:171 destroy_context arch/x86/include/asm/mmu_context.h:157 [inline] __mmdrop+0xe9/0x530 kernel/fork.c:889 mmdrop include/linux/sched/mm.h:42 [inline] __mmput kernel/fork.c:916 [inline] mmput+0x541/0x6e0 kernel/fork.c:927 copy_process.part.36+0x22e1/0x4af0 kernel/fork.c:1931 copy_process kernel/fork.c:1546 [inline] _do_fork+0x1ef/0xfb0 kernel/fork.c:2025 SYSC_clone kernel/fork.c:2135 [inline] SyS_clone+0x37/0x50 kernel/fork.c:2129 do_syscall_64+0x26c/0x8c0 arch/x86/entry/common.c:287 return_from_SYSCALL_64+0x0/0x7a Here is a C reproducer: #include <asm/ldt.h> #include <pthread.h> #include <signal.h> #include <stdlib.h> #include <sys/syscall.h> #include <sys/wait.h> #include <unistd.h> static void *fork_thread(void *_arg) { fork(); } int main(void) { struct user_desc desc = { .entry_number = 8191 }; syscall(__NR_modify_ldt, 1, &desc, sizeof(desc)); for (;;) { if (fork() == 0) { pthread_t t; srand(getpid()); pthread_create(&t, NULL, fork_thread, NULL); usleep(rand() % 10000); syscall(__NR_exit_group, 0); } wait(NULL); } } Note: the reproducer takes advantage of the fact that alloc_ldt_struct() may use vmalloc() to allocate a large ->entries array, and after commit: 5d17a73 ("vmalloc: back off when the current task is killed") it is possible for userspace to fail a task's vmalloc() by sending a fatal signal, e.g. via exit_group(). It would be more difficult to reproduce this bug on kernels without that commit. This bug only affected kernels with CONFIG_MODIFY_LDT_SYSCALL=y. Signed-off-by: Eric Biggers <[email protected]> Acked-by: Dave Hansen <[email protected]> Cc: <[email protected]> [v4.6+] Cc: Andrew Morton <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Tetsuo Handa <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Fixes: 39a0526 ("x86/mm: Factor out LDT init from context init") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>

v4.10 commit 6f2ce1c ("scsi: zfcp: fix rport unblock race with LUN recovery") extended accessing parent pointer fields of struct zfcp_erp_action for tracing. If an erp_action has never been enqueued before, these parent pointer fields are uninitialized and NULL. Examples are zfcp objects freshly added to the parent object's children list, before enqueueing their first recovery subsequently. In zfcp_erp_try_rport_unblock(), we iterate such list. Accessing erp_action fields can cause a NULL pointer dereference. Since the kernel can read from lowcore on s390, it does not immediately cause a kernel page fault. Instead it can cause hangs on trying to acquire the wrong erp_action->adapter->dbf->rec_lock in zfcp_dbf_rec_action_lvl() ^bogus^ while holding already other locks with IRQs disabled. Real life example from attaching lots of LUNs in parallel on many CPUs: crash> bt 17723 PID: 17723 TASK: ... CPU: 25 COMMAND: "zfcperp0.0.1800" LOWCORE INFO: -psw : 0x0404300180000000 0x000000000038e424 -function : _raw_spin_lock_wait_flags at 38e424 ... #0 [fdde8fc90] zfcp_dbf_rec_action_lvl at 3e0004e9862 [zfcp] #1 [fdde8fce8] zfcp_erp_try_rport_unblock at 3e0004dfddc [zfcp] #2 [fdde8fd38] zfcp_erp_strategy at 3e0004e0234 [zfcp] #3 [fdde8fda8] zfcp_erp_thread at 3e0004e0a12 [zfcp] #4 [fdde8fe60] kthread at 173550 #5 [fdde8feb8] kernel_thread_starter at 10add2 zfcp_adapter zfcp_port zfcp_unit <address>, 0x404040d600000000 scsi_device NULL, returning early! zfcp_scsi_dev.status = 0x40000000 0x40000000 ZFCP_STATUS_COMMON_RUNNING crash> zfcp_unit <address> struct zfcp_unit { erp_action = { adapter = 0x0, port = 0x0, unit = 0x0, }, } zfcp_erp_action is always fully embedded into its container object. Such container object is never moved in its object tree (only add or delete). Hence, erp_action parent pointers can never change. To fix the issue, initialize the erp_action parent pointers before adding the erp_action container to any list and thus before it becomes accessible from outside of its initializing function. In order to also close the time window between zfcp_erp_setup_act() memsetting the entire erp_action to zero and setting the parent pointers again, drop the memset and instead explicitly initialize individually all erp_action fields except for parent pointers. To be extra careful not to introduce any other unintended side effect, even keep zeroing the erp_action fields for list and timer. Also double-check with WARN_ON_ONCE that erp_action parent pointers never change, so we get to know when we would deviate from previous behavior. Signed-off-by: Steffen Maier <[email protected]> Fixes: 6f2ce1c ("scsi: zfcp: fix rport unblock race with LUN recovery") Cc: <[email protected]> #2.6.32+ Reviewed-by: Benjamin Block <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>

Thomas reported that 'perf buildid-list' gets a SEGFAULT due to NULL pointer deref when he ran it on a data with namespace events. It was because the buildid_id__mark_dso_hit_ops lacks the namespace event handler and perf_too__fill_default() didn't set it. Program received signal SIGSEGV, Segmentation fault. 0x0000000000000000 in ?? () Missing separate debuginfos, use: dnf debuginfo-install audit-libs-2.7.7-1.fc25.s390x bzip2-libs-1.0.6-21.fc25.s390x elfutils-libelf-0.169-1.fc25.s390x +elfutils-libs-0.169-1.fc25.s390x libcap-ng-0.7.8-1.fc25.s390x numactl-libs-2.0.11-2.ibm.fc25.s390x openssl-libs-1.1.0e-1.1.ibm.fc25.s390x perl-libs-5.24.1-386.fc25.s390x +python-libs-2.7.13-2.fc25.s390x slang-2.3.0-7.fc25.s390x xz-libs-5.2.3-2.fc25.s390x zlib-1.2.8-10.fc25.s390x (gdb) where #0 0x0000000000000000 in ?? () #1 0x00000000010fad6a in machines__deliver_event (machines=<optimized out>, machines@entry=0x2c6fd18, evlist=<optimized out>, event=event@entry=0x3fffdf00470, sample=0x3ffffffe880, sample@entry=0x3ffffffe888, tool=tool@entry=0x1312968 <build_id.mark_dso_hit_ops>, file_offset=1136) at util/session.c:1287 #2 0x00000000010fbf4e in perf_session__deliver_event (file_offset=1136, tool=0x1312968 <build_id.mark_dso_hit_ops>, sample=0x3ffffffe888, event=0x3fffdf00470, session=0x2c6fc30) at util/session.c:1340 #3 perf_session__process_event (session=0x2c6fc30, session@entry=0x0, event=event@entry=0x3fffdf00470, file_offset=file_offset@entry=1136) at util/session.c:1522 #4 0x00000000010fddde in __perf_session__process_events (file_size=11880, data_size=<optimized out>, data_offset=<optimized out>, session=0x0) at util/session.c:1899 #5 perf_session__process_events (session=0x0, session@entry=0x2c6fc30) at util/session.c:1953 #6 0x000000000103b2ac in perf_session__list_build_ids (with_hits=<optimized out>, force=<optimized out>) at builtin-buildid-list.c:83 #7 cmd_buildid_list (argc=<optimized out>, argv=<optimized out>) at builtin-buildid-list.c:115 #8 0x00000000010a026c in run_builtin (p=0x1311f78 <commands+24>, argc=argc@entry=2, argv=argv@entry=0x3fffffff3c0) at perf.c:296 #9 0x000000000102bc00 in handle_internal_command (argv=<optimized out>, argc=2) at perf.c:348 #10 run_argv (argcp=<synthetic pointer>, argv=<synthetic pointer>) at perf.c:392 #11 main (argc=<optimized out>, argv=0x3fffffff3c0) at perf.c:536 (gdb) Fix it by adding a stub event handler for namespace event. Committer testing: Further clarifying, plain using 'perf buildid-list' will not end up in a SEGFAULT when processing a perf.data file with namespace info: # perf record -a --namespaces sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 2.024 MB perf.data (1058 samples) ] # perf buildid-list | wc -l 38 # perf buildid-list | head -5 e2a171c7b905826fc8494f0711ba76ab6abbd604 /lib/modules/4.14.0-rc3+/build/vmlinux 874840a02d8f8a31cedd605d0b8653145472ced3 /lib/modules/4.14.0-rc3+/kernel/arch/x86/kvm/kvm-intel.ko ea7223776730cd8a22f320040aae4d54312984bc /lib/modules/4.14.0-rc3+/kernel/drivers/gpu/drm/i915/i915.ko 5961535e6732a8edb7f22b3f148bb2fa2e0be4b9 /lib/modules/4.14.0-rc3+/kernel/drivers/gpu/drm/drm.ko f045f54aa78cf1931cc893f78b6cbc52c72a8cb1 /usr/lib64/libc-2.25.so # It is only when one asks for checking what of those entries actually had samples, i.e. when we use either -H or --with-hits, that we will process all the PERF_RECORD_ events, and since tools/perf/builtin-buildid-list.c neither explicitely set a perf_tool.namespaces() callback nor the default stub was set that we end up, when processing a PERF_RECORD_NAMESPACE record, causing a SEGFAULT: # perf buildid-list -H Segmentation fault (core dumped) ^C # Reported-and-Tested-by: Thomas-Mich Richter <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Hari Bathini <[email protected]> Cc: Hendrik Brueckner <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas-Mich Richter <[email protected]> Fixes: f3b3614 ("perf tools: Add PERF_RECORD_NAMESPACES to include namespaces related info") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

When an EMAD is transmitted, a timeout work item is scheduled with a delay of 200ms, so that another EMAD will be retried until a maximum of five retries. In certain situations, it's possible for the function waiting on the EMAD to be associated with a work item that is queued on the same workqueue (`mlxsw_core`) as the timeout work item. This results in flushing a work item on the same workqueue. According to commit e159489 ("workqueue: relax lockdep annotation on flush_work()") the above may lead to a deadlock in case the workqueue has only one worker active or if the system in under memory pressure and the rescue worker is in use. The latter explains the very rare and random nature of the lockdep splats we have been seeing: [ 52.730240] ============================================ [ 52.736179] WARNING: possible recursive locking detected [ 52.742119] 4.14.0-rc3jiri+ #4 Not tainted [ 52.746697] -------------------------------------------- [ 52.752635] kworker/1:3/599 is trying to acquire lock: [ 52.758378] (mlxsw_core_driver_name){+.+.}, at: [<ffffffff811c4fa4>] flush_work+0x3a4/0x5e0 [ 52.767837] but task is already holding lock: [ 52.774360] (mlxsw_core_driver_name){+.+.}, at: [<ffffffff811c65c4>] process_one_work+0x7d4/0x12f0 [ 52.784495] other info that might help us debug this: [ 52.791794] Possible unsafe locking scenario: [ 52.798413] CPU0 [ 52.801144] ---- [ 52.803875] lock(mlxsw_core_driver_name); [ 52.808556] lock(mlxsw_core_driver_name); [ 52.813236] *** DEADLOCK *** [ 52.819857] May be due to missing lock nesting notation [ 52.827450] 3 locks held by kworker/1:3/599: [ 52.832221] #0: (mlxsw_core_driver_name){+.+.}, at: [<ffffffff811c65c4>] process_one_work+0x7d4/0x12f0 [ 52.842846] #1: ((&(&bridge->fdb_notify.dw)->work)){+.+.}, at: [<ffffffff811c65c4>] process_one_work+0x7d4/0x12f0 [ 52.854537] #2: (rtnl_mutex){+.+.}, at: [<ffffffff822ad8e7>] rtnl_lock+0x17/0x20 [ 52.863021] stack backtrace: [ 52.867890] CPU: 1 PID: 599 Comm: kworker/1:3 Not tainted 4.14.0-rc3jiri+ #4 [ 52.875773] Hardware name: Mellanox Technologies Ltd. "MSN2100-CB2F"/"SA001017", BIOS 5.6.5 06/07/2016 [ 52.886267] Workqueue: mlxsw_core mlxsw_sp_fdb_notify_work [mlxsw_spectrum] [ 52.894060] Call Trace: [ 52.909122] __lock_acquire+0xf6f/0x2a10 [ 53.025412] lock_acquire+0x158/0x440 [ 53.047557] flush_work+0x3c4/0x5e0 [ 53.087571] __cancel_work_timer+0x3ca/0x5e0 [ 53.177051] cancel_delayed_work_sync+0x13/0x20 [ 53.182142] mlxsw_reg_trans_bulk_wait+0x12d/0x7a0 [mlxsw_core] [ 53.194571] mlxsw_core_reg_access+0x586/0x990 [mlxsw_core] [ 53.225365] mlxsw_reg_query+0x10/0x20 [mlxsw_core] [ 53.230882] mlxsw_sp_fdb_notify_work+0x2a3/0x9d0 [mlxsw_spectrum] [ 53.237801] process_one_work+0x8f1/0x12f0 [ 53.321804] worker_thread+0x1fd/0x10c0 [ 53.435158] kthread+0x28e/0x370 [ 53.448703] ret_from_fork+0x2a/0x40 [ 53.453017] mlxsw_spectrum 0000:01:00.0: EMAD retries (2/5) (tid=bf4549b100000774) [ 53.453119] mlxsw_spectrum 0000:01:00.0: EMAD retries (5/5) (tid=bf4549b100000770) [ 53.453132] mlxsw_spectrum 0000:01:00.0: EMAD reg access failed (tid=bf4549b100000770,reg_id=200b(sfn),type=query,status=0(operation performed)) [ 53.453143] mlxsw_spectrum 0000:01:00.0: Failed to get FDB notifications Fix this by creating another workqueue for EMAD timeouts, thereby preventing the situation of a work item trying to flush a work item queued on the same workqueue. Fixes: caf7297 ("mlxsw: core: Introduce support for asynchronous EMAD register access") Signed-off-by: Ido Schimmel <[email protected]> Reported-by: Jiri Pirko <[email protected]> Signed-off-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>

docularxu and others added 12 commits January 11, 2016 14:45

hikey: config: create from defconfig

c833359

Signed-off-by: Guodong Xu <[email protected]>

hikey: config: enable CONFIG_PINCTRL_SINGLE

cae1114

Signed-off-by: Guodong Xu <[email protected]>

hikey: config: enable HI6220 MTCMOS regulators, HI655x PMIC and HI655…

aede479

…X regulators Signed-off-by: Guodong Xu <[email protected]>

hikey: config: enable dw mmc k3

2a2cf20

Signed-off-by: Guodong Xu <[email protected]>

hikey: config: enable configs for TI WL1835, built as module

b0507cc

Signed-off-by: Guodong Xu <[email protected]>

hikey: config: add usb

fcfa85d

Signed-off-by: Guodong Xu <[email protected]>

hikey: config: enable CONFIG_I2C_DESIGNWARE_PLATFORM

030c092

Signed-off-by: Guodong Xu <[email protected]>

hikey: config: Enable hisi drm drivers

d26a00e

enable hisi drm enable adv7511 Set cma size to 128M Enable dummy ion driver to use system heap before hisi ion driver is ready. Signed-off-by: Xinliang Liu <[email protected]> Signed-off-by: Guodong Xu <[email protected]>

hikey: config: enable hi6220 mailbox

f8dae23

Signed-off-by: Guodong Xu <[email protected]>

hikey_defconfig: Add gpu configures

0559613

Signed-off-by: Xinliang Liu <[email protected]>

hikey: config: Add required config for HiKey ION and Android graphics

10d2fd0

Adds the required config option for ION, as well as other tweaks needed to get Android graphics working on hikey. Signed-off-by: John Stultz <[email protected]>

docularxu force-pushed the hikey-tracking-config branch from ccf1853 to 1ffa2a7 Compare February 3, 2016 14:32

docularxu closed this Feb 5, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Android defconfig changes for hikey #2

Android defconfig changes for hikey #2

johnstultz-work commented Feb 2, 2016

docularxu commented Feb 5, 2016

Android defconfig changes for hikey #2

Android defconfig changes for hikey #2

Conversation

johnstultz-work commented Feb 2, 2016

docularxu commented Feb 5, 2016