-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ASUS Tinker Board Kernel Modules #20
Comments
WIFI and Bluetooth driver for tinker board are not included in main 4.4 kernel branch. https://github.com/rockchip-linux/kernel/tree/release-20160818-miniarm |
If that branch does not work for your needs, the Tinkerboard 4.4.76 kernel at Armbian.com has Bluetooth and rtl8723bs driver 4.4.0, as well as the Rockchip reboot fix. The build environment allows for creating user patches for any other adjustments you may wish to make. I follow the patches here and on the Tinker Board repo for any fixes. |
I will rebase release-20160818-miniarm branch to the latest release-4.4 and re-organiza the code to make it easy to merge release-4.4 branch. Maybe armbian could switch to that branch. |
Thanks for the update, Armbian tries to use a single kernel source with patches to support all devices using a particular SoC. Switching to the Rockchip miniarm branch would most likely break the MiQi, or we'd have to have an entirely separate config and patchset for it. |
Also, I've had very good success (as has @Miouyouyou) running Mainline kernel on RK3288 devices, the patches for Mainline are primarily dts and Mali. |
Yeah, we're only missing bluetooth and VPU (well, I didn't the test the VPU packages ATM) with the Mainline kernel. |
You can look at this repo for VPU driver. I remember 4K HDMI is also missing, maybe you folks can have a look. ; ). |
Haha yes, 4k is missing as well, I was focusing on the reboot thing first. Thanks for the link |
I'll take a look at the VPU, yeah ! I also take generous 4K screens donations ! Meanwhile, I don't have 4K screens so testing this will be fairly difficult (´・ω・`). The 4.13.0-rc2 update should be available very soon. This update includes various DTS/DTSI fixes and the reboot patch. Turns out I missed regulator-always-on and regulator-boot-on properties on an SD vcc node. |
I have a 4k screen, I was looking at some of the code, let me get the Armbian Dev up to 4.13 first. ;-) |
commit 6f48655 upstream. This patch fixes a generate_node_acls = 1 + cache_dynamic_acls = 0 regression, that was introduced by commit 01d4d67 Author: Nicholas Bellinger <[email protected]> Date: Wed Dec 7 12:55:54 2016 -0800 which originally had the proper list_del_init() usage, but was dropped during list review as it was thought unnecessary by HCH. However, list_del_init() usage is required during the special generate_node_acls = 1 + cache_dynamic_acls = 0 case when transport_free_session() does a list_del(&se_nacl->acl_list), followed by target_complete_nacl() doing the same thing. This was manifesting as a general protection fault as reported by Justin: kernel: general protection fault: 0000 [#1] SMP kernel: Modules linked in: kernel: CPU: 0 PID: 11047 Comm: iscsi_ttx Not tainted 4.13.0-rc2.x86_64.1+ #20 kernel: Hardware name: Intel Corporation S5500BC/S5500BC, BIOS S5500.86B.01.00.0064.050520141428 05/05/2014 kernel: task: ffff88026939e800 task.stack: ffffc90007884000 kernel: RIP: 0010:target_put_nacl+0x49/0xb0 kernel: RSP: 0018:ffffc90007887d70 EFLAGS: 00010246 kernel: RAX: dead000000000200 RBX: ffff8802556ca000 RCX: 0000000000000000 kernel: RDX: dead000000000100 RSI: 0000000000000246 RDI: ffff8802556ce028 kernel: RBP: ffffc90007887d88 R08: 0000000000000001 R09: 0000000000000000 kernel: R10: ffffc90007887df8 R11: ffffea0009986900 R12: ffff8802556ce020 kernel: R13: ffff8802556ce028 R14: ffff8802556ce028 R15: ffffffff88d85540 kernel: FS: 0000000000000000(0000) GS:ffff88027fc00000(0000) knlGS:0000000000000000 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 kernel: CR2: 00007fffe36f5f94 CR3: 0000000009209000 CR4: 00000000003406f0 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 kernel: Call Trace: kernel: transport_free_session+0x67/0x140 kernel: transport_deregister_session+0x7a/0xc0 kernel: iscsit_close_session+0x92/0x210 kernel: iscsit_close_connection+0x5f9/0x840 kernel: iscsit_take_action_for_connection_exit+0xfe/0x110 kernel: iscsi_target_tx_thread+0x140/0x1e0 kernel: ? wait_woken+0x90/0x90 kernel: kthread+0x124/0x160 kernel: ? iscsit_thread_get_cpumask+0x90/0x90 kernel: ? kthread_create_on_node+0x40/0x40 kernel: ret_from_fork+0x22/0x30 kernel: Code: 00 48 89 fb 4c 8b a7 48 01 00 00 74 68 4d 8d 6c 24 08 4c 89 ef e8 e8 28 43 00 48 8b 93 20 04 00 00 48 8b 83 28 04 00 00 4c 89 ef <48> 89 42 08 48 89 10 48 b8 00 01 00 00 00 00 ad de 48 89 83 20 kernel: RIP: target_put_nacl+0x49/0xb0 RSP: ffffc90007887d70 kernel: ---[ end trace f12821adbfd46fed ]--- To address this, go ahead and use proper list_del_list() for all cases of se_nacl->acl_list deletion. Reported-by: Justin Maggard <[email protected]> Tested-by: Justin Maggard <[email protected]> Cc: Justin Maggard <[email protected]> Signed-off-by: Nicholas Bellinger <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Kms hotplug event may race into fbdev helper initial, that would cause the bug: [ 0.735411] [00000200] *pgd=00000000f6ffe003, *pud=00000000f6ffe003, *pmd=0000000000000000 [ 0.736156] Internal error: Oops: 96000005 [#1] PREEMPT SMP [ 0.736648] Modules linked in: [ 0.736930] CPU: 2 PID: 20 Comm: kworker/2:0 Not tainted 4.4.41 rockchip-linux#20 [ 0.737480] Hardware name: Rockchip RK3399 Board rev2 (BOX) (DT) [ 0.738020] Workqueue: events cdn_dp_pd_event_work [ 0.738447] task: ffffffc0f21f3100 ti: ffffffc0f2218000 task.ti: ffffffc0f2218000 [ 0.739109] PC is at mutex_lock+0x14/0x44 [ 0.739469] LR is at drm_fb_helper_hotplug_event+0x30/0x114 [ 0.756253] [<ffffff8008a344f4>] mutex_lock+0x14/0x44 [ 0.756260] [<ffffff8008445708>] drm_fb_helper_hotplug_event+0x30/0x114 [ 0.756271] [<ffffff8008473c84>] rockchip_drm_output_poll_changed+0x18/0x20 [ 0.756280] [<ffffff8008439fcc>] drm_kms_helper_hotplug_event+0x28/0x34 [ 0.756286] [<ffffff800846c444>] cdn_dp_pd_event_work+0x394/0x3c4 [ 0.756295] [<ffffff80080b2b38>] process_one_work+0x218/0x3e0 [ 0.756302] [<ffffff80080b3538>] worker_thread+0x2e8/0x404 [ 0.756308] [<ffffff80080b7e70>] kthread+0xe8/0xf0 [ 0.756316] [<ffffff8008082690>] ret_from_fork+0x10/0x40 Change-Id: I8d594183cf8187131418a0096dde840cbf01ed6b Signed-off-by: Mark Yao <[email protected]>
This patch aims to fix possible unsafe locking scenario when dwc3 probe failed. The possible deadlock warning info is: [4.254320] ====================================================== [4.254327] [ INFO: possible circular locking dependency detected ] [4.254334] 4.4.55 rockchip-linux#20 Not tainted [4.254339] ------------------------------------------------------- [4.254346] kworker/4:1/47 is trying to acquire lock: [4.254352] (&rockchip->lock){+.+.+.}, at: [<ffffff8008730d58>] dwc3_rockchip_otg_extcon_evt_work+0x30/0x42c [4.254382] [4.254382] but task is already holding lock: [4.254388] ((&rockchip->otg_work)){+.+...}, at: [<ffffff80080b7a78>] process_one_work+0x1c4/0x6ac [4.254411] [4.254411] which lock already depends on the new lock. [4.254411] [4.254418] [4.254418] the existing dependency chain (in reverse order) is: [4.254424] -> #1 ((&rockchip->otg_work)){+.+...}: [4.254440] [<ffffff80080f6a58>] __lock_acquire+0x15b4/0x1950 [4.254452] [<ffffff80080f75fc>] lock_acquire+0x190/0x250 [4.254461] [<ffffff80080b88c8>] flush_work+0x4c/0x274 [4.254471] [<ffffff80080b8cd8>] __cancel_work_timer+0x130/0x1c0 [4.254480] [<ffffff80080b8d78>] cancel_work_sync+0x10/0x18 [4.254489] [<ffffff8008731648>] dwc3_rockchip_probe+0x4f4/0x59c [4.254498] [<ffffff8008525a9c>] platform_drv_probe+0x58/0xa4 [4.254509] [<ffffff800852397c>] driver_probe_device+0x118/0x2b0 [4.254521] [<ffffff8008523b80>] __driver_attach+0x6c/0x98 [4.254531] [<ffffff80085229d8>] bus_for_each_dev+0x80/0xb0 [4.254543] [<ffffff80085234b0>] driver_attach+0x20/0x28 [4.254554] [<ffffff8008523040>] bus_add_driver+0xe8/0x1ec [4.254565] [<ffffff8008524a9c>] driver_register+0x94/0xe0 [4.254576] [<ffffff80085259f4>] __platform_driver_register+0x48/0x50 [4.254585] [<ffffff8009130bf4>] dwc3_rockchip_driver_init+0x18/0x20 [4.254598] [<ffffff8008082ba8>] do_one_initcall+0x180/0x19c [4.254609] [<ffffff8009100e58>] kernel_init_freeable+0x208/0x2c0 [4.254621] [<ffffff8008c00748>] kernel_init+0x10/0xf8 [4.254632] [<ffffff80080826d0>] ret_from_fork+0x10/0x40 [4.254641] -> #0 (&rockchip->lock){+.+.+.}: [4.254656] [<ffffff80080f3af8>] print_circular_bug+0x64/0x2c4 [4.254666] [<ffffff80080f6728>] __lock_acquire+0x1284/0x1950 [4.254675] [<ffffff80080f75fc>] lock_acquire+0x190/0x250 [4.254685] [<ffffff8008c03aac>] mutex_lock_nested+0x80/0x3d0 [4.254695] [<ffffff8008730d58>] dwc3_rockchip_otg_extcon_evt_work+0x30/0x42c [4.254704] [<ffffff80080b7bec>] process_one_work+0x338/0x6ac [4.254714] [<ffffff80080b90e8>] worker_thread+0x300/0x428 [4.254723] [<ffffff80080bead8>] kthread+0xf4/0xfc [4.254732] [<ffffff80080826d0>] ret_from_fork+0x10/0x40 [4.254741] [4.254741] other info that might help us debug this: [4.254741] [4.254749] Possible unsafe locking scenario: [4.254749] [4.254755] CPU0 CPU1 [4.254760] ---- ---- [4.254765] lock((&rockchip->otg_work)); [4.254775] lock(&rockchip->lock); [4.254786] lock((&rockchip->otg_work)); [4.254796] lock(&rockchip->lock); [4.254805] [4.254805] *** DEADLOCK *** [4.254805] [4.254813] 2 locks held by kworker/4:1/47: [4.254818] #0: ("events"){.+.+.+}, at: [<ffffff80080b7a78>] process_one_work+0x1c4/0x6ac [4.254839] #1: ((&rockchip->otg_work)){+.+...}, at: [<ffffff80080b7a78>] process_one_work+0x1c4/0x6ac [4.254860] [4.254860] stack backtrace: [4.254869] CPU: 4 PID: 47 Comm: kworker/4:1 Not tainted 4.4.55 rockchip-linux#20 [4.254875] Hardware name: Rockchip RK3399 Evaluation Board v3 (Android) (DT) [4.254885] Workqueue: events dwc3_rockchip_otg_extcon_evt_work [4.254893] Call trace: [4.254902] [<ffffff8008088a1c>] dump_backtrace+0x0/0x1c8 [4.254909] [<ffffff8008088bf8>] show_stack+0x14/0x1c [4.254917] [<ffffff80083a5fa0>] dump_stack+0xb0/0xec [4.254925] [<ffffff80080f3d3c>] print_circular_bug+0x2a8/0x2c4 [4.254931] [<ffffff80080f6728>] __lock_acquire+0x1284/0x1950 [4.254938] [<ffffff80080f75fc>] lock_acquire+0x190/0x250 [4.254946] [<ffffff8008c03aac>] mutex_lock_nested+0x80/0x3d0 [4.254952] [<ffffff8008730d58>] dwc3_rockchip_otg_extcon_evt_work+0x30/0x42c [4.254960] [<ffffff80080b7bec>] process_one_work+0x338/0x6ac [4.254967] [<ffffff80080b90e8>] worker_thread+0x300/0x428 [4.254973] [<ffffff80080bead8>] kthread+0xf4/0xfc [4.254979] [<ffffff80080826d0>] ret_from_fork+0x10/0x40 [4.275124] rockchip-dwc3 usb@fe800000: USB peripheral connected Change-Id: I9d34724e1c2af9b8472f79bfe4b088cbdde6394d Signed-off-by: William Wu <[email protected]>
[ Upstream commit 560d388 ] cifs_relock_file() can perform a down_write() on the inode's lock_sem even though it was already performed in cifs_strict_readv(). Lockdep complains about this. AFAICS, there is no problem here, and lockdep just needs to be told that this nesting is OK. ============================================= [ INFO: possible recursive locking detected ] 4.11.0+ #20 Not tainted --------------------------------------------- cat/701 is trying to acquire lock: (&cifsi->lock_sem){++++.+}, at: cifs_reopen_file+0x7a7/0xc00 but task is already holding lock: (&cifsi->lock_sem){++++.+}, at: cifs_strict_readv+0x177/0x310 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&cifsi->lock_sem); lock(&cifsi->lock_sem); *** DEADLOCK *** May be due to missing lock nesting notation 1 lock held by cat/701: #0: (&cifsi->lock_sem){++++.+}, at: cifs_strict_readv+0x177/0x310 stack backtrace: CPU: 0 PID: 701 Comm: cat Not tainted 4.11.0+ #20 Call Trace: dump_stack+0x85/0xc2 __lock_acquire+0x17dd/0x2260 ? trace_hardirqs_on_thunk+0x1a/0x1c ? preempt_schedule_irq+0x6b/0x80 lock_acquire+0xcc/0x260 ? lock_acquire+0xcc/0x260 ? cifs_reopen_file+0x7a7/0xc00 down_read+0x2d/0x70 ? cifs_reopen_file+0x7a7/0xc00 cifs_reopen_file+0x7a7/0xc00 ? printk+0x43/0x4b cifs_readpage_worker+0x327/0x8a0 cifs_readpage+0x8c/0x2a0 generic_file_read_iter+0x692/0xd00 cifs_strict_readv+0x29f/0x310 generic_file_splice_read+0x11c/0x1c0 do_splice_to+0xa5/0xc0 splice_direct_to_actor+0xfa/0x350 ? generic_pipe_buf_nosteal+0x10/0x10 do_splice_direct+0xb5/0xe0 do_sendfile+0x278/0x3a0 SyS_sendfile64+0xc4/0xe0 entry_SYSCALL_64_fastpath+0x1f/0xbe Signed-off-by: Rabin Vincent <[email protected]> Acked-by: Pavel Shilovsky <[email protected]> Signed-off-by: Steve French <[email protected]> Signed-off-by: Sasha Levin <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
[ Upstream commit 2bbea6e ] when mounting an ISO filesystem sometimes (very rarely) the system hangs because of a race condition between two tasks. PID: 6766 TASK: ffff88007b2a6dd0 CPU: 0 COMMAND: "mount" #0 [ffff880078447ae0] __schedule at ffffffff8168d605 #1 [ffff880078447b48] schedule_preempt_disabled at ffffffff8168ed49 #2 [ffff880078447b58] __mutex_lock_slowpath at ffffffff8168c995 #3 [ffff880078447bb8] mutex_lock at ffffffff8168bdef #4 [ffff880078447bd0] sr_block_ioctl at ffffffffa00b6818 [sr_mod] #5 [ffff880078447c10] blkdev_ioctl at ffffffff812fea50 #6 [ffff880078447c70] ioctl_by_bdev at ffffffff8123a8b3 #7 [ffff880078447c90] isofs_fill_super at ffffffffa04fb1e1 [isofs] #8 [ffff880078447da8] mount_bdev at ffffffff81202570 #9 [ffff880078447e18] isofs_mount at ffffffffa04f9828 [isofs] #10 [ffff880078447e28] mount_fs at ffffffff81202d09 #11 [ffff880078447e70] vfs_kern_mount at ffffffff8121ea8f #12 [ffff880078447ea8] do_mount at ffffffff81220fee #13 [ffff880078447f28] sys_mount at ffffffff812218d6 #14 [ffff880078447f80] system_call_fastpath at ffffffff81698c49 RIP: 00007fd9ea914e9a RSP: 00007ffd5d9bf648 RFLAGS: 00010246 RAX: 00000000000000a5 RBX: ffffffff81698c49 RCX: 0000000000000010 RDX: 00007fd9ec2bc210 RSI: 00007fd9ec2bc290 RDI: 00007fd9ec2bcf30 RBP: 0000000000000000 R8: 0000000000000000 R9: 0000000000000010 R10: 00000000c0ed0001 R11: 0000000000000206 R12: 00007fd9ec2bc040 R13: 00007fd9eb6b2380 R14: 00007fd9ec2bc210 R15: 00007fd9ec2bcf30 ORIG_RAX: 00000000000000a5 CS: 0033 SS: 002b This task was trying to mount the cdrom. It allocated and configured a super_block struct and owned the write-lock for the super_block->s_umount rwsem. While exclusively owning the s_umount lock, it called sr_block_ioctl and waited to acquire the global sr_mutex lock. PID: 6785 TASK: ffff880078720fb0 CPU: 0 COMMAND: "systemd-udevd" #0 [ffff880078417898] __schedule at ffffffff8168d605 #1 [ffff880078417900] schedule at ffffffff8168dc59 #2 [ffff880078417910] rwsem_down_read_failed at ffffffff8168f605 #3 [ffff880078417980] call_rwsem_down_read_failed at ffffffff81328838 #4 [ffff8800784179d0] down_read at ffffffff8168cde0 #5 [ffff8800784179e8] get_super at ffffffff81201cc7 #6 [ffff880078417a10] __invalidate_device at ffffffff8123a8de #7 [ffff880078417a40] flush_disk at ffffffff8123a94b #8 [ffff880078417a88] check_disk_change at ffffffff8123ab50 #9 [ffff880078417ab0] cdrom_open at ffffffffa00a29e1 [cdrom] #10 [ffff880078417b68] sr_block_open at ffffffffa00b6f9b [sr_mod] #11 [ffff880078417b98] __blkdev_get at ffffffff8123ba86 #12 [ffff880078417bf0] blkdev_get at ffffffff8123bd65 #13 [ffff880078417c78] blkdev_open at ffffffff8123bf9b #14 [ffff880078417c90] do_dentry_open at ffffffff811fc7f7 #15 [ffff880078417cd8] vfs_open at ffffffff811fc9cf #16 [ffff880078417d00] do_last at ffffffff8120d53d #17 [ffff880078417db0] path_openat at ffffffff8120e6b2 #18 [ffff880078417e48] do_filp_open at ffffffff8121082b #19 [ffff880078417f18] do_sys_open at ffffffff811fdd33 #20 [ffff880078417f70] sys_open at ffffffff811fde4e #21 [ffff880078417f80] system_call_fastpath at ffffffff81698c49 RIP: 00007f29438b0c20 RSP: 00007ffc76624b78 RFLAGS: 00010246 RAX: 0000000000000002 RBX: ffffffff81698c49 RCX: 0000000000000000 RDX: 00007f2944a5fa70 RSI: 00000000000a0800 RDI: 00007f2944a5fa70 RBP: 00007f2944a5f540 R8: 0000000000000000 R9: 0000000000000020 R10: 00007f2943614c40 R11: 0000000000000246 R12: ffffffff811fde4e R13: ffff880078417f78 R14: 000000000000000c R15: 00007f2944a4b010 ORIG_RAX: 0000000000000002 CS: 0033 SS: 002b This task tried to open the cdrom device, the sr_block_open function acquired the global sr_mutex lock. The call to check_disk_change() then saw an event flag indicating a possible media change and tried to flush any cached data for the device. As part of the flush, it tried to acquire the super_block->s_umount lock associated with the cdrom device. This was the same super_block as created and locked by the previous task. The first task acquires the s_umount lock and then the sr_mutex_lock; the second task acquires the sr_mutex_lock and then the s_umount lock. This patch fixes the issue by moving check_disk_change() out of cdrom_open() and let the caller take care of it. Signed-off-by: Maurizio Lombardi <[email protected]> Signed-off-by: Jens Axboe <[email protected]> Signed-off-by: Sasha Levin <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
[ Upstream commit 2bbea6e ] when mounting an ISO filesystem sometimes (very rarely) the system hangs because of a race condition between two tasks. PID: 6766 TASK: ffff88007b2a6dd0 CPU: 0 COMMAND: "mount" #0 [ffff880078447ae0] __schedule at ffffffff8168d605 #1 [ffff880078447b48] schedule_preempt_disabled at ffffffff8168ed49 #2 [ffff880078447b58] __mutex_lock_slowpath at ffffffff8168c995 #3 [ffff880078447bb8] mutex_lock at ffffffff8168bdef #4 [ffff880078447bd0] sr_block_ioctl at ffffffffa00b6818 [sr_mod] #5 [ffff880078447c10] blkdev_ioctl at ffffffff812fea50 #6 [ffff880078447c70] ioctl_by_bdev at ffffffff8123a8b3 #7 [ffff880078447c90] isofs_fill_super at ffffffffa04fb1e1 [isofs] #8 [ffff880078447da8] mount_bdev at ffffffff81202570 #9 [ffff880078447e18] isofs_mount at ffffffffa04f9828 [isofs] #10 [ffff880078447e28] mount_fs at ffffffff81202d09 #11 [ffff880078447e70] vfs_kern_mount at ffffffff8121ea8f #12 [ffff880078447ea8] do_mount at ffffffff81220fee #13 [ffff880078447f28] sys_mount at ffffffff812218d6 #14 [ffff880078447f80] system_call_fastpath at ffffffff81698c49 RIP: 00007fd9ea914e9a RSP: 00007ffd5d9bf648 RFLAGS: 00010246 RAX: 00000000000000a5 RBX: ffffffff81698c49 RCX: 0000000000000010 RDX: 00007fd9ec2bc210 RSI: 00007fd9ec2bc290 RDI: 00007fd9ec2bcf30 RBP: 0000000000000000 R8: 0000000000000000 R9: 0000000000000010 R10: 00000000c0ed0001 R11: 0000000000000206 R12: 00007fd9ec2bc040 R13: 00007fd9eb6b2380 R14: 00007fd9ec2bc210 R15: 00007fd9ec2bcf30 ORIG_RAX: 00000000000000a5 CS: 0033 SS: 002b This task was trying to mount the cdrom. It allocated and configured a super_block struct and owned the write-lock for the super_block->s_umount rwsem. While exclusively owning the s_umount lock, it called sr_block_ioctl and waited to acquire the global sr_mutex lock. PID: 6785 TASK: ffff880078720fb0 CPU: 0 COMMAND: "systemd-udevd" #0 [ffff880078417898] __schedule at ffffffff8168d605 #1 [ffff880078417900] schedule at ffffffff8168dc59 #2 [ffff880078417910] rwsem_down_read_failed at ffffffff8168f605 #3 [ffff880078417980] call_rwsem_down_read_failed at ffffffff81328838 #4 [ffff8800784179d0] down_read at ffffffff8168cde0 #5 [ffff8800784179e8] get_super at ffffffff81201cc7 #6 [ffff880078417a10] __invalidate_device at ffffffff8123a8de #7 [ffff880078417a40] flush_disk at ffffffff8123a94b #8 [ffff880078417a88] check_disk_change at ffffffff8123ab50 #9 [ffff880078417ab0] cdrom_open at ffffffffa00a29e1 [cdrom] #10 [ffff880078417b68] sr_block_open at ffffffffa00b6f9b [sr_mod] #11 [ffff880078417b98] __blkdev_get at ffffffff8123ba86 #12 [ffff880078417bf0] blkdev_get at ffffffff8123bd65 #13 [ffff880078417c78] blkdev_open at ffffffff8123bf9b #14 [ffff880078417c90] do_dentry_open at ffffffff811fc7f7 #15 [ffff880078417cd8] vfs_open at ffffffff811fc9cf #16 [ffff880078417d00] do_last at ffffffff8120d53d #17 [ffff880078417db0] path_openat at ffffffff8120e6b2 #18 [ffff880078417e48] do_filp_open at ffffffff8121082b #19 [ffff880078417f18] do_sys_open at ffffffff811fdd33 #20 [ffff880078417f70] sys_open at ffffffff811fde4e #21 [ffff880078417f80] system_call_fastpath at ffffffff81698c49 RIP: 00007f29438b0c20 RSP: 00007ffc76624b78 RFLAGS: 00010246 RAX: 0000000000000002 RBX: ffffffff81698c49 RCX: 0000000000000000 RDX: 00007f2944a5fa70 RSI: 00000000000a0800 RDI: 00007f2944a5fa70 RBP: 00007f2944a5f540 R8: 0000000000000000 R9: 0000000000000020 R10: 00007f2943614c40 R11: 0000000000000246 R12: ffffffff811fde4e R13: ffff880078417f78 R14: 000000000000000c R15: 00007f2944a4b010 ORIG_RAX: 0000000000000002 CS: 0033 SS: 002b This task tried to open the cdrom device, the sr_block_open function acquired the global sr_mutex lock. The call to check_disk_change() then saw an event flag indicating a possible media change and tried to flush any cached data for the device. As part of the flush, it tried to acquire the super_block->s_umount lock associated with the cdrom device. This was the same super_block as created and locked by the previous task. The first task acquires the s_umount lock and then the sr_mutex_lock; the second task acquires the sr_mutex_lock and then the s_umount lock. This patch fixes the issue by moving check_disk_change() out of cdrom_open() and let the caller take care of it. Signed-off-by: Maurizio Lombardi <[email protected]> Signed-off-by: Jens Axboe <[email protected]> Signed-off-by: Sasha Levin <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Obsolete issue |
Function ib_create_qp() was failing to return an error when rdma_rw_init_mrs() fails, causing a crash further down in ib_create_qp() when trying to dereferece the qp pointer which was actually a negative errno. The crash: crash> log|grep BUG [ 136.458121] BUG: unable to handle kernel NULL pointer dereference at 0000000000000098 crash> bt PID: 3736 TASK: ffff8808543215c0 CPU: 2 COMMAND: "kworker/u64:2" #0 [ffff88084d323340] machine_kexec at ffffffff8105fbb0 FireflyTeam#1 [ffff88084d3233b0] __crash_kexec at ffffffff81116758 FireflyTeam#2 [ffff88084d323480] crash_kexec at ffffffff8111682d FireflyTeam#3 [ffff88084d3234b0] oops_end at ffffffff81032bd6 FireflyTeam#4 [ffff88084d3234e0] no_context at ffffffff8106e431 FireflyTeam#5 [ffff88084d323530] __bad_area_nosemaphore at ffffffff8106e610 FireflyTeam#6 [ffff88084d323590] bad_area_nosemaphore at ffffffff8106e6f4 FireflyTeam#7 [ffff88084d3235a0] __do_page_fault at ffffffff8106ebdc FireflyTeam#8 [ffff88084d323620] do_page_fault at ffffffff8106f057 FireflyTeam#9 [ffff88084d323660] page_fault at ffffffff816e3148 [exception RIP: ib_create_qp+427] RIP: ffffffffa02554fb RSP: ffff88084d323718 RFLAGS: 00010246 RAX: 0000000000000004 RBX: fffffffffffffff4 RCX: 000000018020001f RDX: ffff880830997fc0 RSI: 0000000000000001 RDI: ffff88085f407200 RBP: ffff88084d323778 R8: 0000000000000001 R9: ffffea0020bae210 R10: ffffea0020bae218 R11: 0000000000000001 R12: ffff88084d3237c8 R13: 00000000fffffff4 R14: ffff880859fa5000 R15: ffff88082eb89800 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 FireflyTeam#10 [ffff88084d323780] rdma_create_qp at ffffffffa0782681 [rdma_cm] FireflyTeam#11 [ffff88084d3237b0] nvmet_rdma_create_queue_ib at ffffffffa07c43f3 [nvmet_rdma] FireflyTeam#12 [ffff88084d323860] nvmet_rdma_alloc_queue at ffffffffa07c5ba9 [nvmet_rdma] FireflyTeam#13 [ffff88084d323900] nvmet_rdma_queue_connect at ffffffffa07c5c96 [nvmet_rdma] FireflyTeam#14 [ffff88084d323980] nvmet_rdma_cm_handler at ffffffffa07c6450 [nvmet_rdma] FireflyTeam#15 [ffff88084d3239b0] iw_conn_req_handler at ffffffffa0787480 [rdma_cm] FireflyTeam#16 [ffff88084d323a60] cm_conn_req_handler at ffffffffa0775f06 [iw_cm] rockchip-linux#17 [ffff88084d323ab0] process_event at ffffffffa0776019 [iw_cm] rockchip-linux#18 [ffff88084d323af0] cm_work_handler at ffffffffa0776170 [iw_cm] rockchip-linux#19 [ffff88084d323cb0] process_one_work at ffffffff810a1483 rockchip-linux#20 [ffff88084d323d90] worker_thread at ffffffff810a211d rockchip-linux#21 [ffff88084d323ec0] kthread at ffffffff810a6c5c rockchip-linux#22 [ffff88084d323f50] ret_from_fork at ffffffff816e1ebf Fixes: 632bc3f ("IB/core, RDMA RW API: Do not exceed QP SGE send limit") Signed-off-by: Steve Wise <[email protected]> Cc: [email protected] Reviewed-by: Bart Van Assche <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
There have been several reports over the years of NULL pointer dereferences in xfs_trans_log_inode during xfs_fsr processes, when the process is doing an fput and tearing down extents on the temporary inode, something like: BUG: unable to handle kernel NULL pointer dereference at 0000000000000018 PID: 29439 TASK: ffff880550584fa0 CPU: 6 COMMAND: "xfs_fsr" [exception RIP: xfs_trans_log_inode+0x10] FireflyTeam#9 [ffff8800a57bbbe0] xfs_bunmapi at ffffffffa037398e [xfs] FireflyTeam#10 [ffff8800a57bbce8] xfs_itruncate_extents at ffffffffa0391b29 [xfs] FireflyTeam#11 [ffff8800a57bbd88] xfs_inactive_truncate at ffffffffa0391d0c [xfs] FireflyTeam#12 [ffff8800a57bbdb8] xfs_inactive at ffffffffa0392508 [xfs] FireflyTeam#13 [ffff8800a57bbdd8] xfs_fs_evict_inode at ffffffffa035907e [xfs] FireflyTeam#14 [ffff8800a57bbe00] evict at ffffffff811e1b67 FireflyTeam#15 [ffff8800a57bbe28] iput at ffffffff811e23a5 FireflyTeam#16 [ffff8800a57bbe58] dentry_kill at ffffffff811dcfc8 rockchip-linux#17 [ffff8800a57bbe88] dput at ffffffff811dd06c rockchip-linux#18 [ffff8800a57bbea8] __fput at ffffffff811c823b rockchip-linux#19 [ffff8800a57bbef0] ____fput at ffffffff811c846e rockchip-linux#20 [ffff8800a57bbf00] task_work_run at ffffffff81093b27 rockchip-linux#21 [ffff8800a57bbf30] do_notify_resume at ffffffff81013b0c rockchip-linux#22 [ffff8800a57bbf50] int_signal at ffffffff8161405d As it turns out, this is because the i_itemp pointer, along with the d_ops pointer, has been overwritten with zeros when we tear down the extents during truncate. When the in-core inode fork on the temporary inode used by xfs_fsr was originally set up during the extent swap, we mistakenly looked at di_nextents to determine whether all extents fit inline, but this misses extents generated by speculative preallocation; we should be using if_bytes instead. This mistake corrupts the in-memory inode, and code in xfs_iext_remove_inline eventually gets bad inputs, causing it to memmove and memset incorrect ranges; this became apparent because the two values in ifp->if_u2.if_inline_ext[1] contained what should have been in d_ops and i_itemp; they were memmoved due to incorrect array indexing and then the original locations were zeroed with memset, again due to an array overrun. Fix this by properly using i_df.if_bytes to determine the number of extents, not di_nextents. Thanks to dchinner for looking at this with me and spotting the root cause. Cc: [email protected] Signed-off-by: Eric Sandeen <[email protected]> Reviewed-by: Brian Foster <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
Remove spinlock and use the "rtc->ops_lock" from RTC subsystem instead. spin_lock_irqsave() is not needed here because we do not have hard IRQs. This patch fixes the following issue: root@GE004097290448 b850v3:~# hwclock --systohc root@GE004097290448 b850v3:~# hwclock --systohc root@GE004097290448 b850v3:~# hwclock --systohc root@GE004097290448 b850v3:~# hwclock --systohc root@GE004097290448 b850v3:~# hwclock --systohc [ 82.108175] BUG: spinlock wrong CPU on CPU#0, hwclock/855 [ 82.113660] lock: 0xedb4899c, .magic: dead4ead, .owner: hwclock/855, .owner_cpu: 1 [ 82.121329] CPU: 0 PID: 855 Comm: hwclock Not tainted 4.8.0-00042-g09d5410-dirty rockchip-linux#20 [ 82.129078] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree) [ 82.135609] Backtrace: [ 82.138090] [<8010d378>] (dump_backtrace) from [<8010d5c0>] (show_stack+0x20/0x24) [ 82.145664] r7:ec936000 r6:600a0013 r5:00000000 r4:81031680 [ 82.151402] [<8010d5a0>] (show_stack) from [<80401518>] (dump_stack+0xb4/0xe8) [ 82.158636] [<80401464>] (dump_stack) from [<8017b8b0>] (spin_dump+0x84/0xcc) [ 82.165775] r10:00000000 r9:ec936000 r8:81056090 r7:600a0013 r6:edb4899c r5:edb4899c [ 82.173691] r4:e5033e00 r3:00000000 [ 82.177308] [<8017b82c>] (spin_dump) from [<8017bcb0>] (do_raw_spin_unlock+0x108/0x130) [ 82.185314] r5:edb4899c r4:edb4899c [ 82.188938] [<8017bba8>] (do_raw_spin_unlock) from [<8094b93c>] (_raw_spin_unlock_irqrestore+0x34/0x54) [ 82.198333] r5:edb4899c r4:600a0013 [ 82.201953] [<8094b908>] (_raw_spin_unlock_irqrestore) from [<8065b090>] (rx8010_set_time+0x14c/0x188) [ 82.211261] r5:00000020 r4:edb48990 [ 82.214882] [<8065af44>] (rx8010_set_time) from [<80653fe4>] (rtc_set_time+0x70/0x104) [ 82.222801] r7:00000051 r6:edb39da0 r5:edb39c00 r4:ec937e8c [ 82.228535] [<80653f74>] (rtc_set_time) from [<80655774>] (rtc_dev_ioctl+0x3c4/0x674) [ 82.236368] r7:00000051 r6:7ecf1b74 r5:00000000 r4:edb39c00 [ 82.242106] [<806553b0>] (rtc_dev_ioctl) from [<80284034>] (do_vfs_ioctl+0xa4/0xa6c) [ 82.249851] r8:00000003 r7:80284a40 r6:ed1e9c80 r5:edb44e60 r4:7ecf1b74 [ 82.256642] [<80283f90>] (do_vfs_ioctl) from [<80284a40>] (SyS_ioctl+0x44/0x6c) [ 82.263953] r10:00000000 r9:ec936000 r8:7ecf1b74 r7:4024700a r6:ed1e9c80 r5:00000003 [ 82.271869] r4:ed1e9c80 [ 82.274432] [<802849fc>] (SyS_ioctl) from [<80108520>] (ret_fast_syscall+0x0/0x1c) [ 82.282005] r9:ec936000 r8:801086c4 r7:00000036 r6:00000000 r5:00000003 r4:0008e1bc root@GE004097290448 b850v3:~# Message from syslogd@GE004097290448 at Dec 3 11:17:08 ... kernel:[ 82.108175] BUG: spinlock wrong CPU on CPU#0, hwclock/855 Message from syslogd@GE004097290448 at Dec 3 11:17:08 ... kernel:[ 82.113660] lock: 0xedb4899c, .magic: dead4ead, .owner: hwclock/855, .owner_cpu: 1 hwclock --systohc root@GE004097290448 b850v3:~# Signed-off-by: Fabien Lahoudere <[email protected]> Signed-off-by: Alexandre Belloni <[email protected]>
As Eric Dumazet pointed out this also needs to be fixed in IPv6. v2: Contains the IPv6 tcp/Ipv6 dccp patches as well. We have seen a few incidents lately where a dst_enty has been freed with a dangling TCP socket reference (sk->sk_dst_cache) pointing to that dst_entry. If the conditions/timings are right a crash then ensues when the freed dst_entry is referenced later on. A Common crashing back trace is: FireflyTeam#8 [] page_fault at ffffffff8163e648 [exception RIP: __tcp_ack_snd_check+74] . . FireflyTeam#9 [] tcp_rcv_established at ffffffff81580b64 FireflyTeam#10 [] tcp_v4_do_rcv at ffffffff8158b54a FireflyTeam#11 [] tcp_v4_rcv at ffffffff8158cd02 FireflyTeam#12 [] ip_local_deliver_finish at ffffffff815668f4 FireflyTeam#13 [] ip_local_deliver at ffffffff81566bd9 FireflyTeam#14 [] ip_rcv_finish at ffffffff8156656d FireflyTeam#15 [] ip_rcv at ffffffff81566f06 FireflyTeam#16 [] __netif_receive_skb_core at ffffffff8152b3a2 rockchip-linux#17 [] __netif_receive_skb at ffffffff8152b608 rockchip-linux#18 [] netif_receive_skb at ffffffff8152b690 rockchip-linux#19 [] vmxnet3_rq_rx_complete at ffffffffa015eeaf [vmxnet3] rockchip-linux#20 [] vmxnet3_poll_rx_only at ffffffffa015f32a [vmxnet3] rockchip-linux#21 [] net_rx_action at ffffffff8152bac2 rockchip-linux#22 [] __do_softirq at ffffffff81084b4f rockchip-linux#23 [] call_softirq at ffffffff8164845c rockchip-linux#24 [] do_softirq at ffffffff81016fc5 rockchip-linux#25 [] irq_exit at ffffffff81084ee5 rockchip-linux#26 [] do_IRQ at ffffffff81648ff8 Of course it may happen with other NIC drivers as well. It's found the freed dst_entry here: 224 static bool tcp_in_quickack_mode(struct sock *sk)↩ 225 {↩ 226 ▹ const struct inet_connection_sock *icsk = inet_csk(sk);↩ 227 ▹ const struct dst_entry *dst = __sk_dst_get(sk);↩ 228 ↩ 229 ▹ return (dst && dst_metric(dst, RTAX_QUICKACK)) ||↩ 230 ▹ ▹ (icsk->icsk_ack.quick && !icsk->icsk_ack.pingpong);↩ 231 }↩ But there are other backtraces attributed to the same freed dst_entry in netfilter code as well. All the vmcores showed 2 significant clues: - Remote hosts behind the default gateway had always been redirected to a different gateway. A rtable/dst_entry will be added for that host. Making more dst_entrys with lower reference counts. Making this more probable. - All vmcores showed a postitive LockDroppedIcmps value, e.g: LockDroppedIcmps 267 A closer look at the tcp_v4_err() handler revealed that do_redirect() will run regardless of whether user space has the socket locked. This can result in a race condition where the same dst_entry cached in sk->sk_dst_entry can be decremented twice for the same socket via: do_redirect()->__sk_dst_check()-> dst_release(). Which leads to the dst_entry being prematurely freed with another socket pointing to it via sk->sk_dst_cache and a subsequent crash. To fix this skip do_redirect() if usespace has the socket locked. Instead let the redirect take place later when user space does not have the socket locked. The dccp/IPv6 code is very similar in this respect, so fixing it there too. As Eric Garver pointed out the following commit now invalidates routes. Which can set the dst->obsolete flag so that ipv4_dst_check() returns null and triggers the dst_release(). Fixes: ceb3320 ("ipv4: Kill routes during PMTU/redirect updates.") Cc: Eric Garver <[email protected]> Cc: Hannes Sowa <[email protected]> Signed-off-by: Jon Maxwell <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Commit 57e5568 ("sata_via: Implement hotplug for VT6421") adds hotplug IRQ handler for VT6421 but enables hotplug on all chips. This is a bug because it causes "irq xx: nobody cared" error on VT6420 when hot-(un)plugging a drive: [ 381.839948] irq 20: nobody cared (try booting with the "irqpoll" option) [ 381.840014] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.10.0-rc5+ rockchip-linux#148 [ 381.840066] Hardware name: P4VM800/P4VM800, BIOS P1.60 05/29/2006 [ 381.840117] Call Trace: [ 381.840167] <IRQ> [ 381.840225] ? dump_stack+0x44/0x58 [ 381.840278] ? __report_bad_irq+0x14/0x97 [ 381.840327] ? handle_edge_irq+0xa5/0xa5 [ 381.840376] ? note_interrupt+0x155/0x1cf [ 381.840426] ? handle_edge_irq+0xa5/0xa5 [ 381.840474] ? handle_irq_event_percpu+0x32/0x38 [ 381.840524] ? handle_irq_event+0x1f/0x38 [ 381.840573] ? handle_fasteoi_irq+0x69/0xb8 [ 381.840625] ? handle_irq+0x4f/0x5d [ 381.840672] </IRQ> [ 381.840726] ? do_IRQ+0x2e/0x8b [ 381.840782] ? common_interrupt+0x2c/0x34 [ 381.840836] ? mwait_idle+0x60/0x82 [ 381.840892] ? arch_cpu_idle+0x6/0x7 [ 381.840949] ? do_idle+0x96/0x18e [ 381.841002] ? cpu_startup_entry+0x16/0x1a [ 381.841057] ? start_kernel+0x319/0x31c [ 381.841111] ? startup_32_smp+0x166/0x168 [ 381.841165] handlers: [ 381.841219] [<c12a7263>] ata_bmdma_interrupt [ 381.841274] Disabling IRQ rockchip-linux#20 Seems that VT6420 can do hotplug too (there's no documentation) but the comments say that SCR register access (required for detecting hotplug events) can cause problems on these chips. For now, just keep hotplug disabled on anything other than VT6421. Signed-off-by: Ondrej Zary <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
cifs_relock_file() can perform a down_write() on the inode's lock_sem even though it was already performed in cifs_strict_readv(). Lockdep complains about this. AFAICS, there is no problem here, and lockdep just needs to be told that this nesting is OK. ============================================= [ INFO: possible recursive locking detected ] 4.11.0+ rockchip-linux#20 Not tainted --------------------------------------------- cat/701 is trying to acquire lock: (&cifsi->lock_sem){++++.+}, at: cifs_reopen_file+0x7a7/0xc00 but task is already holding lock: (&cifsi->lock_sem){++++.+}, at: cifs_strict_readv+0x177/0x310 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&cifsi->lock_sem); lock(&cifsi->lock_sem); *** DEADLOCK *** May be due to missing lock nesting notation 1 lock held by cat/701: #0: (&cifsi->lock_sem){++++.+}, at: cifs_strict_readv+0x177/0x310 stack backtrace: CPU: 0 PID: 701 Comm: cat Not tainted 4.11.0+ rockchip-linux#20 Call Trace: dump_stack+0x85/0xc2 __lock_acquire+0x17dd/0x2260 ? trace_hardirqs_on_thunk+0x1a/0x1c ? preempt_schedule_irq+0x6b/0x80 lock_acquire+0xcc/0x260 ? lock_acquire+0xcc/0x260 ? cifs_reopen_file+0x7a7/0xc00 down_read+0x2d/0x70 ? cifs_reopen_file+0x7a7/0xc00 cifs_reopen_file+0x7a7/0xc00 ? printk+0x43/0x4b cifs_readpage_worker+0x327/0x8a0 cifs_readpage+0x8c/0x2a0 generic_file_read_iter+0x692/0xd00 cifs_strict_readv+0x29f/0x310 generic_file_splice_read+0x11c/0x1c0 do_splice_to+0xa5/0xc0 splice_direct_to_actor+0xfa/0x350 ? generic_pipe_buf_nosteal+0x10/0x10 do_splice_direct+0xb5/0xe0 do_sendfile+0x278/0x3a0 SyS_sendfile64+0xc4/0xe0 entry_SYSCALL_64_fastpath+0x1f/0xbe Signed-off-by: Rabin Vincent <[email protected]> Acked-by: Pavel Shilovsky <[email protected]> Signed-off-by: Steve French <[email protected]>
This prevents a deadlock that somehow results from the suspend() -> forbid() -> resume() callchain. [ 125.266960] [drm] Initialized nouveau 1.3.1 20120801 for 0000:02:00.0 on minor 1 [ 370.120872] INFO: task kworker/4:1:77 blocked for more than 120 seconds. [ 370.120920] Tainted: G O 4.12.0-rc3 rockchip-linux#20 [ 370.120947] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 370.120982] kworker/4:1 D13808 77 2 0x00000000 [ 370.120998] Workqueue: pm pm_runtime_work [ 370.121004] Call Trace: [ 370.121018] __schedule+0x2bf/0xb40 [ 370.121025] ? mark_held_locks+0x5f/0x90 [ 370.121038] schedule+0x3d/0x90 [ 370.121044] rpm_resume+0x107/0x870 [ 370.121052] ? finish_wait+0x90/0x90 [ 370.121065] ? pci_pm_runtime_resume+0xa0/0xa0 [ 370.121070] pm_runtime_forbid+0x4c/0x60 [ 370.121129] nouveau_pmops_runtime_suspend+0xaf/0xc0 [nouveau] [ 370.121139] pci_pm_runtime_suspend+0x5f/0x170 [ 370.121147] ? pci_pm_runtime_resume+0xa0/0xa0 [ 370.121152] __rpm_callback+0xb9/0x1e0 [ 370.121159] ? pci_pm_runtime_resume+0xa0/0xa0 [ 370.121166] rpm_callback+0x24/0x80 [ 370.121171] ? pci_pm_runtime_resume+0xa0/0xa0 [ 370.121176] rpm_suspend+0x138/0x6e0 [ 370.121192] pm_runtime_work+0x7b/0xc0 [ 370.121199] process_one_work+0x253/0x6a0 [ 370.121216] worker_thread+0x4d/0x3b0 [ 370.121229] kthread+0x133/0x150 [ 370.121234] ? process_one_work+0x6a0/0x6a0 [ 370.121238] ? kthread_create_on_node+0x70/0x70 [ 370.121246] ret_from_fork+0x2a/0x40 [ 370.121283] Showing all locks held in the system: [ 370.121291] 2 locks held by kworker/4:1/77: [ 370.121298] #0: ("pm"){.+.+.+}, at: [<ffffffffac0d3530>] process_one_work+0x1d0/0x6a0 [ 370.121315] FireflyTeam#1: ((&dev->power.work)){+.+.+.}, at: [<ffffffffac0d3530>] process_one_work+0x1d0/0x6a0 [ 370.121330] 1 lock held by khungtaskd/81: [ 370.121333] #0: (tasklist_lock){.+.+..}, at: [<ffffffffac10fc8d>] debug_show_all_locks+0x3d/0x1a0 [ 370.121355] 1 lock held by dmesg/1639: [ 370.121358] #0: (&user->lock){+.+.+.}, at: [<ffffffffac124b6d>] devkmsg_read+0x4d/0x360 [ 370.121377] ============================================= Signed-off-by: Ben Skeggs <[email protected]>
This patch fixes a generate_node_acls = 1 + cache_dynamic_acls = 0 regression, that was introduced by commit 01d4d67 Author: Nicholas Bellinger <[email protected]> Date: Wed Dec 7 12:55:54 2016 -0800 which originally had the proper list_del_init() usage, but was dropped during list review as it was thought unnecessary by HCH. However, list_del_init() usage is required during the special generate_node_acls = 1 + cache_dynamic_acls = 0 case when transport_free_session() does a list_del(&se_nacl->acl_list), followed by target_complete_nacl() doing the same thing. This was manifesting as a general protection fault as reported by Justin: kernel: general protection fault: 0000 [FireflyTeam#1] SMP kernel: Modules linked in: kernel: CPU: 0 PID: 11047 Comm: iscsi_ttx Not tainted 4.13.0-rc2.x86_64.1+ rockchip-linux#20 kernel: Hardware name: Intel Corporation S5500BC/S5500BC, BIOS S5500.86B.01.00.0064.050520141428 05/05/2014 kernel: task: ffff88026939e800 task.stack: ffffc90007884000 kernel: RIP: 0010:target_put_nacl+0x49/0xb0 kernel: RSP: 0018:ffffc90007887d70 EFLAGS: 00010246 kernel: RAX: dead000000000200 RBX: ffff8802556ca000 RCX: 0000000000000000 kernel: RDX: dead000000000100 RSI: 0000000000000246 RDI: ffff8802556ce028 kernel: RBP: ffffc90007887d88 R08: 0000000000000001 R09: 0000000000000000 kernel: R10: ffffc90007887df8 R11: ffffea0009986900 R12: ffff8802556ce020 kernel: R13: ffff8802556ce028 R14: ffff8802556ce028 R15: ffffffff88d85540 kernel: FS: 0000000000000000(0000) GS:ffff88027fc00000(0000) knlGS:0000000000000000 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 kernel: CR2: 00007fffe36f5f94 CR3: 0000000009209000 CR4: 00000000003406f0 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 kernel: Call Trace: kernel: transport_free_session+0x67/0x140 kernel: transport_deregister_session+0x7a/0xc0 kernel: iscsit_close_session+0x92/0x210 kernel: iscsit_close_connection+0x5f9/0x840 kernel: iscsit_take_action_for_connection_exit+0xfe/0x110 kernel: iscsi_target_tx_thread+0x140/0x1e0 kernel: ? wait_woken+0x90/0x90 kernel: kthread+0x124/0x160 kernel: ? iscsit_thread_get_cpumask+0x90/0x90 kernel: ? kthread_create_on_node+0x40/0x40 kernel: ret_from_fork+0x22/0x30 kernel: Code: 00 48 89 fb 4c 8b a7 48 01 00 00 74 68 4d 8d 6c 24 08 4c 89 ef e8 e8 28 43 00 48 8b 93 20 04 00 00 48 8b 83 28 04 00 00 4c 89 ef <48> 89 42 08 48 89 10 48 b8 00 01 00 00 00 00 ad de 48 89 83 20 kernel: RIP: target_put_nacl+0x49/0xb0 RSP: ffffc90007887d70 kernel: ---[ end trace f12821adbfd46fed ]--- To address this, go ahead and use proper list_del_list() for all cases of se_nacl->acl_list deletion. Reported-by: Justin Maggard <[email protected]> Tested-by: Justin Maggard <[email protected]> Cc: Justin Maggard <[email protected]> Cc: [email protected] # 4.1+ Signed-off-by: Nicholas Bellinger <[email protected]>
syzkaller reported a refcount_t warning [1] Issue here is that noop_qdisc refcnt was never really considered as a true refcount, since qdisc_destroy() found TCQ_F_BUILTIN set : if (qdisc->flags & TCQ_F_BUILTIN || !refcount_dec_and_test(&qdisc->refcnt))) return; Meaning that all atomic_inc() we did on noop_qdisc.refcnt were not really needed, but harmless until refcount_t came. To fix this problem, we simply need to not increment noop_qdisc.refcnt, since we never decrement it. [1] refcount_t: increment on 0; use-after-free. ------------[ cut here ]------------ WARNING: CPU: 0 PID: 21754 at lib/refcount.c:152 refcount_inc+0x47/0x50 lib/refcount.c:152 Kernel panic - not syncing: panic_on_warn set ... CPU: 0 PID: 21754 Comm: syz-executor7 Not tainted 4.13.0-rc6+ rockchip-linux#20 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:16 [inline] dump_stack+0x194/0x257 lib/dump_stack.c:52 panic+0x1e4/0x417 kernel/panic.c:180 __warn+0x1c4/0x1d9 kernel/panic.c:541 report_bug+0x211/0x2d0 lib/bug.c:183 fixup_bug+0x40/0x90 arch/x86/kernel/traps.c:190 do_trap_no_signal arch/x86/kernel/traps.c:224 [inline] do_trap+0x260/0x390 arch/x86/kernel/traps.c:273 do_error_trap+0x120/0x390 arch/x86/kernel/traps.c:310 do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:323 invalid_op+0x1e/0x30 arch/x86/entry/entry_64.S:846 RIP: 0010:refcount_inc+0x47/0x50 lib/refcount.c:152 RSP: 0018:ffff8801c43477a0 EFLAGS: 00010282 RAX: 000000000000002b RBX: ffffffff86093c14 RCX: 0000000000000000 RDX: 000000000000002b RSI: ffffffff8159314e RDI: ffffed0038868ee8 RBP: ffff8801c43477a8 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff86093ac0 R13: 0000000000000001 R14: ffff8801d0f3bac0 R15: dffffc0000000000 attach_default_qdiscs net/sched/sch_generic.c:792 [inline] dev_activate+0x7d3/0xaa0 net/sched/sch_generic.c:833 __dev_open+0x227/0x330 net/core/dev.c:1380 __dev_change_flags+0x695/0x990 net/core/dev.c:6726 dev_change_flags+0x88/0x140 net/core/dev.c:6792 dev_ifsioc+0x5a6/0x930 net/core/dev_ioctl.c:256 dev_ioctl+0x2bc/0xf90 net/core/dev_ioctl.c:554 sock_do_ioctl+0x94/0xb0 net/socket.c:968 sock_ioctl+0x2c2/0x440 net/socket.c:1058 vfs_ioctl fs/ioctl.c:45 [inline] do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:685 SYSC_ioctl fs/ioctl.c:700 [inline] SyS_ioctl+0x8f/0xc0 fs/ioctl.c:691 Fixes: 7b93640 ("net, sched: convert Qdisc.refcnt from atomic_t to refcount_t") Signed-off-by: Eric Dumazet <[email protected]> Reported-by: Dmitry Vyukov <[email protected]> Cc: Reshetova, Elena <[email protected]> Signed-off-by: David S. Miller <[email protected]>
According to the kerneldoc[0], should do fbdev setup before calling drm_kms_helper_poll_init(), otherwise, Kms hotplug event may race into fbdev helper initial, and fb_helper->dev may be NULL pointer, that would cause the bug: [ 0.735411] [00000200] *pgd=00000000f6ffe003, *pud=00000000f6ffe003, *pmd=0000000000000000 [ 0.736156] Internal error: Oops: 96000005 [FireflyTeam#1] PREEMPT SMP [ 0.736648] Modules linked in: [ 0.736930] CPU: 2 PID: 20 Comm: kworker/2:0 Not tainted 4.4.41 rockchip-linux#20 [ 0.737480] Hardware name: Rockchip RK3399 Board rev2 (BOX) (DT) [ 0.738020] Workqueue: events cdn_dp_pd_event_work [ 0.738447] task: ffffffc0f21f3100 ti: ffffffc0f2218000 task.ti: ffffffc0f2218000 [ 0.739109] PC is at mutex_lock+0x14/0x44 [ 0.739469] LR is at drm_fb_helper_hotplug_event+0x30/0x114 [ 0.756253] [<ffffff8008a344f4>] mutex_lock+0x14/0x44 [ 0.756260] [<ffffff8008445708>] drm_fb_helper_hotplug_event+0x30/0x114 [ 0.756271] [<ffffff8008473c84>] rockchip_drm_output_poll_changed+0x18/0x20 [ 0.756280] [<ffffff8008439fcc>] drm_kms_helper_hotplug_event+0x28/0x34 [ 0.756286] [<ffffff800846c444>] cdn_dp_pd_event_work+0x394/0x3c4 [ 0.756295] [<ffffff80080b2b38>] process_one_work+0x218/0x3e0 [ 0.756302] [<ffffff80080b3538>] worker_thread+0x2e8/0x404 [ 0.756308] [<ffffff80080b7e70>] kthread+0xe8/0xf0 [ 0.756316] [<ffffff8008082690>] ret_from_fork+0x10/0x40 [0]: https://01.org/linuxgraphics/gfx-docs/drm/gpu/drm-kms-helpers.html Signed-off-by: Mark Yao <[email protected]> Reviewed-by: Sandy huang <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
…-text symbols" This reverts commit 83e840c ("powerpc64/elfv1: Only dereference function descriptor for non-text symbols"). Chandan reported that on newer kernels, trying to enable function_graph tracer on ppc64 (BE) locks up the system with the following trace: Unable to handle kernel paging request for data at address 0x600000002fa30010 Faulting instruction address: 0xc0000000001f1300 Thread overran stack, or stack corrupted Oops: Kernel access of bad area, sig: 11 [FireflyTeam#1] BE SMP NR_CPUS=2048 DEBUG_PAGEALLOC NUMA pSeries Modules linked in: CPU: 1 PID: 6586 Comm: bash Not tainted 4.14.0-rc3-00162-g6e51f1f-dirty rockchip-linux#20 task: c000000625c07200 task.stack: c000000625c07310 NIP: c0000000001f1300 LR: c000000000121cac CTR: c000000000061af8 REGS: c000000625c088c0 TRAP: 0380 Not tainted (4.14.0-rc3-00162-g6e51f1f-dirty) MSR: 8000000000001032 <SF,ME,IR,DR,RI> CR: 28002848 XER: 00000000 CFAR: c0000000001f1320 SOFTE: 0 ... NIP [c0000000001f1300] .__is_insn_slot_addr+0x30/0x90 LR [c000000000121cac] .kernel_text_address+0x18c/0x1c0 Call Trace: [c000000625c08b40] [c0000000001bd040] .is_module_text_address+0x20/0x40 (unreliable) [c000000625c08bc0] [c000000000121cac] .kernel_text_address+0x18c/0x1c0 [c000000625c08c50] [c000000000061960] .prepare_ftrace_return+0x50/0x130 [c000000625c08cf0] [c000000000061b10] .ftrace_graph_caller+0x14/0x34 [c000000625c08d60] [c000000000121b40] .kernel_text_address+0x20/0x1c0 [c000000625c08df0] [c000000000061960] .prepare_ftrace_return+0x50/0x130 ... [c000000625c0ab30] [c000000000061960] .prepare_ftrace_return+0x50/0x130 [c000000625c0abd0] [c000000000061b10] .ftrace_graph_caller+0x14/0x34 [c000000625c0ac40] [c000000000121b40] .kernel_text_address+0x20/0x1c0 [c000000625c0acd0] [c000000000061960] .prepare_ftrace_return+0x50/0x130 [c000000625c0ad70] [c000000000061b10] .ftrace_graph_caller+0x14/0x34 [c000000625c0ade0] [c000000000121b40] .kernel_text_address+0x20/0x1c0 This is because ftrace is using ppc_function_entry() for obtaining the address of return_to_handler() in prepare_ftrace_return(). The call to kernel_text_address() itself gets traced and we end up in a recursive loop. Fixes: 83e840c ("powerpc64/elfv1: Only dereference function descriptor for non-text symbols") Cc: [email protected] # v4.13+ Reported-by: Chandan Rajendra <[email protected]> Signed-off-by: Naveen N. Rao <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
When a GSO skb of truesize O is segmented into 2 new skbs of truesize N1 and N2, we want to transfer socket ownership to the new fresh skbs. In order to avoid expensive atomic operations on a cache line subject to cache bouncing, we replace the sequence : refcount_add(N1, &sk->sk_wmem_alloc); refcount_add(N2, &sk->sk_wmem_alloc); // repeated by number of segments refcount_sub(O, &sk->sk_wmem_alloc); by a single refcount_add(sum_of(N) - O, &sk->sk_wmem_alloc); Problem is : In some pathological cases, sum(N) - O might be a negative number, and syzkaller bot was apparently able to trigger this trace [1] atomic_t was ok with this construct, but we need to take care of the negative delta with refcount_t [1] refcount_t: saturated; leaking memory. ------------[ cut here ]------------ WARNING: CPU: 0 PID: 8404 at lib/refcount.c:77 refcount_add_not_zero+0x198/0x200 lib/refcount.c:77 Kernel panic - not syncing: panic_on_warn set ... CPU: 0 PID: 8404 Comm: syz-executor2 Not tainted 4.14.0-rc5-mm1+ rockchip-linux#20 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:16 [inline] dump_stack+0x194/0x257 lib/dump_stack.c:52 panic+0x1e4/0x41c kernel/panic.c:183 __warn+0x1c4/0x1e0 kernel/panic.c:546 report_bug+0x211/0x2d0 lib/bug.c:183 fixup_bug+0x40/0x90 arch/x86/kernel/traps.c:177 do_trap_no_signal arch/x86/kernel/traps.c:211 [inline] do_trap+0x260/0x390 arch/x86/kernel/traps.c:260 do_error_trap+0x120/0x390 arch/x86/kernel/traps.c:297 do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:310 invalid_op+0x18/0x20 arch/x86/entry/entry_64.S:905 RIP: 0010:refcount_add_not_zero+0x198/0x200 lib/refcount.c:77 RSP: 0018:ffff8801c606e3a0 EFLAGS: 00010282 RAX: 0000000000000026 RBX: 0000000000001401 RCX: 0000000000000000 RDX: 0000000000000026 RSI: ffffc900036fc000 RDI: ffffed0038c0dc68 RBP: ffff8801c606e430 R08: 0000000000000001 R09: 0000000000000000 R10: ffff8801d97f5eba R11: 0000000000000000 R12: ffff8801d5acf73c R13: 1ffff10038c0dc75 R14: 00000000ffffffff R15: 00000000fffff72f refcount_add+0x1b/0x60 lib/refcount.c:101 tcp_gso_segment+0x10d0/0x16b0 net/ipv4/tcp_offload.c:155 tcp4_gso_segment+0xd4/0x310 net/ipv4/tcp_offload.c:51 inet_gso_segment+0x60c/0x11c0 net/ipv4/af_inet.c:1271 skb_mac_gso_segment+0x33f/0x660 net/core/dev.c:2749 __skb_gso_segment+0x35f/0x7f0 net/core/dev.c:2821 skb_gso_segment include/linux/netdevice.h:3971 [inline] validate_xmit_skb+0x4ba/0xb20 net/core/dev.c:3074 __dev_queue_xmit+0xe49/0x2070 net/core/dev.c:3497 dev_queue_xmit+0x17/0x20 net/core/dev.c:3538 neigh_hh_output include/net/neighbour.h:471 [inline] neigh_output include/net/neighbour.h:479 [inline] ip_finish_output2+0xece/0x1460 net/ipv4/ip_output.c:229 ip_finish_output+0x85e/0xd10 net/ipv4/ip_output.c:317 NF_HOOK_COND include/linux/netfilter.h:238 [inline] ip_output+0x1cc/0x860 net/ipv4/ip_output.c:405 dst_output include/net/dst.h:459 [inline] ip_local_out+0x95/0x160 net/ipv4/ip_output.c:124 ip_queue_xmit+0x8c6/0x18e0 net/ipv4/ip_output.c:504 tcp_transmit_skb+0x1ab7/0x3840 net/ipv4/tcp_output.c:1137 tcp_write_xmit+0x663/0x4de0 net/ipv4/tcp_output.c:2341 __tcp_push_pending_frames+0xa0/0x250 net/ipv4/tcp_output.c:2513 tcp_push_pending_frames include/net/tcp.h:1722 [inline] tcp_data_snd_check net/ipv4/tcp_input.c:5050 [inline] tcp_rcv_established+0x8c7/0x18a0 net/ipv4/tcp_input.c:5497 tcp_v4_do_rcv+0x2ab/0x7d0 net/ipv4/tcp_ipv4.c:1460 sk_backlog_rcv include/net/sock.h:909 [inline] __release_sock+0x124/0x360 net/core/sock.c:2264 release_sock+0xa4/0x2a0 net/core/sock.c:2776 tcp_sendmsg+0x3a/0x50 net/ipv4/tcp.c:1462 inet_sendmsg+0x11f/0x5e0 net/ipv4/af_inet.c:763 sock_sendmsg_nosec net/socket.c:632 [inline] sock_sendmsg+0xca/0x110 net/socket.c:642 ___sys_sendmsg+0x31c/0x890 net/socket.c:2048 __sys_sendmmsg+0x1e6/0x5f0 net/socket.c:2138 Fixes: 14afee4 ("net: convert sock.sk_wmem_alloc from atomic_t to refcount_t") Signed-off-by: Eric Dumazet <[email protected]> Reported-by: syzbot <[email protected]> Signed-off-by: David S. Miller <[email protected]>
commit 46cc0b4 upstream. Current snapshot implementation swaps two ring_buffers even though their sizes are different from each other, that can cause an inconsistency between the contents of buffer_size_kb file and the current buffer size. For example: # cat buffer_size_kb 7 (expanded: 1408) # echo 1 > events/enable # grep bytes per_cpu/cpu0/stats bytes: 1441020 # echo 1 > snapshot // current:1408, spare:1408 # echo 123 > buffer_size_kb // current:123, spare:1408 # echo 1 > snapshot // current:1408, spare:123 # grep bytes per_cpu/cpu0/stats bytes: 1443700 # cat buffer_size_kb 123 // != current:1408 And also, a similar per-cpu case hits the following WARNING: Reproducer: # echo 1 > per_cpu/cpu0/snapshot # echo 123 > buffer_size_kb # echo 1 > per_cpu/cpu0/snapshot WARNING: WARNING: CPU: 0 PID: 1946 at kernel/trace/trace.c:1607 update_max_tr_single.part.0+0x2b8/0x380 Modules linked in: CPU: 0 PID: 1946 Comm: bash Not tainted 5.2.0-rc6 #20 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-2.fc30 04/01/2014 RIP: 0010:update_max_tr_single.part.0+0x2b8/0x380 Code: ff e8 dc da f9 ff 0f 0b e9 88 fe ff ff e8 d0 da f9 ff 44 89 ee bf f5 ff ff ff e8 33 dc f9 ff 41 83 fd f5 74 96 e8 b8 da f9 ff <0f> 0b eb 8d e8 af da f9 ff 0f 0b e9 bf fd ff ff e8 a3 da f9 ff 48 RSP: 0018:ffff888063e4fca0 EFLAGS: 00010093 RAX: ffff888066214380 RBX: ffffffff99850fe0 RCX: ffffffff964298a8 RDX: 0000000000000000 RSI: 00000000fffffff5 RDI: 0000000000000005 RBP: 1ffff1100c7c9f96 R08: ffff888066214380 R09: ffffed100c7c9f9b R10: ffffed100c7c9f9a R11: 0000000000000003 R12: 0000000000000000 R13: 00000000ffffffea R14: ffff888066214380 R15: ffffffff99851060 FS: 00007f9f8173c700(0000) GS:ffff88806d000000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000714dc0 CR3: 0000000066fa6000 CR4: 00000000000006f0 Call Trace: ? trace_array_printk_buf+0x140/0x140 ? __mutex_lock_slowpath+0x10/0x10 tracing_snapshot_write+0x4c8/0x7f0 ? trace_printk_init_buffers+0x60/0x60 ? selinux_file_permission+0x3b/0x540 ? tracer_preempt_off+0x38/0x506 ? trace_printk_init_buffers+0x60/0x60 __vfs_write+0x81/0x100 vfs_write+0x1e1/0x560 ksys_write+0x126/0x250 ? __ia32_sys_read+0xb0/0xb0 ? do_syscall_64+0x1f/0x390 do_syscall_64+0xc1/0x390 entry_SYSCALL_64_after_hwframe+0x49/0xbe This patch adds resize_buffer_duplicate_size() to check if there is a difference between current/spare buffer sizes and resize a spare buffer if necessary. Link: http://lkml.kernel.org/r/[email protected] Cc: [email protected] Fixes: ad909e2 ("tracing: Add internal tracing_snapshot() functions") Signed-off-by: Eiichi Tsukata <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]> Signed-off-by: Nobuhiro Iwamatsu <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
When RCU stall warning triggers, it can print out a lot of messages while holding spinlocks. If the console device is slow (e.g. an actual or IPMI serial console), it may end up triggering NMI hard lockup watchdog like the following. *** CPU printking while holding RCU spinlock PID: 4149739 TASK: ffff881a46baa880 CPU: 13 COMMAND: "CPUThreadPool8" #0 [ffff881fff945e48] crash_nmi_callback at ffffffff8103f7d0 FireflyTeam#1 [ffff881fff945e58] nmi_handle at ffffffff81020653 FireflyTeam#2 [ffff881fff945eb0] default_do_nmi at ffffffff81020c36 FireflyTeam#3 [ffff881fff945ed0] do_nmi at ffffffff81020d32 FireflyTeam#4 [ffff881fff945ef0] end_repeat_nmi at ffffffff81956a7e [exception RIP: io_serial_in+21] RIP: ffffffff81630e55 RSP: ffff881fff943b88 RFLAGS: 00000002 RAX: 000000000000ca00 RBX: ffffffff8230e188 RCX: 0000000000000000 RDX: 00000000000002fd RSI: 0000000000000005 RDI: ffffffff8230e188 RBP: ffff881fff943bb0 R8: 0000000000000000 R9: ffffffff820cb3c4 R10: 0000000000000019 R11: 0000000000002000 R12: 00000000000026e1 R13: 0000000000000020 R14: ffffffff820cd398 R15: 0000000000000035 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0000 --- <NMI exception stack> --- FireflyTeam#5 [ffff881fff943b88] io_serial_in at ffffffff81630e55 FireflyTeam#6 [ffff881fff943b90] wait_for_xmitr at ffffffff8163175c FireflyTeam#7 [ffff881fff943bb8] serial8250_console_putchar at ffffffff816317dc FireflyTeam#8 [ffff881fff943bd8] uart_console_write at ffffffff8162ac00 FireflyTeam#9 [ffff881fff943c08] serial8250_console_write at ffffffff81634691 FireflyTeam#10 [ffff881fff943c80] univ8250_console_write at ffffffff8162f7c2 FireflyTeam#11 [ffff881fff943c90] console_unlock at ffffffff810dfc55 FireflyTeam#12 [ffff881fff943cf0] vprintk_emit at ffffffff810dffb5 FireflyTeam#13 [ffff881fff943d50] vprintk_default at ffffffff810e01bf FireflyTeam#14 [ffff881fff943d60] vprintk_func at ffffffff810e1127 FireflyTeam#15 [ffff881fff943d70] printk at ffffffff8119a8a4 FireflyTeam#16 [ffff881fff943dd0] print_cpu_stall_info at ffffffff810eb78c rockchip-linux#17 [ffff881fff943e88] rcu_check_callbacks at ffffffff810ef133 rockchip-linux#18 [ffff881fff943ee8] update_process_times at ffffffff810f3497 rockchip-linux#19 [ffff881fff943f10] tick_sched_timer at ffffffff81103037 rockchip-linux#20 [ffff881fff943f38] __hrtimer_run_queues at ffffffff810f3f38 rockchip-linux#21 [ffff881fff943f88] hrtimer_interrupt at ffffffff810f442b *** CPU triggering the hardlockup watchdog PID: 4149709 TASK: ffff88010f88c380 CPU: 26 COMMAND: "CPUThreadPool35" #0 [ffff883fff1059d0] machine_kexec at ffffffff8104a874 FireflyTeam#1 [ffff883fff105a30] __crash_kexec at ffffffff811116cc FireflyTeam#2 [ffff883fff105af0] __crash_kexec at ffffffff81111795 FireflyTeam#3 [ffff883fff105b08] panic at ffffffff8119a6ae FireflyTeam#4 [ffff883fff105b98] watchdog_overflow_callback at ffffffff81135dbd FireflyTeam#5 [ffff883fff105bb0] __perf_event_overflow at ffffffff81186866 FireflyTeam#6 [ffff883fff105be8] perf_event_overflow at ffffffff81192bc4 FireflyTeam#7 [ffff883fff105bf8] intel_pmu_handle_irq at ffffffff8100b265 FireflyTeam#8 [ffff883fff105df8] perf_event_nmi_handler at ffffffff8100489f FireflyTeam#9 [ffff883fff105e58] nmi_handle at ffffffff81020653 FireflyTeam#10 [ffff883fff105eb0] default_do_nmi at ffffffff81020b94 FireflyTeam#11 [ffff883fff105ed0] do_nmi at ffffffff81020d32 FireflyTeam#12 [ffff883fff105ef0] end_repeat_nmi at ffffffff81956a7e [exception RIP: queued_spin_lock_slowpath+248] RIP: ffffffff810da958 RSP: ffff883fff103e68 RFLAGS: 00000046 RAX: 0000000000000000 RBX: 0000000000000046 RCX: 00000000006d0000 RDX: ffff883fff49a950 RSI: 0000000000d10101 RDI: ffffffff81e54300 RBP: ffff883fff103e80 R8: ffff883fff11a950 R9: 0000000000000000 R10: 000000000e5873ba R11: 000000000000010f R12: ffffffff81e54300 R13: 0000000000000000 R14: ffff88010f88c380 R15: ffffffff81e54300 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- FireflyTeam#13 [ffff883fff103e68] queued_spin_lock_slowpath at ffffffff810da958 FireflyTeam#14 [ffff883fff103e70] _raw_spin_lock_irqsave at ffffffff8195550b FireflyTeam#15 [ffff883fff103e88] rcu_check_callbacks at ffffffff810eed18 FireflyTeam#16 [ffff883fff103ee8] update_process_times at ffffffff810f3497 rockchip-linux#17 [ffff883fff103f10] tick_sched_timer at ffffffff81103037 rockchip-linux#18 [ffff883fff103f38] __hrtimer_run_queues at ffffffff810f3f38 rockchip-linux#19 [ffff883fff103f88] hrtimer_interrupt at ffffffff810f442b --- <IRQ stack> --- Avoid spuriously triggering NMI hardlockup watchdog by touching it from the print functions. show_state_filter() shares the same problem and solution. v2: Relocate the comment to where it belongs. Signed-off-by: Tejun Heo <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
when mounting an ISO filesystem sometimes (very rarely) the system hangs because of a race condition between two tasks. PID: 6766 TASK: ffff88007b2a6dd0 CPU: 0 COMMAND: "mount" #0 [ffff880078447ae0] __schedule at ffffffff8168d605 FireflyTeam#1 [ffff880078447b48] schedule_preempt_disabled at ffffffff8168ed49 FireflyTeam#2 [ffff880078447b58] __mutex_lock_slowpath at ffffffff8168c995 FireflyTeam#3 [ffff880078447bb8] mutex_lock at ffffffff8168bdef FireflyTeam#4 [ffff880078447bd0] sr_block_ioctl at ffffffffa00b6818 [sr_mod] FireflyTeam#5 [ffff880078447c10] blkdev_ioctl at ffffffff812fea50 FireflyTeam#6 [ffff880078447c70] ioctl_by_bdev at ffffffff8123a8b3 FireflyTeam#7 [ffff880078447c90] isofs_fill_super at ffffffffa04fb1e1 [isofs] FireflyTeam#8 [ffff880078447da8] mount_bdev at ffffffff81202570 FireflyTeam#9 [ffff880078447e18] isofs_mount at ffffffffa04f9828 [isofs] FireflyTeam#10 [ffff880078447e28] mount_fs at ffffffff81202d09 FireflyTeam#11 [ffff880078447e70] vfs_kern_mount at ffffffff8121ea8f FireflyTeam#12 [ffff880078447ea8] do_mount at ffffffff81220fee FireflyTeam#13 [ffff880078447f28] sys_mount at ffffffff812218d6 FireflyTeam#14 [ffff880078447f80] system_call_fastpath at ffffffff81698c49 RIP: 00007fd9ea914e9a RSP: 00007ffd5d9bf648 RFLAGS: 00010246 RAX: 00000000000000a5 RBX: ffffffff81698c49 RCX: 0000000000000010 RDX: 00007fd9ec2bc210 RSI: 00007fd9ec2bc290 RDI: 00007fd9ec2bcf30 RBP: 0000000000000000 R8: 0000000000000000 R9: 0000000000000010 R10: 00000000c0ed0001 R11: 0000000000000206 R12: 00007fd9ec2bc040 R13: 00007fd9eb6b2380 R14: 00007fd9ec2bc210 R15: 00007fd9ec2bcf30 ORIG_RAX: 00000000000000a5 CS: 0033 SS: 002b This task was trying to mount the cdrom. It allocated and configured a super_block struct and owned the write-lock for the super_block->s_umount rwsem. While exclusively owning the s_umount lock, it called sr_block_ioctl and waited to acquire the global sr_mutex lock. PID: 6785 TASK: ffff880078720fb0 CPU: 0 COMMAND: "systemd-udevd" #0 [ffff880078417898] __schedule at ffffffff8168d605 FireflyTeam#1 [ffff880078417900] schedule at ffffffff8168dc59 FireflyTeam#2 [ffff880078417910] rwsem_down_read_failed at ffffffff8168f605 FireflyTeam#3 [ffff880078417980] call_rwsem_down_read_failed at ffffffff81328838 FireflyTeam#4 [ffff8800784179d0] down_read at ffffffff8168cde0 FireflyTeam#5 [ffff8800784179e8] get_super at ffffffff81201cc7 FireflyTeam#6 [ffff880078417a10] __invalidate_device at ffffffff8123a8de FireflyTeam#7 [ffff880078417a40] flush_disk at ffffffff8123a94b FireflyTeam#8 [ffff880078417a88] check_disk_change at ffffffff8123ab50 FireflyTeam#9 [ffff880078417ab0] cdrom_open at ffffffffa00a29e1 [cdrom] FireflyTeam#10 [ffff880078417b68] sr_block_open at ffffffffa00b6f9b [sr_mod] FireflyTeam#11 [ffff880078417b98] __blkdev_get at ffffffff8123ba86 FireflyTeam#12 [ffff880078417bf0] blkdev_get at ffffffff8123bd65 FireflyTeam#13 [ffff880078417c78] blkdev_open at ffffffff8123bf9b FireflyTeam#14 [ffff880078417c90] do_dentry_open at ffffffff811fc7f7 FireflyTeam#15 [ffff880078417cd8] vfs_open at ffffffff811fc9cf FireflyTeam#16 [ffff880078417d00] do_last at ffffffff8120d53d rockchip-linux#17 [ffff880078417db0] path_openat at ffffffff8120e6b2 rockchip-linux#18 [ffff880078417e48] do_filp_open at ffffffff8121082b rockchip-linux#19 [ffff880078417f18] do_sys_open at ffffffff811fdd33 rockchip-linux#20 [ffff880078417f70] sys_open at ffffffff811fde4e rockchip-linux#21 [ffff880078417f80] system_call_fastpath at ffffffff81698c49 RIP: 00007f29438b0c20 RSP: 00007ffc76624b78 RFLAGS: 00010246 RAX: 0000000000000002 RBX: ffffffff81698c49 RCX: 0000000000000000 RDX: 00007f2944a5fa70 RSI: 00000000000a0800 RDI: 00007f2944a5fa70 RBP: 00007f2944a5f540 R8: 0000000000000000 R9: 0000000000000020 R10: 00007f2943614c40 R11: 0000000000000246 R12: ffffffff811fde4e R13: ffff880078417f78 R14: 000000000000000c R15: 00007f2944a4b010 ORIG_RAX: 0000000000000002 CS: 0033 SS: 002b This task tried to open the cdrom device, the sr_block_open function acquired the global sr_mutex lock. The call to check_disk_change() then saw an event flag indicating a possible media change and tried to flush any cached data for the device. As part of the flush, it tried to acquire the super_block->s_umount lock associated with the cdrom device. This was the same super_block as created and locked by the previous task. The first task acquires the s_umount lock and then the sr_mutex_lock; the second task acquires the sr_mutex_lock and then the s_umount lock. This patch fixes the issue by moving check_disk_change() out of cdrom_open() and let the caller take care of it. Signed-off-by: Maurizio Lombardi <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
commit 46cc0b4 upstream. Current snapshot implementation swaps two ring_buffers even though their sizes are different from each other, that can cause an inconsistency between the contents of buffer_size_kb file and the current buffer size. For example: # cat buffer_size_kb 7 (expanded: 1408) # echo 1 > events/enable # grep bytes per_cpu/cpu0/stats bytes: 1441020 # echo 1 > snapshot // current:1408, spare:1408 # echo 123 > buffer_size_kb // current:123, spare:1408 # echo 1 > snapshot // current:1408, spare:123 # grep bytes per_cpu/cpu0/stats bytes: 1443700 # cat buffer_size_kb 123 // != current:1408 And also, a similar per-cpu case hits the following WARNING: Reproducer: # echo 1 > per_cpu/cpu0/snapshot # echo 123 > buffer_size_kb # echo 1 > per_cpu/cpu0/snapshot WARNING: WARNING: CPU: 0 PID: 1946 at kernel/trace/trace.c:1607 update_max_tr_single.part.0+0x2b8/0x380 Modules linked in: CPU: 0 PID: 1946 Comm: bash Not tainted 5.2.0-rc6 rockchip-linux#20 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-2.fc30 04/01/2014 RIP: 0010:update_max_tr_single.part.0+0x2b8/0x380 Code: ff e8 dc da f9 ff 0f 0b e9 88 fe ff ff e8 d0 da f9 ff 44 89 ee bf f5 ff ff ff e8 33 dc f9 ff 41 83 fd f5 74 96 e8 b8 da f9 ff <0f> 0b eb 8d e8 af da f9 ff 0f 0b e9 bf fd ff ff e8 a3 da f9 ff 48 RSP: 0018:ffff888063e4fca0 EFLAGS: 00010093 RAX: ffff888066214380 RBX: ffffffff99850fe0 RCX: ffffffff964298a8 RDX: 0000000000000000 RSI: 00000000fffffff5 RDI: 0000000000000005 RBP: 1ffff1100c7c9f96 R08: ffff888066214380 R09: ffffed100c7c9f9b R10: ffffed100c7c9f9a R11: 0000000000000003 R12: 0000000000000000 R13: 00000000ffffffea R14: ffff888066214380 R15: ffffffff99851060 FS: 00007f9f8173c700(0000) GS:ffff88806d000000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000714dc0 CR3: 0000000066fa6000 CR4: 00000000000006f0 Call Trace: ? trace_array_printk_buf+0x140/0x140 ? __mutex_lock_slowpath+0x10/0x10 tracing_snapshot_write+0x4c8/0x7f0 ? trace_printk_init_buffers+0x60/0x60 ? selinux_file_permission+0x3b/0x540 ? tracer_preempt_off+0x38/0x506 ? trace_printk_init_buffers+0x60/0x60 __vfs_write+0x81/0x100 vfs_write+0x1e1/0x560 ksys_write+0x126/0x250 ? __ia32_sys_read+0xb0/0xb0 ? do_syscall_64+0x1f/0x390 do_syscall_64+0xc1/0x390 entry_SYSCALL_64_after_hwframe+0x49/0xbe This patch adds resize_buffer_duplicate_size() to check if there is a difference between current/spare buffer sizes and resize a spare buffer if necessary. Link: http://lkml.kernel.org/r/[email protected] Cc: [email protected] Fixes: ad909e2 ("tracing: Add internal tracing_snapshot() functions") Signed-off-by: Eiichi Tsukata <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Crash dump shows following instructions crash> bt PID: 0 TASK: ffffffffbe412480 CPU: 0 COMMAND: "swapper/0" #0 [ffff891ee0003868] machine_kexec at ffffffffbd063ef1 FireflyTeam#1 [ffff891ee00038c8] __crash_kexec at ffffffffbd12b6f2 FireflyTeam#2 [ffff891ee0003998] crash_kexec at ffffffffbd12c84c FireflyTeam#3 [ffff891ee00039b8] oops_end at ffffffffbd030f0a FireflyTeam#4 [ffff891ee00039e0] no_context at ffffffffbd074643 FireflyTeam#5 [ffff891ee0003a40] __bad_area_nosemaphore at ffffffffbd07496e FireflyTeam#6 [ffff891ee0003a90] bad_area_nosemaphore at ffffffffbd074a64 FireflyTeam#7 [ffff891ee0003aa0] __do_page_fault at ffffffffbd074b0a FireflyTeam#8 [ffff891ee0003b18] do_page_fault at ffffffffbd074fc8 FireflyTeam#9 [ffff891ee0003b50] page_fault at ffffffffbda01925 [exception RIP: qlt_schedule_sess_for_deletion+15] RIP: ffffffffc02e526f RSP: ffff891ee0003c08 RFLAGS: 00010046 RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffffc0307847 RDX: 00000000000020e6 RSI: ffff891edbc377c8 RDI: 0000000000000000 RBP: ffff891ee0003c18 R8: ffffffffc02f0b20 R9: 0000000000000250 R10: 0000000000000258 R11: 000000000000b780 R12: ffff891ed9b43000 R13: 00000000000000f0 R14: 0000000000000006 R15: ffff891edbc377c8 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 FireflyTeam#10 [ffff891ee0003c20] qla2x00_fcport_event_handler at ffffffffc02853d3 [qla2xxx] FireflyTeam#11 [ffff891ee0003cf0] __dta_qla24xx_async_gnl_sp_done_333 at ffffffffc0285a1d [qla2xxx] FireflyTeam#12 [ffff891ee0003de8] qla24xx_process_response_queue at ffffffffc02a2eb5 [qla2xxx] FireflyTeam#13 [ffff891ee0003e88] qla24xx_msix_rsp_q at ffffffffc02a5403 [qla2xxx] FireflyTeam#14 [ffff891ee0003ec0] __handle_irq_event_percpu at ffffffffbd0f4c59 FireflyTeam#15 [ffff891ee0003f10] handle_irq_event_percpu at ffffffffbd0f4e02 FireflyTeam#16 [ffff891ee0003f40] handle_irq_event at ffffffffbd0f4e90 rockchip-linux#17 [ffff891ee0003f68] handle_edge_irq at ffffffffbd0f8984 rockchip-linux#18 [ffff891ee0003f88] handle_irq at ffffffffbd0305d5 rockchip-linux#19 [ffff891ee0003fb8] do_IRQ at ffffffffbda02a18 --- <IRQ stack> --- rockchip-linux#20 [ffffffffbe403d30] ret_from_intr at ffffffffbda0094e [exception RIP: unknown or invalid address] RIP: 000000000000001f RSP: 0000000000000000 RFLAGS: fff3b8c2091ebb3f RAX: ffffbba5a0000200 RBX: 0000be8cdfa8f9fa RCX: 0000000000000018 RDX: 0000000000000101 RSI: 000000000000015d RDI: 0000000000000193 RBP: 0000000000000083 R8: ffffffffbe403e38 R9: 0000000000000002 R10: 0000000000000000 R11: ffffffffbe56b820 R12: ffff891ee001cf00 R13: ffffffffbd11c0a4 R14: ffffffffbe403d60 R15: 0000000000000001 ORIG_RAX: ffff891ee0022ac0 CS: 0000 SS: ffffffffffffffb9 bt: WARNING: possibly bogus exception frame rockchip-linux#21 [ffffffffbe403dd8] cpuidle_enter_state at ffffffffbd67c6fd rockchip-linux#22 [ffffffffbe403e40] cpuidle_enter at ffffffffbd67c907 rockchip-linux#23 [ffffffffbe403e50] call_cpuidle at ffffffffbd0d98f3 rockchip-linux#24 [ffffffffbe403e60] do_idle at ffffffffbd0d9b42 rockchip-linux#25 [ffffffffbe403e98] cpu_startup_entry at ffffffffbd0d9da3 rockchip-linux#26 [ffffffffbe403ec0] rest_init at ffffffffbd81d4aa rockchip-linux#27 [ffffffffbe403ed0] start_kernel at ffffffffbe67d2ca rockchip-linux#28 [ffffffffbe403f28] x86_64_start_reservations at ffffffffbe67c675 rockchip-linux#29 [ffffffffbe403f38] x86_64_start_kernel at ffffffffbe67c6eb rockchip-linux#30 [ffffffffbe403f50] secondary_startup_64 at ffffffffbd0000d5 Fixes: 040036b ("scsi: qla2xxx: Delay loop id allocation at login") Cc: <[email protected]> # v4.17+ Signed-off-by: Chuck Anderson <[email protected]> Signed-off-by: Himanshu Madhani <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>
[ Upstream commit 1bc7896 ] When experimenting with bpf_send_signal() helper in our production environment (5.2 based), we experienced a deadlock in NMI mode: friendlyarm#5 [ffffc9002219f770] queued_spin_lock_slowpath at ffffffff8110be24 friendlyarm#6 [ffffc9002219f770] _raw_spin_lock_irqsave at ffffffff81a43012 friendlyarm#7 [ffffc9002219f780] try_to_wake_up at ffffffff810e7ecd friendlyarm#8 [ffffc9002219f7e0] signal_wake_up_state at ffffffff810c7b55 friendlyarm#9 [ffffc9002219f7f0] __send_signal at ffffffff810c8602 rockchip-linux#10 [ffffc9002219f830] do_send_sig_info at ffffffff810ca31a rockchip-linux#11 [ffffc9002219f868] bpf_send_signal at ffffffff8119d227 rockchip-linux#12 [ffffc9002219f988] bpf_overflow_handler at ffffffff811d4140 rockchip-linux#13 [ffffc9002219f9e0] __perf_event_overflow at ffffffff811d68cf rockchip-linux#14 [ffffc9002219fa10] perf_swevent_overflow at ffffffff811d6a09 rockchip-linux#15 [ffffc9002219fa38] ___perf_sw_event at ffffffff811e0f47 rockchip-linux#16 [ffffc9002219fc30] __schedule at ffffffff81a3e04d rockchip-linux#17 [ffffc9002219fc90] schedule at ffffffff81a3e219 rockchip-linux#18 [ffffc9002219fca0] futex_wait_queue_me at ffffffff8113d1b9 rockchip-linux#19 [ffffc9002219fcd8] futex_wait at ffffffff8113e529 rockchip-linux#20 [ffffc9002219fdf0] do_futex at ffffffff8113ffbc rockchip-linux#21 [ffffc9002219fec0] __x64_sys_futex at ffffffff81140d1c rockchip-linux#22 [ffffc9002219ff38] do_syscall_64 at ffffffff81002602 rockchip-linux#23 [ffffc9002219ff50] entry_SYSCALL_64_after_hwframe at ffffffff81c00068 The above call stack is actually very similar to an issue reported by Commit eac9153 ("bpf/stackmap: Fix deadlock with rq_lock in bpf_get_stack()") by Song Liu. The only difference is bpf_send_signal() helper instead of bpf_get_stack() helper. The above deadlock is triggered with a perf_sw_event. Similar to Commit eac9153, the below almost identical reproducer used tracepoint point sched/sched_switch so the issue can be easily caught. /* stress_test.c */ #include <stdio.h> #include <stdlib.h> #include <sys/mman.h> #include <pthread.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #define THREAD_COUNT 1000 char *filename; void *worker(void *p) { void *ptr; int fd; char *pptr; fd = open(filename, O_RDONLY); if (fd < 0) return NULL; while (1) { struct timespec ts = {0, 1000 + rand() % 2000}; ptr = mmap(NULL, 4096 * 64, PROT_READ, MAP_PRIVATE, fd, 0); usleep(1); if (ptr == MAP_FAILED) { printf("failed to mmap\n"); break; } munmap(ptr, 4096 * 64); usleep(1); pptr = malloc(1); usleep(1); pptr[0] = 1; usleep(1); free(pptr); usleep(1); nanosleep(&ts, NULL); } close(fd); return NULL; } int main(int argc, char *argv[]) { void *ptr; int i; pthread_t threads[THREAD_COUNT]; if (argc < 2) return 0; filename = argv[1]; for (i = 0; i < THREAD_COUNT; i++) { if (pthread_create(threads + i, NULL, worker, NULL)) { fprintf(stderr, "Error creating thread\n"); return 0; } } for (i = 0; i < THREAD_COUNT; i++) pthread_join(threads[i], NULL); return 0; } and the following command: 1. run `stress_test /bin/ls` in one windown 2. hack bcc trace.py with the following change: # --- a/tools/trace.py # +++ b/tools/trace.py @@ -513,6 +513,7 @@ BPF_PERF_OUTPUT(%s); __data.tgid = __tgid; __data.pid = __pid; bpf_get_current_comm(&__data.comm, sizeof(__data.comm)); + bpf_send_signal(10); %s %s %s.perf_submit(%s, &__data, sizeof(__data)); 3. in a different window run ./trace.py -p $(pidof stress_test) t:sched:sched_switch The deadlock can be reproduced in our production system. Similar to Song's fix, the fix is to delay sending signal if irqs is disabled to avoid deadlocks involving with rq_lock. With this change, my above stress-test in our production system won't cause deadlock any more. I also implemented a scale-down version of reproducer in the selftest (a subsequent commit). With latest bpf-next, it complains for the following potential deadlock. [ 32.832450] -> friendlyarm#1 (&p->pi_lock){-.-.}: [ 32.833100] _raw_spin_lock_irqsave+0x44/0x80 [ 32.833696] task_rq_lock+0x2c/0xa0 [ 32.834182] task_sched_runtime+0x59/0xd0 [ 32.834721] thread_group_cputime+0x250/0x270 [ 32.835304] thread_group_cputime_adjusted+0x2e/0x70 [ 32.835959] do_task_stat+0x8a7/0xb80 [ 32.836461] proc_single_show+0x51/0xb0 ... [ 32.839512] -> #0 (&(&sighand->siglock)->rlock){....}: [ 32.840275] __lock_acquire+0x1358/0x1a20 [ 32.840826] lock_acquire+0xc7/0x1d0 [ 32.841309] _raw_spin_lock_irqsave+0x44/0x80 [ 32.841916] __lock_task_sighand+0x79/0x160 [ 32.842465] do_send_sig_info+0x35/0x90 [ 32.842977] bpf_send_signal+0xa/0x10 [ 32.843464] bpf_prog_bc13ed9e4d3163e3_send_signal_tp_sched+0x465/0x1000 [ 32.844301] trace_call_bpf+0x115/0x270 [ 32.844809] perf_trace_run_bpf_submit+0x4a/0xc0 [ 32.845411] perf_trace_sched_switch+0x10f/0x180 [ 32.846014] __schedule+0x45d/0x880 [ 32.846483] schedule+0x5f/0xd0 ... [ 32.853148] Chain exists of: [ 32.853148] &(&sighand->siglock)->rlock --> &p->pi_lock --> &rq->lock [ 32.853148] [ 32.854451] Possible unsafe locking scenario: [ 32.854451] [ 32.855173] CPU0 CPU1 [ 32.855745] ---- ---- [ 32.856278] lock(&rq->lock); [ 32.856671] lock(&p->pi_lock); [ 32.857332] lock(&rq->lock); [ 32.857999] lock(&(&sighand->siglock)->rlock); Deadlock happens on CPU0 when it tries to acquire &sighand->siglock but it has been held by CPU1 and CPU1 tries to grab &rq->lock and cannot get it. This is not exactly the callstack in our production environment, but sympotom is similar and both locks are using spin_lock_irqsave() to acquire the lock, and both involves rq_lock. The fix to delay sending signal when irq is disabled also fixed this issue. Signed-off-by: Yonghong Song <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Cc: Song Liu <[email protected]> Link: https://lore.kernel.org/bpf/[email protected] Signed-off-by: Sasha Levin <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
[ no upstream commit ] Switch the comparison, so that is_branch_taken() will recognize that below branch is never taken: [...] 17: [...] R1_w=inv0 [...] R8_w=inv(id=0,smin_value=-2147483648,smax_value=-1,umin_value=18446744071562067968,var_off=(0xffffffff80000000; 0x7fffffff)) [...] 17: (67) r8 <<= 32 18: [...] R8_w=inv(id=0,smax_value=-4294967296,umin_value=9223372036854775808,umax_value=18446744069414584320,var_off=(0x8000000000000000; 0x7fffffff00000000)) [...] 18: (c7) r8 s>>= 32 19: [...] R8_w=inv(id=0,smin_value=-2147483648,smax_value=-1,umin_value=18446744071562067968,var_off=(0xffffffff80000000; 0x7fffffff)) [...] 19: (6d) if r1 s> r8 goto pc+16 [...] R1_w=inv0 [...] R8_w=inv(id=0,smin_value=-2147483648,smax_value=-1,umin_value=18446744071562067968,var_off=(0xffffffff80000000; 0x7fffffff)) [...] [...] Currently we check for is_branch_taken() only if either K is source, or source is a scalar value that is const. For upstream it would be good to extend this properly to check whether dst is const and src not. For the sake of the test_verifier, it is probably not needed here: # ./test_verifier 101 rockchip-linux#101/p bpf_get_stack return R0 within range OK Summary: 1 PASSED, 0 SKIPPED, 0 FAILED I haven't seen this issue in test_progs* though, they are passing fine: # ./test_progs-no_alu32 -t get_stack Switching to flavor 'no_alu32' subdirectory... rockchip-linux#20 get_stack_raw_tp:OK Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED # ./test_progs -t get_stack rockchip-linux#20 get_stack_raw_tp:OK Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Alexei Starovoitov <[email protected]> Acked-by: John Fastabend <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
[ Upstream commit e24c644 ] I compiled with AddressSanitizer and I had these memory leaks while I was using the tep_parse_format function: Direct leak of 28 byte(s) in 4 object(s) allocated from: #0 0x7fb07db49ffe in __interceptor_realloc (/lib/x86_64-linux-gnu/libasan.so.5+0x10dffe) #1 0x7fb07a724228 in extend_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:985 #2 0x7fb07a724c21 in __read_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1140 #3 0x7fb07a724f78 in read_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1206 #4 0x7fb07a725191 in __read_expect_type /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1291 #5 0x7fb07a7251df in read_expect_type /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1299 #6 0x7fb07a72e6c8 in process_dynamic_array_len /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:2849 #7 0x7fb07a7304b8 in process_function /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3161 #8 0x7fb07a730900 in process_arg_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3207 #9 0x7fb07a727c0b in process_arg /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1786 rockchip-linux#10 0x7fb07a731080 in event_read_print_args /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3285 rockchip-linux#11 0x7fb07a731722 in event_read_print /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3369 rockchip-linux#12 0x7fb07a740054 in __tep_parse_format /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:6335 rockchip-linux#13 0x7fb07a74047a in __parse_event /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:6389 rockchip-linux#14 0x7fb07a740536 in tep_parse_format /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:6431 rockchip-linux#15 0x7fb07a785acf in parse_event ../../../src/fs-src/fs.c:251 rockchip-linux#16 0x7fb07a785ccd in parse_systems ../../../src/fs-src/fs.c:284 rockchip-linux#17 0x7fb07a786fb3 in read_metadata ../../../src/fs-src/fs.c:593 rockchip-linux#18 0x7fb07a78760e in ftrace_fs_source_init ../../../src/fs-src/fs.c:727 rockchip-linux#19 0x7fb07d90c19c in add_component_with_init_method_data ../../../../src/lib/graph/graph.c:1048 rockchip-linux#20 0x7fb07d90c87b in add_source_component_with_initialize_method_data ../../../../src/lib/graph/graph.c:1127 rockchip-linux#21 0x7fb07d90c92a in bt_graph_add_source_component ../../../../src/lib/graph/graph.c:1152 rockchip-linux#22 0x55db11aa632e in cmd_run_ctx_create_components_from_config_components ../../../src/cli/babeltrace2.c:2252 rockchip-linux#23 0x55db11aa6fda in cmd_run_ctx_create_components ../../../src/cli/babeltrace2.c:2347 rockchip-linux#24 0x55db11aa780c in cmd_run ../../../src/cli/babeltrace2.c:2461 rockchip-linux#25 0x55db11aa8a7d in main ../../../src/cli/babeltrace2.c:2673 rockchip-linux#26 0x7fb07d5460b2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x270b2) The token variable in the process_dynamic_array_len function is allocated in the read_expect_type function, but is not freed before calling the read_token function. Free the token variable before calling read_token in order to plug the leak. Signed-off-by: Philippe Duplessis-Guindon <[email protected]> Reviewed-by: Steven Rostedt (VMware) <[email protected]> Link: https://lore.kernel.org/linux-trace-devel/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit ab0db04 ] When running with -o enospc_debug you can get the following splat if one of the dump_space_info's trip ====================================================== WARNING: possible circular locking dependency detected 5.8.0-rc5+ rockchip-linux#20 Tainted: G OE ------------------------------------------------------ dd/563090 is trying to acquire lock: ffff9e7dbf4f1e18 (&ctl->tree_lock){+.+.}-{2:2}, at: btrfs_dump_free_space+0x2b/0xa0 [btrfs] but task is already holding lock: ffff9e7e2284d428 (&cache->lock){+.+.}-{2:2}, at: btrfs_dump_space_info+0xaa/0x120 [btrfs] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #3 (&cache->lock){+.+.}-{2:2}: _raw_spin_lock+0x25/0x30 btrfs_add_reserved_bytes+0x3c/0x3c0 [btrfs] find_free_extent+0x7ef/0x13b0 [btrfs] btrfs_reserve_extent+0x9b/0x180 [btrfs] btrfs_alloc_tree_block+0xc1/0x340 [btrfs] alloc_tree_block_no_bg_flush+0x4a/0x60 [btrfs] __btrfs_cow_block+0x122/0x530 [btrfs] btrfs_cow_block+0x106/0x210 [btrfs] commit_cowonly_roots+0x55/0x300 [btrfs] btrfs_commit_transaction+0x4ed/0xac0 [btrfs] sync_filesystem+0x74/0x90 generic_shutdown_super+0x22/0x100 kill_anon_super+0x14/0x30 btrfs_kill_super+0x12/0x20 [btrfs] deactivate_locked_super+0x36/0x70 cleanup_mnt+0x104/0x160 task_work_run+0x5f/0x90 __prepare_exit_to_usermode+0x1bd/0x1c0 do_syscall_64+0x5e/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #2 (&space_info->lock){+.+.}-{2:2}: _raw_spin_lock+0x25/0x30 btrfs_block_rsv_release+0x1a6/0x3f0 [btrfs] btrfs_inode_rsv_release+0x4f/0x170 [btrfs] btrfs_clear_delalloc_extent+0x155/0x480 [btrfs] clear_state_bit+0x81/0x1a0 [btrfs] __clear_extent_bit+0x25c/0x5d0 [btrfs] clear_extent_bit+0x15/0x20 [btrfs] btrfs_invalidatepage+0x2b7/0x3c0 [btrfs] truncate_cleanup_page+0x47/0xe0 truncate_inode_pages_range+0x238/0x840 truncate_pagecache+0x44/0x60 btrfs_setattr+0x202/0x5e0 [btrfs] notify_change+0x33b/0x490 do_truncate+0x76/0xd0 path_openat+0x687/0xa10 do_filp_open+0x91/0x100 do_sys_openat2+0x215/0x2d0 do_sys_open+0x44/0x80 do_syscall_64+0x52/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #1 (&tree->lock#2){+.+.}-{2:2}: _raw_spin_lock+0x25/0x30 find_first_extent_bit+0x32/0x150 [btrfs] write_pinned_extent_entries.isra.0+0xc5/0x100 [btrfs] __btrfs_write_out_cache+0x172/0x480 [btrfs] btrfs_write_out_cache+0x7a/0xf0 [btrfs] btrfs_write_dirty_block_groups+0x286/0x3b0 [btrfs] commit_cowonly_roots+0x245/0x300 [btrfs] btrfs_commit_transaction+0x4ed/0xac0 [btrfs] close_ctree+0xf9/0x2f5 [btrfs] generic_shutdown_super+0x6c/0x100 kill_anon_super+0x14/0x30 btrfs_kill_super+0x12/0x20 [btrfs] deactivate_locked_super+0x36/0x70 cleanup_mnt+0x104/0x160 task_work_run+0x5f/0x90 __prepare_exit_to_usermode+0x1bd/0x1c0 do_syscall_64+0x5e/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #0 (&ctl->tree_lock){+.+.}-{2:2}: __lock_acquire+0x1240/0x2460 lock_acquire+0xab/0x360 _raw_spin_lock+0x25/0x30 btrfs_dump_free_space+0x2b/0xa0 [btrfs] btrfs_dump_space_info+0xf4/0x120 [btrfs] btrfs_reserve_extent+0x176/0x180 [btrfs] __btrfs_prealloc_file_range+0x145/0x550 [btrfs] cache_save_setup+0x28d/0x3b0 [btrfs] btrfs_start_dirty_block_groups+0x1fc/0x4f0 [btrfs] btrfs_commit_transaction+0xcc/0xac0 [btrfs] btrfs_alloc_data_chunk_ondemand+0x162/0x4c0 [btrfs] btrfs_check_data_free_space+0x4c/0xa0 [btrfs] btrfs_buffered_write.isra.0+0x19b/0x740 [btrfs] btrfs_file_write_iter+0x3cf/0x610 [btrfs] new_sync_write+0x11e/0x1b0 vfs_write+0x1c9/0x200 ksys_write+0x68/0xe0 do_syscall_64+0x52/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 other info that might help us debug this: Chain exists of: &ctl->tree_lock --> &space_info->lock --> &cache->lock Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&cache->lock); lock(&space_info->lock); lock(&cache->lock); lock(&ctl->tree_lock); *** DEADLOCK *** 6 locks held by dd/563090: #0: ffff9e7e21d18448 (sb_writers#14){.+.+}-{0:0}, at: vfs_write+0x195/0x200 #1: ffff9e7dd0410ed8 (&sb->s_type->i_mutex_key#19){++++}-{3:3}, at: btrfs_file_write_iter+0x86/0x610 [btrfs] #2: ffff9e7e21d18638 (sb_internal#2){.+.+}-{0:0}, at: start_transaction+0x40b/0x5b0 [btrfs] #3: ffff9e7e1f05d688 (&cur_trans->cache_write_mutex){+.+.}-{3:3}, at: btrfs_start_dirty_block_groups+0x158/0x4f0 [btrfs] #4: ffff9e7e2284ddb8 (&space_info->groups_sem){++++}-{3:3}, at: btrfs_dump_space_info+0x69/0x120 [btrfs] #5: ffff9e7e2284d428 (&cache->lock){+.+.}-{2:2}, at: btrfs_dump_space_info+0xaa/0x120 [btrfs] stack backtrace: CPU: 3 PID: 563090 Comm: dd Tainted: G OE 5.8.0-rc5+ rockchip-linux#20 Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./890FX Deluxe5, BIOS P1.40 05/03/2011 Call Trace: dump_stack+0x96/0xd0 check_noncircular+0x162/0x180 __lock_acquire+0x1240/0x2460 ? wake_up_klogd.part.0+0x30/0x40 lock_acquire+0xab/0x360 ? btrfs_dump_free_space+0x2b/0xa0 [btrfs] _raw_spin_lock+0x25/0x30 ? btrfs_dump_free_space+0x2b/0xa0 [btrfs] btrfs_dump_free_space+0x2b/0xa0 [btrfs] btrfs_dump_space_info+0xf4/0x120 [btrfs] btrfs_reserve_extent+0x176/0x180 [btrfs] __btrfs_prealloc_file_range+0x145/0x550 [btrfs] ? btrfs_qgroup_reserve_data+0x1d/0x60 [btrfs] cache_save_setup+0x28d/0x3b0 [btrfs] btrfs_start_dirty_block_groups+0x1fc/0x4f0 [btrfs] btrfs_commit_transaction+0xcc/0xac0 [btrfs] ? start_transaction+0xe0/0x5b0 [btrfs] btrfs_alloc_data_chunk_ondemand+0x162/0x4c0 [btrfs] btrfs_check_data_free_space+0x4c/0xa0 [btrfs] btrfs_buffered_write.isra.0+0x19b/0x740 [btrfs] ? ktime_get_coarse_real_ts64+0xa8/0xd0 ? trace_hardirqs_on+0x1c/0xe0 btrfs_file_write_iter+0x3cf/0x610 [btrfs] new_sync_write+0x11e/0x1b0 vfs_write+0x1c9/0x200 ksys_write+0x68/0xe0 do_syscall_64+0x52/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 This is because we're holding the block_group->lock while trying to dump the free space cache. However we don't need this lock, we just need it to read the values for the printk, so move the free space cache dumping outside of the block group lock. Signed-off-by: Josef Bacik <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
commit 01d01ca upstream. We are currently getting this lockdep splat in btrfs/161: ====================================================== WARNING: possible circular locking dependency detected 5.8.0-rc5+ rockchip-linux#20 Tainted: G E ------------------------------------------------------ mount/678048 is trying to acquire lock: ffff9b769f15b6e0 (&fs_devs->device_list_mutex){+.+.}-{3:3}, at: clone_fs_devices+0x4d/0x170 [btrfs] but task is already holding lock: ffff9b76abdb08d0 (&fs_info->chunk_mutex){+.+.}-{3:3}, at: btrfs_read_chunk_tree+0x6a/0x800 [btrfs] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&fs_info->chunk_mutex){+.+.}-{3:3}: __mutex_lock+0x8b/0x8f0 btrfs_init_new_device+0x2d2/0x1240 [btrfs] btrfs_ioctl+0x1de/0x2d20 [btrfs] ksys_ioctl+0x87/0xc0 __x64_sys_ioctl+0x16/0x20 do_syscall_64+0x52/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #0 (&fs_devs->device_list_mutex){+.+.}-{3:3}: __lock_acquire+0x1240/0x2460 lock_acquire+0xab/0x360 __mutex_lock+0x8b/0x8f0 clone_fs_devices+0x4d/0x170 [btrfs] btrfs_read_chunk_tree+0x330/0x800 [btrfs] open_ctree+0xb7c/0x18ce [btrfs] btrfs_mount_root.cold+0x13/0xfa [btrfs] legacy_get_tree+0x30/0x50 vfs_get_tree+0x28/0xc0 fc_mount+0xe/0x40 vfs_kern_mount.part.0+0x71/0x90 btrfs_mount+0x13b/0x3e0 [btrfs] legacy_get_tree+0x30/0x50 vfs_get_tree+0x28/0xc0 do_mount+0x7de/0xb30 __x64_sys_mount+0x8e/0xd0 do_syscall_64+0x52/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&fs_info->chunk_mutex); lock(&fs_devs->device_list_mutex); lock(&fs_info->chunk_mutex); lock(&fs_devs->device_list_mutex); *** DEADLOCK *** 3 locks held by mount/678048: #0: ffff9b75ff5fb0e0 (&type->s_umount_key#63/1){+.+.}-{3:3}, at: alloc_super+0xb5/0x380 #1: ffffffffc0c2fbc8 (uuid_mutex){+.+.}-{3:3}, at: btrfs_read_chunk_tree+0x54/0x800 [btrfs] #2: ffff9b76abdb08d0 (&fs_info->chunk_mutex){+.+.}-{3:3}, at: btrfs_read_chunk_tree+0x6a/0x800 [btrfs] stack backtrace: CPU: 2 PID: 678048 Comm: mount Tainted: G E 5.8.0-rc5+ rockchip-linux#20 Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./890FX Deluxe5, BIOS P1.40 05/03/2011 Call Trace: dump_stack+0x96/0xd0 check_noncircular+0x162/0x180 __lock_acquire+0x1240/0x2460 ? asm_sysvec_apic_timer_interrupt+0x12/0x20 lock_acquire+0xab/0x360 ? clone_fs_devices+0x4d/0x170 [btrfs] __mutex_lock+0x8b/0x8f0 ? clone_fs_devices+0x4d/0x170 [btrfs] ? rcu_read_lock_sched_held+0x52/0x60 ? cpumask_next+0x16/0x20 ? module_assert_mutex_or_preempt+0x14/0x40 ? __module_address+0x28/0xf0 ? clone_fs_devices+0x4d/0x170 [btrfs] ? static_obj+0x4f/0x60 ? lockdep_init_map_waits+0x43/0x200 ? clone_fs_devices+0x4d/0x170 [btrfs] clone_fs_devices+0x4d/0x170 [btrfs] btrfs_read_chunk_tree+0x330/0x800 [btrfs] open_ctree+0xb7c/0x18ce [btrfs] ? super_setup_bdi_name+0x79/0xd0 btrfs_mount_root.cold+0x13/0xfa [btrfs] ? vfs_parse_fs_string+0x84/0xb0 ? rcu_read_lock_sched_held+0x52/0x60 ? kfree+0x2b5/0x310 legacy_get_tree+0x30/0x50 vfs_get_tree+0x28/0xc0 fc_mount+0xe/0x40 vfs_kern_mount.part.0+0x71/0x90 btrfs_mount+0x13b/0x3e0 [btrfs] ? cred_has_capability+0x7c/0x120 ? rcu_read_lock_sched_held+0x52/0x60 ? legacy_get_tree+0x30/0x50 legacy_get_tree+0x30/0x50 vfs_get_tree+0x28/0xc0 do_mount+0x7de/0xb30 ? memdup_user+0x4e/0x90 __x64_sys_mount+0x8e/0xd0 do_syscall_64+0x52/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 This is because btrfs_read_chunk_tree() can come upon DEV_EXTENT's and then read the device, which takes the device_list_mutex. The device_list_mutex needs to be taken before the chunk_mutex, so this is a problem. We only really need the chunk mutex around adding the chunk, so move the mutex around read_one_chunk. An argument could be made that we don't even need the chunk_mutex here as it's during mount, and we are protected by various other locks. However we already have special rules for ->device_list_mutex, and I'd rather not have another special case for ->chunk_mutex. CC: [email protected] # 4.19+ Reviewed-by: Anand Jain <[email protected]> Signed-off-by: Josef Bacik <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit 4d14c5c upstream Calling btrfs_qgroup_reserve_meta_prealloc from btrfs_delayed_inode_reserve_metadata can result in flushing delalloc while holding a transaction and delayed node locks. This is deadlock prone. In the past multiple commits: * ae5e070 ("btrfs: qgroup: don't try to wait flushing if we're already holding a transaction") * 6f23277 ("btrfs: qgroup: don't commit transaction when we already hold the handle") Tried to solve various aspects of this but this was always a whack-a-mole game. Unfortunately those 2 fixes don't solve a deadlock scenario involving btrfs_delayed_node::mutex. Namely, one thread can call btrfs_dirty_inode as a result of reading a file and modifying its atime: PID: 6963 TASK: ffff8c7f3f94c000 CPU: 2 COMMAND: "test" #0 __schedule at ffffffffa529e07d #1 schedule at ffffffffa529e4ff #2 schedule_timeout at ffffffffa52a1bdd #3 wait_for_completion at ffffffffa529eeea <-- sleeps with delayed node mutex held #4 start_delalloc_inodes at ffffffffc0380db5 #5 btrfs_start_delalloc_snapshot at ffffffffc0393836 #6 try_flush_qgroup at ffffffffc03f04b2 #7 __btrfs_qgroup_reserve_meta at ffffffffc03f5bb6 <-- tries to reserve space and starts delalloc inodes. #8 btrfs_delayed_update_inode at ffffffffc03e31aa <-- acquires delayed node mutex #9 btrfs_update_inode at ffffffffc0385ba8 rockchip-linux#10 btrfs_dirty_inode at ffffffffc038627b <-- TRANSACTIION OPENED rockchip-linux#11 touch_atime at ffffffffa4cf0000 rockchip-linux#12 generic_file_read_iter at ffffffffa4c1f123 rockchip-linux#13 new_sync_read at ffffffffa4ccdc8a rockchip-linux#14 vfs_read at ffffffffa4cd0849 rockchip-linux#15 ksys_read at ffffffffa4cd0bd1 rockchip-linux#16 do_syscall_64 at ffffffffa4a052eb rockchip-linux#17 entry_SYSCALL_64_after_hwframe at ffffffffa540008c This will cause an asynchronous work to flush the delalloc inodes to happen which can try to acquire the same delayed_node mutex: PID: 455 TASK: ffff8c8085fa4000 CPU: 5 COMMAND: "kworker/u16:30" #0 __schedule at ffffffffa529e07d #1 schedule at ffffffffa529e4ff #2 schedule_preempt_disabled at ffffffffa529e80a #3 __mutex_lock at ffffffffa529fdcb <-- goes to sleep, never wakes up. #4 btrfs_delayed_update_inode at ffffffffc03e3143 <-- tries to acquire the mutex #5 btrfs_update_inode at ffffffffc0385ba8 <-- this is the same inode that pid 6963 is holding #6 cow_file_range_inline.constprop.78 at ffffffffc0386be7 #7 cow_file_range at ffffffffc03879c1 #8 btrfs_run_delalloc_range at ffffffffc038894c #9 writepage_delalloc at ffffffffc03a3c8f rockchip-linux#10 __extent_writepage at ffffffffc03a4c01 rockchip-linux#11 extent_write_cache_pages at ffffffffc03a500b rockchip-linux#12 extent_writepages at ffffffffc03a6de2 rockchip-linux#13 do_writepages at ffffffffa4c277eb rockchip-linux#14 __filemap_fdatawrite_range at ffffffffa4c1e5bb rockchip-linux#15 btrfs_run_delalloc_work at ffffffffc0380987 <-- starts running delayed nodes rockchip-linux#16 normal_work_helper at ffffffffc03b706c rockchip-linux#17 process_one_work at ffffffffa4aba4e4 rockchip-linux#18 worker_thread at ffffffffa4aba6fd rockchip-linux#19 kthread at ffffffffa4ac0a3d rockchip-linux#20 ret_from_fork at ffffffffa54001ff To fully address those cases the complete fix is to never issue any flushing while holding the transaction or the delayed node lock. This patch achieves it by calling qgroup_reserve_meta directly which will either succeed without flushing or will fail and return -EDQUOT. In the latter case that return value is going to be propagated to btrfs_dirty_inode which will fallback to start a new transaction. That's fine as the majority of time we expect the inode will have BTRFS_DELAYED_NODE_INODE_DIRTY flag set which will result in directly copying the in-memory state. Fixes: c53e965 ("btrfs: qgroup: try to flush qgroup space when we get -EDQUOT") CC: [email protected] # 5.10+ Reviewed-by: Qu Wenruo <[email protected]> Signed-off-by: Nikolay Borisov <[email protected]> Signed-off-by: David Sterba <[email protected]> [sudip: adjust context] Signed-off-by: Sudip Mukherjee <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit 8b59b0a upstream. arm32 uses software to simulate the instruction replaced by kprobe. some instructions may be simulated by constructing assembly functions. therefore, before executing instruction simulation, it is necessary to construct assembly function execution environment in C language through binding registers. after kasan is enabled, the register binding relationship will be destroyed, resulting in instruction simulation errors and causing kernel panic. the kprobe emulate instruction function is distributed in three files: actions-common.c actions-arm.c actions-thumb.c, so disable KASAN when compiling these files. for example, use kprobe insert on cap_capable+20 after kasan enabled, the cap_capable assembly code is as follows: <cap_capable>: e92d47f0 push {r4, r5, r6, r7, r8, r9, sl, lr} e1a05000 mov r5, r0 e280006c add r0, r0, rockchip-linux#108 ; 0x6c e1a04001 mov r4, r1 e1a06002 mov r6, r2 e59fa090 ldr sl, [pc, rockchip-linux#144] ; ebfc7bf8 bl c03aa4b4 <__asan_load4> e595706c ldr r7, [r5, rockchip-linux#108] ; 0x6c e2859014 add r9, r5, rockchip-linux#20 ...... The emulate_ldr assembly code after enabling kasan is as follows: c06f1384 <emulate_ldr>: e92d47f0 push {r4, r5, r6, r7, r8, r9, sl, lr} e282803c add r8, r2, rockchip-linux#60 ; 0x3c e1a05000 mov r5, r0 e7e37855 ubfx r7, r5, rockchip-linux#16, #4 e1a00008 mov r0, r8 e1a09001 mov r9, r1 e1a04002 mov r4, r2 ebf35462 bl c03c6530 <__asan_load4> e357000f cmp r7, rockchip-linux#15 e7e36655 ubfx r6, r5, rockchip-linux#12, #4 e205a00f and sl, r5, rockchip-linux#15 0a000001 beq c06f13bc <emulate_ldr+0x38> e0840107 add r0, r4, r7, lsl #2 ebf3545c bl c03c6530 <__asan_load4> e084010a add r0, r4, sl, lsl #2 ebf3545a bl c03c6530 <__asan_load4> e2890010 add r0, r9, rockchip-linux#16 ebf35458 bl c03c6530 <__asan_load4> e5990010 ldr r0, [r9, rockchip-linux#16] e12fff30 blx r0 e356000f cm r6, rockchip-linux#15 1a000014 bne c06f1430 <emulate_ldr+0xac> e1a06000 mov r6, r0 e2840040 add r0, r4, rockchip-linux#64 ; 0x40 ...... when running in emulate_ldr to simulate the ldr instruction, panic occurred, and the log is as follows: Unable to handle kernel NULL pointer dereference at virtual address 00000090 pgd = ecb46400 [00000090] *pgd=2e0fa003, *pmd=00000000 Internal error: Oops: 206 [#1] SMP ARM PC is at cap_capable+0x14/0xb0 LR is at emulate_ldr+0x50/0xc0 psr: 600d0293 sp : ecd63af8 ip : 00000004 fp : c0a7c30c r10: 00000000 r9 : c30897f4 r8 : ecd63cd4 r7 : 0000000f r6 : 0000000a r5 : e59fa090 r4 : ecd63c98 r3 : c06ae294 r2 : 00000000 r1 : b7611300 r0 : bf4ec008 Flags: nZCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment user Control: 32c5387d Table: 2d546400 DAC: 55555555 Process bash (pid: 1643, stack limit = 0xecd60190) (cap_capable) from (kprobe_handler+0x218/0x340) (kprobe_handler) from (kprobe_trap_handler+0x24/0x48) (kprobe_trap_handler) from (do_undefinstr+0x13c/0x364) (do_undefinstr) from (__und_svc_finish+0x0/0x30) (__und_svc_finish) from (cap_capable+0x18/0xb0) (cap_capable) from (cap_vm_enough_memory+0x38/0x48) (cap_vm_enough_memory) from (security_vm_enough_memory_mm+0x48/0x6c) (security_vm_enough_memory_mm) from (copy_process.constprop.5+0x16b4/0x25c8) (copy_process.constprop.5) from (_do_fork+0xe8/0x55c) (_do_fork) from (SyS_clone+0x1c/0x24) (SyS_clone) from (__sys_trace_return+0x0/0x10) Code: 0050a0e1 6c0080e2 0140a0e1 0260a0e1 (f801f0e7) Fixes: 35aa1df ("ARM kprobes: instruction single-stepping support") Fixes: 4210157 ("ARM: 9017/2: Enable KASan for ARM") Signed-off-by: huangshaobo <[email protected]> Acked-by: Ard Biesheuvel <[email protected]> Signed-off-by: Russell King (Oracle) <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit f9c6456 upstream. Masoud Sharbiani noticed that commit 29ef680 ("memcg, oom: move out_of_memory back to the charge path") broke memcg OOM called from __xfs_filemap_fault() path. It turned out that try_charge() is retrying forever without making forward progress because mem_cgroup_oom(GFP_NOFS) cannot invoke the OOM killer due to commit 3da88fb ("mm, oom: move GFP_NOFS check to out_of_memory"). Allowing forced charge due to being unable to invoke memcg OOM killer will lead to global OOM situation. Also, just returning -ENOMEM will be risky because OOM path is lost and some paths (e.g. get_user_pages()) will leak -ENOMEM. Therefore, invoking memcg OOM killer (despite GFP_NOFS) will be the only choice we can choose for now. Until 29ef680, we were able to invoke memcg OOM killer when GFP_KERNEL reclaim failed [1]. But since 29ef680, we need to invoke memcg OOM killer when GFP_NOFS reclaim failed [2]. Although in the past we did invoke memcg OOM killer for GFP_NOFS [3], we might get pre-mature memcg OOM reports due to this patch. [1] leaker invoked oom-killer: gfp_mask=0x6200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0 CPU: 0 PID: 2746 Comm: leaker Not tainted 4.18.0+ rockchip-linux#19 Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/13/2018 Call Trace: dump_stack+0x63/0x88 dump_header+0x67/0x27a ? mem_cgroup_scan_tasks+0x91/0xf0 oom_kill_process+0x210/0x410 out_of_memory+0x10a/0x2c0 mem_cgroup_out_of_memory+0x46/0x80 mem_cgroup_oom_synchronize+0x2e4/0x310 ? high_work_func+0x20/0x20 pagefault_out_of_memory+0x31/0x76 mm_fault_error+0x55/0x115 ? handle_mm_fault+0xfd/0x220 __do_page_fault+0x433/0x4e0 do_page_fault+0x22/0x30 ? page_fault+0x8/0x30 page_fault+0x1e/0x30 RIP: 0033:0x4009f0 Code: 03 00 00 00 e8 71 fd ff ff 48 83 f8 ff 49 89 c6 74 74 48 89 c6 bf c0 0c 40 00 31 c0 e8 69 fd ff ff 45 85 ff 7e 21 31 c9 66 90 <41> 0f be 14 0e 01 d3 f7 c1 ff 0f 00 00 75 05 41 c6 04 0e 2a 48 83 RSP: 002b:00007ffe29ae96f0 EFLAGS: 00010206 RAX: 000000000000001b RBX: 0000000000000000 RCX: 0000000001ce1000 RDX: 0000000000000000 RSI: 000000007fffffe5 RDI: 0000000000000000 RBP: 000000000000000c R08: 0000000000000000 R09: 00007f94be09220d R10: 0000000000000002 R11: 0000000000000246 R12: 00000000000186a0 R13: 0000000000000003 R14: 00007f949d845000 R15: 0000000002800000 Task in /leaker killed as a result of limit of /leaker memory: usage 524288kB, limit 524288kB, failcnt 158965 memory+swap: usage 0kB, limit 9007199254740988kB, failcnt 0 kmem: usage 2016kB, limit 9007199254740988kB, failcnt 0 Memory cgroup stats for /leaker: cache:844KB rss:521136KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:132KB writeback:0KB inactive_anon:0KB active_anon:521224KB inactive_file:1012KB active_file:8KB unevictable:0KB Memory cgroup out of memory: Kill process 2746 (leaker) score 998 or sacrifice child Killed process 2746 (leaker) total-vm:536704kB, anon-rss:521176kB, file-rss:1208kB, shmem-rss:0kB oom_reaper: reaped process 2746 (leaker), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB [2] leaker invoked oom-killer: gfp_mask=0x600040(GFP_NOFS), nodemask=(null), order=0, oom_score_adj=0 CPU: 1 PID: 2746 Comm: leaker Not tainted 4.18.0+ rockchip-linux#20 Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/13/2018 Call Trace: dump_stack+0x63/0x88 dump_header+0x67/0x27a ? mem_cgroup_scan_tasks+0x91/0xf0 oom_kill_process+0x210/0x410 out_of_memory+0x109/0x2d0 mem_cgroup_out_of_memory+0x46/0x80 try_charge+0x58d/0x650 ? __radix_tree_replace+0x81/0x100 mem_cgroup_try_charge+0x7a/0x100 __add_to_page_cache_locked+0x92/0x180 add_to_page_cache_lru+0x4d/0xf0 iomap_readpages_actor+0xde/0x1b0 ? iomap_zero_range_actor+0x1d0/0x1d0 iomap_apply+0xaf/0x130 iomap_readpages+0x9f/0x150 ? iomap_zero_range_actor+0x1d0/0x1d0 xfs_vm_readpages+0x18/0x20 [xfs] read_pages+0x60/0x140 __do_page_cache_readahead+0x193/0x1b0 ondemand_readahead+0x16d/0x2c0 page_cache_async_readahead+0x9a/0xd0 filemap_fault+0x403/0x620 ? alloc_set_pte+0x12c/0x540 ? _cond_resched+0x14/0x30 __xfs_filemap_fault+0x66/0x180 [xfs] xfs_filemap_fault+0x27/0x30 [xfs] __do_fault+0x19/0x40 __handle_mm_fault+0x8e8/0xb60 handle_mm_fault+0xfd/0x220 __do_page_fault+0x238/0x4e0 do_page_fault+0x22/0x30 ? page_fault+0x8/0x30 page_fault+0x1e/0x30 RIP: 0033:0x4009f0 Code: 03 00 00 00 e8 71 fd ff ff 48 83 f8 ff 49 89 c6 74 74 48 89 c6 bf c0 0c 40 00 31 c0 e8 69 fd ff ff 45 85 ff 7e 21 31 c9 66 90 <41> 0f be 14 0e 01 d3 f7 c1 ff 0f 00 00 75 05 41 c6 04 0e 2a 48 83 RSP: 002b:00007ffda45c9290 EFLAGS: 00010206 RAX: 000000000000001b RBX: 0000000000000000 RCX: 0000000001a1e000 RDX: 0000000000000000 RSI: 000000007fffffe5 RDI: 0000000000000000 RBP: 000000000000000c R08: 0000000000000000 R09: 00007f6d061ff20d R10: 0000000000000002 R11: 0000000000000246 R12: 00000000000186a0 R13: 0000000000000003 R14: 00007f6ce59b2000 R15: 0000000002800000 Task in /leaker killed as a result of limit of /leaker memory: usage 524288kB, limit 524288kB, failcnt 7221 memory+swap: usage 0kB, limit 9007199254740988kB, failcnt 0 kmem: usage 1944kB, limit 9007199254740988kB, failcnt 0 Memory cgroup stats for /leaker: cache:3632KB rss:518232KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB inactive_anon:0KB active_anon:518408KB inactive_file:3908KB active_file:12KB unevictable:0KB Memory cgroup out of memory: Kill process 2746 (leaker) score 992 or sacrifice child Killed process 2746 (leaker) total-vm:536704kB, anon-rss:518264kB, file-rss:1188kB, shmem-rss:0kB oom_reaper: reaped process 2746 (leaker), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB [3] leaker invoked oom-killer: gfp_mask=0x50, order=0, oom_score_adj=0 leaker cpuset=/ mems_allowed=0 CPU: 1 PID: 3206 Comm: leaker Not tainted 3.10.0-957.27.2.el7.x86_64 FireflyTeam#1 Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/13/2018 Call Trace: [<ffffffffaf364147>] dump_stack+0x19/0x1b [<ffffffffaf35eb6a>] dump_header+0x90/0x229 [<ffffffffaedbb456>] ? find_lock_task_mm+0x56/0xc0 [<ffffffffaee32a38>] ? try_get_mem_cgroup_from_mm+0x28/0x60 [<ffffffffaedbb904>] oom_kill_process+0x254/0x3d0 [<ffffffffaee36c36>] mem_cgroup_oom_synchronize+0x546/0x570 [<ffffffffaee360b0>] ? mem_cgroup_charge_common+0xc0/0xc0 [<ffffffffaedbc194>] pagefault_out_of_memory+0x14/0x90 [<ffffffffaf35d072>] mm_fault_error+0x6a/0x157 [<ffffffffaf3717c8>] __do_page_fault+0x3c8/0x4f0 [<ffffffffaf371925>] do_page_fault+0x35/0x90 [<ffffffffaf36d768>] page_fault+0x28/0x30 Task in /leaker killed as a result of limit of /leaker memory: usage 524288kB, limit 524288kB, failcnt 20628 memory+swap: usage 524288kB, limit 9007199254740988kB, failcnt 0 kmem: usage 0kB, limit 9007199254740988kB, failcnt 0 Memory cgroup stats for /leaker: cache:840KB rss:523448KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:523448KB inactive_file:464KB active_file:376KB unevictable:0KB Memory cgroup out of memory: Kill process 3206 (leaker) score 970 or sacrifice child Killed process 3206 (leaker) total-vm:536692kB, anon-rss:523304kB, file-rss:412kB, shmem-rss:0kB Bisected by Masoud Sharbiani. Link: http://lkml.kernel.org/r/[email protected] Fixes: 3da88fb ("mm, oom: move GFP_NOFS check to out_of_memory") [necessary after 29ef680] Signed-off-by: Tetsuo Handa <[email protected]> Reported-by: Masoud Sharbiani <[email protected]> Tested-by: Masoud Sharbiani <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: David Rientjes <[email protected]> Cc: <[email protected]> [4.19+] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
[ Upstream commit a154f5f ] The following call trace shows a deadlock issue due to recursive locking of mutex "device_mutex". First lock acquire is in target_for_each_device() and second in target_free_device(). PID: 148266 TASK: ffff8be21ffb5d00 CPU: 10 COMMAND: "iscsi_ttx" #0 [ffffa2bfc9ec3b18] __schedule at ffffffffa8060e7f #1 [ffffa2bfc9ec3ba0] schedule at ffffffffa8061224 #2 [ffffa2bfc9ec3bb8] schedule_preempt_disabled at ffffffffa80615ee #3 [ffffa2bfc9ec3bc8] __mutex_lock at ffffffffa8062fd7 #4 [ffffa2bfc9ec3c40] __mutex_lock_slowpath at ffffffffa80631d3 #5 [ffffa2bfc9ec3c50] mutex_lock at ffffffffa806320c #6 [ffffa2bfc9ec3c68] target_free_device at ffffffffc0935998 [target_core_mod] #7 [ffffa2bfc9ec3c90] target_core_dev_release at ffffffffc092f975 [target_core_mod] #8 [ffffa2bfc9ec3ca0] config_item_put at ffffffffa79d250f #9 [ffffa2bfc9ec3cd0] config_item_put at ffffffffa79d2583 rockchip-linux#10 [ffffa2bfc9ec3ce0] target_devices_idr_iter at ffffffffc0933f3a [target_core_mod] rockchip-linux#11 [ffffa2bfc9ec3d00] idr_for_each at ffffffffa803f6fc rockchip-linux#12 [ffffa2bfc9ec3d60] target_for_each_device at ffffffffc0935670 [target_core_mod] rockchip-linux#13 [ffffa2bfc9ec3d98] transport_deregister_session at ffffffffc0946408 [target_core_mod] rockchip-linux#14 [ffffa2bfc9ec3dc8] iscsit_close_session at ffffffffc09a44a6 [iscsi_target_mod] rockchip-linux#15 [ffffa2bfc9ec3df0] iscsit_close_connection at ffffffffc09a4a88 [iscsi_target_mod] rockchip-linux#16 [ffffa2bfc9ec3df8] finish_task_switch at ffffffffa76e5d07 rockchip-linux#17 [ffffa2bfc9ec3e78] iscsit_take_action_for_connection_exit at ffffffffc0991c23 [iscsi_target_mod] rockchip-linux#18 [ffffa2bfc9ec3ea0] iscsi_target_tx_thread at ffffffffc09a403b [iscsi_target_mod] rockchip-linux#19 [ffffa2bfc9ec3f08] kthread at ffffffffa76d8080 rockchip-linux#20 [ffffa2bfc9ec3f50] ret_from_fork at ffffffffa8200364 Fixes: 36d4cb4 ("scsi: target: Avoid that EXTENDED COPY commands trigger lock inversion") Signed-off-by: Junxiao Bi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Mike Christie <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Hi,
I have tried to test the kernel on this repository for ASUS Tinker Board.(using rk3288_linux_defconfig)
I have compiled the kernel and modules and boot the device with created kernel.
The kernel could load the device successfully however, modules does not load at all as this repository do not create all the required modules.
Device Tree seems different compare to official ASUS DTB file for miniarm.
Can you please advise if modules like WIFI and Bluetooth are included in this repository or not?
Thanks
The text was updated successfully, but these errors were encountered: