forked from torvalds/linux
-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
multiple ssp and icelake support #1
Closed
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
keyonjie
commented
Jun 22, 2018
- multiple ssp support.
- add icl support.
drivers. Register all DAI from platform driver Signed-off-by: Liam Girdwood <[email protected]> Signed-off-by: Keyon Jie <[email protected]>
Signed-off-by: Liam Girdwood <[email protected]> Signed-off-by: Keyon Jie <[email protected]>
Signed-off-by: Liam Girdwood <[email protected]>
Signed-off-by: Liam Girdwood <[email protected]> Signed-off-by: Keyon Jie <[email protected]>
Signed-off-by: Liam Girdwood <[email protected]>
Signed-off-by: Liam Girdwood <[email protected]> Signed-off-by: Keyon Jie <[email protected]>
Signed-off-by: Liam Girdwood <[email protected]>
This patch create nocodec BE DAI links based on platform dai drivers, with it, we can support multiple SSP/BEs for nocodec mode. Signed-off-by: Keyon Jie <[email protected]>
Signed-off-by: Keyon Jie <[email protected]>
…dynamic We should use .dynamic for all FE dai_links, so change probe and trace buffer ones here to align to that. Signed-off-by: Keyon Jie <[email protected]>
Signed-off-by: yuanwu <[email protected]> Signed-off-by: Keyon Jie <[email protected]>
plbossart
requested changes
Jun 22, 2018
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please split this in separate PR, and even for ICL don't combine stuff for sound/soc/intel/common and sound/soc/sof in the the commit
OK, let me split them and resend. |
Closed
plbossart
pushed a commit
that referenced
this pull request
Aug 13, 2018
…ment enabled Surround scsi_execute() calls with scsi_autopm_get_device() and scsi_autopm_put_device(). Note: removing sr_mutex protection from the scsi_cd_get() and scsi_cd_put() calls is safe because the purpose of sr_mutex is to serialize cdrom_*() calls. This patch avoids that complaints similar to the following appear in the kernel log if runtime power management is enabled: INFO: task systemd-udevd:650 blocked for more than 120 seconds. Not tainted 4.18.0-rc7-dbg+ #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. systemd-udevd D28176 650 513 0x00000104 Call Trace: __schedule+0x444/0xfe0 schedule+0x4e/0xe0 schedule_preempt_disabled+0x18/0x30 __mutex_lock+0x41c/0xc70 mutex_lock_nested+0x1b/0x20 __blkdev_get+0x106/0x970 blkdev_get+0x22c/0x5a0 blkdev_open+0xe9/0x100 do_dentry_open.isra.19+0x33e/0x570 vfs_open+0x7c/0xd0 path_openat+0x6e3/0x1120 do_filp_open+0x11c/0x1c0 do_sys_open+0x208/0x2d0 __x64_sys_openat+0x59/0x70 do_syscall_64+0x77/0x230 entry_SYSCALL_64_after_hwframe+0x49/0xbe Signed-off-by: Bart Van Assche <[email protected]> Cc: Maurizio Lombardi <[email protected]> Cc: Johannes Thumshirn <[email protected]> Cc: Alan Stern <[email protected]> Cc: <[email protected]> Tested-by: Johannes Thumshirn <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>
plbossart
pushed a commit
that referenced
this pull request
Aug 13, 2018
The shift of 'cwnd' with '(now - hc->tx_lsndtime) / hc->tx_rto' value can lead to undefined behavior [1]. In order to fix this use a gradual shift of the window with a 'while' loop, similar to what tcp_cwnd_restart() is doing. When comparing delta and RTO there is a minor difference between TCP and DCCP, the last one also invokes dccp_cwnd_restart() and reduces 'cwnd' if delta equals RTO. That case is preserved in this change. [1]: [40850.963623] UBSAN: Undefined behaviour in net/dccp/ccids/ccid2.c:237:7 [40851.043858] shift exponent 67 is too large for 32-bit type 'unsigned int' [40851.127163] CPU: 3 PID: 15940 Comm: netstress Tainted: G W E 4.18.0-rc7.x86_64 #1 ... [40851.377176] Call Trace: [40851.408503] dump_stack+0xf1/0x17b [40851.451331] ? show_regs_print_info+0x5/0x5 [40851.503555] ubsan_epilogue+0x9/0x7c [40851.548363] __ubsan_handle_shift_out_of_bounds+0x25b/0x2b4 [40851.617109] ? __ubsan_handle_load_invalid_value+0x18f/0x18f [40851.686796] ? xfrm4_output_finish+0x80/0x80 [40851.739827] ? lock_downgrade+0x6d0/0x6d0 [40851.789744] ? xfrm4_prepare_output+0x160/0x160 [40851.845912] ? ip_queue_xmit+0x810/0x1db0 [40851.895845] ? ccid2_hc_tx_packet_sent+0xd36/0x10a0 [dccp] [40851.963530] ccid2_hc_tx_packet_sent+0xd36/0x10a0 [dccp] [40852.029063] dccp_xmit_packet+0x1d3/0x720 [dccp] [40852.086254] dccp_write_xmit+0x116/0x1d0 [dccp] [40852.142412] dccp_sendmsg+0x428/0xb20 [dccp] [40852.195454] ? inet_dccp_listen+0x200/0x200 [dccp] [40852.254833] ? sched_clock+0x5/0x10 [40852.298508] ? sched_clock+0x5/0x10 [40852.342194] ? inet_create+0xdf0/0xdf0 [40852.388988] sock_sendmsg+0xd9/0x160 ... Fixes: 113ced1 ("dccp ccid-2: Perform congestion-window validation") Signed-off-by: Alexey Kodanev <[email protected]> Signed-off-by: David S. Miller <[email protected]>
plbossart
pushed a commit
that referenced
this pull request
Aug 13, 2018
If zram supports writeback feature, it's no longer a BD_CAP_SYNCHRONOUS_IO device beause zram does asynchronous IO operations for incompressible pages. Do not pretend to be synchronous IO device. It makes the system very sluggish due to waiting for IO completion from upper layers. Furthermore, it causes a user-after-free problem because swap thinks the opearion is done when the IO functions returns so it can free the page (e.g., lock_page_or_retry and goto out_release in do_swap_page) but in fact, IO is asynchronous so the driver could access a just freed page afterward. This patch fixes the problem. BUG: Bad page state in process qemu-system-x86 pfn:3dfab21 page:ffffdfb137eac840 count:0 mapcount:0 mapping:0000000000000000 index:0x1 flags: 0x17fffc000000008(uptodate) raw: 017fffc000000008 dead000000000100 dead000000000200 0000000000000000 raw: 0000000000000001 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: PAGE_FLAGS_CHECK_AT_PREP flag set bad because of flags: 0x8(uptodate) CPU: 4 PID: 1039 Comm: qemu-system-x86 Tainted: G B 4.18.0-rc5+ #1 Hardware name: Supermicro Super Server/X10SRL-F, BIOS 2.0b 05/02/2017 Call Trace: dump_stack+0x5c/0x7b bad_page+0xba/0x120 get_page_from_freelist+0x1016/0x1250 __alloc_pages_nodemask+0xfa/0x250 alloc_pages_vma+0x7c/0x1c0 do_swap_page+0x347/0x920 __handle_mm_fault+0x7b4/0x1110 handle_mm_fault+0xfc/0x1f0 __get_user_pages+0x12f/0x690 get_user_pages_unlocked+0x148/0x1f0 __gfn_to_pfn_memslot+0xff/0x3c0 [kvm] try_async_pf+0x87/0x230 [kvm] tdp_page_fault+0x132/0x290 [kvm] kvm_mmu_page_fault+0x74/0x570 [kvm] kvm_arch_vcpu_ioctl_run+0x9b3/0x1990 [kvm] kvm_vcpu_ioctl+0x388/0x5d0 [kvm] do_vfs_ioctl+0xa2/0x630 ksys_ioctl+0x70/0x80 __x64_sys_ioctl+0x16/0x20 do_syscall_64+0x55/0x100 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Link: https://lore.kernel.org/lkml/[email protected]/ Link: http://lkml.kernel.org/r/[email protected] [[email protected]: fix changelog, add comment] Link: https://lore.kernel.org/lkml/[email protected]/ Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] [[email protected]: coding-style fixes] Signed-off-by: Minchan Kim <[email protected]> Reported-by: Tino Lehnig <[email protected]> Tested-by: Tino Lehnig <[email protected]> Cc: Sergey Senozhatsky <[email protected]> Cc: Jens Axboe <[email protected]> Cc: <[email protected]> [4.15+] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
keyonjie
pushed a commit
that referenced
this pull request
Oct 15, 2018
In case local OOB data was generated and other device initiated pairing claiming that it has got OOB data, following crash occurred: [ 222.847853] general protection fault: 0000 [#1] SMP PTI [ 222.848025] CPU: 1 PID: 42 Comm: kworker/u5:0 Tainted: G C 4.18.0-custom #4 [ 222.848158] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 222.848307] Workqueue: hci0 hci_rx_work [bluetooth] [ 222.848416] RIP: 0010:compute_ecdh_secret+0x5a/0x270 [bluetooth] [ 222.848540] Code: 0c af f5 48 8b 3d 46 de f0 f6 ba 40 00 00 00 be c0 00 60 00 e8 b7 7b c5 f5 48 85 c0 0f 84 ea 01 00 00 48 89 c3 e8 16 0c af f5 <49> 8b 47 38 be c0 00 60 00 8b 78 f8 48 83 c7 48 e8 51 84 c5 f5 48 [ 222.848914] RSP: 0018:ffffb1664087fbc0 EFLAGS: 00010293 [ 222.849021] RAX: ffff8a5750d7dc00 RBX: ffff8a5671096780 RCX: ffffffffc08bc32a [ 222.849111] RDX: 0000000000000000 RSI: 00000000006000c0 RDI: ffff8a5752003800 [ 222.849192] RBP: ffffb1664087fc60 R08: ffff8a57525280a0 R09: ffff8a5752003800 [ 222.849269] R10: ffffb1664087fc70 R11: 0000000000000093 R12: ffff8a5674396e00 [ 222.849350] R13: ffff8a574c2e79aa R14: ffff8a574c2e796a R15: 020e0e100d010101 [ 222.849429] FS: 0000000000000000(0000) GS:ffff8a5752500000(0000) knlGS:0000000000000000 [ 222.849518] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 222.849586] CR2: 000055856016a038 CR3: 0000000110d2c005 CR4: 00000000000606e0 [ 222.849671] Call Trace: [ 222.849745] ? sc_send_public_key+0x110/0x2a0 [bluetooth] [ 222.849825] ? sc_send_public_key+0x115/0x2a0 [bluetooth] [ 222.849925] smp_recv_cb+0x959/0x2490 [bluetooth] [ 222.850023] ? _cond_resched+0x19/0x40 [ 222.850105] ? mutex_lock+0x12/0x40 [ 222.850202] l2cap_recv_frame+0x109d/0x3420 [bluetooth] [ 222.850315] ? l2cap_recv_frame+0x109d/0x3420 [bluetooth] [ 222.850426] ? __switch_to_asm+0x34/0x70 [ 222.850515] ? __switch_to_asm+0x40/0x70 [ 222.850625] ? __switch_to_asm+0x34/0x70 [ 222.850724] ? __switch_to_asm+0x40/0x70 [ 222.850786] ? __switch_to_asm+0x34/0x70 [ 222.850846] ? __switch_to_asm+0x40/0x70 [ 222.852581] ? __switch_to_asm+0x34/0x70 [ 222.854976] ? __switch_to_asm+0x40/0x70 [ 222.857475] ? __switch_to_asm+0x40/0x70 [ 222.859775] ? __switch_to_asm+0x34/0x70 [ 222.861218] ? __switch_to_asm+0x40/0x70 [ 222.862327] ? __switch_to_asm+0x34/0x70 [ 222.863758] l2cap_recv_acldata+0x266/0x3c0 [bluetooth] [ 222.865122] hci_rx_work+0x1c9/0x430 [bluetooth] [ 222.867144] process_one_work+0x210/0x4c0 [ 222.868248] worker_thread+0x41/0x4d0 [ 222.869420] kthread+0x141/0x160 [ 222.870694] ? process_one_work+0x4c0/0x4c0 [ 222.871668] ? kthread_create_worker_on_cpu+0x90/0x90 [ 222.872896] ret_from_fork+0x35/0x40 [ 222.874132] Modules linked in: algif_hash algif_skcipher af_alg rfcomm bnep btusb btrtl btbcm btintel snd_intel8x0 cmac intel_rapl_perf vboxvideo(C) snd_ac97_codec bluetooth ac97_bus joydev ttm snd_pcm ecdh_generic drm_kms_helper snd_timer snd input_leds drm serio_raw fb_sys_fops soundcore syscopyarea sysfillrect sysimgblt mac_hid sch_fq_codel ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear hid_generic usbhid hid crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd cryptd glue_helper ahci psmouse libahci i2c_piix4 video e1000 pata_acpi [ 222.883153] fbcon_switch: detected unhandled fb_set_par error, error code -16 [ 222.886774] fbcon_switch: detected unhandled fb_set_par error, error code -16 [ 222.890503] ---[ end trace 6504aa7a777b5316 ]--- [ 222.890541] RIP: 0010:compute_ecdh_secret+0x5a/0x270 [bluetooth] [ 222.890551] Code: 0c af f5 48 8b 3d 46 de f0 f6 ba 40 00 00 00 be c0 00 60 00 e8 b7 7b c5 f5 48 85 c0 0f 84 ea 01 00 00 48 89 c3 e8 16 0c af f5 <49> 8b 47 38 be c0 00 60 00 8b 78 f8 48 83 c7 48 e8 51 84 c5 f5 48 [ 222.890555] RSP: 0018:ffffb1664087fbc0 EFLAGS: 00010293 [ 222.890561] RAX: ffff8a5750d7dc00 RBX: ffff8a5671096780 RCX: ffffffffc08bc32a [ 222.890565] RDX: 0000000000000000 RSI: 00000000006000c0 RDI: ffff8a5752003800 [ 222.890571] RBP: ffffb1664087fc60 R08: ffff8a57525280a0 R09: ffff8a5752003800 [ 222.890576] R10: ffffb1664087fc70 R11: 0000000000000093 R12: ffff8a5674396e00 [ 222.890581] R13: ffff8a574c2e79aa R14: ffff8a574c2e796a R15: 020e0e100d010101 [ 222.890586] FS: 0000000000000000(0000) GS:ffff8a5752500000(0000) knlGS:0000000000000000 [ 222.890591] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 222.890594] CR2: 000055856016a038 CR3: 0000000110d2c005 CR4: 00000000000606e0 This commit fixes a bug where invalid pointer to crypto tfm was used for SMP SC ECDH calculation when OOB was in use. Solution is to use same crypto tfm than when generating OOB material on generate_oob() function. This bug was introduced in commit c0153b0 ("Bluetooth: let the crypto subsystem generate the ecc privkey"). Bug was found by fuzzing kernel SMP implementation using Synopsys Defensics. Signed-off-by: Matias Karhumaa <[email protected]> Signed-off-by: Johan Hedberg <[email protected]> Signed-off-by: Marcel Holtmann <[email protected]>
keyonjie
pushed a commit
that referenced
this pull request
Oct 15, 2018
This reverts commit 12eeeb4. The patch doesn't fix accessing memory with null pointer in skl_interrupt(). There are two problems: 1) skl_init_chip() is called twice, before and after dma buffer is allocate. The first call sets bus->chip_init which prevents the second from initializing bus->corb.buf and rirb.buf from bus->rb.area. 2) snd_hdac_bus_init_chip() enables interrupt before snd_hdac_bus_init_cmd_io() initializing dma buffers. There is a small window which skl_interrupt() can be called if irq has been acquired. If so, it crashes when using null dma buffer pointers. Will fix the problems in the following patches. Also attaching the crash for future reference. [ 16.949148] general protection fault: 0000 [#1] PREEMPT SMP KASAN PTI <snipped> [ 16.950903] Call Trace: [ 16.950906] <IRQ> [ 16.950918] skl_interrupt+0x19e/0x2d6 [snd_soc_skl] [ 16.950926] ? dma_supported+0xb5/0xb5 [snd_soc_skl] [ 16.950933] __handle_irq_event_percpu+0x27a/0x6c8 [ 16.950937] ? __irq_wake_thread+0x1d1/0x1d1 [ 16.950942] ? __do_softirq+0x57a/0x69e [ 16.950944] handle_irq_event_percpu+0x95/0x1ba [ 16.950948] ? _raw_spin_unlock+0x65/0xdc [ 16.950951] ? __handle_irq_event_percpu+0x6c8/0x6c8 [ 16.950953] ? _raw_spin_unlock+0x65/0xdc [ 16.950957] ? time_cpufreq_notifier+0x483/0x483 [ 16.950959] handle_irq_event+0x89/0x123 [ 16.950962] handle_fasteoi_irq+0x16f/0x425 [ 16.950965] handle_irq+0x1fe/0x28e [ 16.950969] do_IRQ+0x6e/0x12e [ 16.950972] common_interrupt+0x7a/0x7a [ 16.950974] </IRQ> <snipped> [ 16.951031] RIP: snd_hdac_bus_update_rirb+0x19b/0x4cf [snd_hda_core] RSP: ffff88015c807c08 [ 16.951036] ---[ end trace 58bf9ece1775bc92 ]--- Fixes: 2eeeb4f4733b ("ASoC: Intel: Skylake: Acquire irq after RIRB allocation") Signed-off-by: Yu Zhao <[email protected]> Signed-off-by: Mark Brown <[email protected]>
keyonjie
pushed a commit
that referenced
this pull request
Oct 15, 2018
When netvsc device is removed it can call reschedule in RCU context. This happens because canceling the subchannel setup work could (in theory) cause a reschedule when manipulating the timer. To reproduce, run with lockdep enabled kernel and unbind a network device from hv_netvsc (via sysfs). [ 160.682011] WARNING: suspicious RCU usage [ 160.707466] 4.19.0-rc3-uio+ #2 Not tainted [ 160.709937] ----------------------------- [ 160.712352] ./include/linux/rcupdate.h:302 Illegal context switch in RCU read-side critical section! [ 160.723691] [ 160.723691] other info that might help us debug this: [ 160.723691] [ 160.730955] [ 160.730955] rcu_scheduler_active = 2, debug_locks = 1 [ 160.762813] 5 locks held by rebind-eth.sh/1812: [ 160.766851] #0: 000000008befa37a (sb_writers#6){.+.+}, at: vfs_write+0x184/0x1b0 [ 160.773416] #1: 00000000b097f236 (&of->mutex){+.+.}, at: kernfs_fop_write+0xe2/0x1a0 [ 160.783766] #2: 0000000041ee6889 (kn->count#3){++++}, at: kernfs_fop_write+0xeb/0x1a0 [ 160.787465] #3: 0000000056d92a74 (&dev->mutex){....}, at: device_release_driver_internal+0x39/0x250 [ 160.816987] #4: 0000000030f6031e (rcu_read_lock){....}, at: netvsc_remove+0x1e/0x250 [hv_netvsc] [ 160.828629] [ 160.828629] stack backtrace: [ 160.831966] CPU: 1 PID: 1812 Comm: rebind-eth.sh Not tainted 4.19.0-rc3-uio+ #2 [ 160.832952] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v1.0 11/26/2012 [ 160.832952] Call Trace: [ 160.832952] dump_stack+0x85/0xcb [ 160.832952] ___might_sleep+0x1a3/0x240 [ 160.832952] __flush_work+0x57/0x2e0 [ 160.832952] ? __mutex_lock+0x83/0x990 [ 160.832952] ? __kernfs_remove+0x24f/0x2e0 [ 160.832952] ? __kernfs_remove+0x1b2/0x2e0 [ 160.832952] ? mark_held_locks+0x50/0x80 [ 160.832952] ? get_work_pool+0x90/0x90 [ 160.832952] __cancel_work_timer+0x13c/0x1e0 [ 160.832952] ? netvsc_remove+0x1e/0x250 [hv_netvsc] [ 160.832952] ? __lock_is_held+0x55/0x90 [ 160.832952] netvsc_remove+0x9a/0x250 [hv_netvsc] [ 160.832952] vmbus_remove+0x26/0x30 [ 160.832952] device_release_driver_internal+0x18a/0x250 [ 160.832952] unbind_store+0xb4/0x180 [ 160.832952] kernfs_fop_write+0x113/0x1a0 [ 160.832952] __vfs_write+0x36/0x1a0 [ 160.832952] ? rcu_read_lock_sched_held+0x6b/0x80 [ 160.832952] ? rcu_sync_lockdep_assert+0x2e/0x60 [ 160.832952] ? __sb_start_write+0x141/0x1a0 [ 160.832952] ? vfs_write+0x184/0x1b0 [ 160.832952] vfs_write+0xbe/0x1b0 [ 160.832952] ksys_write+0x55/0xc0 [ 160.832952] do_syscall_64+0x60/0x1b0 [ 160.832952] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 160.832952] RIP: 0033:0x7fe48f4c8154 Resolve this by getting RTNL earlier. This is safe because the subchannel work queue does trylock on RTNL and will detect the race. Fixes: 7b2ee50 ("hv_netvsc: common detach logic") Signed-off-by: Stephen Hemminger <[email protected]> Reviewed-by: Haiyang Zhang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
keyonjie
pushed a commit
that referenced
this pull request
Oct 15, 2018
Chen Yu reported a divide-by-zero error when accessing the 'size' resctrl file when a MBA resource is enabled. divide error: 0000 [#1] SMP PTI CPU: 93 PID: 1929 Comm: cat Not tainted 4.19.0-rc2-debug-rdt+ #25 RIP: 0010:rdtgroup_cbm_to_size+0x7e/0xa0 Call Trace: rdtgroup_size_show+0x11a/0x1d0 seq_read+0xd8/0x3b0 Quoting Chen Yu's report: This is because for MB resource, the r->cache.cbm_len is zero, thus calculating size in rdtgroup_cbm_to_size() will trigger the exception. Fix this issue in the 'size' file by getting correct memory bandwidth value which is in MBps when MBA software controller is enabled or in percentage when MBA software controller is disabled. Fixes: d9b48c8 ("x86/intel_rdt: Display resource groups' allocations in bytes") Reported-by: Chen Yu <[email protected]> Signed-off-by: Reinette Chatre <[email protected]> Signed-off-by: Fenghua Yu <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Chen Yu <[email protected]> Cc: "H Peter Anvin" <[email protected]> Cc: "Tony Luck" <[email protected]> Cc: "Xiaochen Shen" <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected]
keyonjie
pushed a commit
that referenced
this pull request
Oct 15, 2018
For a failing smc_listen_rdma_finish() smc_listen_decline() is called. If fallback is possible, the new socket is already enqueued to be accepted in smc_listen_decline(). Avoid enqueuing a second time afterwards in this case, otherwise the smc_create_lgr_pending lock is released twice: [ 373.463976] WARNING: bad unlock balance detected! [ 373.463978] 4.18.0-rc7+ #123 Tainted: G O [ 373.463979] ------------------------------------- [ 373.463980] kworker/1:1/30 is trying to release lock (smc_create_lgr_pending) at: [ 373.463990] [<000003ff801205fc>] smc_listen_work+0x22c/0x5d0 [smc] [ 373.463991] but there are no more locks to release! [ 373.463991] other info that might help us debug this: [ 373.463993] 2 locks held by kworker/1:1/30: [ 373.463994] #0: 00000000772cbaed ((wq_completion)"events"){+.+.}, at: process_one_work+0x1ec/0x6b0 [ 373.464000] #1: 000000003ad0894a ((work_completion)(&new_smc->smc_listen_work)){+.+.}, at: process_one_work+0x1ec/0x6b0 [ 373.464003] stack backtrace: [ 373.464005] CPU: 1 PID: 30 Comm: kworker/1:1 Kdump: loaded Tainted: G O 4.18.0-rc7uschi+ #123 [ 373.464007] Hardware name: IBM 2827 H43 738 (LPAR) [ 373.464010] Workqueue: events smc_listen_work [smc] [ 373.464011] Call Trace: [ 373.464015] ([<0000000000114100>] show_stack+0x60/0xd8) [ 373.464019] [<0000000000a8c9bc>] dump_stack+0x9c/0xd8 [ 373.464021] [<00000000001dcaf8>] print_unlock_imbalance_bug+0xf8/0x108 [ 373.464022] [<00000000001e045c>] lock_release+0x114/0x4f8 [ 373.464025] [<0000000000aa87fa>] __mutex_unlock_slowpath+0x4a/0x300 [ 373.464027] [<000003ff801205fc>] smc_listen_work+0x22c/0x5d0 [smc] [ 373.464029] [<0000000000197a68>] process_one_work+0x2a8/0x6b0 [ 373.464030] [<0000000000197ec2>] worker_thread+0x52/0x410 [ 373.464033] [<000000000019fd0e>] kthread+0x15e/0x178 [ 373.464035] [<0000000000aaf58a>] kernel_thread_starter+0x6/0xc [ 373.464052] [<0000000000aaf584>] kernel_thread_starter+0x0/0xc [ 373.464054] INFO: lockdep is turned off. Signed-off-by: Ursula Braun <[email protected]> Signed-off-by: David S. Miller <[email protected]>
keyonjie
pushed a commit
that referenced
this pull request
Oct 15, 2018
If a network interface is created prior to the SFP socket being available, ethtool can request module information. This unfortunately leads to an oops: Unable to handle kernel NULL pointer dereference at virtual address 00000008 pgd = (ptrval) [00000008] *pgd=7c400831, *pte=00000000, *ppte=00000000 Internal error: Oops: 17 [#1] SMP ARM Modules linked in: CPU: 0 PID: 1480 Comm: ethtool Not tainted 4.19.0-rc3 #138 Hardware name: Broadcom Northstar Plus SoC PC is at sfp_get_module_info+0x8/0x10 LR is at dev_ethtool+0x218c/0x2afc Fix this by not filling in the network device's SFP bus pointer until SFP is fully bound, thereby avoiding the core calling into the SFP bus code. Fixes: ce0aa27 ("sfp: add sfp-bus to bridge between network devices and sfp cages") Reported-by: Florian Fainelli <[email protected]> Tested-by: Florian Fainelli <[email protected]> Signed-off-by: Russell King <[email protected]> Signed-off-by: David S. Miller <[email protected]>
keyonjie
pushed a commit
that referenced
this pull request
Oct 15, 2018
Applying dynamic usbcore quirks in early booting when the slab is not yet ready would cause kernel panic of null pointer dereference because the quirk_count has been counted as 1 while the quirk_list was failed to allocate. i.e., [ 1.044970] BUG: unable to handle kernel NULL pointer dereference at (null) [ 1.044995] IP: [<ffffffffb0953ec7>] usb_detect_quirks+0x88/0xd1 [ 1.045016] PGD 0 [ 1.045026] Oops: 0000 [#1] PREEMPT SMP [ 1.046986] gsmi: Log Shutdown Reason 0x03 [ 1.046995] Modules linked in: [ 1.047008] CPU: 0 PID: 81 Comm: kworker/0:3 Not tainted 4.4.154 #28 [ 1.047016] Hardware name: Google Coral/Coral, BIOS Google_Coral.10068.27.0 12/04/2017 [ 1.047028] Workqueue: usb_hub_wq hub_event [ 1.047037] task: ffff88017a321c80 task.stack: ffff88017a384000 [ 1.047044] RIP: 0010:[<ffffffffb0953ec7>] [<ffffffffb0953ec7>] usb_detect_quirks+0x88/0xd1 To tackle this odd, let's balance the quirk_count to 0 when the kcalloc call fails, and defer the quirk setting into a lower level callback which ensures that the kernel memory management has been initialized. Fixes: 027bd6c ("usb: core: Add "quirks" parameter for usbcore") Signed-off-by: Harry Pan <[email protected]> Acked-by: Kai-Heng Feng <[email protected]> Cc: stable <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
keyonjie
pushed a commit
that referenced
this pull request
Oct 15, 2018
Deferred struct page init is needed only on systems with large amount of physical memory to improve boot performance. 32-bit systems do not benefit from this feature. Jiri reported a problem where deferred struct pages do not work well with x86-32: [ 0.035162] Dentry cache hash table entries: 131072 (order: 7, 524288 bytes) [ 0.035725] Inode-cache hash table entries: 65536 (order: 6, 262144 bytes) [ 0.036269] Initializing CPU#0 [ 0.036513] Initializing HighMem for node 0 (00036ffe:0007ffe0) [ 0.038459] page:f6780000 is uninitialized and poisoned [ 0.038460] raw: ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff [ 0.039509] page dumped because: VM_BUG_ON_PAGE(1 && PageCompound(page)) [ 0.040038] ------------[ cut here ]------------ [ 0.040399] kernel BUG at include/linux/page-flags.h:293! [ 0.040823] invalid opcode: 0000 [#1] SMP PTI [ 0.041166] CPU: 0 PID: 0 Comm: swapper Not tainted 4.19.0-rc1_pt_jiri #9 [ 0.041694] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-20171110_100015-anatol 04/01/2014 [ 0.042496] EIP: free_highmem_page+0x64/0x80 [ 0.042839] Code: 13 46 d8 c1 e8 18 5d 83 e0 03 8d 04 c0 c1 e0 06 ff 80 ec 5f 44 d8 c3 8d b4 26 00 00 00 00 ba 08 65 28 d8 89 d8 e8 fc 71 02 00 <0f> 0b 8d 76 00 8d bc 27 00 00 00 00 ba d0 b1 26 d8 89 d8 e8 e4 71 [ 0.044338] EAX: 0000003c EBX: f6780000 ECX: 00000000 EDX: d856cbe8 [ 0.044868] ESI: 0007ffe0 EDI: d838df20 EBP: d838df00 ESP: d838defc [ 0.045372] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00210086 [ 0.045913] CR0: 80050033 CR2: 00000000 CR3: 18556000 CR4: 00040690 [ 0.046413] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 [ 0.046913] DR6: fffe0ff0 DR7: 00000400 [ 0.047220] Call Trace: [ 0.047419] add_highpages_with_active_regions+0xbd/0x10d [ 0.047854] set_highmem_pages_init+0x5b/0x71 [ 0.048202] mem_init+0x2b/0x1e8 [ 0.048460] start_kernel+0x1d2/0x425 [ 0.048757] i386_start_kernel+0x93/0x97 [ 0.049073] startup_32_smp+0x164/0x168 [ 0.049379] Modules linked in: [ 0.049626] ---[ end trace 337949378db0abbb ]--- We free highmem pages before their struct pages are initialized: mem_init() set_highmem_pages_init() add_highpages_with_active_regions() free_highmem_page() .. Access uninitialized struct page here.. Because there is no reason to have this feature on 32-bit systems, just disable it. Link: http://lkml.kernel.org/r/[email protected] Fixes: 2e3ca40 ("mm: relax deferred struct page requirements") Signed-off-by: Pavel Tatashin <[email protected]> Reported-by: Jiri Slaby <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
bardliao
pushed a commit
that referenced
this pull request
Jan 14, 2025
Access to genmask field in struct nft_set_ext results in unaligned atomic read: [ 72.130109] Unable to handle kernel paging request at virtual address ffff0000c2bb708c [ 72.131036] Mem abort info: [ 72.131213] ESR = 0x0000000096000021 [ 72.131446] EC = 0x25: DABT (current EL), IL = 32 bits [ 72.132209] SET = 0, FnV = 0 [ 72.133216] EA = 0, S1PTW = 0 [ 72.134080] FSC = 0x21: alignment fault [ 72.135593] Data abort info: [ 72.137194] ISV = 0, ISS = 0x00000021, ISS2 = 0x00000000 [ 72.142351] CM = 0, WnR = 0, TnD = 0, TagAccess = 0 [ 72.145989] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 [ 72.150115] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000237d27000 [ 72.154893] [ffff0000c2bb708c] pgd=0000000000000000, p4d=180000023ffff403, pud=180000023f84b403, pmd=180000023f835403, +pte=0068000102bb7707 [ 72.163021] Internal error: Oops: 0000000096000021 [#1] SMP [...] [ 72.170041] CPU: 7 UID: 0 PID: 54 Comm: kworker/7:0 Tainted: G E 6.13.0-rc3+ #2 [ 72.170509] Tainted: [E]=UNSIGNED_MODULE [ 72.170720] Hardware name: QEMU QEMU Virtual Machine, BIOS edk2-stable202302-for-qemu 03/01/2023 [ 72.171192] Workqueue: events_power_efficient nft_rhash_gc [nf_tables] [ 72.171552] pstate: 21400005 (nzCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--) [ 72.171915] pc : nft_rhash_gc+0x200/0x2d8 [nf_tables] [ 72.172166] lr : nft_rhash_gc+0x128/0x2d8 [nf_tables] [ 72.172546] sp : ffff800081f2bce0 [ 72.172724] x29: ffff800081f2bd40 x28: ffff0000c2bb708c x27: 0000000000000038 [ 72.173078] x26: ffff0000c6780ef0 x25: ffff0000c643df00 x24: ffff0000c6778f78 [ 72.173431] x23: 000000000000001a x22: ffff0000c4b1f000 x21: ffff0000c6780f78 [ 72.173782] x20: ffff0000c2bb70dc x19: ffff0000c2bb7080 x18: 0000000000000000 [ 72.174135] x17: ffff0000c0a4e1c0 x16: 0000000000003000 x15: 0000ac26d173b978 [ 72.174485] x14: ffffffffffffffff x13: 0000000000000030 x12: ffff0000c6780ef0 [ 72.174841] x11: 0000000000000000 x10: ffff800081f2bcf8 x9 : ffff0000c3000000 [ 72.175193] x8 : 00000000000004be x7 : 0000000000000000 x6 : 0000000000000000 [ 72.175544] x5 : 0000000000000040 x4 : ffff0000c3000010 x3 : 0000000000000000 [ 72.175871] x2 : 0000000000003a98 x1 : ffff0000c2bb708c x0 : 0000000000000004 [ 72.176207] Call trace: [ 72.176316] nft_rhash_gc+0x200/0x2d8 [nf_tables] (P) [ 72.176653] process_one_work+0x178/0x3d0 [ 72.176831] worker_thread+0x200/0x3f0 [ 72.176995] kthread+0xe8/0xf8 [ 72.177130] ret_from_fork+0x10/0x20 [ 72.177289] Code: 54fff984 d503201f d2800080 91003261 (f820303f) [ 72.177557] ---[ end trace 0000000000000000 ]--- Align struct nft_set_ext to word size to address this and documentation it. pahole reports that this increases the size of elements for rhash and pipapo in 8 bytes on x86_64. Fixes: 7ffc748 ("netfilter: nft_set_hash: skip duplicated elements pending gc run") Signed-off-by: Pablo Neira Ayuso <[email protected]>
bardliao
pushed a commit
that referenced
this pull request
Jan 14, 2025
Using mutex lock in IO hot path causes the kernel BUG sleeping while atomic. Shinichiro[1], first encountered this issue while running blktest nvme/052 shown below: BUG: sleeping function called from invalid context at kernel/locking/mutex.c:585 in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 996, name: (udev-worker) preempt_count: 0, expected: 0 RCU nest depth: 1, expected: 0 2 locks held by (udev-worker)/996: #0: ffff8881004570c8 (mapping.invalidate_lock){.+.+}-{3:3}, at: page_cache_ra_unbounded+0x155/0x5c0 #1: ffffffff8607eaa0 (rcu_read_lock){....}-{1:2}, at: blk_mq_flush_plug_list+0xa75/0x1950 CPU: 2 UID: 0 PID: 996 Comm: (udev-worker) Not tainted 6.12.0-rc3+ #339 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-2.fc40 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x6a/0x90 __might_resched.cold+0x1f7/0x23d ? __pfx___might_resched+0x10/0x10 ? vsnprintf+0xdeb/0x18f0 __mutex_lock+0xf4/0x1220 ? nvmet_subsys_nsid_exists+0xb9/0x150 [nvmet] ? __pfx_vsnprintf+0x10/0x10 ? __pfx___mutex_lock+0x10/0x10 ? snprintf+0xa5/0xe0 ? xas_load+0x1ce/0x3f0 ? nvmet_subsys_nsid_exists+0xb9/0x150 [nvmet] nvmet_subsys_nsid_exists+0xb9/0x150 [nvmet] ? __pfx_nvmet_subsys_nsid_exists+0x10/0x10 [nvmet] nvmet_req_find_ns+0x24e/0x300 [nvmet] nvmet_req_init+0x694/0xd40 [nvmet] ? blk_mq_start_request+0x11c/0x750 ? nvme_setup_cmd+0x369/0x990 [nvme_core] nvme_loop_queue_rq+0x2a7/0x7a0 [nvme_loop] ? __pfx___lock_acquire+0x10/0x10 ? __pfx_nvme_loop_queue_rq+0x10/0x10 [nvme_loop] __blk_mq_issue_directly+0xe2/0x1d0 ? __pfx___blk_mq_issue_directly+0x10/0x10 ? blk_mq_request_issue_directly+0xc2/0x140 blk_mq_plug_issue_direct+0x13f/0x630 ? lock_acquire+0x2d/0xc0 ? blk_mq_flush_plug_list+0xa75/0x1950 blk_mq_flush_plug_list+0xa9d/0x1950 ? __pfx_blk_mq_flush_plug_list+0x10/0x10 ? __pfx_mpage_readahead+0x10/0x10 __blk_flush_plug+0x278/0x4d0 ? __pfx___blk_flush_plug+0x10/0x10 ? lock_release+0x460/0x7a0 blk_finish_plug+0x4e/0x90 read_pages+0x51b/0xbc0 ? __pfx_read_pages+0x10/0x10 ? lock_release+0x460/0x7a0 page_cache_ra_unbounded+0x326/0x5c0 force_page_cache_ra+0x1ea/0x2f0 filemap_get_pages+0x59e/0x17b0 ? __pfx_filemap_get_pages+0x10/0x10 ? lock_is_held_type+0xd5/0x130 ? __pfx___might_resched+0x10/0x10 ? find_held_lock+0x2d/0x110 filemap_read+0x317/0xb70 ? up_write+0x1ba/0x510 ? __pfx_filemap_read+0x10/0x10 ? inode_security+0x54/0xf0 ? selinux_file_permission+0x36d/0x420 blkdev_read_iter+0x143/0x3b0 vfs_read+0x6ac/0xa20 ? __pfx_vfs_read+0x10/0x10 ? __pfx_vm_mmap_pgoff+0x10/0x10 ? __pfx___seccomp_filter+0x10/0x10 ksys_read+0xf7/0x1d0 ? __pfx_ksys_read+0x10/0x10 do_syscall_64+0x93/0x180 ? lockdep_hardirqs_on_prepare+0x16d/0x400 ? do_syscall_64+0x9f/0x180 ? lockdep_hardirqs_on+0x78/0x100 ? do_syscall_64+0x9f/0x180 ? lockdep_hardirqs_on_prepare+0x16d/0x400 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7f565bd1ce11 Code: 00 48 8b 15 09 90 0d 00 f7 d8 64 89 02 b8 ff ff ff ff eb bd e8 d0 ad 01 00 f3 0f 1e fa 80 3d 35 12 0e 00 00 74 13 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 4f c3 66 0f 1f 44 00 00 55 48 89 e5 48 83 ec RSP: 002b:00007ffd6e7a20c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 RAX: ffffffffffffffda RBX: 0000000000001000 RCX: 00007f565bd1ce11 RDX: 0000000000001000 RSI: 00007f565babb000 RDI: 0000000000000014 RBP: 00007ffd6e7a2130 R08: 00000000ffffffff R09: 0000000000000000 R10: 0000556000bfa610 R11: 0000000000000246 R12: 000000003ffff000 R13: 0000556000bfa5b0 R14: 0000000000000e00 R15: 0000556000c07328 </TASK> Apparently, the above issue is caused due to using mutex lock while we're in IO hot path. It's a regression caused with commit 5053639 ("nvmet: fix nvme status code when namespace is disabled"). The mutex ->su_mutex is used to find whether a disabled nsid exists in the config group or not. This is to differentiate between a nsid that is disabled vs non-existent. To mitigate the above issue, we've worked upon a fix[2] where we now insert nsid in subsys Xarray as soon as it's created under config group and later when that nsid is enabled, we add an Xarray mark on it and set ns->enabled to true. The Xarray mark is useful while we need to loop through all enabled namepsaces under a subsystem using xa_for_each_marked() API. If later a nsid is disabled then we clear Xarray mark from it and also set ns->enabled to false. It's only when nsid is deleted from the config group we delete it from the Xarray. So with this change, now we could easily differentiate a nsid is disabled (i.e. Xarray entry for ns exists but ns->enabled is set to false) vs non- existent (i.e.Xarray entry for ns doesn't exist). Link: https://lore.kernel.org/linux-nvme/[email protected]/ [2] Reported-by: Shinichiro Kawasaki <[email protected]> Closes: https://lore.kernel.org/linux-nvme/tqcy3sveity7p56v7ywp7ssyviwcb3w4623cnxj3knoobfcanq@yxgt2mjkbkam/ [1] Fixes: 5053639 ("nvmet: fix nvme status code when namespace is disabled") Fix-suggested-by: Christoph Hellwig <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Reviewed-by: Chaitanya Kulkarni <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Nilay Shroff <[email protected]> Signed-off-by: Keith Busch <[email protected]>
bardliao
pushed a commit
that referenced
this pull request
Jan 14, 2025
This reverts commit fd620fc. Boot failures reported by KernelCI: [ 4.395400] mediatek-drm mediatek-drm.5.auto: bound 1c014000.merge (ops 0xffffd35fd12975f8) [ 4.396155] mediatek-drm mediatek-drm.5.auto: bound 1c000000.ovl (ops 0xffffd35fd12977b8) [ 4.411951] mediatek-drm mediatek-drm.5.auto: bound 1c002000.rdma (ops 0xffffd35fd12989c0) [ 4.536837] mediatek-drm mediatek-drm.5.auto: bound 1c004000.ccorr (ops 0xffffd35fd1296cf0) [ 4.545181] mediatek-drm mediatek-drm.5.auto: bound 1c005000.aal (ops 0xffffd35fd1296a80) [ 4.553344] mediatek-drm mediatek-drm.5.auto: bound 1c006000.gamma (ops 0xffffd35fd12972b0) [ 4.561680] mediatek-drm mediatek-drm.5.auto: bound 1c014000.merge (ops 0xffffd35fd12975f8) [ 4.570025] ------------[ cut here ]------------ [ 4.574630] refcount_t: underflow; use-after-free. [ 4.579416] WARNING: CPU: 6 PID: 81 at lib/refcount.c:28 refcount_warn_saturate+0xf4/0x148 [ 4.587670] Modules linked in: [ 4.590714] CPU: 6 UID: 0 PID: 81 Comm: kworker/u32:3 Tainted: G W 6.12.0 #1 cab58e2e59020ebd4be8ada89a65f465a316c742 [ 4.602695] Tainted: [W]=WARN [ 4.605649] Hardware name: Acer Tomato (rev2) board (DT) [ 4.610947] Workqueue: events_unbound deferred_probe_work_func [ 4.616768] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 4.623715] pc : refcount_warn_saturate+0xf4/0x148 [ 4.628493] lr : refcount_warn_saturate+0xf4/0x148 [ 4.633270] sp : ffff8000807639c0 [ 4.636571] x29: ffff8000807639c0 x28: ffff34ff4116c640 x27: ffff34ff4368e080 [ 4.643693] x26: ffffd35fd1299ac8 x25: ffff34ff46c8c410 x24: 0000000000000000 [ 4.650814] x23: ffff34ff4368e080 x22: 00000000fffffdfb x21: 0000000000000002 [ 4.657934] x20: ffff34ff470c6000 x19: ffff34ff410c7c10 x18: 0000000000000006 [ 4.665055] x17: 666678302073706f x16: 2820656772656d2e x15: ffff800080763440 [ 4.672176] x14: 0000000000000000 x13: 2e656572662d7265 x12: ffffd35fd2ed14f0 [ 4.679297] x11: 0000000000000001 x10: 0000000000000001 x9 : ffffd35fd0342150 [ 4.686418] x8 : c0000000ffffdfff x7 : ffffd35fd2e21450 x6 : 00000000000affa8 [ 4.693539] x5 : ffffd35fd2ed1498 x4 : 0000000000000000 x3 : 0000000000000000 [ 4.700660] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff34ff40932580 [ 4.707781] Call trace: [ 4.710216] refcount_warn_saturate+0xf4/0x148 (P) [ 4.714993] refcount_warn_saturate+0xf4/0x148 (L) [ 4.719772] kobject_put+0x110/0x118 [ 4.723335] put_device+0x1c/0x38 [ 4.726638] mtk_drm_bind+0x294/0x5c0 [ 4.730289] try_to_bring_up_aggregate_device+0x16c/0x1e0 [ 4.735673] __component_add+0xbc/0x1c0 [ 4.739495] component_add+0x1c/0x30 [ 4.743058] mtk_disp_rdma_probe+0x140/0x210 [ 4.747314] platform_probe+0x70/0xd0 [ 4.750964] really_probe+0xc4/0x2a8 [ 4.754527] __driver_probe_device+0x80/0x140 [ 4.758870] driver_probe_device+0x44/0x120 [ 4.763040] __device_attach_driver+0xc0/0x108 [ 4.767470] bus_for_each_drv+0x8c/0xf0 [ 4.771294] __device_attach+0xa4/0x198 [ 4.775117] device_initial_probe+0x1c/0x30 [ 4.779286] bus_probe_device+0xb4/0xc0 [ 4.783109] deferred_probe_work_func+0xb0/0x100 [ 4.787714] process_one_work+0x18c/0x420 [ 4.791712] worker_thread+0x30c/0x418 [ 4.795449] kthread+0x128/0x138 [ 4.798665] ret_from_fork+0x10/0x20 [ 4.802229] ---[ end trace 0000000000000000 ]--- Fixes: fd620fc ("drm/mediatek: Switch to for_each_child_of_node_scoped()") Cc: [email protected] Cc: Javier Carrasco <[email protected]> Reported-by: Sasha Levin <[email protected]> Closes: https://lore.kernel.org/lkml/Z0lNHdwQ3rODHQ2c@sashalap/T/#mfaa6343cfd4d59aae5912b095c0693c0553e746c Link: https://patchwork.kernel.org/project/dri-devel/patch/[email protected]/ Signed-off-by: Chun-Kuang Hu <[email protected]>
bardliao
pushed a commit
that referenced
this pull request
Jan 14, 2025
syzbot reports that a recent fix causes nesting issues between the (now) raw timeoutlock and the eventfd locking: ============================= [ BUG: Invalid wait context ] 6.13.0-rc4-00080-g9828a4c0901f #29 Not tainted ----------------------------- kworker/u32:0/68094 is trying to lock: ffff000014d7a520 (&ctx->wqh#2){..-.}-{3:3}, at: eventfd_signal_mask+0x64/0x180 other info that might help us debug this: context-{5:5} 6 locks held by kworker/u32:0/68094: #0: ffff0000c1d98148 ((wq_completion)iou_exit){+.+.}-{0:0}, at: process_one_work+0x4e8/0xfc0 #1: ffff80008d927c78 ((work_completion)(&ctx->exit_work)){+.+.}-{0:0}, at: process_one_work+0x53c/0xfc0 #2: ffff0000c59bc3d8 (&ctx->completion_lock){+.+.}-{3:3}, at: io_kill_timeouts+0x40/0x180 #3: ffff0000c59bc358 (&ctx->timeout_lock){-.-.}-{2:2}, at: io_kill_timeouts+0x48/0x180 #4: ffff800085127aa0 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire+0x8/0x38 #5: ffff800085127aa0 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire+0x8/0x38 stack backtrace: CPU: 7 UID: 0 PID: 68094 Comm: kworker/u32:0 Not tainted 6.13.0-rc4-00080-g9828a4c0901f #29 Hardware name: linux,dummy-virt (DT) Workqueue: iou_exit io_ring_exit_work Call trace: show_stack+0x1c/0x30 (C) __dump_stack+0x24/0x30 dump_stack_lvl+0x60/0x80 dump_stack+0x14/0x20 __lock_acquire+0x19f8/0x60c8 lock_acquire+0x1a4/0x540 _raw_spin_lock_irqsave+0x90/0xd0 eventfd_signal_mask+0x64/0x180 io_eventfd_signal+0x64/0x108 io_req_local_work_add+0x294/0x430 __io_req_task_work_add+0x1c0/0x270 io_kill_timeout+0x1f0/0x288 io_kill_timeouts+0xd4/0x180 io_uring_try_cancel_requests+0x2e8/0x388 io_ring_exit_work+0x150/0x550 process_one_work+0x5e8/0xfc0 worker_thread+0x7ec/0xc80 kthread+0x24c/0x300 ret_from_fork+0x10/0x20 because after the preempt-rt fix for the timeout lock nesting inside the io-wq lock, we now have the eventfd spinlock nesting inside the raw timeout spinlock. Rather than play whack-a-mole with other nesting on the timeout lock, split the deletion and killing of timeouts so queueing the task_work for the timeout cancelations can get done outside of the timeout lock. Reported-by: [email protected] Fixes: 020b40f ("io_uring: make ctx->timeout_lock a raw spinlock") Signed-off-by: Jens Axboe <[email protected]>
bardliao
pushed a commit
that referenced
this pull request
Jan 14, 2025
Syzbot reported the following splat: Oops: general protection fault, probably for non-canonical address 0xdffffc0000000001: 0000 [#1] PREEMPT SMP KASAN PTI KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f] CPU: 1 UID: 0 PID: 5836 Comm: sshd Not tainted 6.13.0-rc3-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/25/2024 RIP: 0010:_compound_head include/linux/page-flags.h:242 [inline] RIP: 0010:put_page+0x23/0x260 include/linux/mm.h:1552 Code: 90 90 90 90 90 90 90 55 41 57 41 56 53 49 89 fe 48 bd 00 00 00 00 00 fc ff df e8 f8 5e 12 f8 49 8d 5e 08 48 89 d8 48 c1 e8 03 <80> 3c 28 00 74 08 48 89 df e8 8f c7 78 f8 48 8b 1b 48 89 de 48 83 RSP: 0000:ffffc90003916c90 EFLAGS: 00010202 RAX: 0000000000000001 RBX: 0000000000000008 RCX: ffff888030458000 RDX: 0000000000000100 RSI: 0000000000000000 RDI: 0000000000000000 RBP: dffffc0000000000 R08: ffffffff898ca81d R09: 1ffff110054414ac R10: dffffc0000000000 R11: ffffed10054414ad R12: 0000000000000007 R13: ffff88802a20a542 R14: 0000000000000000 R15: 0000000000000000 FS: 00007f34f496e800(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f9d6ec9ec28 CR3: 000000004d260000 CR4: 00000000003526f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> skb_page_unref include/linux/skbuff_ref.h:43 [inline] __skb_frag_unref include/linux/skbuff_ref.h:56 [inline] skb_release_data+0x483/0x8a0 net/core/skbuff.c:1119 skb_release_all net/core/skbuff.c:1190 [inline] __kfree_skb+0x55/0x70 net/core/skbuff.c:1204 tcp_clean_rtx_queue net/ipv4/tcp_input.c:3436 [inline] tcp_ack+0x2442/0x6bc0 net/ipv4/tcp_input.c:4032 tcp_rcv_state_process+0x8eb/0x44e0 net/ipv4/tcp_input.c:6805 tcp_v4_do_rcv+0x77d/0xc70 net/ipv4/tcp_ipv4.c:1939 tcp_v4_rcv+0x2dc0/0x37f0 net/ipv4/tcp_ipv4.c:2351 ip_protocol_deliver_rcu+0x22e/0x440 net/ipv4/ip_input.c:205 ip_local_deliver_finish+0x341/0x5f0 net/ipv4/ip_input.c:233 NF_HOOK+0x3a4/0x450 include/linux/netfilter.h:314 NF_HOOK+0x3a4/0x450 include/linux/netfilter.h:314 __netif_receive_skb_one_core net/core/dev.c:5672 [inline] __netif_receive_skb+0x2bf/0x650 net/core/dev.c:5785 process_backlog+0x662/0x15b0 net/core/dev.c:6117 __napi_poll+0xcb/0x490 net/core/dev.c:6883 napi_poll net/core/dev.c:6952 [inline] net_rx_action+0x89b/0x1240 net/core/dev.c:7074 handle_softirqs+0x2d4/0x9b0 kernel/softirq.c:561 __do_softirq kernel/softirq.c:595 [inline] invoke_softirq kernel/softirq.c:435 [inline] __irq_exit_rcu+0xf7/0x220 kernel/softirq.c:662 irq_exit_rcu+0x9/0x30 kernel/softirq.c:678 instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1049 [inline] sysvec_apic_timer_interrupt+0x57/0xc0 arch/x86/kernel/apic/apic.c:1049 asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:702 RIP: 0033:0x7f34f4519ad5 Code: 85 d2 74 0d 0f 10 02 48 8d 54 24 20 0f 11 44 24 20 64 8b 04 25 18 00 00 00 85 c0 75 27 41 b8 08 00 00 00 b8 0f 01 00 00 0f 05 <48> 3d 00 f0 ff ff 76 75 48 8b 15 24 73 0d 00 f7 d8 64 89 02 48 83 RSP: 002b:00007ffec5b32ce0 EFLAGS: 00000246 RAX: 0000000000000001 RBX: 00000000000668a0 RCX: 00007f34f4519ad5 RDX: 00007ffec5b32d00 RSI: 0000000000000004 RDI: 0000564f4bc6cae0 RBP: 0000564f4bc6b5a0 R08: 0000000000000008 R09: 0000000000000000 R10: 00007ffec5b32de8 R11: 0000000000000246 R12: 0000564f48ea8aa4 R13: 0000000000000001 R14: 0000564f48ea93e8 R15: 00007ffec5b32d68 </TASK> Eric noted a probable shinfo->nr_frags corruption, which indeed occurs. The root cause is a buggy MPTCP option len computation in some circumstances: the ADD_ADDR option should be mutually exclusive with DSS since the blamed commit. Still, mptcp_established_options_add_addr() tries to set the relevant info in mptcp_out_options, if the remaining space is large enough even when DSS is present. Since the ADD_ADDR infos and the DSS share the same union fields, adding first corrupts the latter. In the worst-case scenario, such corruption increases the DSS binary layout, exceeding the computed length and possibly overwriting the skb shared info. Address the issue by enforcing mutual exclusion in mptcp_established_options_add_addr(), too. Cc: [email protected] Reported-by: [email protected] Closes: multipath-tcp/mptcp_net-next#538 Fixes: 1bff1e4 ("mptcp: optimize out option generation") Signed-off-by: Paolo Abeni <[email protected]> Reviewed-by: Matthieu Baerts (NGI0) <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Link: https://patch.msgid.link/025d9df8cde3c9a557befc47e9bc08fbbe3476e5.1734771049.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <[email protected]>
bardliao
pushed a commit
that referenced
this pull request
Jan 14, 2025
…le_direct_reclaim() The task sometimes continues looping in throttle_direct_reclaim() because allow_direct_reclaim(pgdat) keeps returning false. #0 [ffff80002cb6f8d0] __switch_to at ffff8000080095ac #1 [ffff80002cb6f900] __schedule at ffff800008abbd1c #2 [ffff80002cb6f990] schedule at ffff800008abc50c #3 [ffff80002cb6f9b0] throttle_direct_reclaim at ffff800008273550 #4 [ffff80002cb6fa20] try_to_free_pages at ffff800008277b68 #5 [ffff80002cb6fae0] __alloc_pages_nodemask at ffff8000082c4660 #6 [ffff80002cb6fc50] alloc_pages_vma at ffff8000082e4a98 #7 [ffff80002cb6fca0] do_anonymous_page at ffff80000829f5a8 #8 [ffff80002cb6fce0] __handle_mm_fault at ffff8000082a5974 #9 [ffff80002cb6fd90] handle_mm_fault at ffff8000082a5bd4 At this point, the pgdat contains the following two zones: NODE: 4 ZONE: 0 ADDR: ffff00817fffe540 NAME: "DMA32" SIZE: 20480 MIN/LOW/HIGH: 11/28/45 VM_STAT: NR_FREE_PAGES: 359 NR_ZONE_INACTIVE_ANON: 18813 NR_ZONE_ACTIVE_ANON: 0 NR_ZONE_INACTIVE_FILE: 50 NR_ZONE_ACTIVE_FILE: 0 NR_ZONE_UNEVICTABLE: 0 NR_ZONE_WRITE_PENDING: 0 NR_MLOCK: 0 NR_BOUNCE: 0 NR_ZSPAGES: 0 NR_FREE_CMA_PAGES: 0 NODE: 4 ZONE: 1 ADDR: ffff00817fffec00 NAME: "Normal" SIZE: 8454144 PRESENT: 98304 MIN/LOW/HIGH: 68/166/264 VM_STAT: NR_FREE_PAGES: 146 NR_ZONE_INACTIVE_ANON: 94668 NR_ZONE_ACTIVE_ANON: 3 NR_ZONE_INACTIVE_FILE: 735 NR_ZONE_ACTIVE_FILE: 78 NR_ZONE_UNEVICTABLE: 0 NR_ZONE_WRITE_PENDING: 0 NR_MLOCK: 0 NR_BOUNCE: 0 NR_ZSPAGES: 0 NR_FREE_CMA_PAGES: 0 In allow_direct_reclaim(), while processing ZONE_DMA32, the sum of inactive/active file-backed pages calculated in zone_reclaimable_pages() based on the result of zone_page_state_snapshot() is zero. Additionally, since this system lacks swap, the calculation of inactive/ active anonymous pages is skipped. crash> p nr_swap_pages nr_swap_pages = $1937 = { counter = 0 } As a result, ZONE_DMA32 is deemed unreclaimable and skipped, moving on to the processing of the next zone, ZONE_NORMAL, despite ZONE_DMA32 having free pages significantly exceeding the high watermark. The problem is that the pgdat->kswapd_failures hasn't been incremented. crash> px ((struct pglist_data *) 0xffff00817fffe540)->kswapd_failures $1935 = 0x0 This is because the node deemed balanced. The node balancing logic in balance_pgdat() evaluates all zones collectively. If one or more zones (e.g., ZONE_DMA32) have enough free pages to meet their watermarks, the entire node is deemed balanced. This causes balance_pgdat() to exit early before incrementing the kswapd_failures, as it considers the overall memory state acceptable, even though some zones (like ZONE_NORMAL) remain under significant pressure. The patch ensures that zone_reclaimable_pages() includes free pages (NR_FREE_PAGES) in its calculation when no other reclaimable pages are available (e.g., file-backed or anonymous pages). This change prevents zones like ZONE_DMA32, which have sufficient free pages, from being mistakenly deemed unreclaimable. By doing so, the patch ensures proper node balancing, avoids masking pressure on other zones like ZONE_NORMAL, and prevents infinite loops in throttle_direct_reclaim() caused by allow_direct_reclaim(pgdat) repeatedly returning false. The kernel hangs due to a task stuck in throttle_direct_reclaim(), caused by a node being incorrectly deemed balanced despite pressure in certain zones, such as ZONE_NORMAL. This issue arises from zone_reclaimable_pages() returning 0 for zones without reclaimable file- backed or anonymous pages, causing zones like ZONE_DMA32 with sufficient free pages to be skipped. The lack of swap or reclaimable pages results in ZONE_DMA32 being ignored during reclaim, masking pressure in other zones. Consequently, pgdat->kswapd_failures remains 0 in balance_pgdat(), preventing fallback mechanisms in allow_direct_reclaim() from being triggered, leading to an infinite loop in throttle_direct_reclaim(). This patch modifies zone_reclaimable_pages() to account for free pages (NR_FREE_PAGES) when no other reclaimable pages exist. This ensures zones with sufficient free pages are not skipped, enabling proper balancing and reclaim behavior. [[email protected]: coding-style cleanups] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Fixes: 5a1c84b ("mm: remove reclaim and compaction retry approximations") Signed-off-by: Seiji Nishikawa <[email protected]> Cc: Mel Gorman <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
bardliao
pushed a commit
that referenced
this pull request
Jan 14, 2025
…nt message Address a bug in the kernel that triggers a "sleeping function called from invalid context" warning when /sys/kernel/debug/kmemleak is printed under specific conditions: - CONFIG_PREEMPT_RT=y - Set SELinux as the LSM for the system - Set kptr_restrict to 1 - kmemleak buffer contains at least one item BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48 in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 136, name: cat preempt_count: 1, expected: 0 RCU nest depth: 2, expected: 2 6 locks held by cat/136: #0: ffff32e64bcbf950 (&p->lock){+.+.}-{3:3}, at: seq_read_iter+0xb8/0xe30 #1: ffffafe6aaa9dea0 (scan_mutex){+.+.}-{3:3}, at: kmemleak_seq_start+0x34/0x128 #3: ffff32e6546b1cd0 (&object->lock){....}-{2:2}, at: kmemleak_seq_show+0x3c/0x1e0 #4: ffffafe6aa8d8560 (rcu_read_lock){....}-{1:2}, at: has_ns_capability_noaudit+0x8/0x1b0 #5: ffffafe6aabbc0f8 (notif_lock){+.+.}-{2:2}, at: avc_compute_av+0xc4/0x3d0 irq event stamp: 136660 hardirqs last enabled at (136659): [<ffffafe6a80fd7a0>] _raw_spin_unlock_irqrestore+0xa8/0xd8 hardirqs last disabled at (136660): [<ffffafe6a80fd85c>] _raw_spin_lock_irqsave+0x8c/0xb0 softirqs last enabled at (0): [<ffffafe6a5d50b28>] copy_process+0x11d8/0x3df8 softirqs last disabled at (0): [<0000000000000000>] 0x0 Preemption disabled at: [<ffffafe6a6598a4c>] kmemleak_seq_show+0x3c/0x1e0 CPU: 1 UID: 0 PID: 136 Comm: cat Tainted: G E 6.11.0-rt7+ #34 Tainted: [E]=UNSIGNED_MODULE Hardware name: linux,dummy-virt (DT) Call trace: dump_backtrace+0xa0/0x128 show_stack+0x1c/0x30 dump_stack_lvl+0xe8/0x198 dump_stack+0x18/0x20 rt_spin_lock+0x8c/0x1a8 avc_perm_nonode+0xa0/0x150 cred_has_capability.isra.0+0x118/0x218 selinux_capable+0x50/0x80 security_capable+0x7c/0xd0 has_ns_capability_noaudit+0x94/0x1b0 has_capability_noaudit+0x20/0x30 restricted_pointer+0x21c/0x4b0 pointer+0x298/0x760 vsnprintf+0x330/0xf70 seq_printf+0x178/0x218 print_unreferenced+0x1a4/0x2d0 kmemleak_seq_show+0xd0/0x1e0 seq_read_iter+0x354/0xe30 seq_read+0x250/0x378 full_proxy_read+0xd8/0x148 vfs_read+0x190/0x918 ksys_read+0xf0/0x1e0 __arm64_sys_read+0x70/0xa8 invoke_syscall.constprop.0+0xd4/0x1d8 el0_svc+0x50/0x158 el0t_64_sync+0x17c/0x180 %pS and %pK, in the same back trace line, are redundant, and %pS can void %pK service in certain contexts. %pS alone already provides the necessary information, and if it cannot resolve the symbol, it falls back to printing the raw address voiding the original intent behind the %pK. Additionally, %pK requires a privilege check CAP_SYSLOG enforced through the LSM, which can trigger a "sleeping function called from invalid context" warning under RT_PREEMPT kernels when the check occurs in an atomic context. This issue may also affect other LSMs. This change avoids the unnecessary privilege check and resolves the sleeping function warning without any loss of information. Link: https://lkml.kernel.org/r/[email protected] Fixes: 3a6f33d ("mm/kmemleak: use %pK to display kernel pointers in backtrace") Signed-off-by: Alessandro Carminati <[email protected]> Acked-by: Sebastian Andrzej Siewior <[email protected]> Acked-by: Catalin Marinas <[email protected]> Cc: Clément Léger <[email protected]> Cc: Alessandro Carminati <[email protected]> Cc: Eric Chanudet <[email protected]> Cc: Gabriele Paoloni <[email protected]> Cc: Juri Lelli <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Thomas Weißschuh <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
bardliao
pushed a commit
that referenced
this pull request
Jan 14, 2025
The intermediate variable in the PERCPU_PTR() macro results in a kernel panic on boot [1] due to a compiler bug seen when compiling the kernel (+ KASAN) with gcc 11.3.1, but not when compiling with latest gcc (v14.2)/clang(v18.1). To solve it, remove the intermediate variable (which is not needed) and keep the casting that resolves the address space checks. [1] Oops: general protection fault, probably for non-canonical address 0xdffffc0000000003: 0000 [#1] SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000018-0x000000000000001f] CPU: 0 UID: 0 PID: 547 Comm: iptables Not tainted 6.13.0-rc1_external_tested-master #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:nf_ct_netns_do_get+0x139/0x540 Code: 03 00 00 48 81 c4 88 00 00 00 5b 5d 41 5c 41 5d 41 5e 41 5f c3 4d 8d 75 08 48 b8 00 00 00 00 00 fc ff df 4c 89 f2 48 c1 ea 03 <0f> b6 04 02 84 c0 74 08 3c 03 0f 8e 27 03 00 00 41 8b 45 08 83 c0 RSP: 0018:ffff888116df75e8 EFLAGS: 00010207 RAX: dffffc0000000000 RBX: 1ffff11022dbeebe RCX: ffffffff839a2382 RDX: 0000000000000003 RSI: 0000000000000008 RDI: ffff88842ec46d10 RBP: 0000000000000002 R08: 0000000000000000 R09: fffffbfff0b0860c R10: ffff888116df75e8 R11: 0000000000000001 R12: ffffffff879d6a80 R13: 0000000000000016 R14: 000000000000001e R15: ffff888116df7908 FS: 00007fba01646740(0000) GS:ffff88842ec00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055bd901800d8 CR3: 00000001205f0003 CR4: 0000000000172eb0 Call Trace: <TASK> ? die_addr+0x3d/0xa0 ? exc_general_protection+0x144/0x220 ? asm_exc_general_protection+0x22/0x30 ? __mutex_lock+0x2c2/0x1d70 ? nf_ct_netns_do_get+0x139/0x540 ? nf_ct_netns_do_get+0xb5/0x540 ? net_generic+0x1f0/0x1f0 ? __create_object+0x5e/0x80 xt_check_target+0x1f0/0x930 ? textify_hooks.constprop.0+0x110/0x110 ? pcpu_alloc_noprof+0x7cd/0xcf0 ? xt_find_target+0x148/0x1e0 find_check_entry.constprop.0+0x6c0/0x920 ? get_info+0x380/0x380 ? __virt_addr_valid+0x1df/0x3b0 ? kasan_quarantine_put+0xe3/0x200 ? kfree+0x13e/0x3d0 ? translate_table+0xaf5/0x1750 translate_table+0xbd8/0x1750 ? ipt_unregister_table_exit+0x30/0x30 ? __might_fault+0xbb/0x170 do_ipt_set_ctl+0x408/0x1340 ? nf_sockopt_find.constprop.0+0x17b/0x1f0 ? lock_downgrade+0x680/0x680 ? lockdep_hardirqs_on_prepare+0x284/0x400 ? ipt_register_table+0x440/0x440 ? bit_wait_timeout+0x160/0x160 nf_setsockopt+0x6f/0xd0 raw_setsockopt+0x7e/0x200 ? raw_bind+0x590/0x590 ? do_user_addr_fault+0x812/0xd20 do_sock_setsockopt+0x1e2/0x3f0 ? move_addr_to_user+0x90/0x90 ? lock_downgrade+0x680/0x680 __sys_setsockopt+0x9e/0x100 __x64_sys_setsockopt+0xb9/0x150 ? do_syscall_64+0x33/0x140 do_syscall_64+0x6d/0x140 entry_SYSCALL_64_after_hwframe+0x4b/0x53 RIP: 0033:0x7fba015134ce Code: 0f 1f 40 00 48 8b 15 59 69 0e 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b1 0f 1f 00 f3 0f 1e fa 49 89 ca b8 36 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 0a c3 66 0f 1f 84 00 00 00 00 00 48 8b 15 21 RSP: 002b:00007ffd9de6f388 EFLAGS: 00000246 ORIG_RAX: 0000000000000036 RAX: ffffffffffffffda RBX: 000055bd9017f490 RCX: 00007fba015134ce RDX: 0000000000000040 RSI: 0000000000000000 RDI: 0000000000000004 RBP: 0000000000000500 R08: 0000000000000560 R09: 0000000000000052 R10: 000055bd901800e0 R11: 0000000000000246 R12: 000055bd90180140 R13: 000055bd901800e0 R14: 000055bd9017f498 R15: 000055bd9017ff10 </TASK> Modules linked in: xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter rpcsec_gss_krb5 auth_rpcgss oid_registry overlay zram zsmalloc mlx4_ib mlx4_en mlx4_core rpcrdma rdma_ucm ib_uverbs ib_iser libiscsi scsi_transport_iscsi fuse ib_umad rdma_cm ib_ipoib iw_cm ib_cm ib_core ---[ end trace 0000000000000000 ]--- [[email protected]: simplification, per Uros] Link: https://lkml.kernel.org/r/[email protected] Fixes: dabddd6 ("percpu: cast percpu pointer in PERCPU_PTR() via unsigned long") Signed-off-by: Gal Pressman <[email protected]> Closes: https://lore.kernel.org/all/[email protected] Cc: Uros Bizjak <[email protected]> Cc: Bill Wendling <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Dennis Zhou <[email protected]> Cc: Justin Stitt <[email protected]> Cc: Nathan Chancellor <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Tejun Heo <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
bardliao
pushed a commit
that referenced
this pull request
Jan 14, 2025
When driver processes the internal state change command, it use an asynchronous thread to process the command operation. If the main thread detects that the task has timed out, the asynchronous thread will panic when executing the completion notification because the main thread completion object has been released. BUG: unable to handle page fault for address: fffffffffffffff8 PGD 1f283a067 P4D 1f283a067 PUD 1f283c067 PMD 0 Oops: 0000 [#1] PREEMPT SMP NOPTI RIP: 0010:complete_all+0x3e/0xa0 [...] Call Trace: <TASK> ? __die_body+0x68/0xb0 ? page_fault_oops+0x379/0x3e0 ? exc_page_fault+0x69/0xa0 ? asm_exc_page_fault+0x22/0x30 ? complete_all+0x3e/0xa0 fsm_main_thread+0xa3/0x9c0 [mtk_t7xx (HASH:1400 5)] ? __pfx_autoremove_wake_function+0x10/0x10 kthread+0xd8/0x110 ? __pfx_fsm_main_thread+0x10/0x10 [mtk_t7xx (HASH:1400 5)] ? __pfx_kthread+0x10/0x10 ret_from_fork+0x38/0x50 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1b/0x30 </TASK> [...] CR2: fffffffffffffff8 ---[ end trace 0000000000000000 ]--- Use the reference counter to ensure safe release as Sergey suggests: https://lore.kernel.org/all/[email protected]/ Fixes: 13e920d ("net: wwan: t7xx: Add core components") Signed-off-by: Jinjian Song <[email protected]> Acked-by: Sergey Ryazanov <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
bardliao
pushed a commit
that referenced
this pull request
Jan 14, 2025
Blamed commit forgot MSG_PEEK case, allowing a crash [1] as found by syzbot. Rework vlan_get_tci() to not touch skb at all, so that it can be used from many cpus on the same skb. Add a const qualifier to skb argument. [1] skbuff: skb_under_panic: text:ffffffff8a8da482 len:32 put:14 head:ffff88807a1d5800 data:ffff88807a1d5810 tail:0x14 end:0x140 dev:<NULL> ------------[ cut here ]------------ kernel BUG at net/core/skbuff.c:206 ! Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI CPU: 0 UID: 0 PID: 5880 Comm: syz-executor172 Not tainted 6.13.0-rc3-syzkaller-00762-g9268abe611b0 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024 RIP: 0010:skb_panic net/core/skbuff.c:206 [inline] RIP: 0010:skb_under_panic+0x14b/0x150 net/core/skbuff.c:216 Code: 0b 8d 48 c7 c6 9e 6c 26 8e 48 8b 54 24 08 8b 0c 24 44 8b 44 24 04 4d 89 e9 50 41 54 41 57 41 56 e8 3a 5a 79 f7 48 83 c4 20 90 <0f> 0b 0f 1f 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 RSP: 0018:ffffc90003baf5b8 EFLAGS: 00010286 RAX: 0000000000000087 RBX: dffffc0000000000 RCX: 8565c1eec37aa000 RDX: 0000000000000000 RSI: 0000000080000000 RDI: 0000000000000000 RBP: ffff88802616fb50 R08: ffffffff817f0a4c R09: 1ffff92000775e50 R10: dffffc0000000000 R11: fffff52000775e51 R12: 0000000000000140 R13: ffff88807a1d5800 R14: ffff88807a1d5810 R15: 0000000000000014 FS: 00007fa03261f6c0(0000) GS:ffff8880b8600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007ffd65753000 CR3: 0000000031720000 CR4: 00000000003526f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> skb_push+0xe5/0x100 net/core/skbuff.c:2636 vlan_get_tci+0x272/0x550 net/packet/af_packet.c:565 packet_recvmsg+0x13c9/0x1ef0 net/packet/af_packet.c:3616 sock_recvmsg_nosec net/socket.c:1044 [inline] sock_recvmsg+0x22f/0x280 net/socket.c:1066 ____sys_recvmsg+0x1c6/0x480 net/socket.c:2814 ___sys_recvmsg net/socket.c:2856 [inline] do_recvmmsg+0x426/0xab0 net/socket.c:2951 __sys_recvmmsg net/socket.c:3025 [inline] __do_sys_recvmmsg net/socket.c:3048 [inline] __se_sys_recvmmsg net/socket.c:3041 [inline] __x64_sys_recvmmsg+0x199/0x250 net/socket.c:3041 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 Fixes: 79eecf6 ("af_packet: Handle outgoing VLAN packets without hardware offloading") Reported-by: [email protected] Closes: https://lore.kernel.org/netdev/[email protected]/T/#u Signed-off-by: Eric Dumazet <[email protected]> Cc: Chengen Du <[email protected]> Reviewed-by: Willem de Bruijn <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
bardliao
pushed a commit
that referenced
this pull request
Jan 14, 2025
Blamed commit forgot MSG_PEEK case, allowing a crash [1] as found by syzbot. Rework vlan_get_protocol_dgram() to not touch skb at all, so that it can be used from many cpus on the same skb. Add a const qualifier to skb argument. [1] skbuff: skb_under_panic: text:ffffffff8a8ccd05 len:29 put:14 head:ffff88807fc8e400 data:ffff88807fc8e3f4 tail:0x11 end:0x140 dev:<NULL> ------------[ cut here ]------------ kernel BUG at net/core/skbuff.c:206 ! Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI CPU: 1 UID: 0 PID: 5892 Comm: syz-executor883 Not tainted 6.13.0-rc4-syzkaller-00054-gd6ef8b40d075 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024 RIP: 0010:skb_panic net/core/skbuff.c:206 [inline] RIP: 0010:skb_under_panic+0x14b/0x150 net/core/skbuff.c:216 Code: 0b 8d 48 c7 c6 86 d5 25 8e 48 8b 54 24 08 8b 0c 24 44 8b 44 24 04 4d 89 e9 50 41 54 41 57 41 56 e8 5a 69 79 f7 48 83 c4 20 90 <0f> 0b 0f 1f 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 RSP: 0018:ffffc900038d7638 EFLAGS: 00010282 RAX: 0000000000000087 RBX: dffffc0000000000 RCX: 609ffd18ea660600 RDX: 0000000000000000 RSI: 0000000080000000 RDI: 0000000000000000 RBP: ffff88802483c8d0 R08: ffffffff817f0a8c R09: 1ffff9200071ae60 R10: dffffc0000000000 R11: fffff5200071ae61 R12: 0000000000000140 R13: ffff88807fc8e400 R14: ffff88807fc8e3f4 R15: 0000000000000011 FS: 00007fbac5e006c0(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fbac5e00d58 CR3: 000000001238e000 CR4: 00000000003526f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> skb_push+0xe5/0x100 net/core/skbuff.c:2636 vlan_get_protocol_dgram+0x165/0x290 net/packet/af_packet.c:585 packet_recvmsg+0x948/0x1ef0 net/packet/af_packet.c:3552 sock_recvmsg_nosec net/socket.c:1033 [inline] sock_recvmsg+0x22f/0x280 net/socket.c:1055 ____sys_recvmsg+0x1c6/0x480 net/socket.c:2803 ___sys_recvmsg net/socket.c:2845 [inline] do_recvmmsg+0x426/0xab0 net/socket.c:2940 __sys_recvmmsg net/socket.c:3014 [inline] __do_sys_recvmmsg net/socket.c:3037 [inline] __se_sys_recvmmsg net/socket.c:3030 [inline] __x64_sys_recvmmsg+0x199/0x250 net/socket.c:3030 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f Fixes: 79eecf6 ("af_packet: Handle outgoing VLAN packets without hardware offloading") Reported-by: [email protected] Closes: https://lore.kernel.org/netdev/[email protected]/T/#u Signed-off-by: Eric Dumazet <[email protected]> Cc: Chengen Du <[email protected]> Reviewed-by: Willem de Bruijn <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
bardliao
pushed a commit
that referenced
this pull request
Jan 14, 2025
[BUG] Syzbot reported a crash with the following call trace: BTRFS info (device loop0): scrub: started on devid 1 BUG: kernel NULL pointer dereference, address: 0000000000000208 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 106e70067 P4D 106e70067 PUD 107143067 PMD 0 Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI CPU: 1 UID: 0 PID: 689 Comm: repro Kdump: loaded Tainted: G O 6.13.0-rc4-custom+ #206 Tainted: [O]=OOT_MODULE Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS unknown 02/02/2022 RIP: 0010:find_first_extent_item+0x26/0x1f0 [btrfs] Call Trace: <TASK> scrub_find_fill_first_stripe+0x13d/0x3b0 [btrfs] scrub_simple_mirror+0x175/0x260 [btrfs] scrub_stripe+0x5d4/0x6c0 [btrfs] scrub_chunk+0xbb/0x170 [btrfs] scrub_enumerate_chunks+0x2f4/0x5f0 [btrfs] btrfs_scrub_dev+0x240/0x600 [btrfs] btrfs_ioctl+0x1dc8/0x2fa0 [btrfs] ? do_sys_openat2+0xa5/0xf0 __x64_sys_ioctl+0x97/0xc0 do_syscall_64+0x4f/0x120 entry_SYSCALL_64_after_hwframe+0x76/0x7e </TASK> [CAUSE] The reproducer is using a corrupted image where extent tree root is corrupted, thus forcing to use "rescue=all,ro" mount option to mount the image. Then it triggered a scrub, but since scrub relies on extent tree to find where the data/metadata extents are, scrub_find_fill_first_stripe() relies on an non-empty extent root. But unfortunately scrub_find_fill_first_stripe() doesn't really expect an NULL pointer for extent root, it use extent_root to grab fs_info and triggered a NULL pointer dereference. [FIX] Add an extra check for a valid extent root at the beginning of scrub_find_fill_first_stripe(). The new error path is introduced by 42437a6 ("btrfs: introduce mount option rescue=ignorebadroots"), but that's pretty old, and later commit b979547 ("btrfs: scrub: introduce helper to find and fill sector info for a scrub_stripe") changed how we do scrub. So for kernels older than 6.6, the fix will need manual backport. Reported-by: [email protected] Link: https://lore.kernel.org/linux-btrfs/[email protected]/ Fixes: 42437a6 ("btrfs: introduce mount option rescue=ignorebadroots") Reviewed-by: Anand Jain <[email protected]> Signed-off-by: Qu Wenruo <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
bardliao
pushed a commit
that referenced
this pull request
Jan 14, 2025
Since the input data length passed to zlib_compress_folios() can be arbitrary, always setting strm.avail_in to a multiple of PAGE_SIZE may cause read-in bytes to exceed the input range. Currently this triggers an assert in btrfs_compress_folios() on the debug kernel (see below). Fix strm.avail_in calculation for S390 hardware acceleration path. assertion failed: *total_in <= orig_len, in fs/btrfs/compression.c:1041 ------------[ cut here ]------------ kernel BUG at fs/btrfs/compression.c:1041! monitor event: 0040 ilc:2 [#1] PREEMPT SMP CPU: 16 UID: 0 PID: 325 Comm: kworker/u273:3 Not tainted 6.13.0-20241204.rc1.git6.fae3b21430ca.300.fc41.s390x+debug #1 Hardware name: IBM 3931 A01 703 (z/VM 7.4.0) Workqueue: btrfs-delalloc btrfs_work_helper Krnl PSW : 0704d00180000000 0000021761df6538 (btrfs_compress_folios+0x198/0x1a0) R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3 Krnl GPRS: 0000000080000000 0000000000000001 0000000000000047 0000000000000000 0000000000000006 ffffff01757bb000 000001976232fcc0 000000000000130c 000001976232fcd0 000001976232fcc8 00000118ff4a0e30 0000000000000001 00000111821ab400 0000011100000000 0000021761df6534 000001976232fb58 Krnl Code: 0000021761df6528: c020006f5ef4 larl %r2,0000021762be2310 0000021761df652e: c0e5ffbd09d5 brasl %r14,00000217615978d8 #0000021761df6534: af000000 mc 0,0 >0000021761df6538: 0707 bcr 0,%r7 0000021761df653a: 0707 bcr 0,%r7 0000021761df653c: 0707 bcr 0,%r7 0000021761df653e: 0707 bcr 0,%r7 0000021761df6540: c004004bb7ec brcl 0,000002176276d518 Call Trace: [<0000021761df6538>] btrfs_compress_folios+0x198/0x1a0 ([<0000021761df6534>] btrfs_compress_folios+0x194/0x1a0) [<0000021761d97788>] compress_file_range+0x3b8/0x6d0 [<0000021761dcee7c>] btrfs_work_helper+0x10c/0x160 [<0000021761645760>] process_one_work+0x2b0/0x5d0 [<000002176164637e>] worker_thread+0x20e/0x3e0 [<000002176165221a>] kthread+0x15a/0x170 [<00000217615b859c>] __ret_from_fork+0x3c/0x60 [<00000217626e72d2>] ret_from_fork+0xa/0x38 INFO: lockdep is turned off. Last Breaking-Event-Address: [<0000021761597924>] _printk+0x4c/0x58 Kernel panic - not syncing: Fatal exception: panic_on_oops Fixes: fd1e75d ("btrfs: make compression path to be subpage compatible") CC: [email protected] # 6.12+ Acked-by: Ilya Leoshkevich <[email protected]> Reviewed-by: Qu Wenruo <[email protected]> Signed-off-by: Mikhail Zaslonko <[email protected]> Signed-off-by: David Sterba <[email protected]>
bardliao
pushed a commit
that referenced
this pull request
Jan 14, 2025
DC driver is using two different values to define the maximum number of surfaces: MAX_SURFACES and MAX_SURFACE_NUM. Consolidate MAX_SURFACES as the unique definition for surface updates across DC. It fixes page fault faced by Cosmic users on AMD display versions that support two overlay planes, since the introduction of cursor overlay mode. [Nov26 21:33] BUG: unable to handle page fault for address: 0000000051d0f08b [ +0.000015] #PF: supervisor read access in kernel mode [ +0.000006] #PF: error_code(0x0000) - not-present page [ +0.000005] PGD 0 P4D 0 [ +0.000007] Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI [ +0.000006] CPU: 4 PID: 71 Comm: kworker/u32:6 Not tainted 6.10.0+ #300 [ +0.000006] Hardware name: Valve Jupiter/Jupiter, BIOS F7A0131 01/30/2024 [ +0.000007] Workqueue: events_unbound commit_work [drm_kms_helper] [ +0.000040] RIP: 0010:copy_stream_update_to_stream.isra.0+0x30d/0x750 [amdgpu] [ +0.000847] Code: 8b 10 49 89 94 24 f8 00 00 00 48 8b 50 08 49 89 94 24 00 01 00 00 8b 40 10 41 89 84 24 08 01 00 00 49 8b 45 78 48 85 c0 74 0b <0f> b6 00 41 88 84 24 90 64 00 00 49 8b 45 60 48 85 c0 74 3b 48 8b [ +0.000010] RSP: 0018:ffffc203802f79a0 EFLAGS: 00010206 [ +0.000009] RAX: 0000000051d0f08b RBX: 0000000000000004 RCX: ffff9f964f0a8070 [ +0.000004] RDX: ffff9f9710f90e40 RSI: ffff9f96600c8000 RDI: ffff9f964f000000 [ +0.000004] RBP: ffffc203802f79f8 R08: 0000000000000000 R09: 0000000000000000 [ +0.000005] R10: 0000000000000000 R11: 0000000000000000 R12: ffff9f96600c8000 [ +0.000004] R13: ffff9f9710f90e40 R14: ffff9f964f000000 R15: ffff9f96600c8000 [ +0.000004] FS: 0000000000000000(0000) GS:ffff9f9970000000(0000) knlGS:0000000000000000 [ +0.000005] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ +0.000005] CR2: 0000000051d0f08b CR3: 00000002e6a20000 CR4: 0000000000350ef0 [ +0.000005] Call Trace: [ +0.000011] <TASK> [ +0.000010] ? __die_body.cold+0x19/0x27 [ +0.000012] ? page_fault_oops+0x15a/0x2d0 [ +0.000014] ? exc_page_fault+0x7e/0x180 [ +0.000009] ? asm_exc_page_fault+0x26/0x30 [ +0.000013] ? copy_stream_update_to_stream.isra.0+0x30d/0x750 [amdgpu] [ +0.000739] ? dc_commit_state_no_check+0xd6c/0xe70 [amdgpu] [ +0.000470] update_planes_and_stream_state+0x49b/0x4f0 [amdgpu] [ +0.000450] ? srso_return_thunk+0x5/0x5f [ +0.000009] ? commit_minimal_transition_state+0x239/0x3d0 [amdgpu] [ +0.000446] update_planes_and_stream_v2+0x24a/0x590 [amdgpu] [ +0.000464] ? srso_return_thunk+0x5/0x5f [ +0.000009] ? sort+0x31/0x50 [ +0.000007] ? amdgpu_dm_atomic_commit_tail+0x159f/0x3a30 [amdgpu] [ +0.000508] ? srso_return_thunk+0x5/0x5f [ +0.000009] ? amdgpu_crtc_get_scanout_position+0x28/0x40 [amdgpu] [ +0.000377] ? srso_return_thunk+0x5/0x5f [ +0.000009] ? drm_crtc_vblank_helper_get_vblank_timestamp_internal+0x160/0x390 [drm] [ +0.000058] ? srso_return_thunk+0x5/0x5f [ +0.000005] ? dma_fence_default_wait+0x8c/0x260 [ +0.000010] ? srso_return_thunk+0x5/0x5f [ +0.000005] ? wait_for_completion_timeout+0x13b/0x170 [ +0.000006] ? srso_return_thunk+0x5/0x5f [ +0.000005] ? dma_fence_wait_timeout+0x108/0x140 [ +0.000010] ? commit_tail+0x94/0x130 [drm_kms_helper] [ +0.000024] ? process_one_work+0x177/0x330 [ +0.000008] ? worker_thread+0x266/0x3a0 [ +0.000006] ? __pfx_worker_thread+0x10/0x10 [ +0.000004] ? kthread+0xd2/0x100 [ +0.000006] ? __pfx_kthread+0x10/0x10 [ +0.000006] ? ret_from_fork+0x34/0x50 [ +0.000004] ? __pfx_kthread+0x10/0x10 [ +0.000005] ? ret_from_fork_asm+0x1a/0x30 [ +0.000011] </TASK> Fixes: 1b04dcc ("drm/amd/display: Introduce overlay cursor mode") Suggested-by: Leo Li <[email protected]> Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3693 Signed-off-by: Melissa Wen <[email protected]> Reviewed-by: Rodrigo Siqueira <[email protected]> Signed-off-by: Rodrigo Siqueira <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit 1c86c81) Cc: [email protected]
bardliao
pushed a commit
that referenced
this pull request
Jan 14, 2025
dm_get_plane_scale doesn't take into account plane scaled size equal to zero, leading to a kernel oops due to division by zero. Fix by setting out-scale size as zero when the dst size is zero, similar to what is done by drm_calc_scale(). This issue started with the introduction of cursor ovelay mode that uses this function to assess cursor mode changes via dm_crtc_get_cursor_mode() before checking plane state. [Dec17 17:14] Oops: divide error: 0000 [#1] PREEMPT SMP NOPTI [ +0.000018] CPU: 5 PID: 1660 Comm: surface-DP-1 Not tainted 6.10.0+ #231 [ +0.000007] Hardware name: Valve Jupiter/Jupiter, BIOS F7A0131 01/30/2024 [ +0.000004] RIP: 0010:dm_get_plane_scale+0x3f/0x60 [amdgpu] [ +0.000553] Code: 44 0f b7 41 3a 44 0f b7 49 3e 83 e0 0f 48 0f a3 c2 73 21 69 41 28 e8 03 00 00 31 d2 41 f7 f1 31 d2 89 06 69 41 2c e8 03 00 00 <41> f7 f0 89 07 e9 d7 d8 7e e9 44 89 c8 45 89 c1 41 89 c0 eb d4 66 [ +0.000005] RSP: 0018:ffffa8df0de6b8a0 EFLAGS: 00010246 [ +0.000006] RAX: 00000000000003e8 RBX: ffff9ac65c1f6e00 RCX: ffff9ac65d055500 [ +0.000003] RDX: 0000000000000000 RSI: ffffa8df0de6b8b0 RDI: ffffa8df0de6b8b4 [ +0.000004] RBP: ffff9ac64e7a5800 R08: 0000000000000000 R09: 0000000000000a00 [ +0.000003] R10: 00000000000000ff R11: 0000000000000054 R12: ffff9ac6d0700010 [ +0.000003] R13: ffff9ac65d054f00 R14: ffff9ac65d055500 R15: ffff9ac64e7a60a0 [ +0.000004] FS: 00007f869ea00640(0000) GS:ffff9ac970080000(0000) knlGS:0000000000000000 [ +0.000004] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ +0.000003] CR2: 000055ca701becd0 CR3: 000000010e7f2000 CR4: 0000000000350ef0 [ +0.000004] Call Trace: [ +0.000007] <TASK> [ +0.000006] ? __die_body.cold+0x19/0x27 [ +0.000009] ? die+0x2e/0x50 [ +0.000007] ? do_trap+0xca/0x110 [ +0.000007] ? do_error_trap+0x6a/0x90 [ +0.000006] ? dm_get_plane_scale+0x3f/0x60 [amdgpu] [ +0.000504] ? exc_divide_error+0x38/0x50 [ +0.000005] ? dm_get_plane_scale+0x3f/0x60 [amdgpu] [ +0.000488] ? asm_exc_divide_error+0x1a/0x20 [ +0.000011] ? dm_get_plane_scale+0x3f/0x60 [amdgpu] [ +0.000593] dm_crtc_get_cursor_mode+0x33f/0x430 [amdgpu] [ +0.000562] amdgpu_dm_atomic_check+0x2ef/0x1770 [amdgpu] [ +0.000501] drm_atomic_check_only+0x5e1/0xa30 [drm] [ +0.000047] drm_mode_atomic_ioctl+0x832/0xcb0 [drm] [ +0.000050] ? __pfx_drm_mode_atomic_ioctl+0x10/0x10 [drm] [ +0.000047] drm_ioctl_kernel+0xb3/0x100 [drm] [ +0.000062] drm_ioctl+0x27a/0x4f0 [drm] [ +0.000049] ? __pfx_drm_mode_atomic_ioctl+0x10/0x10 [drm] [ +0.000055] amdgpu_drm_ioctl+0x4e/0x90 [amdgpu] [ +0.000360] __x64_sys_ioctl+0x97/0xd0 [ +0.000010] do_syscall_64+0x82/0x190 [ +0.000008] ? __pfx_drm_mode_createblob_ioctl+0x10/0x10 [drm] [ +0.000044] ? srso_return_thunk+0x5/0x5f [ +0.000006] ? drm_ioctl_kernel+0xb3/0x100 [drm] [ +0.000040] ? srso_return_thunk+0x5/0x5f [ +0.000005] ? __check_object_size+0x50/0x220 [ +0.000007] ? srso_return_thunk+0x5/0x5f [ +0.000005] ? srso_return_thunk+0x5/0x5f [ +0.000005] ? drm_ioctl+0x2a4/0x4f0 [drm] [ +0.000039] ? __pfx_drm_mode_createblob_ioctl+0x10/0x10 [drm] [ +0.000043] ? srso_return_thunk+0x5/0x5f [ +0.000005] ? srso_return_thunk+0x5/0x5f [ +0.000005] ? __pm_runtime_suspend+0x69/0xc0 [ +0.000006] ? srso_return_thunk+0x5/0x5f [ +0.000005] ? amdgpu_drm_ioctl+0x71/0x90 [amdgpu] [ +0.000366] ? srso_return_thunk+0x5/0x5f [ +0.000006] ? syscall_exit_to_user_mode+0x77/0x210 [ +0.000007] ? srso_return_thunk+0x5/0x5f [ +0.000005] ? do_syscall_64+0x8e/0x190 [ +0.000006] ? srso_return_thunk+0x5/0x5f [ +0.000006] ? do_syscall_64+0x8e/0x190 [ +0.000006] ? srso_return_thunk+0x5/0x5f [ +0.000007] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ +0.000008] RIP: 0033:0x55bb7cd962bc [ +0.000007] Code: 4c 89 6c 24 18 4c 89 64 24 20 4c 89 74 24 28 0f 57 c0 0f 11 44 24 30 89 c7 48 8d 54 24 08 b8 10 00 00 00 be bc 64 38 c0 0f 05 <49> 89 c7 48 83 3b 00 74 09 4c 89 c7 ff 15 62 64 99 00 48 83 7b 18 [ +0.000005] RSP: 002b:00007f869e9f4da0 EFLAGS: 00000217 ORIG_RAX: 0000000000000010 [ +0.000007] RAX: ffffffffffffffda RBX: 00007f869e9f4fb8 RCX: 000055bb7cd962bc [ +0.000004] RDX: 00007f869e9f4da8 RSI: 00000000c03864bc RDI: 000000000000003b [ +0.000003] RBP: 000055bb9ddcbcc0 R08: 00007f86541b9920 R09: 0000000000000009 [ +0.000004] R10: 0000000000000004 R11: 0000000000000217 R12: 00007f865406c6b0 [ +0.000003] R13: 00007f86541b5290 R14: 00007f865410b700 R15: 000055bb9ddcbc18 [ +0.000009] </TASK> Fixes: 1b04dcc ("drm/amd/display: Introduce overlay cursor mode") Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3729 Reported-by: Fabio Scaccabarozzi <[email protected]> Co-developed-by: Fabio Scaccabarozzi <[email protected]> Signed-off-by: Fabio Scaccabarozzi <[email protected]> Signed-off-by: Melissa Wen <[email protected]> Reviewed-by: Rodrigo Siqueira <[email protected]> Signed-off-by: Rodrigo Siqueira <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit ab75a0d) Cc: [email protected]
bardliao
pushed a commit
that referenced
this pull request
Jan 14, 2025
die() can be called in exception handler, and therefore cannot sleep. However, die() takes spinlock_t which can sleep with PREEMPT_RT enabled. That causes the following warning: BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48 in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 285, name: mutex preempt_count: 110001, expected: 0 RCU nest depth: 0, expected: 0 CPU: 0 UID: 0 PID: 285 Comm: mutex Not tainted 6.12.0-rc7-00022-ge19049cf7d56-dirty #234 Hardware name: riscv-virtio,qemu (DT) Call Trace: dump_backtrace+0x1c/0x24 show_stack+0x2c/0x38 dump_stack_lvl+0x5a/0x72 dump_stack+0x14/0x1c __might_resched+0x130/0x13a rt_spin_lock+0x2a/0x5c die+0x24/0x112 do_trap_insn_illegal+0xa0/0xea _new_vmalloc_restore_context_a0+0xcc/0xd8 Oops - illegal instruction [#1] Switch to use raw_spinlock_t, which does not sleep even with PREEMPT_RT enabled. Fixes: 76d2a04 ("RISC-V: Init and Halt Code") Signed-off-by: Nam Cao <[email protected]> Cc: [email protected] Reviewed-by: Sebastian Andrzej Siewior <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Palmer Dabbelt <[email protected]>
bardliao
pushed a commit
that referenced
this pull request
Jan 14, 2025
If GuC fails to load, the driver wedges, but in the process it tries to do stuff that may not be initialized yet. This moves the xe_gt_tlb_invalidation_init() to be done earlier: as its own doc says, it's a software-only initialization and should had been named with the _early() suffix. Move it to be called by xe_gt_init_early(), so the locks and seqno are initialized, avoiding a NULL ptr deref when wedging: xe 0000:03:00.0: [drm] *ERROR* GT0: load failed: status: Reset = 0, BootROM = 0x50, UKernel = 0x00, MIA = 0x00, Auth = 0x01 xe 0000:03:00.0: [drm] *ERROR* GT0: firmware signature verification failed xe 0000:03:00.0: [drm] *ERROR* CRITICAL: Xe has declared device 0000:03:00.0 as wedged. ... BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI CPU: 9 UID: 0 PID: 3908 Comm: modprobe Tainted: G U W 6.13.0-rc4-xe+ #3 Tainted: [U]=USER, [W]=WARN Hardware name: Intel Corporation Alder Lake Client Platform/AlderLake-S ADP-S DDR5 UDIMM CRB, BIOS ADLSFWI1.R00.3275.A00.2207010640 07/01/2022 RIP: 0010:xe_gt_tlb_invalidation_reset+0x75/0x110 [xe] This can be easily triggered by poking the GuC binary to force a signature failure. There will still be an extra message, xe 0000:03:00.0: [drm] *ERROR* GT0: GuC mmio request 0x4100: no reply 0x4100 but that's better than a NULL ptr deref. Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/3956 Fixes: c9474b7 ("drm/xe: Wedge the entire device") Reviewed-by: Matthew Brost <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Lucas De Marchi <[email protected]> (cherry picked from commit 5001ef3) Signed-off-by: Thomas Hellström <[email protected]>
bardliao
pushed a commit
that referenced
this pull request
Jan 14, 2025
We found a timeout problem with the pldm command on our system. The reason is that the MCTP-I3C driver has a race condition when receiving multiple-packet messages in multi-thread, resulting in a wrong packet order problem. We identified this problem by adding a debug message to the mctp_i3c_read function. According to the MCTP spec, a multiple-packet message must be composed in sequence, and if there is a wrong sequence, the whole message will be discarded and wait for the next SOM. For example, SOM → Pkt Seq #2 → Pkt Seq #1 → Pkt Seq #3 → EOM. Therefore, we try to solve this problem by adding a mutex to the mctp_i3c_read function. Before the modification, when a command requesting a multiple-packet message response is sent consecutively, an error usually occurs within 100 loops. After the mutex, it can go through 40000 loops without any error, and it seems to run well. Fixes: c8755b2 ("mctp i3c: MCTP I3C driver") Signed-off-by: Leo Yang <[email protected]> Link: https://patch.msgid.link/[email protected] [[email protected]: dropped already answered question from changelog] Signed-off-by: Paolo Abeni <[email protected]>
bardliao
pushed a commit
that referenced
this pull request
Jan 14, 2025
When cmd_alloc_index(), fails cmd_work_handler() needs to complete ent->slotted before returning early. Otherwise the task which issued the command may hang: mlx5_core 0000:01:00.0: cmd_work_handler:877:(pid 3880418): failed to allocate command entry INFO: task kworker/13:2:4055883 blocked for more than 120 seconds. Not tainted 4.19.90-25.44.v2101.ky10.aarch64 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kworker/13:2 D 0 4055883 2 0x00000228 Workqueue: events mlx5e_tx_dim_work [mlx5_core] Call trace: __switch_to+0xe8/0x150 __schedule+0x2a8/0x9b8 schedule+0x2c/0x88 schedule_timeout+0x204/0x478 wait_for_common+0x154/0x250 wait_for_completion+0x28/0x38 cmd_exec+0x7a0/0xa00 [mlx5_core] mlx5_cmd_exec+0x54/0x80 [mlx5_core] mlx5_core_modify_cq+0x6c/0x80 [mlx5_core] mlx5_core_modify_cq_moderation+0xa0/0xb8 [mlx5_core] mlx5e_tx_dim_work+0x54/0x68 [mlx5_core] process_one_work+0x1b0/0x448 worker_thread+0x54/0x468 kthread+0x134/0x138 ret_from_fork+0x10/0x18 Fixes: 485d65e ("net/mlx5: Add a timeout to acquire the command queue semaphore") Signed-off-by: Chenguang Zhao <[email protected]> Reviewed-by: Moshe Shemesh <[email protected]> Acked-by: Tariq Toukan <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
bardliao
pushed a commit
that referenced
this pull request
Jan 14, 2025
Using the 'net' structure via 'current' is not recommended for different reasons. First, if the goal is to use it to read or write per-netns data, this is inconsistent with how the "generic" sysctl entries are doing: directly by only using pointers set to the table entry, e.g. table->data. Linked to that, the per-netns data should always be obtained from the table linked to the netns it had been created for, which may not coincide with the reader's or writer's netns. Another reason is that access to current->nsproxy->netns can oops if attempted when current->nsproxy had been dropped when the current task is exiting. This is what syzbot found, when using acct(2): Oops: general protection fault, probably for non-canonical address 0xdffffc0000000005: 0000 [#1] PREEMPT SMP KASAN PTI KASAN: null-ptr-deref in range [0x0000000000000028-0x000000000000002f] CPU: 1 UID: 0 PID: 5924 Comm: syz-executor Not tainted 6.13.0-rc5-syzkaller-00004-gccb98ccef0e5 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024 RIP: 0010:proc_scheduler+0xc6/0x3c0 net/mptcp/ctrl.c:125 Code: 03 42 80 3c 38 00 0f 85 fe 02 00 00 4d 8b a4 24 08 09 00 00 48 b8 00 00 00 00 00 fc ff df 49 8d 7c 24 28 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 cc 02 00 00 4d 8b 7c 24 28 48 8d 84 24 c8 00 00 RSP: 0018:ffffc900034774e8 EFLAGS: 00010206 RAX: dffffc0000000000 RBX: 1ffff9200068ee9e RCX: ffffc90003477620 RDX: 0000000000000005 RSI: ffffffff8b08f91e RDI: 0000000000000028 RBP: 0000000000000001 R08: ffffc90003477710 R09: 0000000000000040 R10: 0000000000000040 R11: 00000000726f7475 R12: 0000000000000000 R13: ffffc90003477620 R14: ffffc90003477710 R15: dffffc0000000000 FS: 0000000000000000(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fee3cd452d8 CR3: 000000007d116000 CR4: 00000000003526f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> proc_sys_call_handler+0x403/0x5d0 fs/proc/proc_sysctl.c:601 __kernel_write_iter+0x318/0xa80 fs/read_write.c:612 __kernel_write+0xf6/0x140 fs/read_write.c:632 do_acct_process+0xcb0/0x14a0 kernel/acct.c:539 acct_pin_kill+0x2d/0x100 kernel/acct.c:192 pin_kill+0x194/0x7c0 fs/fs_pin.c:44 mnt_pin_kill+0x61/0x1e0 fs/fs_pin.c:81 cleanup_mnt+0x3ac/0x450 fs/namespace.c:1366 task_work_run+0x14e/0x250 kernel/task_work.c:239 exit_task_work include/linux/task_work.h:43 [inline] do_exit+0xad8/0x2d70 kernel/exit.c:938 do_group_exit+0xd3/0x2a0 kernel/exit.c:1087 get_signal+0x2576/0x2610 kernel/signal.c:3017 arch_do_signal_or_restart+0x90/0x7e0 arch/x86/kernel/signal.c:337 exit_to_user_mode_loop kernel/entry/common.c:111 [inline] exit_to_user_mode_prepare include/linux/entry-common.h:329 [inline] __syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline] syscall_exit_to_user_mode+0x150/0x2a0 kernel/entry/common.c:218 do_syscall_64+0xda/0x250 arch/x86/entry/common.c:89 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7fee3cb87a6a Code: Unable to access opcode bytes at 0x7fee3cb87a40. RSP: 002b:00007fffcccac688 EFLAGS: 00000202 ORIG_RAX: 0000000000000037 RAX: 0000000000000000 RBX: 00007fffcccac710 RCX: 00007fee3cb87a6a RDX: 0000000000000041 RSI: 0000000000000000 RDI: 0000000000000003 RBP: 0000000000000003 R08: 00007fffcccac6ac R09: 00007fffcccacac7 R10: 00007fffcccac710 R11: 0000000000000202 R12: 00007fee3cd49500 R13: 00007fffcccac6ac R14: 0000000000000000 R15: 00007fee3cd4b000 </TASK> Modules linked in: ---[ end trace 0000000000000000 ]--- RIP: 0010:proc_scheduler+0xc6/0x3c0 net/mptcp/ctrl.c:125 Code: 03 42 80 3c 38 00 0f 85 fe 02 00 00 4d 8b a4 24 08 09 00 00 48 b8 00 00 00 00 00 fc ff df 49 8d 7c 24 28 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 cc 02 00 00 4d 8b 7c 24 28 48 8d 84 24 c8 00 00 RSP: 0018:ffffc900034774e8 EFLAGS: 00010206 RAX: dffffc0000000000 RBX: 1ffff9200068ee9e RCX: ffffc90003477620 RDX: 0000000000000005 RSI: ffffffff8b08f91e RDI: 0000000000000028 RBP: 0000000000000001 R08: ffffc90003477710 R09: 0000000000000040 R10: 0000000000000040 R11: 00000000726f7475 R12: 0000000000000000 R13: ffffc90003477620 R14: ffffc90003477710 R15: dffffc0000000000 FS: 0000000000000000(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fee3cd452d8 CR3: 000000007d116000 CR4: 00000000003526f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 ---------------- Code disassembly (best guess), 1 bytes skipped: 0: 42 80 3c 38 00 cmpb $0x0,(%rax,%r15,1) 5: 0f 85 fe 02 00 00 jne 0x309 b: 4d 8b a4 24 08 09 00 mov 0x908(%r12),%r12 12: 00 13: 48 b8 00 00 00 00 00 movabs $0xdffffc0000000000,%rax 1a: fc ff df 1d: 49 8d 7c 24 28 lea 0x28(%r12),%rdi 22: 48 89 fa mov %rdi,%rdx 25: 48 c1 ea 03 shr $0x3,%rdx * 29: 80 3c 02 00 cmpb $0x0,(%rdx,%rax,1) <-- trapping instruction 2d: 0f 85 cc 02 00 00 jne 0x2ff 33: 4d 8b 7c 24 28 mov 0x28(%r12),%r15 38: 48 rex.W 39: 8d .byte 0x8d 3a: 84 24 c8 test %ah,(%rax,%rcx,8) Here with 'net.mptcp.scheduler', the 'net' structure is not really needed, because the table->data already has a pointer to the current scheduler, the only thing needed from the per-netns data. Simply use 'data', instead of getting (most of the time) the same thing, but from a longer and indirect way. Fixes: 6963c50 ("mptcp: only allow set existing scheduler for net.mptcp.scheduler") Cc: [email protected] Reported-by: [email protected] Closes: https://lore.kernel.org/[email protected] Suggested-by: Al Viro <[email protected]> Reviewed-by: Mat Martineau <[email protected]> Signed-off-by: Matthieu Baerts (NGI0) <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
bardliao
pushed a commit
that referenced
this pull request
Jan 14, 2025
Max Makarov reported kernel panic [1] in perf user callchain code. The reason for that is the race between uprobe_free_utask and bpf profiler code doing the perf user stack unwind and is triggered within uprobe_free_utask function: - after current->utask is freed and - before current->utask is set to NULL general protection fault, probably for non-canonical address 0x9e759c37ee555c76: 0000 [#1] SMP PTI RIP: 0010:is_uprobe_at_func_entry+0x28/0x80 ... ? die_addr+0x36/0x90 ? exc_general_protection+0x217/0x420 ? asm_exc_general_protection+0x26/0x30 ? is_uprobe_at_func_entry+0x28/0x80 perf_callchain_user+0x20a/0x360 get_perf_callchain+0x147/0x1d0 bpf_get_stackid+0x60/0x90 bpf_prog_9aac297fb833e2f5_do_perf_event+0x434/0x53b ? __smp_call_single_queue+0xad/0x120 bpf_overflow_handler+0x75/0x110 ... asm_sysvec_apic_timer_interrupt+0x1a/0x20 RIP: 0010:__kmem_cache_free+0x1cb/0x350 ... ? uprobe_free_utask+0x62/0x80 ? acct_collect+0x4c/0x220 uprobe_free_utask+0x62/0x80 mm_release+0x12/0xb0 do_exit+0x26b/0xaa0 __x64_sys_exit+0x1b/0x20 do_syscall_64+0x5a/0x80 It can be easily reproduced by running following commands in separate terminals: # while :; do bpftrace -e 'uprobe:/bin/ls:_start { printf("hit\n"); }' -c ls; done # bpftrace -e 'profile:hz:100000 { @[ustack()] = count(); }' Fixing this by making sure current->utask pointer is set to NULL before we start to release the utask object. [1] grafana/pyroscope#3673 Fixes: cfa7f3d ("perf,x86: avoid missing caller address in stack traces captured in uprobe") Reported-by: Max Makarov <[email protected]> Signed-off-by: Jiri Olsa <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Oleg Nesterov <[email protected]> Acked-by: Andrii Nakryiko <[email protected]> Link: https://lore.kernel.org/r/[email protected]
bardliao
pushed a commit
that referenced
this pull request
Feb 6, 2025
The following failure was reported on HPE ProLiant D320: [ 10.693310][ T1] tpm_tis STM0925:00: 2.0 TPM (device-id 0x3, rev-id 0) [ 10.848132][ T1] ------------[ cut here ]------------ [ 10.853559][ T1] WARNING: CPU: 59 PID: 1 at mm/page_alloc.c:4727 __alloc_pages_noprof+0x2ca/0x330 [ 10.862827][ T1] Modules linked in: [ 10.866671][ T1] CPU: 59 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.12.0-lp155.2.g52785e2-default #1 openSUSE Tumbleweed (unreleased) 588cd98293a7c9eba9013378d807364c088c9375 [ 10.882741][ T1] Hardware name: HPE ProLiant DL320 Gen12/ProLiant DL320 Gen12, BIOS 1.20 10/28/2024 [ 10.892170][ T1] RIP: 0010:__alloc_pages_noprof+0x2ca/0x330 [ 10.898103][ T1] Code: 24 08 e9 4a fe ff ff e8 34 36 fa ff e9 88 fe ff ff 83 fe 0a 0f 86 b3 fd ff ff 80 3d 01 e7 ce 01 00 75 09 c6 05 f8 e6 ce 01 01 <0f> 0b 45 31 ff e9 e5 fe ff ff f7 c2 00 00 08 00 75 42 89 d9 80 e1 [ 10.917750][ T1] RSP: 0000:ffffb7cf40077980 EFLAGS: 00010246 [ 10.923777][ T1] RAX: 0000000000000000 RBX: 0000000000040cc0 RCX: 0000000000000000 [ 10.931727][ T1] RDX: 0000000000000000 RSI: 000000000000000c RDI: 0000000000040cc0 The above transcript shows that ACPI pointed a 16 MiB buffer for the log events because RSI maps to the 'order' parameter of __alloc_pages_noprof(). Address the bug by moving from devm_kmalloc() to devm_add_action() and kvmalloc() and devm_add_action(). Suggested-by: Ard Biesheuvel <[email protected]> Cc: [email protected] # v2.6.16+ Fixes: 55a82ab ("[PATCH] tpm: add bios measurement log") Reported-by: Andy Liang <[email protected]> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219495 Reviewed-by: Ard Biesheuvel <[email protected]> Reviewed-by: Stefan Berger <[email protected]> Reviewed-by: Takashi Iwai <[email protected]> Tested-by: Andy Liang <[email protected]> Signed-off-by: Jarkko Sakkinen <[email protected]>
bardliao
pushed a commit
that referenced
this pull request
Feb 6, 2025
In commit e4b5ccd ("drm/v3d: Ensure job pointer is set to NULL after job completion"), we introduced a change to assign the job pointer to NULL after completing a job, indicating job completion. However, this approach created a race condition between the DRM scheduler workqueue and the IRQ execution thread. As soon as the fence is signaled in the IRQ execution thread, a new job starts to be executed. This results in a race condition where the IRQ execution thread sets the job pointer to NULL simultaneously as the `run_job()` function assigns a new job to the pointer. This race condition can lead to a NULL pointer dereference if the IRQ execution thread sets the job pointer to NULL after `run_job()` assigns it to the new job. When the new job completes and the GPU emits an interrupt, `v3d_irq()` is triggered, potentially causing a crash. [ 466.310099] Unable to handle kernel NULL pointer dereference at virtual address 00000000000000c0 [ 466.318928] Mem abort info: [ 466.321723] ESR = 0x0000000096000005 [ 466.325479] EC = 0x25: DABT (current EL), IL = 32 bits [ 466.330807] SET = 0, FnV = 0 [ 466.333864] EA = 0, S1PTW = 0 [ 466.337010] FSC = 0x05: level 1 translation fault [ 466.341900] Data abort info: [ 466.344783] ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000 [ 466.350285] CM = 0, WnR = 0, TnD = 0, TagAccess = 0 [ 466.355350] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 [ 466.360677] user pgtable: 4k pages, 39-bit VAs, pgdp=0000000089772000 [ 466.367140] [00000000000000c0] pgd=0000000000000000, p4d=0000000000000000, pud=0000000000000000 [ 466.375875] Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP [ 466.382163] Modules linked in: rfcomm snd_seq_dummy snd_hrtimer snd_seq snd_seq_device algif_hash algif_skcipher af_alg bnep binfmt_misc vc4 snd_soc_hdmi_codec drm_display_helper cec brcmfmac_wcc spidev rpivid_hevc(C) drm_client_lib brcmfmac hci_uart drm_dma_helper pisp_be btbcm brcmutil snd_soc_core aes_ce_blk v4l2_mem2mem bluetooth aes_ce_cipher snd_compress videobuf2_dma_contig ghash_ce cfg80211 gf128mul snd_pcm_dmaengine videobuf2_memops ecdh_generic sha2_ce ecc videobuf2_v4l2 snd_pcm v3d sha256_arm64 rfkill videodev snd_timer sha1_ce libaes gpu_sched snd videobuf2_common sha1_generic drm_shmem_helper mc rp1_pio drm_kms_helper raspberrypi_hwmon spi_bcm2835 gpio_keys i2c_brcmstb rp1 raspberrypi_gpiomem rp1_mailbox rp1_adc nvmem_rmem uio_pdrv_genirq uio i2c_dev drm ledtrig_pattern drm_panel_orientation_quirks backlight fuse dm_mod ip_tables x_tables ipv6 [ 466.458429] CPU: 0 UID: 1000 PID: 2008 Comm: chromium Tainted: G C 6.13.0-v8+ #18 [ 466.467336] Tainted: [C]=CRAP [ 466.470306] Hardware name: Raspberry Pi 5 Model B Rev 1.0 (DT) [ 466.476157] pstate: 404000c9 (nZcv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 466.483143] pc : v3d_irq+0x118/0x2e0 [v3d] [ 466.487258] lr : __handle_irq_event_percpu+0x60/0x228 [ 466.492327] sp : ffffffc080003ea0 [ 466.495646] x29: ffffffc080003ea0 x28: ffffff80c0c94200 x27: 0000000000000000 [ 466.502807] x26: ffffffd08dd81d7b x25: ffffff80c0c94200 x24: ffffff8003bdc200 [ 466.509969] x23: 0000000000000001 x22: 00000000000000a7 x21: 0000000000000000 [ 466.517130] x20: ffffff8041bb0000 x19: 0000000000000001 x18: 0000000000000000 [ 466.524291] x17: ffffffafadfb0000 x16: ffffffc080000000 x15: 0000000000000000 [ 466.531452] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000 [ 466.538613] x11: 0000000000000000 x10: 0000000000000000 x9 : ffffffd08c527eb0 [ 466.545777] x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000000000 [ 466.552941] x5 : ffffffd08c4100d0 x4 : ffffffafadfb0000 x3 : ffffffc080003f70 [ 466.560102] x2 : ffffffc0829e8058 x1 : 0000000000000001 x0 : 0000000000000000 [ 466.567263] Call trace: [ 466.569711] v3d_irq+0x118/0x2e0 [v3d] (P) [ 466.573826] __handle_irq_event_percpu+0x60/0x228 [ 466.578546] handle_irq_event+0x54/0xb8 [ 466.582391] handle_fasteoi_irq+0xac/0x240 [ 466.586498] generic_handle_domain_irq+0x34/0x58 [ 466.591128] gic_handle_irq+0x48/0xd8 [ 466.594798] call_on_irq_stack+0x24/0x58 [ 466.598730] do_interrupt_handler+0x88/0x98 [ 466.602923] el0_interrupt+0x44/0xc0 [ 466.606508] __el0_irq_handler_common+0x18/0x28 [ 466.611050] el0t_64_irq_handler+0x10/0x20 [ 466.615156] el0t_64_irq+0x198/0x1a0 [ 466.618740] Code: 52800035 3607faf3 f9442e80 52800021 (f9406018) [ 466.624853] ---[ end trace 0000000000000000 ]--- [ 466.629483] Kernel panic - not syncing: Oops: Fatal exception in interrupt [ 466.636384] SMP: stopping secondary CPUs [ 466.640320] Kernel Offset: 0x100c400000 from 0xffffffc080000000 [ 466.646259] PHYS_OFFSET: 0x0 [ 466.649141] CPU features: 0x100,00000170,00901250,0200720b [ 466.654644] Memory Limit: none [ 466.657706] ---[ end Kernel panic - not syncing: Oops: Fatal exception in interrupt ]--- Fix the crash by assigning the job pointer to NULL before signaling the fence. This ensures that the job pointer is cleared before any new job starts execution, preventing the race condition and the NULL pointer dereference crash. Cc: [email protected] Fixes: e4b5ccd ("drm/v3d: Ensure job pointer is set to NULL after job completion") Signed-off-by: Maíra Canal <[email protected]> Reviewed-by: Jose Maria Casanova Crespo <[email protected]> Reviewed-by: Iago Toral Quiroga <[email protected]> Tested-by: Phil Elwell <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
bardliao
pushed a commit
that referenced
this pull request
Feb 6, 2025
On some systems, the same CPU (with the same APIC ID) is assigned a different logical CPU id after commit ec9aedb ("x86/acpi: Ignore invalid x2APIC entries"). This means that Linux enumerates the CPUs in a different order, which violates ACPI specification[1] that states: "OSPM should initialize processors in the order that they appear in the MADT" The problematic commit parses all LAPIC entries before any x2APIC entries, aiming to ignore x2APIC entries with APIC ID < 255 when valid LAPIC entries exist. However, it disrupts the CPU enumeration order on systems where x2APIC entries precede LAPIC entries in the MADT. Fix this problem by: 1) Parsing LAPIC entries first without registering them in the topology to evaluate whether valid LAPIC entries exist. 2) Restoring the MADT in order parser which invokes either the LAPIC or the X2APIC parser function depending on the entry type. The X2APIC parser still ignores entries < 0xff in case that #1 found valid LAPIC entries independent of their position in the MADT table. Link: https://uefi.org/specs/ACPI/6.5/05_ACPI_Software_Programming_Model.html#madt-processor-local-apic-sapic-structure-entry-order Cc: All applicable <[email protected]> Reported-by: Jim Mattson <[email protected]> Closes: https://lore.kernel.org/all/[email protected]/ Fixes: ec9aedb ("x86/acpi: Ignore invalid x2APIC entries") Signed-off-by: Zhang Rui <[email protected]> Reviewed-by: Jim Mattson <[email protected]> Tested-by: Jim Mattson <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Rafael J. Wysocki <[email protected]>
bardliao
pushed a commit
that referenced
this pull request
Feb 6, 2025
Add read memory barrier to ensure the order of operations when accessing control queue descriptors. Specifically, we want to avoid cases where loads can be reordered: 1. Load #1 is dispatched to read descriptor flags. 2. Load #2 is dispatched to read some other field from the descriptor. 3. Load #2 completes, accessing memory/cache at a point in time when the DD flag is zero. 4. NIC DMA overwrites the descriptor, now the DD flag is one. 5. Any fields loaded before step 4 are now inconsistent with the actual descriptor state. Add read memory barrier between steps 1 and 2, so that load #2 is not executed until load #1 has completed. Fixes: 8077c72 ("idpf: add controlq init and reset checks") Reviewed-by: Przemek Kitszel <[email protected]> Reviewed-by: Sridhar Samudrala <[email protected]> Suggested-by: Lance Richardson <[email protected]> Signed-off-by: Emil Tantilov <[email protected]> Tested-by: Krishneil Singh <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
bardliao
pushed a commit
that referenced
this pull request
Feb 6, 2025
When testing the atomic write fix patches, the f2fs_bug_on was triggered as below: ------------[ cut here ]------------ kernel BUG at fs/f2fs/inode.c:935! Oops: invalid opcode: 0000 [#1] PREEMPT SMP PTI CPU: 3 UID: 0 PID: 257 Comm: bash Not tainted 6.13.0-rc1-00033-gc283a70d3497 #5 RIP: 0010:f2fs_evict_inode+0x50f/0x520 Call Trace: <TASK> ? __die_body+0x65/0xb0 ? die+0x9f/0xc0 ? do_trap+0xa1/0x170 ? f2fs_evict_inode+0x50f/0x520 ? f2fs_evict_inode+0x50f/0x520 ? handle_invalid_op+0x65/0x80 ? f2fs_evict_inode+0x50f/0x520 ? exc_invalid_op+0x39/0x50 ? asm_exc_invalid_op+0x1a/0x20 ? __pfx_f2fs_get_dquots+0x10/0x10 ? f2fs_evict_inode+0x50f/0x520 ? f2fs_evict_inode+0x2e5/0x520 evict+0x186/0x2f0 prune_icache_sb+0x75/0xb0 super_cache_scan+0x1a8/0x200 do_shrink_slab+0x163/0x320 shrink_slab+0x2fc/0x470 drop_slab+0x82/0xf0 drop_caches_sysctl_handler+0x4e/0xb0 proc_sys_call_handler+0x183/0x280 vfs_write+0x36d/0x450 ksys_write+0x68/0xd0 do_syscall_64+0xc8/0x1a0 ? arch_exit_to_user_mode_prepare+0x11/0x60 ? irqentry_exit_to_user_mode+0x7e/0xa0 The root cause is: f2fs uses FI_ATOMIC_DIRTIED to indicate dirty atomic files during commit. If the inode is dirtied during commit, such as by f2fs_i_pino_write, the vfs inode keeps clean and the f2fs inode is set to FI_DIRTY_INODE. The FI_DIRTY_INODE flag cann't be cleared by write_inode later due to the clean vfs inode. Finally, f2fs_bug_on is triggered due to this inconsistent state when evict. To reproduce this situation: - fd = open("/mnt/test.db", O_WRONLY) - ioctl(fd, F2FS_IOC_START_ATOMIC_WRITE) - mv /mnt/test.db /mnt/test1.db - ioctl(fd, F2FS_IOC_COMMIT_ATOMIC_WRITE) - echo 3 > /proc/sys/vm/drop_caches To fix this problem, clear FI_DIRTY_INODE after commit, then f2fs_mark_inode_dirty_sync will ensure a consistent dirty state. Fixes: fccaa81 ("f2fs: prevent atomic file from being dirtied before commit") Signed-off-by: Yunlei He <[email protected]> Signed-off-by: Jianan Huang <[email protected]> Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
bardliao
pushed a commit
that referenced
this pull request
Feb 6, 2025
…g_info syz reported: (syz-executor404,5313,0):ocfs2_truncate_log_append:5874 ERROR: bug expression: tl_count > ocfs2_truncate_recs_per_inode(osb->sb) || tl_count == 0 (syz-executor404,5313,0):ocfs2_truncate_log_append:5874 ERROR: Truncate record count on #77 invalid wanted 39, actual 2087 ------------[ cut here ]------------ kernel BUG at fs/ocfs2/alloc.c:5874! Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI CPU: 0 UID: 0 PID: 5313 Comm: syz-executor404 Not tainted 6.12.0-rc5-syzkaller-00299-g11066801dd4b #0 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014 RIP: 0010:ocfs2_truncate_log_append+0x9a8/0x9c0 fs/ocfs2/alloc.c:5868 RSP: 0018:ffffc9000cf16f40 EFLAGS: 00010292 RAX: b4b54f1d10640800 RBX: 0000000000000027 RCX: b4b54f1d10640800 RDX: 0000000000000000 RSI: 0000000080000000 RDI: 0000000000000000 RBP: ffffc9000cf17070 R08: ffffffff8174a14c R09: 1ffff11003f8519a R10: dffffc0000000000 R11: ffffed1003f8519b R12: 1ffff110085f5f58 R13: ffffff3800000000 R14: 000000000000004d R15: ffff8880438f0008 FS: 00005555722df380(0000) GS:ffff88801fc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000002000f000 CR3: 000000004010e000 CR4: 0000000000352ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> ocfs2_remove_btree_range+0x1303/0x1860 fs/ocfs2/alloc.c:5789 ocfs2_remove_inode_range+0xff3/0x29f0 fs/ocfs2/file.c:1907 ocfs2_reflink_remap_extent fs/ocfs2/refcounttree.c:4537 [inline] ocfs2_reflink_remap_blocks+0xcd4/0x1f30 fs/ocfs2/refcounttree.c:4684 ocfs2_remap_file_range+0x5fa/0x8d0 fs/ocfs2/file.c:2736 vfs_copy_file_range+0xc07/0x1510 fs/read_write.c:1615 __do_sys_copy_file_range fs/read_write.c:1705 [inline] __se_sys_copy_file_range+0x3f2/0x5d0 fs/read_write.c:1668 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7fd327167af9 Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 61 17 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007ffe6b8e22e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000146 RAX: ffffffffffffffda RBX: 00007fd3271b005e RCX: 00007fd327167af9 RDX: 0000000000000006 RSI: 0000000000000000 RDI: 0000000000000004 RBP: 00007fd3271de610 R08: 000000000000d8c2 R09: 0000000000000000 R10: 0000000020000640 R11: 0000000000000246 R12: 0000000000000001 R13: 00007ffe6b8e24b8 R14: 0000000000000001 R15: 0000000000000001 </TASK> The fuzz image has a truncate log inode whose tl_count is bigger than ocfs2_truncate_recs_per_inode() so it triggers the BUG in ocfs2_truncate_log_append(). As what the check in ocfs2_truncate_log_append() does, just do same check into ocfs2_get_truncate_log_info when truncate log inode is reading in so we can bail out earlier. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Su Yue <[email protected]> Reported-by: Liebes Wang <[email protected]> Link: https://lore.kernel.org/ocfs2-devel/CADCV8souQhdP0RdQF1U7KTWtuHDfpn+3LnTt-EEuMmB-pMRrgQ@mail.gmail.com/T/#u Reported-by: [email protected] Tested-by: [email protected] Reviewed-by: Joseph Qi <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Cc: Junxiao Bi <[email protected]> Cc: Changwei Ge <[email protected]> Cc: Jun Piao <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
bardliao
pushed a commit
that referenced
this pull request
Feb 6, 2025
In "one-shot" mode, turbostat 1. takes a counter snapshot 2. forks and waits for a child 3. takes the end counter snapshot and prints the result. But turbostat counter snapshots currently use affinity to travel around the system so that counter reads are "local", and this affinity must be cleared between #1 and #2 above. The offending commit removed that reset that allowed the child to run on cpu_present_set. Fix that issue, and improve upon the original by using cpu_possible_set for the child. This allows the child to also run on CPUs that hotplug online during its runtime. Reported-by: Zhang Rui <[email protected]> Fixes: 7bb3fe2 ("tools/power/turbostat: Obey allowed CPUs during startup") Signed-off-by: Len Brown <[email protected]>
bardliao
pushed a commit
that referenced
this pull request
Feb 6, 2025
libtraceevent parses and returns an array of argument fields, sometimes larger than RAW_SYSCALL_ARGS_NUM (6) because it includes "__syscall_nr", idx will traverse to index 6 (7th element) whereas sc->fmt->arg holds 6 elements max, creating an out-of-bounds access. This runtime error is found by UBsan. The error message: $ sudo UBSAN_OPTIONS=print_stacktrace=1 ./perf trace -a --max-events=1 builtin-trace.c:1966:35: runtime error: index 6 out of bounds for type 'syscall_arg_fmt [6]' #0 0x5c04956be5fe in syscall__alloc_arg_fmts /home/howard/hw/linux-perf/tools/perf/builtin-trace.c:1966 #1 0x5c04956c0510 in trace__read_syscall_info /home/howard/hw/linux-perf/tools/perf/builtin-trace.c:2110 #2 0x5c04956c372b in trace__syscall_info /home/howard/hw/linux-perf/tools/perf/builtin-trace.c:2436 #3 0x5c04956d2f39 in trace__init_syscalls_bpf_prog_array_maps /home/howard/hw/linux-perf/tools/perf/builtin-trace.c:3897 #4 0x5c04956d6d25 in trace__run /home/howard/hw/linux-perf/tools/perf/builtin-trace.c:4335 #5 0x5c04956e112e in cmd_trace /home/howard/hw/linux-perf/tools/perf/builtin-trace.c:5502 #6 0x5c04956eda7d in run_builtin /home/howard/hw/linux-perf/tools/perf/perf.c:351 #7 0x5c04956ee0a8 in handle_internal_command /home/howard/hw/linux-perf/tools/perf/perf.c:404 #8 0x5c04956ee37f in run_argv /home/howard/hw/linux-perf/tools/perf/perf.c:448 #9 0x5c04956ee8e9 in main /home/howard/hw/linux-perf/tools/perf/perf.c:556 #10 0x79eb3622a3b7 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58 #11 0x79eb3622a47a in __libc_start_main_impl ../csu/libc-start.c:360 #12 0x5c04955422d4 in _start (/home/howard/hw/linux-perf/tools/perf/perf+0x4e02d4) (BuildId: 5b6cab2d59e96a4341741765ad6914a4d784dbc6) 0.000 ( 0.014 ms): Chrome_ChildIO/117244 write(fd: 238, buf: !, count: 1) = 1 Fixes: 5e58fcf ("perf trace: Allow allocating sc->arg_fmt even without the syscall tracepoint") Signed-off-by: Howard Chu <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Namhyung Kim <[email protected]>
bardliao
pushed a commit
that referenced
this pull request
Feb 6, 2025
This fixes the following hard lockup in isolate_lru_folios() during memory reclaim. If the LRU mostly contains ineligible folios this may trigger watchdog. watchdog: Watchdog detected hard LOCKUP on cpu 173 RIP: 0010:native_queued_spin_lock_slowpath+0x255/0x2a0 Call Trace: _raw_spin_lock_irqsave+0x31/0x40 folio_lruvec_lock_irqsave+0x5f/0x90 folio_batch_move_lru+0x91/0x150 lru_add_drain_per_cpu+0x1c/0x40 process_one_work+0x17d/0x350 worker_thread+0x27b/0x3a0 kthread+0xe8/0x120 ret_from_fork+0x34/0x50 ret_from_fork_asm+0x1b/0x30 lruvec->lru_lock owner: PID: 2865 TASK: ffff888139214d40 CPU: 40 COMMAND: "kswapd0" #0 [fffffe0000945e60] crash_nmi_callback at ffffffffa567a555 #1 [fffffe0000945e68] nmi_handle at ffffffffa563b171 #2 [fffffe0000945eb0] default_do_nmi at ffffffffa6575920 #3 [fffffe0000945ed0] exc_nmi at ffffffffa6575af4 #4 [fffffe0000945ef0] end_repeat_nmi at ffffffffa6601dde [exception RIP: isolate_lru_folios+403] RIP: ffffffffa597df53 RSP: ffffc90006fb7c28 RFLAGS: 00000002 RAX: 0000000000000001 RBX: ffffc90006fb7c60 RCX: ffffea04a2196f88 RDX: ffffc90006fb7c60 RSI: ffffc90006fb7c60 RDI: ffffea04a2197048 RBP: ffff88812cbd3010 R8: ffffea04a2197008 R9: 0000000000000001 R10: 0000000000000000 R11: 0000000000000001 R12: ffffea04a2197008 R13: ffffea04a2197048 R14: ffffc90006fb7de8 R15: 0000000003e3e937 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 <NMI exception stack> #5 [ffffc90006fb7c28] isolate_lru_folios at ffffffffa597df53 #6 [ffffc90006fb7cf8] shrink_active_list at ffffffffa597f788 #7 [ffffc90006fb7da8] balance_pgdat at ffffffffa5986db0 #8 [ffffc90006fb7ec0] kswapd at ffffffffa5987354 #9 [ffffc90006fb7ef8] kthread at ffffffffa5748238 crash> Scenario: User processe are requesting a large amount of memory and keep page active. Then a module continuously requests memory from ZONE_DMA32 area. Memory reclaim will be triggered due to ZONE_DMA32 watermark alarm reached. However pages in the LRU(active_anon) list are mostly from the ZONE_NORMAL area. Reproduce: Terminal 1: Construct to continuously increase pages active(anon). mkdir /tmp/memory mount -t tmpfs -o size=1024000M tmpfs /tmp/memory dd if=/dev/zero of=/tmp/memory/block bs=4M tail /tmp/memory/block Terminal 2: vmstat -a 1 active will increase. procs ---memory--- ---swap-- ---io---- -system-- ---cpu--- ... r b swpd free inact active si so bi bo 1 0 0 1445623076 45898836 83646008 0 0 0 1 0 0 1445623076 43450228 86094616 0 0 0 1 0 0 1445623076 41003480 88541364 0 0 0 1 0 0 1445623076 38557088 90987756 0 0 0 1 0 0 1445623076 36109688 93435156 0 0 0 1 0 0 1445619552 33663256 95881632 0 0 0 1 0 0 1445619804 31217140 98327792 0 0 0 1 0 0 1445619804 28769988 100774944 0 0 0 1 0 0 1445619804 26322348 103222584 0 0 0 1 0 0 1445619804 23875592 105669340 0 0 0 cat /proc/meminfo | head Active(anon) increase. MemTotal: 1579941036 kB MemFree: 1445618500 kB MemAvailable: 1453013224 kB Buffers: 6516 kB Cached: 128653956 kB SwapCached: 0 kB Active: 118110812 kB Inactive: 11436620 kB Active(anon): 115345744 kB Inactive(anon): 945292 kB When the Active(anon) is 115345744 kB, insmod module triggers the ZONE_DMA32 watermark. perf record -e vmscan:mm_vmscan_lru_isolate -aR perf script isolate_mode=0 classzone=1 order=1 nr_requested=32 nr_scanned=2 nr_skipped=2 nr_taken=0 lru=active_anon isolate_mode=0 classzone=1 order=1 nr_requested=32 nr_scanned=0 nr_skipped=0 nr_taken=0 lru=active_anon isolate_mode=0 classzone=1 order=0 nr_requested=32 nr_scanned=28835844 nr_skipped=28835844 nr_taken=0 lru=active_anon isolate_mode=0 classzone=1 order=1 nr_requested=32 nr_scanned=28835844 nr_skipped=28835844 nr_taken=0 lru=active_anon isolate_mode=0 classzone=1 order=0 nr_requested=32 nr_scanned=29 nr_skipped=29 nr_taken=0 lru=active_anon isolate_mode=0 classzone=1 order=0 nr_requested=32 nr_scanned=0 nr_skipped=0 nr_taken=0 lru=active_anon See nr_scanned=28835844. 28835844 * 4k = 115343376KB approximately equal to 115345744 kB. If increase Active(anon) to 1000G then insmod module triggers the ZONE_DMA32 watermark. hard lockup will occur. In my device nr_scanned = 0000000003e3e937 when hard lockup. Convert to memory size 0x0000000003e3e937 * 4KB = 261072092 KB. [ffffc90006fb7c28] isolate_lru_folios at ffffffffa597df53 ffffc90006fb7c30: 0000000000000020 0000000000000000 ffffc90006fb7c40: ffffc90006fb7d40 ffff88812cbd3000 ffffc90006fb7c50: ffffc90006fb7d30 0000000106fb7de8 ffffc90006fb7c60: ffffea04a2197008 ffffea0006ed4a48 ffffc90006fb7c70: 0000000000000000 0000000000000000 ffffc90006fb7c80: 0000000000000000 0000000000000000 ffffc90006fb7c90: 0000000000000000 0000000000000000 ffffc90006fb7ca0: 0000000000000000 0000000003e3e937 ffffc90006fb7cb0: 0000000000000000 0000000000000000 ffffc90006fb7cc0: 8d7c0b56b7874b00 ffff88812cbd3000 About the Fixes: Why did it take eight years to be discovered? The problem requires the following conditions to occur: 1. The device memory should be large enough. 2. Pages in the LRU(active_anon) list are mostly from the ZONE_NORMAL area. 3. The memory in ZONE_DMA32 needs to reach the watermark. If the memory is not large enough, or if the usage design of ZONE_DMA32 area memory is reasonable, this problem is difficult to detect. notes: The problem is most likely to occur in ZONE_DMA32 and ZONE_NORMAL, but other suitable scenarios may also trigger the problem. Link: https://lkml.kernel.org/r/[email protected] Fixes: b2e1875 ("mm, vmscan: begin reclaiming pages on a per-node basis") Signed-off-by: liuye <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Yang Shi <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.