selftests: mptcp_join tests are creating warnings #86

matttbe · 2020-09-07T06:16:18Z

When launching mptcp_join.sh selftests with a debug kernel [1], I got these warnings 10 times:

# selftests: net/mptcp: mptcp_join.sh

(...)

# 06 multiple subflows, limited by server syn[ ok ] - synack[ ok ] - ack[ ok ]
[  266.062334] BUG: using __this_cpu_add() in preemptible [00000000] code: mptcp_connect/3294
[  266.063559] caller is mptcp_incoming_options (net/mptcp/mib.h:44) 
[  266.064323] CPU: 0 PID: 3294 Comm: mptcp_connect Not tainted 5.9.0-rc1+ #45
[  266.065241] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1 04/01/2014
[  266.066271] Call Trace:
[  266.066615] dump_stack (lib/dump_stack.c:120) 
[  266.067021] check_preemption_disabled (lib/smp_processor_id.c:48) 
[  266.067579] mptcp_incoming_options (net/mptcp/mib.h:44) 
[  266.068215] ? cipso_v4_getattr (net/ipv4/cipso_ipv4.c:263) 
[  266.068766] ? mptcp_update_rcv_data_fin (net/mptcp/options.c:864) 
[  266.069376] ? tcp_ack (net/ipv4/tcp_input.c:3415) 
[  266.069984] tcp_data_queue (net/ipv4/tcp_input.c:4911) 
[  266.070591] ? tcp_parse_options (net/ipv4/tcp_input.c:4048) 
[  266.071171] ? tcp_urg (net/ipv4/tcp_input.c:5525) 
[  266.071753] ? tcp_send_rcvq (net/ipv4/tcp_input.c:4905) 
[  266.072411] ? tcp_conn_request (net/ipv4/tcp_input.c:5521) 
[  266.073071] ? tcp_validate_incoming (./include/net/tcp.h:732) 
[  266.073734] ? kvm_clock_read (./arch/x86/include/asm/preempt.h:94) 
[  266.074232] tcp_rcv_established (./include/linux/skbuff.h:1765) 
[  266.074822] ? tcp_data_queue (net/ipv4/tcp_input.c:5699) 
[  266.075336] ? mark_held_locks (kernel/locking/lockdep.c:3611) 
[  266.075829] tcp_v4_do_rcv (net/ipv4/tcp_ipv4.c:1686) 
[  266.076284] __release_sock (net/core/sock.c:2530) 
[  266.076744] release_sock (net/core/sock.c:3056) 
[  266.077239] mptcp_sendmsg (./include/net/sock.h:947) 
[  266.077834] ? mptcp_clean_una (net/mptcp/protocol.c:1167) 
[  266.078489] ? touch_atime (fs/inode.c:1828) 
[  266.078985] ? inet_send_prepare (net/ipv4/af_inet.c:805) 
[  266.079492] ? inet_send_prepare (net/ipv4/af_inet.c:814) 
[  266.080007] sock_sendmsg (net/socket.c:651) 
[  266.080438] sock_write_iter (net/socket.c:999) 
[  266.081031] ? sock_sendmsg (net/socket.c:982) 
[  266.081540] ? iov_iter_init (lib/iov_iter.c:459) 
[  266.082064] new_sync_write (./include/linux/fs.h:1882) 
[  266.082593] ? new_sync_read (fs/read_write.c:493) 
[  266.083118] ? file_has_perm (security/selinux/hooks.c:1732) 
[  266.083786] vfs_write (fs/read_write.c:578) 
[  266.084265] ksys_write (fs/read_write.c:631) 
[  266.084738] ? __ia32_sys_read (fs/read_write.c:621) 
[  266.085285] ? lockdep_hardirqs_on_prepare (kernel/locking/lockdep.c:398) 
[  266.086049] ? syscall_enter_from_user_mode (./arch/x86/include/asm/irqflags.h:54) 
[  266.086862] ? syscall_enter_from_user_mode (./arch/x86/include/asm/irqflags.h:54) 
[  266.087735] ? lockdep_hardirqs_on (kernel/locking/lockdep.c:3762 (discriminator 3)) 
[  266.088431] do_syscall_64 (arch/x86/entry/common.c:46) 
[  266.088981] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:125) 
[  266.089796] RIP: 0033:0x7f42b381d057
[ 266.090390] Code: 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
All code
========
   0:	64 89 02             	mov    %eax,%fs:(%rdx)
   3:	48 c7 c0 ff ff ff ff 	mov    $0xffffffffffffffff,%rax
   a:	eb bb                	jmp    0xffffffffffffffc7
   c:	0f 1f 80 00 00 00 00 	nopl   0x0(%rax)
  13:	f3 0f 1e fa          	endbr64 
  17:	64 8b 04 25 18 00 00 	mov    %fs:0x18,%eax
  1e:	00 
  1f:	85 c0                	test   %eax,%eax
  21:	75 10                	jne    0x33
  23:	b8 01 00 00 00       	mov    $0x1,%eax
  28:	0f 05                	syscall 
  2a:*	48 3d 00 f0 ff ff    	cmp    $0xfffffffffffff000,%rax		<-- trapping instruction
  30:	77 51                	ja     0x83
  32:	c3                   	retq   
  33:	48 83 ec 28          	sub    $0x28,%rsp
  37:	48 89 54 24 18       	mov    %rdx,0x18(%rsp)
  3c:	48                   	rex.W
  3d:	89                   	.byte 0x89
  3e:	74 24                	je     0x64

Code starting with the faulting instruction
===========================================
   0:	48 3d 00 f0 ff ff    	cmp    $0xfffffffffffff000,%rax
   6:	77 51                	ja     0x59
   8:	c3                   	retq   
   9:	48 83 ec 28          	sub    $0x28,%rsp
   d:	48 89 54 24 18       	mov    %rdx,0x18(%rsp)
  12:	48                   	rex.W
  13:	89                   	.byte 0x89
  14:	74 24                	je     0x3a
[  266.092953] RSP: 002b:00007ffec25b5c38 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[  266.094061] RAX: ffffffffffffffda RBX: 0000000000000064 RCX: 00007f42b381d057
[  266.095133] RDX: 0000000000000064 RSI: 00007ffec25b5c60 RDI: 0000000000000005
[  266.096148] RBP: 0000000000000005 R08: 0000000000000000 R09: 00007f42b38f7240
[  266.097122] R10: 00007f42b38c476e R11: 0000000000000246 R12: 000000000000041c
[  266.098123] R13: 00007ffec25b5c58 R14: 00007ffec25b5c60 R15: 0000000000000064
(...)
# 07 unused signal address                syn[ ok ] - synack[ ok ] - ack[ ok ]
#                                         add[ ok ] - echo  [ ok ]
(...)

[1] Compared to a non debug kernel, the kconfig contains:

-e KASAN -e KASAN_OUTLINE -d TEST_KASAN -e PROVE_LOCKING -e DEBUG_LOCKDEP -e PREEMPT -e DEBUG_PREEMPT -e DEBUG_SLAVE -e DEBUG_PAGEALLOC -e DEBUG_MUTEXES -e DEBUG_SPINLOCK -e DEBUG_ATOMIC_SLEEP -e PROVE_RCU -e DEBUG_OBJECTS_RCU_HEAD

The text was updated successfully, but these errors were encountered:

matttbe · 2020-09-07T09:53:08Z

Fixed by @pabeni : https://patchwork.ozlabs.org/project/mptcp/patch/0a4053db6ce391392514c394f83386ed75efa050.1599470677.git.pabeni@redhat.com/

The commit eb1f002 ("lockdep,trace: Expose tracepoints"), started to expose us for tracepoints. This lead to the following RCU splat on an ARM64 Qcom board. [ 5.529634] WARNING: suspicious RCU usage [ 5.537307] sdhci-pltfm: SDHCI platform and OF driver helper [ 5.541092] 5.9.0-rc3 #86 Not tainted [ 5.541098] ----------------------------- [ 5.541105] ../include/trace/events/lock.h:37 suspicious rcu_dereference_check() usage! [ 5.541110] [ 5.541110] other info that might help us debug this: [ 5.541110] [ 5.541116] [ 5.541116] rcu_scheduler_active = 2, debug_locks = 1 [ 5.541122] RCU used illegally from extended quiescent state! [ 5.541129] no locks held by swapper/0/0. [ 5.541134] [ 5.541134] stack backtrace: [ 5.541143] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.9.0-rc3 #86 [ 5.541149] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT) [ 5.541157] Call trace: [ 5.568185] sdhci_msm 7864900.sdhci: Got CD GPIO [ 5.574186] dump_backtrace+0x0/0x1c8 [ 5.574206] show_stack+0x14/0x20 [ 5.574229] dump_stack+0xe8/0x154 [ 5.574250] lockdep_rcu_suspicious+0xd4/0xf8 [ 5.574269] lock_acquire+0x3f0/0x460 [ 5.574292] _raw_spin_lock_irqsave+0x80/0xb0 [ 5.574314] __pm_runtime_suspend+0x4c/0x188 [ 5.574341] psci_enter_domain_idle_state+0x40/0xa0 [ 5.574362] cpuidle_enter_state+0xc0/0x610 [ 5.646487] cpuidle_enter+0x38/0x50 [ 5.650651] call_cpuidle+0x18/0x40 [ 5.654467] do_idle+0x228/0x278 [ 5.657678] cpu_startup_entry+0x24/0x70 [ 5.661153] rest_init+0x1a4/0x278 [ 5.665061] arch_call_rest_init+0xc/0x14 [ 5.668272] start_kernel+0x508/0x540 Following the path in pm_runtime_put_sync_suspend() from psci_enter_domain_idle_state(), it seems like we end up using the RCU. Therefore, let's simply silence the splat by informing the RCU about it with RCU_NONIDLE. Note that, this is a temporary solution. Instead we should strive to avoid using RCU_NONIDLE (and similar), but rather push rcu_idle_enter|exit() further down, closer to the arch specific code. However, as the CPU PM notifiers are also using the RCU, additional rework is needed. Reported-by: Naresh Kamboju <[email protected]> Signed-off-by: Ulf Hansson <[email protected]> Acked-by: Paul E. McKenney <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>

Liajian reported a bug_on hit on a ThunderX2 arm64 server with FastLinQ QL41000 ethernet controller: BUG: scheduling while atomic: kworker/0:4/531/0x00000200 [qed_probe:488()]hw prepare failed kernel BUG at mm/vmalloc.c:2355! Internal error: Oops - BUG: 0 [#1] SMP CPU: 0 PID: 531 Comm: kworker/0:4 Tainted: G W 5.4.0-77-generic #86-Ubuntu pstate: 00400009 (nzcv daif +PAN -UAO) Call trace: vunmap+0x4c/0x50 iounmap+0x48/0x58 qed_free_pci+0x60/0x80 [qed] qed_probe+0x35c/0x688 [qed] __qede_probe+0x88/0x5c8 [qede] qede_probe+0x60/0xe0 [qede] local_pci_probe+0x48/0xa0 work_for_cpu_fn+0x24/0x38 process_one_work+0x1d0/0x468 worker_thread+0x238/0x4e0 kthread+0xf0/0x118 ret_from_fork+0x10/0x18 In this case, qed_hw_prepare() returns error due to hw/fw error, but in theory work queue should be in process context instead of interrupt. The root cause might be the unpaired spin_{un}lock_bh() in _qed_mcp_cmd_and_union(), which causes botton half is disabled incorrectly. Reported-by: Lijian Zhang <[email protected]> Signed-off-by: Jia He <[email protected]> Signed-off-by: David S. Miller <[email protected]>

In some cases skb head could be locked and entire header data is pulled from skb. When skb_zerocopy() called in such cases, following BUG is triggered. This patch fixes it by copying entire skb in such cases. This could be optimized incase this is performance bottleneck. ---8<--- kernel BUG at net/core/skbuff.c:2961! invalid opcode: 0000 [#1] SMP PTI CPU: 2 PID: 0 Comm: swapper/2 Tainted: G OE 5.4.0-77-generic #86-Ubuntu Hardware name: OpenStack Foundation OpenStack Nova, BIOS 1.13.0-1ubuntu1.1 04/01/2014 RIP: 0010:skb_zerocopy+0x37a/0x3a0 RSP: 0018:ffffbcc70013ca38 EFLAGS: 00010246 Call Trace: <IRQ> queue_userspace_packet+0x2af/0x5e0 [openvswitch] ovs_dp_upcall+0x3d/0x60 [openvswitch] ovs_dp_process_packet+0x125/0x150 [openvswitch] ovs_vport_receive+0x77/0xd0 [openvswitch] netdev_port_receive+0x87/0x130 [openvswitch] netdev_frame_hook+0x4b/0x60 [openvswitch] __netif_receive_skb_core+0x2b4/0xc90 __netif_receive_skb_one_core+0x3f/0xa0 __netif_receive_skb+0x18/0x60 process_backlog+0xa9/0x160 net_rx_action+0x142/0x390 __do_softirq+0xe1/0x2d6 irq_exit+0xae/0xb0 do_IRQ+0x5a/0xf0 common_interrupt+0xf/0xf Code that triggered BUG: int skb_zerocopy(struct sk_buff *to, struct sk_buff *from, int len, int hlen) { int i, j = 0; int plen = 0; /* length of skb->head fragment */ int ret; struct page *page; unsigned int offset; BUG_ON(!from->head_frag && !hlen); Signed-off-by: Pravin B Shelar <[email protected]> Signed-off-by: David S. Miller <[email protected]>

…ging The following bug is reported to be triggered when starting X on x86-32 system with i915: [ 225.777375] kernel BUG at mm/memory.c:2664! [ 225.777391] invalid opcode: 0000 [#1] PREEMPT SMP [ 225.777405] CPU: 0 PID: 2402 Comm: Xorg Not tainted 6.1.0-rc3-bdg+ #86 [ 225.777415] Hardware name: /8I865G775-G, BIOS F1 08/29/2006 [ 225.777421] EIP: __apply_to_page_range+0x24d/0x31c [ 225.777437] Code: ff ff 8b 55 e8 8b 45 cc e8 0a 11 ec ff 89 d8 83 c4 28 5b 5e 5f 5d c3 81 7d e0 a0 ef 96 c1 74 ad 8b 45 d0 e8 2d 83 49 00 eb a3 <0f> 0b 25 00 f0 ff ff 81 eb 00 00 00 40 01 c3 8b 45 ec 8b 00 e8 76 [ 225.777446] EAX: 00000001 EBX: c53a3b58 ECX: b5c00000 EDX: c258aa00 [ 225.777454] ESI: b5c00000 EDI: b5900000 EBP: c4b0fdb4 ESP: c4b0fd80 [ 225.777462] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 EFLAGS: 00010202 [ 225.777470] CR0: 80050033 CR2: b5900000 CR3: 053a3000 CR4: 000006d0 [ 225.777479] Call Trace: [ 225.777486] ? i915_memcpy_init_early+0x63/0x63 [i915] [ 225.777684] apply_to_page_range+0x21/0x27 [ 225.777694] ? i915_memcpy_init_early+0x63/0x63 [i915] [ 225.777870] remap_io_mapping+0x49/0x75 [i915] [ 225.778046] ? i915_memcpy_init_early+0x63/0x63 [i915] [ 225.778220] ? mutex_unlock+0xb/0xd [ 225.778231] ? i915_vma_pin_fence+0x6d/0xf7 [i915] [ 225.778420] vm_fault_gtt+0x2a9/0x8f1 [i915] [ 225.778644] ? lock_is_held_type+0x56/0xe7 [ 225.778655] ? lock_is_held_type+0x7a/0xe7 [ 225.778663] ? 0xc1000000 [ 225.778670] __do_fault+0x21/0x6a [ 225.778679] handle_mm_fault+0x708/0xb21 [ 225.778686] ? mt_find+0x21e/0x5ae [ 225.778696] exc_page_fault+0x185/0x705 [ 225.778704] ? doublefault_shim+0x127/0x127 [ 225.778715] handle_exception+0x130/0x130 [ 225.778723] EIP: 0xb700468a Recently pud_huge() got aware of non-present entry by commit 3a194f3 ("mm/hugetlb: make pud_huge() and follow_huge_pud() aware of non-present pud entry") to handle some special states of gigantic page. However, it's overlooked that pud_none() always returns false when running with 2-level paging, and as a result pud_huge() can return true pointlessly. Introduce "#if CONFIG_PGTABLE_LEVELS > 2" to pud_huge() to deal with this. Link: https://lkml.kernel.org/r/[email protected] Fixes: 3a194f3 ("mm/hugetlb: make pud_huge() and follow_huge_pud() aware of non-present pud entry") Signed-off-by: Naoya Horiguchi <[email protected]> Reported-by: Ville Syrjälä <[email protected]> Tested-by: Ville Syrjälä <[email protected]> Reviewed-by: Miaohe Lin <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Liu Shixin <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Muchun Song <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Yang Shi <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Dave Hansen <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>

The inline assembly for arm64's cmpxchg_double*() implementations use a +Q constraint to hazard against other accesses to the memory location being exchanged. However, the pointer passed to the constraint is a pointer to unsigned long, and thus the hazard only applies to the first 8 bytes of the location. GCC can take advantage of this, assuming that other portions of the location are unchanged, leading to a number of potential problems. This is similar to what we fixed back in commit: fee960b ("arm64: xchg: hazard against entire exchange variable") ... but we forgot to adjust cmpxchg_double*() similarly at the same time. The same problem applies, as demonstrated with the following test: | struct big { | u64 lo, hi; | } __aligned(128); | | unsigned long foo(struct big *b) | { | u64 hi_old, hi_new; | | hi_old = b->hi; | cmpxchg_double_local(&b->lo, &b->hi, 0x12, 0x34, 0x56, 0x78); | hi_new = b->hi; | | return hi_old ^ hi_new; | } ... which GCC 12.1.0 compiles as: | 0000000000000000 <foo>: | 0: d503233f paciasp | 4: aa0003e4 mov x4, x0 | 8: 1400000e b 40 <foo+0x40> | c: d2800240 mov x0, #0x12 // #18 | 10: d2800681 mov x1, #0x34 // #52 | 14: aa0003e5 mov x5, x0 | 18: aa0103e6 mov x6, x1 | 1c: d2800ac2 mov x2, #0x56 // #86 | 20: d2800f03 mov x3, #0x78 // #120 | 24: 48207c82 casp x0, x1, x2, x3, [x4] | 28: ca050000 eor x0, x0, x5 | 2c: ca060021 eor x1, x1, x6 | 30: aa010000 orr x0, x0, x1 | 34: d2800000 mov x0, #0x0 // #0 <--- BANG | 38: d50323bf autiasp | 3c: d65f03c0 ret | 40: d2800240 mov x0, #0x12 // #18 | 44: d2800681 mov x1, #0x34 // #52 | 48: d2800ac2 mov x2, #0x56 // #86 | 4c: d2800f03 mov x3, #0x78 // #120 | 50: f9800091 prfm pstl1strm, [x4] | 54: c87f1885 ldxp x5, x6, [x4] | 58: ca0000a5 eor x5, x5, x0 | 5c: ca0100c6 eor x6, x6, x1 | 60: aa0600a6 orr x6, x5, x6 | 64: b5000066 cbnz x6, 70 <foo+0x70> | 68: c8250c82 stxp w5, x2, x3, [x4] | 6c: 35ffff45 cbnz w5, 54 <foo+0x54> | 70: d2800000 mov x0, #0x0 // #0 <--- BANG | 74: d50323bf autiasp | 78: d65f03c0 ret Notice that at the lines with "BANG" comments, GCC has assumed that the higher 8 bytes are unchanged by the cmpxchg_double() call, and that `hi_old ^ hi_new` can be reduced to a constant zero, for both LSE and LL/SC versions of cmpxchg_double(). This patch fixes the issue by passing a pointer to __uint128_t into the +Q constraint, ensuring that the compiler hazards against the entire 16 bytes being modified. With this change, GCC 12.1.0 compiles the above test as: | 0000000000000000 <foo>: | 0: f9400407 ldr x7, [x0, #8] | 4: d503233f paciasp | 8: aa0003e4 mov x4, x0 | c: 1400000f b 48 <foo+0x48> | 10: d2800240 mov x0, #0x12 // #18 | 14: d2800681 mov x1, #0x34 // #52 | 18: aa0003e5 mov x5, x0 | 1c: aa0103e6 mov x6, x1 | 20: d2800ac2 mov x2, #0x56 // #86 | 24: d2800f03 mov x3, #0x78 // #120 | 28: 48207c82 casp x0, x1, x2, x3, [x4] | 2c: ca050000 eor x0, x0, x5 | 30: ca060021 eor x1, x1, x6 | 34: aa010000 orr x0, x0, x1 | 38: f9400480 ldr x0, [x4, #8] | 3c: d50323bf autiasp | 40: ca0000e0 eor x0, x7, x0 | 44: d65f03c0 ret | 48: d2800240 mov x0, #0x12 // #18 | 4c: d2800681 mov x1, #0x34 // #52 | 50: d2800ac2 mov x2, #0x56 // #86 | 54: d2800f03 mov x3, #0x78 // #120 | 58: f9800091 prfm pstl1strm, [x4] | 5c: c87f1885 ldxp x5, x6, [x4] | 60: ca0000a5 eor x5, x5, x0 | 64: ca0100c6 eor x6, x6, x1 | 68: aa0600a6 orr x6, x5, x6 | 6c: b5000066 cbnz x6, 78 <foo+0x78> | 70: c8250c82 stxp w5, x2, x3, [x4] | 74: 35ffff45 cbnz w5, 5c <foo+0x5c> | 78: f9400480 ldr x0, [x4, #8] | 7c: d50323bf autiasp | 80: ca0000e0 eor x0, x7, x0 | 84: d65f03c0 ret ... sampling the high 8 bytes before and after the cmpxchg, and performing an EOR, as we'd expect. For backporting, I've tested this atop linux-4.9.y with GCC 5.5.0. Note that linux-4.9.y is oldest currently supported stable release, and mandates GCC 5.1+. Unfortunately I couldn't get a GCC 5.1 binary to run on my machines due to library incompatibilities. I've also used a standalone test to check that we can use a __uint128_t pointer in a +Q constraint at least as far back as GCC 4.8.5 and LLVM 3.9.1. Fixes: 5284e1b ("arm64: xchg: Implement cmpxchg_double") Fixes: e9a4b79 ("arm64: cmpxchg_dbl: patch in lse instructions when supported by the CPU") Reported-by: Boqun Feng <[email protected]> Link: https://lore.kernel.org/lkml/Y6DEfQXymYVgL3oJ@boqun-archlinux/ Reported-by: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/lkml/[email protected]/ Signed-off-by: Mark Rutland <[email protected]> Cc: [email protected] Cc: Arnd Bergmann <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Steve Capper <[email protected]> Cc: Will Deacon <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>

@2

Recent additions in BPF like cpu v4 instructions, test_bpf module exhibits the following failures: test_bpf: #82 ALU_MOVSX | BPF_B jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times) test_bpf: #83 ALU_MOVSX | BPF_H jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times) test_bpf: #84 ALU64_MOVSX | BPF_B jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times) test_bpf: #85 ALU64_MOVSX | BPF_H jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times) test_bpf: #86 ALU64_MOVSX | BPF_W jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times) test_bpf: #165 ALU_SDIV_X: -6 / 2 = -3 jited:1 ret 2147483645 != -3 (0x7ffffffd != 0xfffffffd)FAIL (1 times) test_bpf: #166 ALU_SDIV_K: -6 / 2 = -3 jited:1 ret 2147483645 != -3 (0x7ffffffd != 0xfffffffd)FAIL (1 times) test_bpf: #169 ALU_SMOD_X: -7 % 2 = -1 jited:1 ret 1 != -1 (0x1 != 0xffffffff)FAIL (1 times) test_bpf: #170 ALU_SMOD_K: -7 % 2 = -1 jited:1 ret 1 != -1 (0x1 != 0xffffffff)FAIL (1 times) test_bpf: #172 ALU64_SMOD_K: -7 % 2 = -1 jited:1 ret 1 != -1 (0x1 != 0xffffffff)FAIL (1 times) test_bpf: #313 BSWAP 16: 0x0123456789abcdef -> 0xefcd eBPF filter opcode 00d7 (@2) unsupported jited:0 301 PASS test_bpf: #314 BSWAP 32: 0x0123456789abcdef -> 0xefcdab89 eBPF filter opcode 00d7 (@2) unsupported jited:0 555 PASS test_bpf: #315 BSWAP 64: 0x0123456789abcdef -> 0x67452301 eBPF filter opcode 00d7 (@2) unsupported jited:0 268 PASS test_bpf: #316 BSWAP 64: 0x0123456789abcdef >> 32 -> 0xefcdab89 eBPF filter opcode 00d7 (@2) unsupported jited:0 269 PASS test_bpf: #317 BSWAP 16: 0xfedcba9876543210 -> 0x1032 eBPF filter opcode 00d7 (@2) unsupported jited:0 460 PASS test_bpf: #318 BSWAP 32: 0xfedcba9876543210 -> 0x10325476 eBPF filter opcode 00d7 (@2) unsupported jited:0 320 PASS test_bpf: #319 BSWAP 64: 0xfedcba9876543210 -> 0x98badcfe eBPF filter opcode 00d7 (@2) unsupported jited:0 222 PASS test_bpf: #320 BSWAP 64: 0xfedcba9876543210 >> 32 -> 0x10325476 eBPF filter opcode 00d7 (@2) unsupported jited:0 273 PASS test_bpf: #344 BPF_LDX_MEMSX | BPF_B eBPF filter opcode 0091 (@5) unsupported jited:0 432 PASS test_bpf: #345 BPF_LDX_MEMSX | BPF_H eBPF filter opcode 0089 (@5) unsupported jited:0 381 PASS test_bpf: #346 BPF_LDX_MEMSX | BPF_W eBPF filter opcode 0081 (@5) unsupported jited:0 505 PASS test_bpf: #490 JMP32_JA: Unconditional jump: if (true) return 1 eBPF filter opcode 0006 (@1) unsupported jited:0 261 PASS test_bpf: Summary: 1040 PASSED, 10 FAILED, [924/1038 JIT'ed] Fix them by adding missing processing. Fixes: daabb2b ("bpf/tests: add tests for cpuv4 instructions") Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://msgid.link/91de862dda99d170697eb79ffb478678af7e0b27.1709652689.git.christophe.leroy@csgroup.eu

[BUG] When testing with COW fixup marked as BUG_ON() (this is involved with the new pin_user_pages*() change, which should not result new out-of-band dirty pages), I hit a crash triggered by the BUG_ON() from hitting COW fixup path. This BUG_ON() happens just after a failed btrfs_run_delalloc_range(): BTRFS error (device dm-2): failed to run delalloc range, root 348 ino 405 folio 65536 submit_bitmap 6-15 start 90112 len 106496: -28 ------------[ cut here ]------------ kernel BUG at fs/btrfs/extent_io.c:1444! Internal error: Oops - BUG: 00000000f2000800 [#1] SMP CPU: 0 UID: 0 PID: 434621 Comm: kworker/u24:8 Tainted: G OE 6.12.0-rc7-custom+ #86 Hardware name: QEMU KVM Virtual Machine, BIOS unknown 2/2/2022 Workqueue: events_unbound btrfs_async_reclaim_data_space [btrfs] pc : extent_writepage_io+0x2d4/0x308 [btrfs] lr : extent_writepage_io+0x2d4/0x308 [btrfs] Call trace: extent_writepage_io+0x2d4/0x308 [btrfs] extent_writepage+0x218/0x330 [btrfs] extent_write_cache_pages+0x1d4/0x4b0 [btrfs] btrfs_writepages+0x94/0x150 [btrfs] do_writepages+0x74/0x190 filemap_fdatawrite_wbc+0x88/0xc8 start_delalloc_inodes+0x180/0x3b0 [btrfs] btrfs_start_delalloc_roots+0x174/0x280 [btrfs] shrink_delalloc+0x114/0x280 [btrfs] flush_space+0x250/0x2f8 [btrfs] btrfs_async_reclaim_data_space+0x180/0x228 [btrfs] process_one_work+0x164/0x408 worker_thread+0x25c/0x388 kthread+0x100/0x118 ret_from_fork+0x10/0x20 Code: aa1403e1 9402f3ef aa1403e0 9402f36f (d4210000) ---[ end trace 0000000000000000 ]--- [CAUSE] That failure is mostly from cow_file_range(), where we can hit -ENOSPC. Although the -ENOSPC is already a bug related to our space reservation code, let's just focus on the error handling. For example, we have the following dirty range [0, 64K) of an inode, with 4K sector size and 4K page size: 0 16K 32K 48K 64K |///////////////////////////////////////| |#######################################| Where |///| means page are still dirty, and |###| means the extent io tree has EXTENT_DELALLOC flag. - Enter extent_writepage() for page 0 - Enter btrfs_run_delalloc_range() for range [0, 64K) - Enter cow_file_range() for range [0, 64K) - Function btrfs_reserve_extent() only reserved one 16K extent So we created extent map and ordered extent for range [0, 16K) 0 16K 32K 48K 64K |////////|//////////////////////////////| |<- OE ->|##############################| And range [0, 16K) has its delalloc flag cleared. But since we haven't yet submit any bio, involved 4 pages are still dirty. - Function btrfs_reserve_extent() returns with -ENOSPC Now we have to run error cleanup, which will clear all EXTENT_DELALLOC* flags and clear the dirty flags for the remaining ranges: 0 16K 32K 48K 64K |////////| | | | | Note that range [0, 16K) still has its pages dirty. - Some time later, writeback is triggered again for the range [0, 16K) since the page range still has dirty flags. - btrfs_run_delalloc_range() will do nothing because there is no EXTENT_DELALLOC flag. - extent_writepage_io() finds page 0 has no ordered flag Which falls into the COW fixup path, triggering the BUG_ON(). Unfortunately this error handling bug dates back to the introduction of btrfs. Thankfully with the abuse of COW fixup, at least it won't crash the kernel. [FIX] Instead of immediately unlocking the extent and folios, we keep the extent and folios locked until either erroring out or the whole delalloc range finished. When the whole delalloc range finished without error, we just unlock the whole range with PAGE_SET_ORDERED (and PAGE_UNLOCK for !keep_locked cases), with EXTENT_DELALLOC and EXTENT_LOCKED cleared. And the involved folios will be properly submitted, with their dirty flags cleared during submission. For the error path, it will be a little more complex: - The range with ordered extent allocated (range (1)) We only clear the EXTENT_DELALLOC and EXTENT_LOCKED, as the remaining flags are cleaned up by btrfs_mark_ordered_io_finished()->btrfs_finish_one_ordered(). For folios we finish the IO (clear dirty, start writeback and immediately finish the writeback) and unlock the folios. - The range with reserved extent but no ordered extent (range(2)) - The range we never touched (range(3)) For both range (2) and range(3) the behavior is not changed. Now even if cow_file_range() failed halfway with some successfully reserved extents/ordered extents, we will keep all folios clean, so there will be no future writeback triggered on them. CC: [email protected] Reviewed-by: Boris Burkov <[email protected]> Signed-off-by: Qu Wenruo <[email protected]> Signed-off-by: David Sterba <[email protected]>

matttbe added the bug label Sep 7, 2020

matttbe closed this as completed Sep 7, 2020

matttbe assigned pabeni Sep 7, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

selftests: mptcp_join tests are creating warnings #86

selftests: mptcp_join tests are creating warnings #86

matttbe commented Sep 7, 2020 •

edited

Loading

matttbe commented Sep 7, 2020

selftests: mptcp_join tests are creating warnings #86

selftests: mptcp_join tests are creating warnings #86

Comments

matttbe commented Sep 7, 2020 • edited Loading

matttbe commented Sep 7, 2020

matttbe commented Sep 7, 2020 •

edited

Loading