Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
mm/memory-failure.c: shift page lock from head page to tail page afte…
…r thp split After thp split in hwpoison_user_mappings(), we hold page lock on the raw error page only between try_to_unmap, hence we are in danger of race condition. I found in the RHEL7 MCE-relay testing that we have "bad page" error when a memory error happens on a thp tail page used by qemu-kvm: Triggering MCE exception on CPU 10 mce: [Hardware Error]: Machine check events logged MCE exception done on CPU 10 MCE 0x38c535: Killing qemu-kvm:8418 due to hardware memory corruption MCE 0x38c535: dirty LRU page recovery: Recovered qemu-kvm[8418]: segfault at 20 ip 00007ffb0f0f229a sp 00007fffd6bc5240 error 4 in qemu-kvm[7ffb0ef14000+420000] BUG: Bad page state in process qemu-kvm pfn:38c400 page:ffffea000e310000 count:0 mapcount:0 mapping: (null) index:0x7ffae3c00 page flags: 0x2fffff0008001d(locked|referenced|uptodate|dirty|swapbacked) Modules linked in: hwpoison_inject mce_inject vhost_net macvtap macvlan ... CPU: 0 PID: 8418 Comm: qemu-kvm Tainted: G M -------------- 3.10.0-54.0.1.el7.mce_test_fixed.x86_64 #1 Hardware name: NEC NEC Express5800/R120b-1 [N8100-1719F]/MS-91E7-001, BIOS 4.6.3C19 02/10/2011 000fffff00000000 ffff8802fc9239a0 ffffffff815b4cc0 ffff8802fc9239b8 ffffffff815b072e 0000000000000000 ffff8802fc9239f8 ffffffff8113b918 ffffea000e310000 ffffea000e310000 002fffff0008001d 0000000000000000 Call Trace: [<ffffffff815b4cc0>] dump_stack+0x19/0x1b [<ffffffff815b072e>] bad_page.part.59+0xcf/0xe8 [<ffffffff8113b918>] free_pages_prepare+0x148/0x160 [<ffffffff8113c231>] free_hot_cold_page+0x31/0x140 [<ffffffff8113c386>] free_hot_cold_page_list+0x46/0xa0 [<ffffffff81141361>] release_pages+0x1c1/0x200 [<ffffffff8116e47d>] free_pages_and_swap_cache+0xad/0xd0 [<ffffffff8115850c>] tlb_flush_mmu.part.46+0x4c/0x90 [<ffffffff81159045>] tlb_finish_mmu+0x55/0x60 [<ffffffff81163e6b>] exit_mmap+0xcb/0x170 [<ffffffff81055f87>] mmput+0x67/0xf0 [<ffffffffa05c7451>] vhost_dev_cleanup+0x231/0x260 [vhost_net] [<ffffffffa05ca0df>] vhost_net_release+0x3f/0x90 [vhost_net] [<ffffffff8119f649>] __fput+0xe9/0x270 [<ffffffff8119f8fe>] ____fput+0xe/0x10 [<ffffffff8107b754>] task_work_run+0xc4/0xe0 [<ffffffff8105e88b>] do_exit+0x2bb/0xa40 [<ffffffff8106b0cc>] ? __dequeue_signal+0x13c/0x220 [<ffffffff8105f08f>] do_group_exit+0x3f/0xa0 [<ffffffff8106dcc0>] get_signal_to_deliver+0x1d0/0x6e0 [<ffffffff81012408>] do_signal+0x48/0x5e0 [<ffffffff8106ee48>] ? do_sigaction+0x88/0x1f0 [<ffffffff81012a11>] do_notify_resume+0x71/0xc0 [<ffffffff815bc53c>] retint_signal+0x48/0x8c The reason of this bug is that a page fault happens before unlocking the head page at the end of memory_failure(). This strange page fault is trying to access to address 0x20 and I'm not sure why qemu-kvm does this, but anyway as a result the SIGSEGV makes qemu-kvm exit and on the way we catch the bad page bug/warning because we try to free a locked page (which was the former head page.) To fix this, this patch suggests to shift page lock from head page to tail page just after thp split. SIGSEGV still happens, but it affects only error affected VMs, not a whole system. Signed-off-by: Naoya Horiguchi <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Wanpeng Li <[email protected]> Cc: <[email protected]> [3.9+] # a3e0f9e "mm/memory-failure.c: transfer page count from head page to tail page after split thp" Signed-off-by: Andrew Morton <[email protected]>
- Loading branch information