Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update From Base #11

Merged
merged 84 commits into from
Jan 29, 2018
Merged

Update From Base #11

merged 84 commits into from
Jan 29, 2018

Conversation

Esteban-Rocha
Copy link
Owner

No description provided.

=?UTF-8?q?Christian=20K=C3=B6nig?= and others added 30 commits January 16, 2018 11:45
Reenable the 64-bit window during resume.

Fixes: fa564ad ("x86/PCI: Enable a 64bit BAR on AMD Family 15h (Models 00-1f, 30-3f, 60-7f)")
Reported-by: Tom St Denis <[email protected]>
Signed-off-by: Christian König <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
Current code configures the hardware with a new SA before the state has been
fully initialized. During this time interval, an incoming ESP packet can cause
a crash due to a NULL dereference. More specifically, xfrm_input() considers
the packet as valid, and yet, anti-replay mechanism is not initialized.

Move hardware configuration to the end of xfrm_state_construct(), and mark
the state as valid once the SA is fully initialized.

Fixes: d77e38e ("xfrm: Add an IPsec hardware offloading API")
Signed-off-by: Aviad Yehezkel <[email protected]>
Signed-off-by: Aviv Heller <[email protected]>
Signed-off-by: Yossi Kuperman <[email protected]>
Signed-off-by: Steffen Klassert <[email protected]>
If the frame samples from a render target that was just written, its
cache flush during the binning step may have occurred before the
previous frame's RCL was completed.  Flush the texture caches again
before starting each RCL job to make sure that the sampling of the
previous RCL's output is correct.

Fixes flickering in the top left of 3DMMES Taiji.

Signed-off-by: Eric Anholt <[email protected]>
Fixes: ca26d28 ("drm/vc4: improve throughput by pipelining binning and rendering jobs")
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Reviewed-by: Boris Brezillon <[email protected]>
When saving BOs in the hang state we skip one entry of the
kernel_state->bo[] array, thus leaving it to NULL. This leads to a NULL
pointer dereference when, later in this function, we iterate over all
BOs to check their ->madv state.

Fixes: ca26d28 ("drm/vc4: improve throughput by pipelining binning and rendering jobs")
Cc: <[email protected]>
Signed-off-by: Boris Brezillon <[email protected]>
Signed-off-by: Eric Anholt <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
If add state fails in case of device offload, netdev refcount
will be negative since gc task is attempting to dev_free this state.
This is fixed by putting NULL in state dev field.

Signed-off-by: Aviad Yehezkel <[email protected]>
Signed-off-by: Boris Pismeny <[email protected]>
Signed-off-by: Steffen Klassert <[email protected]>
Replace the original license statement with the SPDX identifier.

Update also the copyright owner adding myself as co-owner of the
copyright.

Signed-off-by: Andi Shyti <[email protected]>
Signed-off-by: Dmitry Torokhov <[email protected]>
Adds support for the current lineup of Xbox One controllers from PDP
(Performance Designed Products). These controllers are very picky with
their initialization sequence and require an additional 2 packets before
they send any input reports.

Signed-off-by: Mark Furneaux <[email protected]>
Reviewed-by: Cameron Gutman <[email protected]>
Cc: [email protected]
Signed-off-by: Dmitry Torokhov <[email protected]>
Lenovo introduced trackpoint compatible sticks with minimum PS/2 commands.
They supposed to reply with 0x02, 0x03, or 0x04 in response to the
"Read Extended ID" command, so we would know not to try certain extended
commands. Unfortunately even some trackpoints reporting the original IBM
version (0x01 firmware 0x0e) now respond with incorrect data to the "Get
Extended Buttons" command:

 thinkpad_acpi: ThinkPad BIOS R0DET87W (1.87 ), EC unknown
 thinkpad_acpi: Lenovo ThinkPad E470, model 20H1004SGE

 psmouse serio2: trackpoint: IBM TrackPoint firmware: 0x0e, buttons: 0/0

Since there are no trackpoints without buttons, let's assume the trackpoint
has 3 buttons when we get 0 response to the extended buttons query.

Signed-off-by: Aaron Ma <[email protected]>
Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=196253
Cc: [email protected]
Signed-off-by: Dmitry Torokhov <[email protected]>
… NXP

The newer trackpoints from ALPS, Elan and NXP implement a very limited
subset of extended commands and controls that the original trackpoints
implemented, so we should not be exposing not working controls in sysfs.
The newer trackpoints also do not implement "Power On Reset" or "Read
Extended Button Status", so we should not be using these commands during
initialization.

While we are at it, let's change "unsigned char" to u8 for byte data or
bool for booleans and use better suited error codes instead of -1.

Cc: [email protected]
Signed-off-by: Dmitry Torokhov <[email protected]>
IPSec tunnel mode supports encapsulation of IPv4 over IPv6 and vice-versa.

The outer IP header is stripped and the inner IP inherits the original
Ethernet header. Tcpdump fails to properly decode the inner packet in
case that h_proto is different than the inner IP version.

Fix h_proto to reflect the inner IP version.

Signed-off-by: Yossi Kuperman <[email protected]>
Signed-off-by: Steffen Klassert <[email protected]>
Assign true or false to boolean variables instead of an integer value.

This issue was detected with the help of Coccinelle.

Fixes: ffdb521 ("xfrm: Auto-load xfrm offload modules")
Signed-off-by: Gustavo A. R. Silva <[email protected]>
Signed-off-by: Steffen Klassert <[email protected]>
Steven Rostedt discovered that the ftrace stack tracer is broken when
it's used with the ORC unwinder.  The problem is that objtool is
instructed by the Makefile to ignore the ftrace_64.S code, so it doesn't
generate any ORC data for it.

Fix it by making the asm code objtool-friendly:

- Objtool doesn't like the fact that save_mcount_regs pushes RBP at the
  beginning, but it's never restored (directly, at least).  So just skip
  the original RBP push, which is only needed for frame pointers anyway.

- Annotate some functions as normal callable functions with
  ENTRY/ENDPROC.

- Add an empty unwind hint to return_to_handler().  The return address
  isn't on the stack, so there's nothing ORC can do there.  It will just
  punt in the unlikely case it tries to unwind from that code.

With all that fixed, remove the OBJECT_FILES_NON_STANDARD Makefile
annotation so objtool can read the file.

Link: http://lkml.kernel.org/r/20180123040746.ih4ep3tk4pbjvg7c@treble

Reported-by: Steven Rostedt <[email protected]>
Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Steven Rostedt (VMware) <[email protected]>
…ernel/git/helgaas/pci

Pull PCI fix from Bjorn Helgaas:
 "Fix AMD regression due to not re-enabling the big window on resume
  (Christian König)"

* tag 'pci-v4.15-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  x86/PCI: Enable AMD 64-bit window on resume
The function tracer can create a dynamically allocated trampoline that is
called by the function mcount or fentry hook that is used to call the
function callback that is registered. The problem is that the orc undwinder
will bail if it encounters one of these trampolines. This breaks the stack
trace of function callbacks, which include the stack tracer and setting the
stack trace for individual functions.

Since these dynamic trampolines are basically copies of the static ftrace
trampolines defined in ftrace_*.S, we do not need to create new orc entries
for the dynamic trampolines. Finding the return address on the stack will be
identical as the functions that were copied to create the dynamic
trampolines. When encountering a ftrace dynamic trampoline, we can just use
the orc entry of the ftrace static function that was copied for that
trampoline.

Signed-off-by: Steven Rostedt (VMware) <[email protected]>
With the addition of ORC unwinder and FRAME POINTER unwinder, the stack
trace skipping requirements have changed.

I went through the tracing stack trace dumps with ORC and with frame
pointers and recalculated the proper values.

Signed-off-by: Steven Rostedt (VMware) <[email protected]>
In pppoe_sendmsg(), reserving dev->hard_header_len bytes of headroom
was probably fine before the introduction of ->needed_headroom in
commit f5184d2 ("net: Allow netdevices to specify needed head/tailroom").

But now, virtual devices typically advertise the size of their overhead
in dev->needed_headroom, so we must also take it into account in
skb_reserve().
Allocation size of skb is also updated to take dev->needed_tailroom
into account and replace the arbitrary 32 bytes with the real size of
a PPPoE header.

This issue was discovered by syzbot, who connected a pppoe socket to a
gre device which had dev->header_ops->create == ipgre_header and
dev->hard_header_len == 0. Therefore, PPPoE didn't reserve any
headroom, and dev_hard_header() crashed when ipgre_header() tried to
prepend its header to skb->data.

skbuff: skb_under_panic: text:000000001d390b3a len:31 put:24
head:00000000d8ed776f data:000000008150e823 tail:0x7 end:0xc0 dev:gre0
------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:104!
invalid opcode: 0000 [#1] SMP KASAN
Dumping ftrace buffer:
    (ftrace buffer empty)
Modules linked in:
CPU: 1 PID: 3670 Comm: syzkaller801466 Not tainted
4.15.0-rc7-next-20180115+ #97
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
RIP: 0010:skb_panic+0x162/0x1f0 net/core/skbuff.c:100
RSP: 0018:ffff8801d9bd7840 EFLAGS: 00010282
RAX: 0000000000000083 RBX: ffff8801d4f083c0 RCX: 0000000000000000
RDX: 0000000000000083 RSI: 1ffff1003b37ae92 RDI: ffffed003b37aefc
RBP: ffff8801d9bd78a8 R08: 1ffff1003b37ae8a R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000000 R12: ffffffff86200de0
R13: ffffffff84a981ad R14: 0000000000000018 R15: ffff8801d2d34180
FS:  00000000019c4880(0000) GS:ffff8801db300000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000208bc000 CR3: 00000001d9111001 CR4: 00000000001606e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
  skb_under_panic net/core/skbuff.c:114 [inline]
  skb_push+0xce/0xf0 net/core/skbuff.c:1714
  ipgre_header+0x6d/0x4e0 net/ipv4/ip_gre.c:879
  dev_hard_header include/linux/netdevice.h:2723 [inline]
  pppoe_sendmsg+0x58e/0x8b0 drivers/net/ppp/pppoe.c:890
  sock_sendmsg_nosec net/socket.c:630 [inline]
  sock_sendmsg+0xca/0x110 net/socket.c:640
  sock_write_iter+0x31a/0x5d0 net/socket.c:909
  call_write_iter include/linux/fs.h:1775 [inline]
  do_iter_readv_writev+0x525/0x7f0 fs/read_write.c:653
  do_iter_write+0x154/0x540 fs/read_write.c:932
  vfs_writev+0x18a/0x340 fs/read_write.c:977
  do_writev+0xfc/0x2a0 fs/read_write.c:1012
  SYSC_writev fs/read_write.c:1085 [inline]
  SyS_writev+0x27/0x30 fs/read_write.c:1082
  entry_SYSCALL_64_fastpath+0x29/0xa0

Admittedly PPPoE shouldn't be allowed to run on non Ethernet-like
interfaces, but reserving space for ->needed_headroom is a more
fundamental issue that needs to be addressed first.

Same problem exists for __pppoe_xmit(), which also needs to take
dev->needed_headroom into account in skb_cow_head().

Fixes: f5184d2 ("net: Allow netdevices to specify needed head/tailroom")
Reported-by: syzbot+ed0838d0fa4c4f2b528e20286e6dc63effc7c14d@syzkaller.appspotmail.com
Signed-off-by: Guillaume Nault <[email protected]>
Reviewed-by: Xin Long <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Commit 513674b ("net: reevalulate autoflowlabel setting after
sysctl setting") removed the initialisation of
ipv6_pinfo::autoflowlabel and added a second flag to indicate
whether this field or the net namespace default should be used.

The getsockopt() handling for this case was not updated, so it
currently returns 0 for all sockets for which IPV6_AUTOFLOWLABEL is
not explicitly enabled.  Fix it to return the effective value, whether
that has been set at the socket or net namespace level.

Fixes: 513674b ("net: reevalulate autoflowlabel setting after sysctl ...")
Signed-off-by: Ben Hutchings <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
with the introduction of commit
b0eb57c, it appears that rq->buf_info
is improperly handled.  While it is heap allocated when an rx queue is
setup, and freed when torn down, an old line of code in
vmxnet3_rq_destroy was not properly removed, leading to rq->buf_info[0]
being set to NULL prior to its being freed, causing a memory leak, which
eventually exhausts the system on repeated create/destroy operations
(for example, when  the mtu of a vmxnet3 interface is changed
frequently.

Fix is pretty straight forward, just move the NULL set to after the
free.

Tested by myself with successful results

Applies to net, and should likely be queued for stable, please

Signed-off-by: Neil Horman <[email protected]>
Reported-By: [email protected]
CC: [email protected]
CC: Shrikrishna Khare <[email protected]>
CC: "VMware, Inc." <[email protected]>
CC: David S. Miller <[email protected]>
Acked-by: Shrikrishna Khare <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Both Geert and DaveJ reported that the recent futex commit:

  c1e2f0e ("futex: Avoid violating the 10th rule of futex")

introduced a problem with setting OWNER_DEAD. We set the bit on an
uninitialized variable and then entirely optimize it away as a
dead-store.

Move the setting of the bit to where it is more useful.

Reported-by: Geert Uytterhoeven <[email protected]>
Reported-by: Dave Jones <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Fixes: c1e2f0e ("futex: Avoid violating the 10th rule of futex")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
debug_show_all_locks() iterates all tasks and print held locks whole
holding tasklist_lock.  This can take a while on a slow console device
and may end up triggering NMI hardlockup detector if someone else ends
up waiting for tasklist_lock.

Touch the NMI watchdog while printing the held locks to avoid
spuriously triggering the hardlockup detector.

Signed-off-by: Tejun Heo <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
Tejun reported the following cpu-hotplug lock (percpu-rwsem) read recursion:

  tg_set_cfs_bandwidth()
    get_online_cpus()
      cpus_read_lock()

    cfs_bandwidth_usage_inc()
      static_key_slow_inc()
        cpus_read_lock()

Reported-by: Tejun Heo <[email protected]>
Tested-by: Tejun Heo <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/20180122215328.GP3397@worktop
Signed-off-by: Ingo Molnar <[email protected]>
It doesn't make sense to have an indirect call thunk with esp/rsp as
retpoline code won't work correctly with the stack pointer register.
Removing it will help compiler writers to catch error in case such
a thunk call is emitted incorrectly.

Fixes: 76b0438 ("x86/retpoline: Add initial retpoline support")
Suggested-by: Jeff Law <[email protected]>
Signed-off-by: Waiman Long <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Acked-by: David Woodhouse <[email protected]>
Cc: Tom Lendacky <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Tim Chen <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Jiri Kosina <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Arjan van de Ven <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Paul Turner <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
The AMD power module can be loaded on non AMD platforms, but unload fails
with the following Oops:

 BUG: unable to handle kernel NULL pointer dereference at           (null)
 IP: __list_del_entry_valid+0x29/0x90
 Call Trace:
  perf_pmu_unregister+0x25/0xf0
  amd_power_pmu_exit+0x1c/0xd23 [power]
  SyS_delete_module+0x1a8/0x2b0
  ? exit_to_usermode_loop+0x8f/0xb0
  entry_SYSCALL_64_fastpath+0x20/0x83

Return -ENODEV instead of 0 from the module init function if the CPU does
not match.

Fixes: c7ab62b ("perf/x86/amd/power: Add AMD accumulated power reporting mechanism")
Signed-off-by: Xiao Liang <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
Commit b94b737 ("x86/microcode/intel: Extend BDW late-loading with a
revision check") reduced the impact of erratum BDF90 for Broadwell model
79.

The impact can be reduced further by checking the size of the last level
cache portion per core.

Tony: "The erratum says the problem only occurs on the large-cache SKUs.
So we only need to avoid the update if we are on a big cache SKU that is
also running old microcode."

For more details, see erratum BDF90 in document #334165 (Intel Xeon
Processor E7-8800/4800 v4 Product Family Specification Update) from
September 2017.

Fixes: b94b737 ("x86/microcode/intel: Extend BDW late-loading with a revision check")
Signed-off-by: Jia Zhang <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Acked-by: Tony Luck <[email protected]>
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
Commit 24c2503 ("x86/microcode: Do not access the initrd after it has
been freed") fixed attempts to access initrd from the microcode loader
after it has been freed. However, a similar KASAN warning was reported
(stack trace edited):

  smpboot: Booting Node 0 Processor 1 APIC 0x11
  ==================================================================
  BUG: KASAN: use-after-free in find_cpio_data+0x9b5/0xa50
  Read of size 1 at addr ffff880035ffd000 by task swapper/1/0

  CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.14.8-slack #7
  Hardware name: System manufacturer System Product Name/A88X-PLUS, BIOS 3003 03/10/2016
  Call Trace:
   dump_stack
   print_address_description
   kasan_report
   ? find_cpio_data
   __asan_report_load1_noabort
   find_cpio_data
   find_microcode_in_initrd
   __load_ucode_amd
   load_ucode_amd_ap
      load_ucode_ap

After some investigation, it turned out that a merge was done using the
wrong side to resolve, leading to picking up the previous state, before
the 24c2503 fix. Therefore the Fixes tag below contains a merge
commit.

Revert the mismerge by catching the save_microcode_in_initrd_amd()
retval and thus letting the function exit with the last return statement
so that initrd_gone can be set to true.

Fixes: f26483e ("Merge branch 'x86/urgent' into x86/microcode, to resolve conflicts")
Reported-by: <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: [email protected]
Link: https://bugzilla.kernel.org/show_bug.cgi?id=198295
Link: https://lkml.kernel.org/r/[email protected]
Some parts of the cmma migration bitmap is already protected
with the kvm->lock (e.g. the migration start). On the other
hand the read of the cmma bits is not protected against a
concurrent free, neither is the emulation of the ESSA instruction.
Let's extend the locking to all related ioctls by using
the slots lock for
- kvm_s390_vm_start_migration
- kvm_s390_vm_stop_migration
- kvm_s390_set_cmma_bits
- kvm_s390_get_cmma_bits

In addition to that, we use synchronize_srcu before freeing
the migration structure as all users hold kvm->srcu for read.
(e.g. the ESSA handler).

Reported-by: David Hildenbrand <[email protected]>
Signed-off-by: Christian Borntraeger <[email protected]>
Cc: [email protected] # 4.13+
Fixes: 190df4a (KVM: s390: CMMA tracking, ESSA emulation, migration mode)
Reviewed-by: Claudio Imbrenda <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Reviewed-by: Cornelia Huck <[email protected]>
…nux/kernel/git/kvms390/linux

KVM: s390: another fix for cmma migration

This fixes races and potential use after free in the
cmma migration code.
…t/klassert/ipsec

Steffen Klassert says:

====================
pull request (net): ipsec 2018-01-24

1) Only offloads SAs after they are fully initialized.
   Otherwise a NIC may receive packets on a SA we can
   not yet handle in the stack.
   From Yossi Kuperman.

2) Fix negative refcount in case of a failing offload.
   From Aviad Yehezkel.

3) Fix inner IP ptoro version when decapsulating
   from interaddress family tunnels.
   From Yossi Kuperman.

4) Use true or false for boolean variables instead of an
   integer value in xfrm_get_type_offload.
   From Gustavo A. R. Silva.
====================

Signed-off-by: David S. Miller <[email protected]>
Driver periodically samples all neighbors configured in device
in order to update the kernel regarding their state. When finding
an entry configured in HW that doesn't show in neigh_lookup()
driver logs an error message.
This introduces a race when removing multiple neighbors -
it's possible that a given entry would still be configured in HW
as its removal is still being processed but is already removed
from the kernel's neighbor tables.

Simply remove the error message and gracefully accept such events.

Fixes: c723c73 ("mlxsw: spectrum_router: Periodically update the kernel's neigh table")
Fixes: 60f040c ("mlxsw: spectrum_router: Periodically dump active IPv6 neighbours")
Signed-off-by: Yuval Mintz <[email protected]>
Reviewed-by: Ido Schimmel <[email protected]>
Signed-off-by: Jiri Pirko <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
This reverts commit 6cfb521.

Turns out distros do not want to make retpoline as part of their "ABI",
so this patch should not have been merged.  Sorry Andi, this was my
fault, I suggested it when your original patch was the "correct" way of
doing this instead.

Reported-by: Jiri Kosina <[email protected]>
Fixes: 6cfb521 ("module: Add retpoline tag to VERMAGIC")
Acked-by: Andi Kleen <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: David Woodhouse <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: stable <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
NicolasDichtel and others added 25 commits January 25, 2018 16:27
Some dst_ops (e.g. md_dst_ops)) doesn't set this handler. It may result to:
"BUG: unable to handle kernel NULL pointer dereference at           (null)"

Let's add a helper to check if update_pmtu is available before calling it.

Fixes: 52a589d ("geneve: update skb dst pmtu on tx path")
Fixes: a93bf0f ("vxlan: update skb dst pmtu on tx path")
CC: Roman Kapl <[email protected]>
CC: Xin Long <[email protected]>
Signed-off-by: Nicolas Dichtel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
For a while we've been having issues with seemingly random interrupts
coming from nvidia cards when resuming them. Originally the fix for this
was thought to be just re-arming the MSI interrupt registers right after
re-allocating our IRQs, however it seems a lot of what we do is both
wrong and not even nessecary.

This was made apparent by what appeared to be a regression in the
mainline kernel that started introducing suspend/resume issues for
nouveau:

        a0c9259 (irq/matrix: Spread interrupts on allocation)

After this commit was introduced, we started getting interrupts from the
GPU before we actually re-allocated our own IRQ (see references below)
and assigned the IRQ handler. Investigating this turned out that the
problem was not with the commit, but the fact that nouveau even
free/allocates it's irqs before and after suspend/resume.

For starters: drivers in the linux kernel haven't had to handle
freeing/re-allocating their IRQs during suspend/resume cycles for quite
a while now. Nouveau seems to be one of the few drivers left that still
does this, despite the fact there's no reason we actually need to since
disabling interrupts from the device side should be enough, as the
kernel is already smart enough to know to disable host-side interrupts
for us before going into suspend. Since we were tearing down our IRQs by
hand however, that means there was a short period during resume where
interrupts could be received before we re-allocated our IRQ which would
lead to us getting an unhandled IRQ. Since we never handle said IRQ and
re-arm the interrupt registers, this would cause us to miss all of the
interrupts from the GPU and cause our init process to start timing out
on anything requiring interrupts.

So, since this whole setup/teardown every suspend/resume cycle is
useless anyway, move irq setup/teardown into the pci subdev's ctor/dtor
functions instead so they're only called at driver load and driver
unload. This should fix most of the issues with pending interrupts on
resume, along with getting suspend/resume for nouveau to work again.

As well, this probably means we can also just remove the msi rearm call
inside nvkm_pci_init(). But since our main focus here is to fix
suspend/resume before 4.15, we'll save that for a later patch.

Signed-off-by: Lyude Paul <[email protected]>
Cc: Karol Herbst <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: [email protected]
Signed-off-by: Ben Skeggs <[email protected]>
After do_readv_writev, the inode cache is invalidated anyway, so i_size
will never be read.  It will be fetched from the server which will also
know about updates from other machines.

Fixes deadlock on 32-bit SMP.

See https://marc.info/?l=linux-fsdevel&m=151268557427760&w=2

Signed-off-by: Martin Brandenburg <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Mike Marshall <[email protected]>
Cc: [email protected]
Signed-off-by: Linus Torvalds <[email protected]>
…/git/dtor/input

Pull input fixes from Dmitry Torokhov:
 "The main item is that we try to better handle the newer trackpoints on
  Lenovo devices that are now being produced by Elan/ALPS/NXP and only
  implement a small subset of the original IBM trackpoint controls"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Revert "Input: synaptics_rmi4 - use devm_device_add_group() for attributes in F01"
  Input: trackpoint - only expose supported controls for Elan, ALPS and NXP
  Input: trackpoint - force 3 buttons if 0 button is reported
  Input: xpad - add support for PDP Xbox One controllers
  Input: stmfts,s6sy671 - add SPDX identifier
Hardware statistics retrieval hurts in tight invocation loops.

Avoid extraneous write and enforce strict ordering of writes targeted to
the tally counters dump area address registers.

Signed-off-by: Francois Romieu <[email protected]>
Tested-by: Oliver Freyermuth <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Sukumar reported that sends to the local broadcast address
(255.255.255.255) are broken. Check for the address in vrf driver
and do not redirect to the VRF device - similar to multicast
packets.

With this change sockets can use SO_BINDTODEVICE to specify an
egress interface and receive responses. Note: the egress interface
can not be a VRF device but needs to be the enslaved device.

https://bugzilla.kernel.org/show_bug.cgi?id=198521

Reported-by: Sukumar Gopalakrishnan <[email protected]>
Signed-off-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
…fixes

Single irq regression fix
* 'linux-4.15' of git://github.com/skeggsb/linux:
  drm/nouveau: Move irq setup/teardown to pci ctor/dtor
…tems

Neil Berrington reported a double-fault on a VM with 768GB of RAM that uses
large amounts of vmalloc space with PTI enabled.

The cause is that load_new_mm_cr3() was never fixed to take the 5-level pgd
folding code into account, so, on a 4-level kernel, the pgd synchronization
logic compiles away to exactly nothing.

Interestingly, the problem doesn't trigger with nopti.  I assume this is
because the kernel is mapped with global pages if we boot with nopti.  The
sequence of operations when we create a new task is that we first load its
mm while still running on the old stack (which crashes if the old stack is
unmapped in the new mm unless the TLB saves us), then we call
prepare_switch_to(), and then we switch to the new stack.
prepare_switch_to() pokes the new stack directly, which will populate the
mapping through vmalloc_fault().  I assume that we're getting lucky on
non-PTI systems -- the old stack's TLB entry stays alive long enough to
make it all the way through prepare_switch_to() and switch_to() so that we
make it to a valid stack.

Fixes: b50858c ("x86/mm/vmalloc: Add 5-level paging support")
Reported-and-tested-by: Neil Berrington <[email protected]>
Signed-off-by: Andy Lutomirski <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Konstantin Khlebnikov <[email protected]>
Cc: [email protected]
Cc: Dave Hansen <[email protected]>
Cc: Borislav Petkov <[email protected]>
Link: https://lkml.kernel.org/r/346541c56caed61abbe693d7d2742b4a380c5001.1516914529.git.luto@kernel.org
On a 5-level kernel, if a non-init mm has a top-level entry, it needs to
match init_mm's, but the vmalloc_fault() code skipped over the BUG_ON()
that would have checked it.

While we're at it, get rid of the rather confusing 4-level folded "pgd"
logic.

Cleans-up: b50858c ("x86/mm/vmalloc: Add 5-level paging support")
Signed-off-by: Andy Lutomirski <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Konstantin Khlebnikov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Neil Berrington <[email protected]>
Link: https://lkml.kernel.org/r/2ae598f8c279b0a29baf75df207e6f2fdddc0a1b.1516914529.git.luto@kernel.org
Now that we're upstream in Linux we've been able to make some
infrastructure changes so our port works a bit more like other ports.
Specifically:

* We now have a mailing list specific to the RISC-V Linux port, hosted
  at lists.infreadead.org.
* We now have a kernel.org git tree where work on our port is
  coordinated.

This patch changes the RISC-V maintainers entry to reflect these new
bits of infrastructure.

Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Palmer Dabbelt <[email protected]>
ccid2_hc_tx_rto_expire() timer callback always restarts the timer
again and can run indefinitely (unless it is stopped outside), and after
commit 120e9da ("dccp: defer ccid_hc_tx_delete() at dismantle time"),
which moved ccid_hc_tx_delete() (also includes sk_stop_timer()) from
dccp_destroy_sock() to sk_destruct(), this started to happen quite often.
The timer prevents releasing the socket, as a result, sk_destruct() won't
be called.

Found with LTP/dccp_ipsec tests running on the bonding device,
which later couldn't be unloaded after the tests were completed:

  unregister_netdevice: waiting for bond0 to become free. Usage count = 148

Fixes: 2a91aa3 ("[DCCP] CCID2: Initial CCID2 (TCP-Like) implementation")
Signed-off-by: Alexey Kodanev <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
select(2) with wfds but no rfds must return when the socket is shut down
by the peer.  This way userspace notices socket activity and gets -EPIPE
from the next write(2).

Currently select(2) does not return for virtio-vsock when a SEND+RCV
shutdown packet is received.  This is because vsock_poll() only sets
POLLOUT | POLLWRNORM for TCP_CLOSE, not the TCP_CLOSING state that the
socket is in when the shutdown is received.

Signed-off-by: Stefan Hajnoczi <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
…g/~airlied/linux

Pull drm fixes from Dave Airlie:
 "A fairly urgent nouveau regression fix for broken irqs across
  suspend/resume came in. This was broken before but a patch in 4.15 has
  made it much more obviously broken and now s/r fails a lot more often.

  The fix removes freeing the irq across s/r which never should have
  been done anyways.

  Also two vc4 fixes for a NULL deference and some misrendering /
  flickering on screen"

* tag 'drm-fixes-for-v4.15-rc10-2' of git://people.freedesktop.org/~airlied/linux:
  drm/nouveau: Move irq setup/teardown to pci ctor/dtor
  drm/vc4: Fix NULL pointer dereference in vc4_save_hang_state()
  drm/vc4: Flush the caches before the bin jobs, as well.
Pull networking fixes from David Miller:

 1) The per-network-namespace loopback device, and thus its namespace,
    can have its teardown deferred for a long time if a kernel created
    TCP socket closes and the namespace is exiting meanwhile. The kernel
    keeps trying to finish the close sequence until it times out (which
    takes quite some time).

    Fix this by forcing the socket closed in this situation, from Dan
    Streetman.

 2) Fix regression where we're trying to invoke the update_pmtu method
    on route types (in this case metadata tunnel routes) that don't
    implement the dst_ops method. Fix from Nicolas Dichtel.

 3) Fix long standing memory corruption issues in r8169 driver by
    performing the chip statistics DMA programming more correctly. From
    Francois Romieu.

 4) Handle local broadcast sends over VRF routes properly, from David
    Ahern.

 5) Don't refire the DCCP CCID2 timer endlessly, otherwise the socket
    can never be released. From Alexey Kodanev.

 6) Set poll flags properly in VSOCK protocol layer, from Stefan
    Hajnoczi.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
  VSOCK: set POLLOUT | POLLWRNORM for TCP_CLOSING
  dccp: don't restart ccid2_hc_tx_rto_expire() if sk in closed state
  net: vrf: Add support for sends to local broadcast address
  r8169: fix memory corruption on retrieval of hardware statistics.
  net: don't call update_pmtu unconditionally
  net: tcp: close sock if net namespace is exiting
…pub/scm/linux/kernel/git/palmer/riscv-linux

Pull RISC-V update from Palmer Dabbelt:
 "RISC-V: We have a new mailing list and git repo!

  Sorry to send something essentially as late as possible (Friday after
  an rc9), but we managed to get a mailing list for the RISC-V Linux
  port. We've been using [email protected] for a while, but that
  list has some problems (it's Google Groups and it's shared over all
  RISC-V software projects). The new infaread.org list is much better.
  We just got it on Wednesday but I used it a bit on Thursday to shake
  out all the configuration problems and it appears to be in working
  order.

  When I updated the mailing list I noticed that the MAINTAINERS file
  was pointing to our github repo, but now that we have a kernel.org
  repo I'd like to point to that instead so I changed that as well.
  We'll be centralizing all RISC-V Linux related development here as
  that seems to be the saner way to go about it.

  I can understand if it's too late to get this into 4.15, but given
  that it's not a code change I was hoping it'd still be OK. It would be
  nice to have the new mailing list and git repo in the release tarballs
  so when people start to find bugs they'll get to the right place"

* tag 'riscv-for-linus-4.15-maintainers' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux:
  Update the RISC-V MAINTAINERS file
Due to some unfortunate events, I have not been directly involved in
the x86 kernel patch flow for a while now.  I have also not been able
to ramp back up by now like I had hoped to, and after reviewing what I
will need to work on both internally at Intel and elsewhere in the near
term, it is clear that I am not going to be able to ramp back up until
late 2018 at the very earliest.

It is not acceptable to not recognize that this load is currently
taken by Ingo and Thomas without my direct participation, so I mark
myself as R: (designated reviewer) rather than M: (maintainer) until
further notice.  This is in fact recognizing the de facto situation
for the past few years.

I have obviously no intention of going away, and I will do everything
within my power to improve Linux on x86 and x86 for Linux.  This,
however, puts credit where it is due and reflects a change of focus.

This patch also removes stale entries for portions of the x86
architecture which have not been maintained separately from arch/x86
for a long time.  If there is a reason to re-introduce them then that
can happen later.

Signed-off-by: H. Peter Anvin <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Bruce Schlobohm <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
The hrtimer interrupt code contains a hang detection and mitigation
mechanism, which prevents that a long delayed hrtimer interrupt causes a
continous retriggering of interrupts which prevent the system from making
progress. If a hang is detected then the timer hardware is programmed with
a certain delay into the future and a flag is set in the hrtimer cpu base
which prevents newly enqueued timers from reprogramming the timer hardware
prior to the chosen delay. The subsequent hrtimer interrupt after the delay
clears the flag and resumes normal operation.

If such a hang happens in the last hrtimer interrupt before a CPU is
unplugged then the hang_detected flag is set and stays that way when the
CPU is plugged in again. At that point the timer hardware is not armed and
it cannot be armed because the hang_detected flag is still active, so
nothing clears that flag. As a consequence the CPU does not receive hrtimer
interrupts and no timers expire on that CPU which results in RCU stalls and
other malfunctions.

Clear the flag along with some other less critical members of the hrtimer
cpu base to ensure starting from a clean state when a CPU is plugged in.

Thanks to Paul, Sebastian and Anna-Maria for their help to get down to the
root cause of that hard to reproduce heisenbug. Once understood it's
trivial and certainly justifies a brown paperbag.

Fixes: 41d2e49 ("hrtimer: Tune hrtimer_interrupt hang logic")
Reported-by: Paul E. McKenney <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Sebastian Sewior <[email protected]>
Cc: Anna-Maria Gleixner <[email protected]>
Cc: [email protected]
Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801261447590.2067@nanos
When ORC support was added for the ftrace_64.S code, an ENDPROC
for function_hook() was missed. This results in the following warning:

  arch/x86/kernel/ftrace_64.o: warning: objtool: .entry.text+0x0: unreachable instruction

Fixes: e2ac83d ("x86/ftrace: Fix ORC unwinding from ftrace handlers")
Reported-by: Steven Rostedt <[email protected]>
Reported-by: Borislav Petkov <[email protected]>
Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Acked-by: Ingo Molnar <[email protected]>
Cc: Linus Torvalds <[email protected]>
Link: https://lkml.kernel.org/r/20180128022150.dqierscqmt3uwwsr@treble
…cm/linux/kernel/git/tip/tip

Pull locking fixes from Thomas Gleixner:
 "Two final locking fixes for 4.15:

   - Repair the OWNER_DIED logic in the futex code which got wreckaged
     with the recent fix for a subtle race condition.

   - Prevent the hard lockup detector from triggering when dumping all
     held locks in the system"

* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  locking/lockdep: Avoid triggering hardlockup from debug_show_all_locks()
  futex: Fix OWNER_DEAD fixup
…linux/kernel/git/tip/tip

Pull perf fixes from Thomas Gleixner:
 "Four patches which all address lock inversions and deadlocks in the
  perf core code and the Intel debug store"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86: Fix perf,x86,cpuhp deadlock
  perf/core: Fix ctx::mutex deadlock
  perf/core: Fix another perf,trace,cpuhp lock inversion
  perf/core: Fix lock inversion between perf,trace,cpuhp
…/linux/kernel/git/tip/tip

Pull scheduler fix from Thomas Gleixner:
 "A single bug fix to prevent a subtle deadlock in the scheduler core
  code vs cpu hotplug"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/core: Fix cpu.max vs. cpuhotplug deadlock
…m/linux/kernel/git/tip/tip

Pull timer fix from Thomas Gleixner:
 "A single fix for a ~10 years old problem which causes high resolution
  timers to stop after a CPU unplug/plug cycle due to a stale flag in
  the per CPU hrtimer base struct.

  Paul McKenney was hunting this for about a year, but the heisenbug
  nature made it resistant against debug attempts for quite some time"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  hrtimer: Reset hrtimer cpu base proper on CPU hotplug
…inux/kernel/git/tip/tip

Pull x86 fixes from Thomas Gleixner:
 "A set of small fixes for 4.15:

   - Fix vmapped stack synchronization on systems with 4-level paging
     and a large amount of memory caused by a missing 5-level folding
     which made the pgd synchronization logic to fail and causing double
     faults.

   - Add a missing sanity check in the vmalloc_fault() logic on 5-level
     paging systems.

   - Bring back protection against accessing a freed initrd in the
     microcode loader which was lost by a wrong merge conflict
     resolution.

   - Extend the Broadwell micro code loading sanity check.

   - Add a missing ENDPROC annotation in ftrace assembly code which
     makes ORC unhappy.

   - Prevent loading the AMD power module on !AMD platforms. The load
     itself is uncritical, but an unload attempt results in a kernel
     crash.

   - Update Peter Anvins role in the MAINTAINERS file"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/ftrace: Add one more ENDPROC annotation
  x86: Mark hpa as a "Designated Reviewer" for the time being
  x86/mm/64: Tighten up vmalloc_fault() sanity checks on 5-level kernels
  x86/mm/64: Fix vmapped stack syncing on very-large-memory 4-level systems
  x86/microcode: Fix again accessing initrd after having been freed
  x86/microcode/intel: Extend BDW late-loading further with LLC size check
  perf/x86/amd/power: Do not load AMD power module on !AMD platforms
…x/kernel/git/tip/tip

Pull x86 retpoline fixlet from Thomas Gleixner:
 "Remove the ESP/RSP thunks for retpoline as they cannot ever work.

  Get rid of them before they show up in a release"

* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/retpoline: Remove the esp/rsp thunk
@Esteban-Rocha Esteban-Rocha merged commit 5173440 into Esteban-Rocha:master Jan 29, 2018
Esteban-Rocha pushed a commit that referenced this pull request Feb 8, 2018
Currently, rmmod ddbridge on a KASAN enabled kernel yields this report
for hardware that utilises the tda18212 tuner driver:

  [   50.355229] ==================================================================
  [   50.355271] BUG: KASAN: use-after-free in tda18212_remove+0x5c/0xb0 [tda18212]
  [   50.355290] Write of size 288 at addr ffff8800c235cf18 by task rmmod/285

  [   50.355316] CPU: 1 PID: 285 Comm: rmmod Not tainted 4.15.0-rc1-13744-g352a86ad536f #11
  [   50.355318] Hardware name: Gigabyte Technology Co., Ltd. P35-DS3/P35-DS3, BIOS F3 06/11/2007
  [   50.355319] Call Trace:
  [   50.355326]  dump_stack+0x46/0x61
  [   50.355332]  print_address_description+0x79/0x270
  [   50.355336]  ? tda18212_remove+0x5c/0xb0 [tda18212]
  [   50.355339]  kasan_report+0x229/0x340
  [   50.355342]  memset+0x1f/0x40
  [   50.355345]  tda18212_remove+0x5c/0xb0 [tda18212]
  [   50.355350]  i2c_device_remove+0x97/0xe0
  [   50.355355]  device_release_driver_internal+0x267/0x510
  [   50.355358]  bus_remove_device+0x296/0x470
  [   50.355360]  device_del+0x35c/0x890
  [   50.355363]  ? __device_links_no_driver+0x1c0/0x1c0
  [   50.355367]  ? cxd2841er_get_algo+0x10/0x10 [cxd2841er]
  [   50.355371]  ? cxd2841er_get_algo+0x10/0x10 [cxd2841er]
  [   50.355374]  ? __module_text_address+0xe/0x140
  [   50.355377]  device_unregister+0x9/0x20
  [   50.355382]  dvb_input_detach.isra.24+0x286/0x480 [ddbridge]
  [   50.355388]  ddb_ports_detach+0x15f/0x4f0 [ddbridge]
  [   50.355393]  ddb_remove+0x3c/0xb0 [ddbridge]
  [   50.355397]  pci_device_remove+0x93/0x1d0
  [   50.355400]  device_release_driver_internal+0x267/0x510
  [   50.355403]  driver_detach+0xb9/0x1b0
  [   50.355406]  bus_remove_driver+0xd0/0x1f0
  [   50.355410]  pci_unregister_driver+0x25/0x210
  [   50.355415]  module_exit_ddbridge+0xc/0x45 [ddbridge]
  [   50.355418]  SyS_delete_module+0x314/0x440
  [   50.355420]  ? free_module+0x5b0/0x5b0
  [   50.355423]  ? exit_to_usermode_loop+0xa9/0xc0
  [   50.355425]  ? free_module+0x5b0/0x5b0
  [   50.355428]  do_syscall_64+0x179/0x4c0
  [   50.355432]  ? do_page_fault+0x1b/0x60
  [   50.355435]  entry_SYSCALL64_slow_path+0x25/0x25
  [   50.355438] RIP: 0033:0x7fe65d08ade7
  [   50.355439] RSP: 002b:00007fff5a6a09a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000b0
  [   50.355443] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fe65d08ade7
  [   50.355445] RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000000000f4e268
  [   50.355447] RBP: 0000000000f4e200 R08: 0000000000000000 R09: 1999999999999999
  [   50.355449] R10: 0000000000000891 R11: 0000000000000202 R12: 00007fff5a6a14ef
  [   50.355451] R13: 0000000000000000 R14: 0000000000f4e200 R15: 0000000000f4d010

  [   50.355462] Allocated by task 164:
  [   50.355477]  cxd2841er_attach+0xc3/0x7f0 [cxd2841er]
  [   50.355482]  demod_attach_cxd28xx+0x14c/0x3f0 [ddbridge]
  [   50.355486]  dvb_input_attach+0x671/0x1e20 [ddbridge]
  [   50.355490]  ddb_ports_attach+0x3d7/0xbf0 [ddbridge]
  [   50.355495]  ddb_init+0x4b3/0xa30 [ddbridge]
  [   50.355499]  ddb_probe+0xa51/0xfe0 [ddbridge]
  [   50.355501]  pci_device_probe+0x279/0x480
  [   50.355504]  driver_probe_device+0x46f/0x7a0
  [   50.355506]  __driver_attach+0x133/0x170
  [   50.355509]  bus_for_each_dev+0x10a/0x190
  [   50.355511]  bus_add_driver+0x2a3/0x5a0
  [   50.355513]  driver_register+0x182/0x3a0
  [   50.355516]  arc4_set_key+0x8f/0x2a0 [arc4]
  [   50.355518]  do_one_initcall+0x77/0x1d0
  [   50.355521]  do_init_module+0x1c2/0x548
  [   50.355523]  load_module+0x5e61/0x8df0
  [   50.355525]  SyS_finit_module+0x142/0x150
  [   50.355527]  do_syscall_64+0x179/0x4c0
  [   50.355529]  return_from_SYSCALL_64+0x0/0x65

  [   50.355539] Freed by task 285:
  [   50.355551]  kfree+0x6c/0xa0
  [   50.355558]  __dvb_frontend_free+0x81/0xb0 [dvb_core]
  [   50.355562]  dvb_input_detach.isra.24+0x17c/0x480 [ddbridge]
  [   50.355566]  ddb_ports_detach+0x15f/0x4f0 [ddbridge]
  [   50.355570]  ddb_remove+0x3c/0xb0 [ddbridge]
  [   50.355573]  pci_device_remove+0x93/0x1d0
  [   50.355576]  device_release_driver_internal+0x267/0x510
  [   50.355578]  driver_detach+0xb9/0x1b0
  [   50.355580]  bus_remove_driver+0xd0/0x1f0
  [   50.355583]  pci_unregister_driver+0x25/0x210
  [   50.355587]  module_exit_ddbridge+0xc/0x45 [ddbridge]
  [   50.355590]  SyS_delete_module+0x314/0x440
  [   50.355592]  do_syscall_64+0x179/0x4c0
  [   50.355594]  return_from_SYSCALL_64+0x0/0x65

  [   50.355604] The buggy address belongs to the object at ffff8800c235cd80
                  which belongs to the cache kmalloc-2048 of size 2048
  [   50.355630] The buggy address is located 408 bytes inside of
                  2048-byte region [ffff8800c235cd80, ffff8800c235d580)
  [   50.355652] The buggy address belongs to the page:
  [   50.355666] page:ffffea0002a7bc20 count:1 mapcount:0 mapping:ffff8800c235c500 index:0x0 compound_mapcount: 0
  [   50.355688] flags: 0x4000000000008100(slab|head)
  [   50.355703] raw: 4000000000008100 ffff8800c235c500 0000000000000000 0000000100000003
  [   50.355720] raw: ffffea000382b4b0 ffffea0002b91550 ffff88010b000800
  [   50.355734] page dumped because: kasan: bad access detected

  [   50.355754] Memory state around the buggy address:
  [   50.355767]  ffff8800c235ce00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  [   50.355783]  ffff8800c235ce80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  [   50.355800] >ffff8800c235cf00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  [   50.355815]                             ^
  [   50.355827]  ffff8800c235cf80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  [   50.355843]  ffff8800c235d000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  [   50.355858] ==================================================================

This is due to dvb_frontend_detach() being called before
i2c_unregister_device() on the TDA18212 tuner client instance, as
dvb_frontend_detach() causes the demod drivers to release all their
resources, and the tuner driver's _remove method does further cleanup on
the now invalid (freed) resources. Fix this by putting the I2C client
deregistration in dvb_input_detach() to state/case 0x30, right before the
call to dvb_frontend_detach(). This also makes sure that any further
(tuner) hardware driven by I2C client drivers unload cleanly.

Fixes: 1502efd ("media: ddbridge: fix teardown/deregistration order in ddb_input_detach()")

Signed-off-by: Daniel Scheller <[email protected]>
Signed-off-by: Mauro Carvalho Chehab <[email protected]>
Esteban-Rocha pushed a commit that referenced this pull request Feb 8, 2018
Scenario:
1. Port down and do fail over
2. Ap do rds_bind syscall

PID: 47039  TASK: ffff89887e2fe640  CPU: 47  COMMAND: "kworker/u:6"
 #0 [ffff898e35f159f0] machine_kexec at ffffffff8103abf9
 #1 [ffff898e35f15a60] crash_kexec at ffffffff810b96e3
 #2 [ffff898e35f15b30] oops_end at ffffffff8150f518
 #3 [ffff898e35f15b60] no_context at ffffffff8104854c
 #4 [ffff898e35f15ba0] __bad_area_nosemaphore at ffffffff81048675
 #5 [ffff898e35f15bf0] bad_area_nosemaphore at ffffffff810487d3
 #6 [ffff898e35f15c00] do_page_fault at ffffffff815120b8
 #7 [ffff898e35f15d10] page_fault at ffffffff8150ea95
    [exception RIP: unknown or invalid address]
    RIP: 0000000000000000  RSP: ffff898e35f15dc8  RFLAGS: 00010282
    RAX: 00000000fffffffe  RBX: ffff889b77f6fc00  RCX:ffffffff81c99d88
    RDX: 0000000000000000  RSI: ffff896019ee08e8  RDI:ffff889b77f6fc00
    RBP: ffff898e35f15df0   R8: ffff896019ee08c8  R9:0000000000000000
    R10: 0000000000000400  R11: 0000000000000000  R12:ffff896019ee08c0
    R13: ffff889b77f6fe68  R14: ffffffff81c99d80  R15: ffffffffa022a1e0
    ORIG_RAX: ffffffffffffffff  CS: 0010 SS: 0018
 #8 [ffff898e35f15dc8] cma_ndev_work_handler at ffffffffa022a228 [rdma_cm]
 #9 [ffff898e35f15df8] process_one_work at ffffffff8108a7c6
 #10 [ffff898e35f15e58] worker_thread at ffffffff8108bda0
 #11 [ffff898e35f15ee8] kthread at ffffffff81090fe6

PID: 45659  TASK: ffff880d313d2500  CPU: 31  COMMAND: "oracle_45659_ap"
 #0 [ffff881024ccfc98] __schedule at ffffffff8150bac4
 #1 [ffff881024ccfd40] schedule at ffffffff8150c2cf
 #2 [ffff881024ccfd50] __mutex_lock_slowpath at ffffffff8150cee7
 #3 [ffff881024ccfdc0] mutex_lock at ffffffff8150cdeb
 #4 [ffff881024ccfde0] rdma_destroy_id at ffffffffa022a027 [rdma_cm]
 #5 [ffff881024ccfe10] rds_ib_laddr_check at ffffffffa0357857 [rds_rdma]
 #6 [ffff881024ccfe50] rds_trans_get_preferred at ffffffffa0324c2a [rds]
 #7 [ffff881024ccfe80] rds_bind at ffffffffa031d690 [rds]
 #8 [ffff881024ccfeb0] sys_bind at ffffffff8142a670

PID: 45659                          PID: 47039
rds_ib_laddr_check
  /* create id_priv with a null event_handler */
  rdma_create_id
  rdma_bind_addr
    cma_acquire_dev
      /* add id_priv to cma_dev->id_list */
      cma_attach_to_dev
                                    cma_ndev_work_handler
                                      /* event_hanlder is null */
                                      id_priv->id.event_handler

Signed-off-by: Guanglei Li <[email protected]>
Signed-off-by: Honglei Wang <[email protected]>
Reviewed-by: Junxiao Bi <[email protected]>
Reviewed-by: Yanjun Zhu <[email protected]>
Reviewed-by: Leon Romanovsky <[email protected]>
Acked-by: Santosh Shilimkar <[email protected]>
Acked-by: Doug Ledford <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Esteban-Rocha pushed a commit that referenced this pull request Mar 3, 2018
It was reported by Sergey Senozhatsky that if THP (Transparent Huge
Page) and frontswap (via zswap) are both enabled, when memory goes low
so that swap is triggered, segfault and memory corruption will occur in
random user space applications as follow,

kernel: urxvt[338]: segfault at 20 ip 00007fc08889ae0d sp 00007ffc73a7fc40 error 6 in libc-2.26.so[7fc08881a000+1ae000]
 #0  0x00007fc08889ae0d _int_malloc (libc.so.6)
 #1  0x00007fc08889c2f3 malloc (libc.so.6)
 #2  0x0000560e6004bff7 _Z14rxvt_wcstoutf8PKwi (urxvt)
 #3  0x0000560e6005e75c n/a (urxvt)
 #4  0x0000560e6007d9f1 _ZN16rxvt_perl_interp6invokeEP9rxvt_term9hook_typez (urxvt)
 #5  0x0000560e6003d988 _ZN9rxvt_term9cmd_parseEv (urxvt)
 #6  0x0000560e60042804 _ZN9rxvt_term6pty_cbERN2ev2ioEi (urxvt)
 #7  0x0000560e6005c10f _Z17ev_invoke_pendingv (urxvt)
 #8  0x0000560e6005cb55 ev_run (urxvt)
 #9  0x0000560e6003b9b9 main (urxvt)
 #10 0x00007fc08883af4a __libc_start_main (libc.so.6)
 #11 0x0000560e6003f9da _start (urxvt)

After bisection, it was found the first bad commit is bd4c82c ("mm,
THP, swap: delay splitting THP after swapped out").

The root cause is as follows:

When the pages are written to swap device during swapping out in
swap_writepage(), zswap (fontswap) is tried to compress the pages to
improve performance.  But zswap (frontswap) will treat THP as a normal
page, so only the head page is saved.  After swapping in, tail pages
will not be restored to their original contents, causing memory
corruption in the applications.

This is fixed by refusing to save page in the frontswap store functions
if the page is a THP.  So that the THP will be swapped out to swap
device.

Another choice is to split THP if frontswap is enabled.  But it is found
that the frontswap enabling isn't flexible.  For example, if
CONFIG_ZSWAP=y (cannot be module), frontswap will be enabled even if
zswap itself isn't enabled.

Frontswap has multiple backends, to make it easy for one backend to
enable THP support, the THP checking is put in backend frontswap store
functions instead of the general interfaces.

Link: http://lkml.kernel.org/r/[email protected]
Fixes: bd4c82c ("mm, THP, swap: delay splitting THP after swapped out")
Signed-off-by: "Huang, Ying" <[email protected]>
Reported-by: Sergey Senozhatsky <[email protected]>
Tested-by: Sergey Senozhatsky <[email protected]>
Suggested-by: Minchan Kim <[email protected]>	[put THP checking in backend]
Cc: Konrad Rzeszutek Wilk <[email protected]>
Cc: Dan Streetman <[email protected]>
Cc: Seth Jennings <[email protected]>
Cc: Tetsuo Handa <[email protected]>
Cc: Shaohua Li <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Shakeel Butt <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: <[email protected]>	[4.14]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Esteban-Rocha pushed a commit that referenced this pull request Apr 22, 2018
Patch series "kexec_file: Clean up purgatory load", v2.

Following the discussion with Dave and AKASHI, here are the common code
patches extracted from my recent patch set (Add kexec_file_load support
to s390) [1].  The patches were extracted to allow upstream integration
together with AKASHI's common code patches before the arch code gets
adjusted to the new base.

The reason for this series is to prepare common code for adding
kexec_file_load to s390 as well as cleaning up the mis-use of the
sh_offset field during purgatory load.  In detail this series contains:

Patch #1&2: Minor cleanups/fixes.

Patch #3-9: Clean up the purgatory load/relocation code.  Especially
remove the mis-use of the purgatory_info->sechdrs->sh_offset field,
currently holding a pointer into either kexec_purgatory (ro) or
purgatory_buf (rw) depending on the section.  With these patches the
section address will be calculated verbosely and sh_offset will contain
the offset of the section in the stripped purgatory binary
(purgatory_buf).

Patch #10: Allows architectures to set the purgatory load address.  This
patch is important for s390 as the kernel and purgatory have to be
loaded to fixed addresses.  In current code this is impossible as the
purgatory load is opaque to the architecture.

Patch #11: Moves x86 purgatories sha implementation to common lib/
directory to allow reuse in other architectures.

This patch (of 11)

When building the kernel with CONFIG_KEXEC_FILE enabled gcc prints a
compile warning multiple times.

  In file included from <path>/linux/init/initramfs.c:526:0:
  <path>/include/linux/kexec.h:120:9: warning: `struct kimage' declared inside parameter list [enabled by default]
           unsigned long cmdline_len);
           ^

This is because the typedefs for kexec_file_load uses struct kimage
before it is declared.  Fix this by simply forward declaring struct
kimage.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Philipp Rudo <[email protected]>
Acked-by: Dave Young <[email protected]>
Cc: Eric Biederman <[email protected]>
Cc: Vivek Goyal <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Thiago Jung Bauermann <[email protected]>
Cc: Martin Schwidefsky <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: AKASHI Takahiro <[email protected]>
Cc: Ingo Molnar <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.