Add zero-copy driver #37

andr2000 · 2018-04-23T13:51:37Z

This pull request adds Xen zero-copy driver: for that, backport of DRM xen-front driver is also required
(to keep consistency).

With fixes for 4.14 It is possible that drm_simple_kms_plane_atomic_check called with no CRTC set, e.g. when user-space application sets CRTC_ID/FB_ID to 0 before doing any actual drawing. This leads to NULL pointer dereference because in this case new CRTC state is NULL and must be checked before accessing. Signed-off-by: Oleksandr Andrushchenko <[email protected]> Reviewed-by: Daniel Vetter <[email protected]> Signed-off-by: Daniel Vetter <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

Use srcu to protect drm_device.unplugged in a race free manner. Drivers can use drm_dev_enter()/drm_dev_exit() to protect and mark sections preventing access to device resources that are not available after the device is gone. Suggested-by: Daniel Vetter <[email protected]> Signed-off-by: Noralf Trønnes <[email protected]> Signed-off-by: Oleksandr Andrushchenko <[email protected]> Reviewed-by: Oleksandr Andrushchenko <[email protected]> Tested-by: Oleksandr Andrushchenko <[email protected]> Cc: [email protected] Reviewed-by: Daniel Vetter <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

The PL111 needs to filter valid modes based on memory bandwidth. I guess it is a pretty simple operation, so we can still claim the DRM KMS helper pipeline is simple after adding this (optional) vtable callback. Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Daniel Vetter <[email protected]> Signed-off-by: Linus Walleij <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

@r1

tinydrm enable hook wants to play around with the new fb in .atomic_enable(), thus we'll need access to the plane state. Performed with coccinelle: @r1@ identifier F =~ ".*enable$"; identifier P, CS; @@ F( struct drm_simple_display_pipe *P ,struct drm_crtc_state *CS + ,struct drm_plane_state *plane_state ) { ... } @@ struct drm_simple_display_pipe *P; expression E; @@ { + struct drm_plane *plane; ... + plane = &P->plane; P->funcs->enable(P ,E + ,plane->state ); ... } @@ identifier P, CS; @@ struct drm_simple_display_pipe_funcs { ... void (*enable)(struct drm_simple_display_pipe *P ,struct drm_crtc_state *CS + ,struct drm_plane_state *plane_state ); ... }; v2: Pimp the commit message (David) Cc: Marek Vasut <[email protected]> Cc: Eric Anholt <[email protected]> Cc: David Lechner <[email protected]> Cc: "Noralf Trønnes" <[email protected]> Cc: Linus Walleij <[email protected]> Signed-off-by: Ville Syrjälä <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Reviewed-by: Noralf Trønnes <[email protected]>

We do not want the common dma_configure() pathway to apply indiscriminately to all devices, since there are plenty of buses which do not have DMA capability, and if their child devices were used for DMA API calls it would only be indicative of a driver bug. However, there are a number of buses for which DMA is implicitly expected even when not described by firmware - those we whitelist with an automatic opt-in to dma_configure(), assuming that the DMA address space and the physical address space are equivalent if not otherwise specified. Commit 7232888 ("of: restrict DMA configuration") introduced a short-term fix by comparing explicit bus types, but this approach is far from pretty, doesn't scale well, and fails to cope at all with bus drivers which may be built as modules, like host1x. Let's refine things by making that opt-in a property of the bus type, which neatly addresses those problems and lets the decision of whether firmware description of DMA capability should be optional or mandatory stay internal to the bus drivers themselves. Signed-off-by: Robin Murphy <[email protected]> Acked-by: Rob Herring <[email protected]> Acked-by: Greg Kroah-Hartman <[email protected]> Acked-by: Thierry Reding <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>

Add support for Xen para-virtualized frontend display driver. Accompanying backend [1] is implemented as a user-space application and its helper library [2], capable of running as a Weston client or DRM master. Configuration of both backend and frontend is done via Xen guest domain configuration options [3]. Driver limitations: 1. Only primary plane without additional properties is supported. 2. Only one video mode supported which resolution is configured via XenStore. 3. All CRTCs operate at fixed frequency of 60Hz. 1. Implement Xen bus state machine for the frontend driver according to the state diagram and recovery flow from display para-virtualized protocol: xen/interface/io/displif.h. 2. Read configuration values from Xen store according to xen/interface/io/displif.h protocol: - read connector(s) configuration - read buffer allocation mode (backend/frontend) 3. Handle Xen event channels: - create for all configured connectors and publish corresponding ring references and event channels in Xen store, so backend can connect - implement event channels interrupt handlers - create and destroy event channels with respect to Xen bus state 4. Implement shared buffer handling according to the para-virtualized display device protocol at xen/interface/io/displif.h: - handle page directories according to displif protocol: - allocate and share page directories - grant references to the required set of pages for the page directory - allocate xen balllooned pages via Xen balloon driver with alloc_xenballooned_pages/free_xenballooned_pages - grant references to the required set of pages for the shared buffer itself - implement pages map/unmap for the buffers allocated by the backend (gnttab_map_refs/gnttab_unmap_refs) 5. Implement kernel modesetiing/connector handling using DRM simple KMS helper pipeline: - implement KMS part of the driver with the help of DRM simple pipepline helper which is possible due to the fact that the para-virtualized driver only supports a single (primary) plane: - initialize connectors according to XenStore configuration - handle frame done events from the backend - create and destroy frame buffers and propagate those to the backend - propagate set/reset mode configuration to the backend on display enable/disable callbacks - send page flip request to the backend and implement logic for reporting backend IO errors on prepare fb callback - implement virtual connector handling: - support only pixel formats suitable for single plane modes - make sure the connector is always connected - support a single video mode as per para-virtualized driver configuration 6. Implement GEM handling depending on driver mode of operation: depending on the requirements for the para-virtualized environment, namely requirements dictated by the accompanying DRM/(v)GPU drivers running in both host and guest environments, number of operating modes of para-virtualized display driver are supported: - display buffers can be allocated by either frontend driver or backend - display buffers can be allocated to be contiguous in memory or not Note! Frontend driver itself has no dependency on contiguous memory for its operation. 6.1. Buffers allocated by the frontend driver. The below modes of operation are configured at compile-time via frontend driver's kernel configuration. 6.1.1. Front driver configured to use GEM CMA helpers This use-case is useful when used with accompanying DRM/vGPU driver in guest domain which was designed to only work with contiguous buffers, e.g. DRM driver based on GEM CMA helpers: such drivers can only import contiguous PRIME buffers, thus requiring frontend driver to provide such. In order to implement this mode of operation para-virtualized frontend driver can be configured to use GEM CMA helpers. 6.1.2. Front driver doesn't use GEM CMA If accompanying drivers can cope with non-contiguous memory then, to lower pressure on CMA subsystem of the kernel, driver can allocate buffers from system memory. Note! If used with accompanying DRM/(v)GPU drivers this mode of operation may require IOMMU support on the platform, so accompanying DRM/vGPU hardware can still reach display buffer memory while importing PRIME buffers from the frontend driver. 6.2. Buffers allocated by the backend This mode of operation is run-time configured via guest domain configuration through XenStore entries. For systems which do not provide IOMMU support, but having specific requirements for display buffers it is possible to allocate such buffers at backend side and share those with the frontend. For example, if host domain is 1:1 mapped and has DRM/GPU hardware expecting physically contiguous memory, this allows implementing zero-copying use-cases. Note, while using this scenario the following should be considered: a) If guest domain dies then pages/grants received from the backend cannot be claimed back b) Misbehaving guest may send too many requests to the backend exhausting its grant references and memory (consider this from security POV). Note! Configuration options 1.1 (contiguous display buffers) and 2 (backend allocated buffers) are not supported at the same time. 7. Handle communication with the backend: - send requests and wait for the responses according to the displif protocol - serialize access to the communication channel - time-out used for backend communication is set to 3000 ms - manage display buffers shared with the backend [1] https://github.com/xen-troops/displ_be [2] https://github.com/xen-troops/libxenbe [3] https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=docs/man/xl.cfg.pod.5.in;h=a699367779e2ae1212ff8f638eff0206ec1a1cc9;hb=refs/heads/master#l1257 Signed-off-by: Oleksandr Andrushchenko <[email protected]> Reviewed-by: Boris Ostrovsky <[email protected]> Reviewed-by: Daniel Vetter <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

It turns out this was only needed to paper over a bug in the CMA helpers, which was addressed in commit 998fb1a Author: Liviu Dudau <[email protected]> Date: Fri Nov 10 13:33:10 2017 +0000 drm: gem_cma_helper.c: Allow importing of contiguous scatterlists with nents > 1 Without this the following pipeline didn't work: domU: 1. xen-front allocates a non-contig buffer 2. creates grants out of it dom0: 3. converts the grants into a dma-buf. Since they're non-contig, the scatter-list is huge. 4. imports it into rcar-du, which requires dma-contig memory for scanout. -> On this given platform there's an IOMMU, so in theory this should work. But in practice this failed, because of the huge number of sg entries, even though the IOMMU driver mapped it all into a dma-contig range. With a guest-contig buffer allocated in step 1, this problem doesn't exist. But there's technically no reason to require guest-contig memory for xen buffer sharing using grants. Given all that, the xen-front cma support is not needed and should be removed. Signed-off-by: Oleksandr Andrushchenko <[email protected]> Suggested-by: Daniel Vetter <[email protected]> Acked-by: Daniel Vetter <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

Introduce Xen zero-copy helper DRM driver, add user-space API of the driver: 1. DRM_IOCTL_XEN_ZCOPY_DUMB_FROM_REFS This will create a DRM dumb buffer from grant references provided by the frontend. The intended usage is: - Frontend - creates a dumb/display buffer and allocates memory - grants foreign access to the buffer pages - passes granted references to the backend - Backend - issues DRM_XEN_ZCOPY_DUMB_FROM_REFS ioctl to map granted references and create a dumb buffer - requests handle to fd conversion via DRM_IOCTL_PRIME_HANDLE_TO_FD - requests real HW driver/consumer to import the PRIME buffer with DRM_IOCTL_PRIME_FD_TO_HANDLE - uses handle returned by the real HW driver - at the end: o closes real HW driver's handle with DRM_IOCTL_GEM_CLOSE o closes zero-copy driver's handle with DRM_IOCTL_GEM_CLOSE o closes file descriptor of the exported buffer 2. DRM_IOCTL_XEN_ZCOPY_DUMB_TO_REFS This will grant references to a dumb/display buffer's memory provided by the backend. The intended usage is: - Frontend - requests backend to allocate dumb/display buffer and grant references to its pages - Backend - requests real HW driver to create a dumb with DRM_IOCTL_MODE_CREATE_DUMB - requests handle to fd conversion via DRM_IOCTL_PRIME_HANDLE_TO_FD - requests zero-copy driver to import the PRIME buffer with DRM_IOCTL_PRIME_FD_TO_HANDLE - issues DRM_XEN_ZCOPY_DUMB_TO_REFS ioctl to grant references to the buffer's memory. - passes grant references to the frontend - at the end: - closes zero-copy driver's handle with DRM_IOCTL_GEM_CLOSE - closes real HW driver's handle with DRM_IOCTL_GEM_CLOSE - closes file descriptor of the imported buffer Implement GEM/IOCTL handling depending on driver mode of operation: - if GEM is created from grant references, then prepare to create a dumb from mapped pages - if GEM grant references are about to be provided for the imported PRIME buffer, then prepare for granting references and providing those to user-space Implement handling of display buffers from backend to/from front interaction point ov view: - when importing a buffer from the frontend: - allocate/free xen ballooned pages via Xen balloon driver or by manually allocating a DMA buffer - if DMA buffer is used, then increase/decrease its pages reservation accordingly - map/unmap foreign pages to the ballooned pages - when exporting a buffer to the frontend: - grant references for the pages of the imported PRIME buffer - pass the grants back to user-space, so those can be shared with the frontend Add an option to allocate DMA buffers as backing storage while importing a frontend's buffer into host's memory: for those use-cases when exported PRIME buffer will be used by a device expecting CMA buffers only, it is possible to map frontend's pages onto contiguous buffer, e.g. allocated via DMA API. Implement synchronous buffer deletion: for buffers, created from front's grant references, synchronization between backend and frontend is needed on buffer deletion as front expects us to unmap these references after XENDISPL_OP_DBUF_DESTROY response. For that reason introduce DRM_IOCTL_XEN_ZCOPY_DUMB_WAIT_FREE IOCTL: this will block until dumb buffer, with the wait handle provided, be freed. The rationale behind implementing own wait handle: - dumb buffer handle cannot be used as when the PRIME buffer gets exported there are at least 2 handles: one is for the backend and another one for the importing application, so when backend closes its handle and the other application still holds the buffer then there is no way for the backend to tell which buffer we want to wait for while calling xen_ioctl_wait_free - flink cannot be used as well as it is gone when DRM core calls .gem_free_object_unlocked Signed-off-by: Oleksandr Andrushchenko <[email protected]>

syzbot caught an infinite recursion in nsh_gso_segment(). Problem here is that we need to make sure the NSH header is of reasonable length. BUG: MAX_LOCK_DEPTH too low! turning off the locking correctness validator. depth: 48 max: 48! 48 locks held by syz-executor0/10189: #0: (ptrval) (rcu_read_lock_bh){....}, at: __dev_queue_xmit+0x30f/0x34c0 net/core/dev.c:3517 #1: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] #1: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 #2: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] #2: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 #3: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] #3: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 xen-troops#4: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] xen-troops#4: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 xen-troops#5: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] xen-troops#5: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 xen-troops#6: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] xen-troops#6: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 xen-troops#7: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] xen-troops#7: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 xen-troops#8: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] xen-troops#8: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 xen-troops#9: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] xen-troops#9: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 xen-troops#10: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] xen-troops#10: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 xen-troops#11: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] xen-troops#11: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 xen-troops#12: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] xen-troops#12: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 xen-troops#13: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] xen-troops#13: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 xen-troops#14: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] xen-troops#14: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 xen-troops#15: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] xen-troops#15: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 xen-troops#16: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] xen-troops#16: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 xen-troops#17: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] xen-troops#17: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 xen-troops#18: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] xen-troops#18: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 xen-troops#19: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] xen-troops#19: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 xen-troops#20: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] xen-troops#20: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 xen-troops#21: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] xen-troops#21: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 xen-troops#22: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] xen-troops#22: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 xen-troops#23: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] xen-troops#23: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 xen-troops#24: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] xen-troops#24: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 xen-troops#25: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] xen-troops#25: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 xen-troops#26: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] xen-troops#26: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 xen-troops#27: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] xen-troops#27: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 xen-troops#28: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] xen-troops#28: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 xen-troops#29: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] xen-troops#29: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 xen-troops#30: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] xen-troops#30: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 xen-troops#31: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] xen-troops#31: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 dccp_close: ABORT with 65423 bytes unread xen-troops#32: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] xen-troops#32: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 xen-troops#33: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] xen-troops#33: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 xen-troops#34: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] xen-troops#34: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 xen-troops#35: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] xen-troops#35: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 xen-troops#36: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] xen-troops#36: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 xen-troops#37: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] xen-troops#37: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 xen-troops#38: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] xen-troops#38: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 xen-troops#39: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] xen-troops#39: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 xen-troops#40: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] xen-troops#40: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 xen-troops#41: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] xen-troops#41: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 xen-troops#42: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] xen-troops#42: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 xen-troops#43: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] xen-troops#43: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 xen-troops#44: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] xen-troops#44: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 xen-troops#45: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] xen-troops#45: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 xen-troops#46: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] xen-troops#46: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 xen-troops#47: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] xen-troops#47: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 INFO: lockdep is turned off. CPU: 1 PID: 10189 Comm: syz-executor0 Not tainted 4.17.0-rc2+ xen-troops#26 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1b9/0x294 lib/dump_stack.c:113 __lock_acquire+0x1788/0x5140 kernel/locking/lockdep.c:3449 lock_acquire+0x1dc/0x520 kernel/locking/lockdep.c:3920 rcu_lock_acquire include/linux/rcupdate.h:246 [inline] rcu_read_lock include/linux/rcupdate.h:632 [inline] skb_mac_gso_segment+0x25b/0x720 net/core/dev.c:2789 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 __skb_gso_segment+0x3bb/0x870 net/core/dev.c:2865 skb_gso_segment include/linux/netdevice.h:4025 [inline] validate_xmit_skb+0x54d/0xd90 net/core/dev.c:3118 validate_xmit_skb_list+0xbf/0x120 net/core/dev.c:3168 sch_direct_xmit+0x354/0x11e0 net/sched/sch_generic.c:312 qdisc_restart net/sched/sch_generic.c:399 [inline] __qdisc_run+0x741/0x1af0 net/sched/sch_generic.c:410 __dev_xmit_skb net/core/dev.c:3243 [inline] __dev_queue_xmit+0x28ea/0x34c0 net/core/dev.c:3551 dev_queue_xmit+0x17/0x20 net/core/dev.c:3616 packet_snd net/packet/af_packet.c:2951 [inline] packet_sendmsg+0x40f8/0x6070 net/packet/af_packet.c:2976 sock_sendmsg_nosec net/socket.c:629 [inline] sock_sendmsg+0xd5/0x120 net/socket.c:639 __sys_sendto+0x3d7/0x670 net/socket.c:1789 __do_sys_sendto net/socket.c:1801 [inline] __se_sys_sendto net/socket.c:1797 [inline] __x64_sys_sendto+0xe1/0x1a0 net/socket.c:1797 do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x49/0xbe Fixes: c411ed8 ("nsh: add GSO support") Signed-off-by: Eric Dumazet <[email protected]> Cc: Jiri Benc <[email protected]> Reported-by: syzbot <[email protected]> Acked-by: Jiri Benc <[email protected]> Signed-off-by: David S. Miller <[email protected]>

Increase kasan instrumented kernel stack size from 32k to 64k. Other architectures seems to get away with just doubling kernel stack size under kasan, but on s390 this appears to be not enough due to bigger frame size. The particular pain point is kasan inlined checks (CONFIG_KASAN_INLINE vs CONFIG_KASAN_OUTLINE). With inlined checks one particular case hitting stack overflow is fs sync on xfs filesystem: #0 [9a0681e8] 704 bytes check_usage at 34b1fc #1 [9a0684a8] 432 bytes check_usage at 34c710 #2 [9a068658] 1048 bytes validate_chain at 35044a #3 [9a068a70] 312 bytes __lock_acquire at 3559fe xen-troops#4 [9a068ba8] 440 bytes lock_acquire at 3576ee xen-troops#5 [9a068d60] 104 bytes _raw_spin_lock at 21b44e0 xen-troops#6 [9a068dc8] 1992 bytes enqueue_entity at 2dbf72 xen-troops#7 [9a069590] 1496 bytes enqueue_task_fair at 2df5f0 xen-troops#8 [9a069b68] 64 bytes ttwu_do_activate at 28f438 xen-troops#9 [9a069ba8] 552 bytes try_to_wake_up at 298c4c xen-troops#10 [9a069dd0] 168 bytes wake_up_worker at 23f97c xen-troops#11 [9a069e78] 200 bytes insert_work at 23fc2e xen-troops#12 [9a069f40] 648 bytes __queue_work at 2487c0 xen-troops#13 [9a06a1c8] 200 bytes __queue_delayed_work at 24db28 xen-troops#14 [9a06a290] 248 bytes mod_delayed_work_on at 24de84 xen-troops#15 [9a06a388] 24 bytes kblockd_mod_delayed_work_on at 153e2a0 xen-troops#16 [9a06a3a0] 288 bytes __blk_mq_delay_run_hw_queue at 158168c xen-troops#17 [9a06a4c0] 192 bytes blk_mq_run_hw_queue at 1581a3c xen-troops#18 [9a06a580] 184 bytes blk_mq_sched_insert_requests at 15a2192 xen-troops#19 [9a06a638] 1024 bytes blk_mq_flush_plug_list at 1590f3a xen-troops#20 [9a06aa38] 704 bytes blk_flush_plug_list at 1555028 xen-troops#21 [9a06acf8] 320 bytes schedule at 219e476 xen-troops#22 [9a06ae38] 760 bytes schedule_timeout at 21b0aac xen-troops#23 [9a06b130] 408 bytes wait_for_common at 21a1706 xen-troops#24 [9a06b2c8] 360 bytes xfs_buf_iowait at fa1540 xen-troops#25 [9a06b430] 256 bytes __xfs_buf_submit at fadae6 xen-troops#26 [9a06b530] 264 bytes xfs_buf_read_map at fae3f6 xen-troops#27 [9a06b638] 656 bytes xfs_trans_read_buf_map at 10ac9a8 xen-troops#28 [9a06b8c8] 304 bytes xfs_btree_kill_root at e72426 xen-troops#29 [9a06b9f8] 288 bytes xfs_btree_lookup_get_block at e7bc5e xen-troops#30 [9a06bb18] 624 bytes xfs_btree_lookup at e7e1a6 xen-troops#31 [9a06bd88] 2664 bytes xfs_alloc_ag_vextent_near at dfa070 xen-troops#32 [9a06c7f0] 144 bytes xfs_alloc_ag_vextent at dff3ca xen-troops#33 [9a06c880] 1128 bytes xfs_alloc_vextent at e05fce xen-troops#34 [9a06cce8] 584 bytes xfs_bmap_btalloc at e58342 xen-troops#35 [9a06cf30] 1336 bytes xfs_bmapi_write at e618de xen-troops#36 [9a06d468] 776 bytes xfs_iomap_write_allocate at ff678e xen-troops#37 [9a06d770] 720 bytes xfs_map_blocks at f82af8 xen-troops#38 [9a06da40] 928 bytes xfs_writepage_map at f83cd6 xen-troops#39 [9a06dde0] 320 bytes xfs_do_writepage at f85872 xen-troops#40 [9a06df20] 1320 bytes write_cache_pages at 73dfe8 xen-troops#41 [9a06e448] 208 bytes xfs_vm_writepages at f7f892 xen-troops#42 [9a06e518] 88 bytes do_writepages at 73fe6a xen-troops#43 [9a06e570] 872 bytes __writeback_single_inode at a20cb6 xen-troops#44 [9a06e8d8] 664 bytes writeback_sb_inodes at a23be2 xen-troops#45 [9a06eb70] 296 bytes __writeback_inodes_wb at a242e0 xen-troops#46 [9a06ec98] 928 bytes wb_writeback at a2500e xen-troops#47 [9a06f038] 848 bytes wb_do_writeback at a260ae xen-troops#48 [9a06f388] 536 bytes wb_workfn at a28228 xen-troops#49 [9a06f5a0] 1088 bytes process_one_work at 24a234 xen-troops#50 [9a06f9e0] 1120 bytes worker_thread at 24ba26 xen-troops#51 [9a06fe40] 104 bytes kthread at 26545a xen-troops#52 [9a06fea8] kernel_thread_starter at 21b6b62 To be able to increase the stack size to 64k reuse LLILL instruction in __switch_to function to load 64k - STACK_FRAME_OVERHEAD - __PT_SIZE (65192) value as unsigned. Reported-by: Benjamin Block <[email protected]> Reviewed-by: Heiko Carstens <[email protected]> Signed-off-by: Vasily Gorbik <[email protected]> Signed-off-by: Martin Schwidefsky <[email protected]>

We get the following warning: [ 47.926140] 32-bit node address hash set to 2010a0a [ 47.927202] [ 47.927433] ================================ [ 47.928050] WARNING: inconsistent lock state [ 47.928661] 4.19.0+ xen-troops#37 Tainted: G E [ 47.929346] -------------------------------- [ 47.929954] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. [ 47.930116] swapper/3/0 [HC0[0]:SC1[3]:HE1:SE0] takes: [ 47.930116] 00000000af8bc31e (&(&ht->lock)->rlock){+.?.}, at: rhashtable_walk_enter+0x36/0xb0 [ 47.930116] {SOFTIRQ-ON-W} state was registered at: [ 47.930116] _raw_spin_lock+0x29/0x60 [ 47.930116] rht_deferred_worker+0x556/0x810 [ 47.930116] process_one_work+0x1f5/0x540 [ 47.930116] worker_thread+0x64/0x3e0 [ 47.930116] kthread+0x112/0x150 [ 47.930116] ret_from_fork+0x3a/0x50 [ 47.930116] irq event stamp: 14044 [ 47.930116] hardirqs last enabled at (14044): [<ffffffff9a07fbba>] __local_bh_enable_ip+0x7a/0xf0 [ 47.938117] hardirqs last disabled at (14043): [<ffffffff9a07fb81>] __local_bh_enable_ip+0x41/0xf0 [ 47.938117] softirqs last enabled at (14028): [<ffffffff9a0803ee>] irq_enter+0x5e/0x60 [ 47.938117] softirqs last disabled at (14029): [<ffffffff9a0804a5>] irq_exit+0xb5/0xc0 [ 47.938117] [ 47.938117] other info that might help us debug this: [ 47.938117] Possible unsafe locking scenario: [ 47.938117] [ 47.938117] CPU0 [ 47.938117] ---- [ 47.938117] lock(&(&ht->lock)->rlock); [ 47.938117] <Interrupt> [ 47.938117] lock(&(&ht->lock)->rlock); [ 47.938117] [ 47.938117] *** DEADLOCK *** [ 47.938117] [ 47.938117] 2 locks held by swapper/3/0: [ 47.938117] #0: 0000000062c64f90 ((&d->timer)){+.-.}, at: call_timer_fn+0x5/0x280 [ 47.938117] #1: 00000000ee39619c (&(&d->lock)->rlock){+.-.}, at: tipc_disc_timeout+0xc8/0x540 [tipc] [ 47.938117] [ 47.938117] stack backtrace: [ 47.938117] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G E 4.19.0+ xen-troops#37 [ 47.938117] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 47.938117] Call Trace: [ 47.938117] <IRQ> [ 47.938117] dump_stack+0x5e/0x8b [ 47.938117] print_usage_bug+0x1ed/0x1ff [ 47.938117] mark_lock+0x5b5/0x630 [ 47.938117] __lock_acquire+0x4c0/0x18f0 [ 47.938117] ? lock_acquire+0xa6/0x180 [ 47.938117] lock_acquire+0xa6/0x180 [ 47.938117] ? rhashtable_walk_enter+0x36/0xb0 [ 47.938117] _raw_spin_lock+0x29/0x60 [ 47.938117] ? rhashtable_walk_enter+0x36/0xb0 [ 47.938117] rhashtable_walk_enter+0x36/0xb0 [ 47.938117] tipc_sk_reinit+0xb0/0x410 [tipc] [ 47.938117] ? mark_held_locks+0x6f/0x90 [ 47.938117] ? __local_bh_enable_ip+0x7a/0xf0 [ 47.938117] ? lockdep_hardirqs_on+0x20/0x1a0 [ 47.938117] tipc_net_finalize+0xbf/0x180 [tipc] [ 47.938117] tipc_disc_timeout+0x509/0x540 [tipc] [ 47.938117] ? call_timer_fn+0x5/0x280 [ 47.938117] ? tipc_disc_msg_xmit.isra.19+0xa0/0xa0 [tipc] [ 47.938117] ? tipc_disc_msg_xmit.isra.19+0xa0/0xa0 [tipc] [ 47.938117] call_timer_fn+0xa1/0x280 [ 47.938117] ? tipc_disc_msg_xmit.isra.19+0xa0/0xa0 [tipc] [ 47.938117] run_timer_softirq+0x1f2/0x4d0 [ 47.938117] __do_softirq+0xfc/0x413 [ 47.938117] irq_exit+0xb5/0xc0 [ 47.938117] smp_apic_timer_interrupt+0xac/0x210 [ 47.938117] apic_timer_interrupt+0xf/0x20 [ 47.938117] </IRQ> [ 47.938117] RIP: 0010:default_idle+0x1c/0x140 [ 47.938117] Code: 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 41 54 55 53 65 8b 2d d8 2b 74 65 0f 1f 44 00 00 e8 c6 2c 8b ff fb f4 <65> 8b 2d c5 2b 74 65 0f 1f 44 00 00 5b 5d 41 5c c3 65 8b 05 b4 2b [ 47.938117] RSP: 0018:ffffaf6ac0207ec8 EFLAGS: 00000206 ORIG_RAX: ffffffffffffff13 [ 47.938117] RAX: ffff8f5b3735e200 RBX: 0000000000000003 RCX: 0000000000000001 [ 47.938117] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff8f5b3735e200 [ 47.938117] RBP: 0000000000000003 R08: 0000000000000001 R09: 0000000000000000 [ 47.938117] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 [ 47.938117] R13: 0000000000000000 R14: ffff8f5b3735e200 R15: ffff8f5b3735e200 [ 47.938117] ? default_idle+0x1a/0x140 [ 47.938117] do_idle+0x1bc/0x280 [ 47.938117] cpu_startup_entry+0x19/0x20 [ 47.938117] start_secondary+0x187/0x1c0 [ 47.938117] secondary_startup_64+0xa4/0xb0 The reason seems to be that tipc_net_finalize()->tipc_sk_reinit() is calling the function rhashtable_walk_enter() within a timer interrupt. We fix this by executing tipc_net_finalize() in work queue context. Acked-by: Ying Xue <[email protected]> Signed-off-by: Jon Maloy <[email protected]> Signed-off-by: David S. Miller <[email protected]>

Similarly to commit 276bdb8 ("dccp: check ccid before dereferencing") it is wise to test for a NULL ccid. kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] PREEMPT SMP KASAN CPU: 1 PID: 16 Comm: ksoftirqd/1 Not tainted 5.0.0-rc3+ xen-troops#37 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:ccid_hc_tx_parse_options net/dccp/ccid.h:205 [inline] RIP: 0010:dccp_parse_options+0x8d9/0x12b0 net/dccp/options.c:233 Code: c5 0f b6 75 b3 80 38 00 0f 85 d6 08 00 00 48 b9 00 00 00 00 00 fc ff df 48 8b 45 b8 4c 8b b8 f8 07 00 00 4c 89 f8 48 c1 e8 03 <80> 3c 08 00 0f 85 95 08 00 00 48 b8 00 00 00 00 00 fc ff df 4d 8b kobject: 'loop5' (0000000080f78fc1): kobject_uevent_env RSP: 0018:ffff8880a94df0b8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff8880858ac723 RCX: dffffc0000000000 RDX: 0000000000000100 RSI: 0000000000000007 RDI: 0000000000000001 RBP: ffff8880a94df140 R08: 0000000000000001 R09: ffff888061b83a80 R10: ffffed100c370752 R11: ffff888061b83a97 R12: 0000000000000026 R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff8880ae700000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f0defa33518 CR3: 000000008db5e000 CR4: 00000000001406e0 kobject: 'loop5' (0000000080f78fc1): fill_kobj_path: path = '/devices/virtual/block/loop5' DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: dccp_rcv_state_process+0x2b6/0x1af6 net/dccp/input.c:654 dccp_v4_do_rcv+0x100/0x190 net/dccp/ipv4.c:688 sk_backlog_rcv include/net/sock.h:936 [inline] __sk_receive_skb+0x3a9/0xea0 net/core/sock.c:473 dccp_v4_rcv+0x10cb/0x1f80 net/dccp/ipv4.c:880 ip_protocol_deliver_rcu+0xb6/0xa20 net/ipv4/ip_input.c:208 ip_local_deliver_finish+0x23b/0x390 net/ipv4/ip_input.c:234 NF_HOOK include/linux/netfilter.h:289 [inline] NF_HOOK include/linux/netfilter.h:283 [inline] ip_local_deliver+0x1f0/0x740 net/ipv4/ip_input.c:255 dst_input include/net/dst.h:450 [inline] ip_rcv_finish+0x1f4/0x2f0 net/ipv4/ip_input.c:414 NF_HOOK include/linux/netfilter.h:289 [inline] NF_HOOK include/linux/netfilter.h:283 [inline] ip_rcv+0xed/0x620 net/ipv4/ip_input.c:524 __netif_receive_skb_one_core+0x160/0x210 net/core/dev.c:4973 __netif_receive_skb+0x2c/0x1c0 net/core/dev.c:5083 process_backlog+0x206/0x750 net/core/dev.c:5923 napi_poll net/core/dev.c:6346 [inline] net_rx_action+0x76d/0x1930 net/core/dev.c:6412 __do_softirq+0x30b/0xb11 kernel/softirq.c:292 run_ksoftirqd kernel/softirq.c:654 [inline] run_ksoftirqd+0x8e/0x110 kernel/softirq.c:646 smpboot_thread_fn+0x6ab/0xa10 kernel/smpboot.c:164 kthread+0x357/0x430 kernel/kthread.c:246 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352 Modules linked in: ---[ end trace 58a0ba03bea2c376 ]--- RIP: 0010:ccid_hc_tx_parse_options net/dccp/ccid.h:205 [inline] RIP: 0010:dccp_parse_options+0x8d9/0x12b0 net/dccp/options.c:233 Code: c5 0f b6 75 b3 80 38 00 0f 85 d6 08 00 00 48 b9 00 00 00 00 00 fc ff df 48 8b 45 b8 4c 8b b8 f8 07 00 00 4c 89 f8 48 c1 e8 03 <80> 3c 08 00 0f 85 95 08 00 00 48 b8 00 00 00 00 00 fc ff df 4d 8b RSP: 0018:ffff8880a94df0b8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff8880858ac723 RCX: dffffc0000000000 RDX: 0000000000000100 RSI: 0000000000000007 RDI: 0000000000000001 RBP: ffff8880a94df140 R08: 0000000000000001 R09: ffff888061b83a80 R10: ffffed100c370752 R11: ffff888061b83a97 R12: 0000000000000026 R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff8880ae700000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f0defa33518 CR3: 0000000009871000 CR4: 00000000001406e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Signed-off-by: Eric Dumazet <[email protected]> Reported-by: syzbot <[email protected]> Cc: Gerrit Renker <[email protected]> Signed-off-by: David S. Miller <[email protected]>

We received a report[1] of kernel crashes when Cilium is used in XDP mode with virtio_net after updating to newer kernels. After investigating the reason it turned out that when using mergeable bufs with an XDP program which adjusts xdp.data or xdp.data_meta page_to_buf() calculates the build_skb address wrong because the offset can become less than the headroom so it gets the address of the previous page (-X bytes depending on how lower offset is): page_to_skb: page addr ffff9eb2923e2000 buf ffff9eb2923e1ffc offset 252 headroom 256 This is a pr_err() I added in the beginning of page_to_skb which clearly shows offset that is less than headroom by adding 4 bytes of metadata via an xdp prog. The calculations done are: receive_mergeable(): headroom = VIRTIO_XDP_HEADROOM; // VIRTIO_XDP_HEADROOM == 256 bytes offset = xdp.data - page_address(xdp_page) - vi->hdr_len - metasize; page_to_skb(): p = page_address(page) + offset; ... buf = p - headroom; Now buf goes -4 bytes from the page's starting address as can be seen above which is set as skb->head and skb->data by build_skb later. Depending on what's done with the skb (when it's freed most often) we get all kinds of corruptions and BUG_ON() triggers in mm[2]. We have to recalculate the new headroom after the xdp program has run, similar to how offset and len are recalculated. Headroom is directly related to data_hard_start, data and data_meta, so we use them to get the new size. The result is correct (similar pr_err() in page_to_skb, one case of xdp_page and one case of virtnet buf): a) Case with 4 bytes of metadata [ 115.949641] page_to_skb: page addr ffff8b4dcfad2000 offset 252 headroom 252 [ 121.084105] page_to_skb: page addr ffff8b4dcf018000 offset 20732 headroom 252 b) Case of pushing data +32 bytes [ 153.181401] page_to_skb: page addr ffff8b4dd0c4d000 offset 288 headroom 288 [ 158.480421] page_to_skb: page addr ffff8b4dd00b0000 offset 24864 headroom 288 c) Case of pushing data -33 bytes [ 835.906830] page_to_skb: page addr ffff8b4dd3270000 offset 223 headroom 223 [ 840.839910] page_to_skb: page addr ffff8b4dcdd68000 offset 12511 headroom 223 Offset and headroom are equal because offset points to the start of reserved bytes for the virtio_net header which are at buf start + headroom, while data points at buf start + vnet hdr size + headroom so when data or data_meta are adjusted by the xdp prog both the headroom size and the offset change equally. We can use data_hard_start to compute the new headroom after the xdp prog (linearized / page start case, the virtnet buf case is similar just with bigger base offset): xdp.data_hard_start = page_address + vnet_hdr xdp.data = page_address + vnet_hdr + headroom new headroom after xdp prog = xdp.data - xdp.data_hard_start - metasize An example reproducer xdp prog[3] is below. [1] cilium/cilium#19453 [2] Two of the many traces: [ 40.437400] BUG: Bad page state in process swapper/0 pfn:14940 [ 40.916726] BUG: Bad page state in process systemd-resolve pfn:053b7 [ 41.300891] kernel BUG at include/linux/mm.h:720! [ 41.301801] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI [ 41.302784] CPU: 1 PID: 1181 Comm: kubelet Kdump: loaded Tainted: G B W 5.18.0-rc1+ xen-troops#37 [ 41.304458] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1.fc35 04/01/2014 [ 41.306018] RIP: 0010:page_frag_free+0x79/0xe0 [ 41.306836] Code: 00 00 75 ea 48 8b 07 a9 00 00 01 00 74 e0 48 8b 47 48 48 8d 50 ff a8 01 48 0f 45 fa eb d0 48 c7 c6 18 b8 30 a6 e8 d7 f8 fc ff <0f> 0b 48 8d 78 ff eb bc 48 8b 07 a9 00 00 01 00 74 3a 66 90 0f b6 [ 41.310235] RSP: 0018:ffffac05c2a6bc78 EFLAGS: 00010292 [ 41.311201] RAX: 000000000000003e RBX: 0000000000000000 RCX: 0000000000000000 [ 41.312502] RDX: 0000000000000001 RSI: ffffffffa6423004 RDI: 00000000ffffffff [ 41.313794] RBP: ffff993c98823600 R08: 0000000000000000 R09: 00000000ffffdfff [ 41.315089] R10: ffffac05c2a6ba68 R11: ffffffffa698ca28 R12: ffff993c98823600 [ 41.316398] R13: ffff993c86311ebc R14: 0000000000000000 R15: 000000000000005c [ 41.317700] FS: 00007fe13fc56740(0000) GS:ffff993cdd900000(0000) knlGS:0000000000000000 [ 41.319150] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 41.320152] CR2: 000000c00008a000 CR3: 0000000014908000 CR4: 0000000000350ee0 [ 41.321387] Call Trace: [ 41.321819] <TASK> [ 41.322193] skb_release_data+0x13f/0x1c0 [ 41.322902] __kfree_skb+0x20/0x30 [ 41.343870] tcp_recvmsg_locked+0x671/0x880 [ 41.363764] tcp_recvmsg+0x5e/0x1c0 [ 41.384102] inet_recvmsg+0x42/0x100 [ 41.406783] ? sock_recvmsg+0x1d/0x70 [ 41.428201] sock_read_iter+0x84/0xd0 [ 41.445592] ? 0xffffffffa3000000 [ 41.462442] new_sync_read+0x148/0x160 [ 41.479314] ? 0xffffffffa3000000 [ 41.496937] vfs_read+0x138/0x190 [ 41.517198] ksys_read+0x87/0xc0 [ 41.535336] do_syscall_64+0x3b/0x90 [ 41.551637] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 41.568050] RIP: 0033:0x48765b [ 41.583955] Code: e8 4a 35 fe ff eb 88 cc cc cc cc cc cc cc cc e8 fb 7a fe ff 48 8b 7c 24 10 48 8b 74 24 18 48 8b 54 24 20 48 8b 44 24 08 0f 05 <48> 3d 01 f0 ff ff 76 20 48 c7 44 24 28 ff ff ff ff 48 c7 44 24 30 [ 41.632818] RSP: 002b:000000c000a2f5b8 EFLAGS: 00000212 ORIG_RAX: 0000000000000000 [ 41.664588] RAX: ffffffffffffffda RBX: 000000c000062000 RCX: 000000000048765b [ 41.681205] RDX: 0000000000005e54 RSI: 000000c000e66000 RDI: 0000000000000016 [ 41.697164] RBP: 000000c000a2f608 R08: 0000000000000001 R09: 00000000000001b4 [ 41.713034] R10: 00000000000000b6 R11: 0000000000000212 R12: 00000000000000e9 [ 41.728755] R13: 0000000000000001 R14: 000000c000a92000 R15: ffffffffffffffff [ 41.744254] </TASK> [ 41.758585] Modules linked in: br_netfilter bridge veth netconsole virtio_net and [ 33.524802] BUG: Bad page state in process systemd-network pfn:11e60 [ 33.528617] page ffffe05dc0147b00 ffffe05dc04e7a00 ffff8ae9851ec000 (1) len 82 offset 252 metasize 4 hroom 0 hdr_len 12 data ffff8ae9851ec10c data_meta ffff8ae9851ec108 data_end ffff8ae9851ec14e [ 33.529764] page:000000003792b5ba refcount:0 mapcount:-512 mapping:0000000000000000 index:0x0 pfn:0x11e60 [ 33.532463] flags: 0xfffffc0000000(node=0|zone=1|lastcpupid=0x1fffff) [ 33.532468] raw: 000fffffc0000000 0000000000000000 dead000000000122 0000000000000000 [ 33.532470] raw: 0000000000000000 0000000000000000 00000000fffffdff 0000000000000000 [ 33.532471] page dumped because: nonzero mapcount [ 33.532472] Modules linked in: br_netfilter bridge veth netconsole virtio_net [ 33.532479] CPU: 0 PID: 791 Comm: systemd-network Kdump: loaded Not tainted 5.18.0-rc1+ xen-troops#37 [ 33.532482] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1.fc35 04/01/2014 [ 33.532484] Call Trace: [ 33.532496] <TASK> [ 33.532500] dump_stack_lvl+0x45/0x5a [ 33.532506] bad_page.cold+0x63/0x94 [ 33.532510] free_pcp_prepare+0x290/0x420 [ 33.532515] free_unref_page+0x1b/0x100 [ 33.532518] skb_release_data+0x13f/0x1c0 [ 33.532524] kfree_skb_reason+0x3e/0xc0 [ 33.532527] ip6_mc_input+0x23c/0x2b0 [ 33.532531] ip6_sublist_rcv_finish+0x83/0x90 [ 33.532534] ip6_sublist_rcv+0x22b/0x2b0 [3] XDP program to reproduce(xdp_pass.c): #include <linux/bpf.h> #include <bpf/bpf_helpers.h> SEC("xdp_pass") int xdp_pkt_pass(struct xdp_md *ctx) { bpf_xdp_adjust_head(ctx, -(int)32); return XDP_PASS; } char _license[] SEC("license") = "GPL"; compile: clang -O2 -g -Wall -target bpf -c xdp_pass.c -o xdp_pass.o load on virtio_net: ip link set enp1s0 xdpdrv obj xdp_pass.o sec xdp_pass CC: [email protected] CC: Jason Wang <[email protected]> CC: Xuan Zhuo <[email protected]> CC: Daniel Borkmann <[email protected]> CC: "Michael S. Tsirkin" <[email protected]> CC: [email protected] Fixes: 8fb7da9 ("virtio_net: get build_skb() buf by data ptr") Signed-off-by: Nikolay Aleksandrov <[email protected]> Reviewed-by: Xuan Zhuo <[email protected]> Acked-by: Daniel Borkmann <[email protected]> Acked-by: Michael S. Tsirkin <[email protected]> Acked-by: Jason Wang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>

This change fixes the following kernel NULL pointer dereference which is reproduced by blktests srp/007 occasionally. BUG: kernel NULL pointer dereference, address: 0000000000000170 PGD 0 P4D 0 Oops: 0002 [#1] PREEMPT SMP NOPTI CPU: 0 PID: 9 Comm: kworker/0:1H Kdump: loaded Not tainted 6.0.0-rc1+ xen-troops#37 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.15.0-29-g6a62e0cb0dfe-prebuilt.qemu.org 04/01/2014 Workqueue: 0x0 (kblockd) RIP: 0010:srp_recv_done+0x176/0x500 [ib_srp] Code: 00 4d 85 ff 0f 84 52 02 00 00 48 c7 82 80 02 00 00 00 00 00 00 4c 89 df 4c 89 14 24 e8 53 d3 4a f6 4c 8b 14 24 41 0f b6 42 13 <41> 89 87 70 01 00 00 41 0f b6 52 12 f6 c2 02 74 44 41 8b 42 1c b9 RSP: 0018:ffffaef7c0003e28 EFLAGS: 00000282 RAX: 0000000000000000 RBX: ffff9bc9486dea60 RCX: 0000000000000000 RDX: 0000000000000102 RSI: ffffffffb76bbd0e RDI: 00000000ffffffff RBP: ffff9bc980099a00 R08: 0000000000000001 R09: 0000000000000001 R10: ffff9bca53ef0000 R11: ffff9bc980099a10 R12: ffff9bc956e14000 R13: ffff9bc9836b9cb0 R14: ffff9bc9557b4480 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff9bc97ec00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000170 CR3: 0000000007e04000 CR4: 00000000000006f0 Call Trace: <IRQ> __ib_process_cq+0xb7/0x280 [ib_core] ib_poll_handler+0x2b/0x130 [ib_core] irq_poll_softirq+0x93/0x150 __do_softirq+0xee/0x4b8 irq_exit_rcu+0xf7/0x130 sysvec_apic_timer_interrupt+0x8e/0xc0 </IRQ> Fixes: ad215aa ("RDMA/srp: Make struct scsi_cmnd and struct srp_request adjacent") Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Xiao Yang <[email protected]> Acked-by: Bart Van Assche <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]>

Oleksandr Andrushchenko and others added 8 commits April 23, 2018 16:38

andr2000 requested review from aanisov and iartemenko April 23, 2018 13:51

aanisov removed their request for review April 23, 2018 14:01

aanisov force-pushed the master branch from a95d91d to a5fe5a5 Compare April 24, 2018 15:01

andr2000 changed the base branch from master to yocto-3.7-migration April 25, 2018 10:12

iartemenko merged commit 3df9604 into xen-troops:yocto-3.7-migration Apr 25, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add zero-copy driver #37

Add zero-copy driver #37

andr2000 commented Apr 23, 2018

Add zero-copy driver #37

Add zero-copy driver #37

Conversation

andr2000 commented Apr 23, 2018