Forum | Documentation | Website | Blog

Skip to content
Snippets Groups Projects
  1. Nov 25, 2022
    • Greg Kroah-Hartman's avatar
    • Greg Kroah-Hartman's avatar
      Revert "net: broadcom: Fix BCMGENET Kconfig" · 7be134eb
      Greg Kroah-Hartman authored
      
      This reverts commit fbb4e8e6 which is
      commit 8d820bc9 upstream.
      
      It causes runtime failures as reported by Naresh and Arnd writes:
      
      	Greg, please just revert fbb4e8e6 ("net: broadcom: Fix BCMGENET Kconfig")
      	in stable/linux-5.10.y: it depends on e5f31552 ("ethernet: fix
      	PTP_1588_CLOCK dependencies"), which we probably don't want backported
      	from 5.15 to 5.10.
      
      So it should be reverted.
      
      Reported-by: default avatarNaresh Kamboju <naresh.kamboju@linaro.org>
      Reported-by: default avatarArnd Bergmann <arnd@arndb.de>
      Link: https://lore.kernel.org/r/CA+G9fYsXomPXcecPDzDydO3=i2qHDM2RTtGxr0p2YOS6=YcWng@mail.gmail.com
      
      
      Cc: YueHaibing <yuehaibing@huawei.com>
      Cc: Florian Fainelli <f.fainelli@broadcom.com>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: Sasha Levin <sashal@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7be134eb
    • Hawkins Jiawei's avatar
      ntfs: check overflow when iterating ATTR_RECORDs · 957732a0
      Hawkins Jiawei authored
      commit 63095f4f upstream.
      
      Kernel iterates over ATTR_RECORDs in mft record in ntfs_attr_find().
      Because the ATTR_RECORDs are next to each other, kernel can get the next
      ATTR_RECORD from end address of current ATTR_RECORD, through current
      ATTR_RECORD length field.
      
      The problem is that during iteration, when kernel calculates the end
      address of current ATTR_RECORD, kernel may trigger an integer overflow bug
      in executing `a = (ATTR_RECORD*)((u8*)a + le32_to_cpu(a->length))`.  This
      may wrap, leading to a forever iteration on 32bit systems.
      
      This patch solves it by adding some checks on calculating end address
      of current ATTR_RECORD during iteration.
      
      Link: https://lkml.kernel.org/r/20220831160935.3409-4-yin31149@gmail.com
      Link: https://lore.kernel.org/all/20220827105842.GM2030@kadam/
      
      
      Signed-off-by: default avatarHawkins Jiawei <yin31149@gmail.com>
      Suggested-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Cc: Anton Altaparmakov <anton@tuxera.com>
      Cc: chenxiaosong (A) <chenxiaosong2@huawei.com>
      Cc: syzkaller-bugs <syzkaller-bugs@googlegroups.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      957732a0
    • Hawkins Jiawei's avatar
      ntfs: fix out-of-bounds read in ntfs_attr_find() · 6322dda4
      Hawkins Jiawei authored
      commit 36a4d82d upstream.
      
      Kernel iterates over ATTR_RECORDs in mft record in ntfs_attr_find().  To
      ensure access on these ATTR_RECORDs are within bounds, kernel will do some
      checking during iteration.
      
      The problem is that during checking whether ATTR_RECORD's name is within
      bounds, kernel will dereferences the ATTR_RECORD name_offset field, before
      checking this ATTR_RECORD strcture is within bounds.  This problem may
      result out-of-bounds read in ntfs_attr_find(), reported by Syzkaller:
      
      ==================================================================
      BUG: KASAN: use-after-free in ntfs_attr_find+0xc02/0xce0 fs/ntfs/attrib.c:597
      Read of size 2 at addr ffff88807e352009 by task syz-executor153/3607
      
      [...]
      Call Trace:
       <TASK>
       __dump_stack lib/dump_stack.c:88 [inline]
       dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
       print_address_description mm/kasan/report.c:317 [inline]
       print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
       kasan_report+0xb1/0x1e0 mm/kasan/report.c:495
       ntfs_attr_find+0xc02/0xce0 fs/ntfs/attrib.c:597
       ntfs_attr_lookup+0x1056/0x2070 fs/ntfs/attrib.c:1193
       ntfs_read_inode_mount+0x89a/0x2580 fs/ntfs/inode.c:1845
       ntfs_fill_super+0x1799/0x9320 fs/ntfs/super.c:2854
       mount_bdev+0x34d/0x410 fs/super.c:1400
       legacy_get_tree+0x105/0x220 fs/fs_context.c:610
       vfs_get_tree+0x89/0x2f0 fs/super.c:1530
       do_new_mount fs/namespace.c:3040 [inline]
       path_mount+0x1326/0x1e20 fs/namespace.c:3370
       do_mount fs/namespace.c:3383 [inline]
       __do_sys_mount fs/namespace.c:3591 [inline]
       __se_sys_mount fs/namespace.c:3568 [inline]
       __x64_sys_mount+0x27f/0x300 fs/namespace.c:3568
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x63/0xcd
       [...]
       </TASK>
      
      The buggy address belongs to the physical page:
      page:ffffea0001f8d400 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x7e350
      head:ffffea0001f8d400 order:3 compound_mapcount:0 compound_pincount:0
      flags: 0xfff00000010200(slab|head|node=0|zone=1|lastcpupid=0x7ff)
      raw: 00fff00000010200 0000000000000000 dead000000000122 ffff888011842140
      raw: 0000000000000000 0000000000040004 00000001ffffffff 0000000000000000
      page dumped because: kasan: bad access detected
      Memory state around the buggy address:
       ffff88807e351f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
       ffff88807e351f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      >ffff88807e352000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                            ^
       ffff88807e352080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ffff88807e352100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      ==================================================================
      
      This patch solves it by moving the ATTR_RECORD strcture's bounds checking
      earlier, then checking whether ATTR_RECORD's name is within bounds.
      What's more, this patch also add some comments to improve its
      maintainability.
      
      Link: https://lkml.kernel.org/r/20220831160935.3409-3-yin31149@gmail.com
      Link: https://lore.kernel.org/all/1636796c-c85e-7f47-e96f-e074fee3c7d3@huawei.com/
      Link: https://groups.google.com/g/syzkaller-bugs/c/t_XdeKPGTR4/m/LECAuIGcBgAJ
      
      
      Signed-off-by: default avatarchenxiaosong (A) <chenxiaosong2@huawei.com>
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarHawkins Jiawei <yin31149@gmail.com>
      Reported-by: default avatar <syzbot+5f8dcabe4a3b2c51c607@syzkaller.appspotmail.com>
      Tested-by: default avatar <syzbot+5f8dcabe4a3b2c51c607@syzkaller.appspotmail.com>
      Cc: Anton Altaparmakov <anton@tuxera.com>
      Cc: syzkaller-bugs <syzkaller-bugs@googlegroups.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6322dda4
    • Hawkins Jiawei's avatar
      ntfs: fix use-after-free in ntfs_attr_find() · b825bfbb
      Hawkins Jiawei authored
      commit d85a1bec upstream.
      
      Patch series "ntfs: fix bugs about Attribute", v2.
      
      This patchset fixes three bugs relative to Attribute in record:
      
      Patch 1 adds a sanity check to ensure that, attrs_offset field in first
      mft record loading from disk is within bounds.
      
      Patch 2 moves the ATTR_RECORD's bounds checking earlier, to avoid
      dereferencing ATTR_RECORD before checking this ATTR_RECORD is within
      bounds.
      
      Patch 3 adds an overflow checking to avoid possible forever loop in
      ntfs_attr_find().
      
      Without patch 1 and patch 2, the kernel triggersa KASAN use-after-free
      detection as reported by Syzkaller.
      
      Although one of patch 1 or patch 2 can fix this, we still need both of
      them.  Because patch 1 fixes the root cause, and patch 2 not only fixes
      the direct cause, but also fixes the potential out-of-bounds bug.
      
      
      This patch (of 3):
      
      Syzkaller reported use-after-free read as follows:
      ==================================================================
      BUG: KASAN: use-after-free in ntfs_attr_find+0xc02/0xce0 fs/ntfs/attrib.c:597
      Read of size 2 at addr ffff88807e352009 by task syz-executor153/3607
      
      [...]
      Call Trace:
       <TASK>
       __dump_stack lib/dump_stack.c:88 [inline]
       dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
       print_address_description mm/kasan/report.c:317 [inline]
       print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
       kasan_report+0xb1/0x1e0 mm/kasan/report.c:495
       ntfs_attr_find+0xc02/0xce0 fs/ntfs/attrib.c:597
       ntfs_attr_lookup+0x1056/0x2070 fs/ntfs/attrib.c:1193
       ntfs_read_inode_mount+0x89a/0x2580 fs/ntfs/inode.c:1845
       ntfs_fill_super+0x1799/0x9320 fs/ntfs/super.c:2854
       mount_bdev+0x34d/0x410 fs/super.c:1400
       legacy_get_tree+0x105/0x220 fs/fs_context.c:610
       vfs_get_tree+0x89/0x2f0 fs/super.c:1530
       do_new_mount fs/namespace.c:3040 [inline]
       path_mount+0x1326/0x1e20 fs/namespace.c:3370
       do_mount fs/namespace.c:3383 [inline]
       __do_sys_mount fs/namespace.c:3591 [inline]
       __se_sys_mount fs/namespace.c:3568 [inline]
       __x64_sys_mount+0x27f/0x300 fs/namespace.c:3568
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x63/0xcd
       [...]
       </TASK>
      
      The buggy address belongs to the physical page:
      page:ffffea0001f8d400 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x7e350
      head:ffffea0001f8d400 order:3 compound_mapcount:0 compound_pincount:0
      flags: 0xfff00000010200(slab|head|node=0|zone=1|lastcpupid=0x7ff)
      raw: 00fff00000010200 0000000000000000 dead000000000122 ffff888011842140
      raw: 0000000000000000 0000000000040004 00000001ffffffff 0000000000000000
      page dumped because: kasan: bad access detected
      Memory state around the buggy address:
       ffff88807e351f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
       ffff88807e351f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      >ffff88807e352000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                            ^
       ffff88807e352080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ffff88807e352100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      ==================================================================
      
      Kernel will loads $MFT/$DATA's first mft record in
      ntfs_read_inode_mount().
      
      Yet the problem is that after loading, kernel doesn't check whether
      attrs_offset field is a valid value.
      
      To be more specific, if attrs_offset field is larger than bytes_allocated
      field, then it may trigger the out-of-bounds read bug(reported as
      use-after-free bug) in ntfs_attr_find(), when kernel tries to access the
      corresponding mft record's attribute.
      
      This patch solves it by adding the sanity check between attrs_offset field
      and bytes_allocated field, after loading the first mft record.
      
      Link: https://lkml.kernel.org/r/20220831160935.3409-1-yin31149@gmail.com
      Link: https://lkml.kernel.org/r/20220831160935.3409-2-yin31149@gmail.com
      
      
      Signed-off-by: default avatarHawkins Jiawei <yin31149@gmail.com>
      Cc: Anton Altaparmakov <anton@tuxera.com>
      Cc: ChenXiaoSong <chenxiaosong2@huawei.com>
      Cc: syzkaller-bugs <syzkaller-bugs@googlegroups.com>
      Cc: Dan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b825bfbb
    • Alexander Potapenko's avatar
      mm: fs: initialize fsdata passed to write_begin/write_end interface · 294ef12d
      Alexander Potapenko authored
      commit 1468c6f4 upstream.
      
      Functions implementing the a_ops->write_end() interface accept the `void
      *fsdata` parameter that is supposed to be initialized by the corresponding
      a_ops->write_begin() (which accepts `void **fsdata`).
      
      However not all a_ops->write_begin() implementations initialize `fsdata`
      unconditionally, so it may get passed uninitialized to a_ops->write_end(),
      resulting in undefined behavior.
      
      Fix this by initializing fsdata with NULL before the call to
      write_begin(), rather than doing so in all possible a_ops implementations.
      
      This patch covers only the following cases found by running x86 KMSAN
      under syzkaller:
      
       - generic_perform_write()
       - cont_expand_zero() and generic_cont_expand_simple()
       - page_symlink()
      
      Other cases of passing uninitialized fsdata may persist in the codebase.
      
      Link: https://lkml.kernel.org/r/20220915150417.722975-43-glider@google.com
      
      
      Signed-off-by: default avatarAlexander Potapenko <glider@google.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andrey Konovalov <andreyknvl@gmail.com>
      Cc: Andrey Konovalov <andreyknvl@google.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Eric Biggers <ebiggers@google.com>
      Cc: Eric Biggers <ebiggers@kernel.org>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Ilya Leoshkevich <iii@linux.ibm.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Marco Elver <elver@google.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Vegard Nossum <vegard.nossum@oracle.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      294ef12d
    • Tetsuo Handa's avatar
      9p/trans_fd: always use O_NONBLOCK read/write · a8e2fc8f
      Tetsuo Handa authored
      commit ef575281 upstream.
      
      syzbot is reporting hung task at p9_fd_close() [1], for p9_mux_poll_stop()
       from p9_conn_destroy() from p9_fd_close() is failing to interrupt already
      started kernel_read() from p9_fd_read() from p9_read_work() and/or
      kernel_write() from p9_fd_write() from p9_write_work() requests.
      
      Since p9_socket_open() sets O_NONBLOCK flag, p9_mux_poll_stop() does not
      need to interrupt kernel_read()/kernel_write(). However, since p9_fd_open()
      does not set O_NONBLOCK flag, but pipe blocks unless signal is pending,
      p9_mux_poll_stop() needs to interrupt kernel_read()/kernel_write() when
      the file descriptor refers to a pipe. In other words, pipe file descriptor
      needs to be handled as if socket file descriptor.
      
      We somehow need to interrupt kernel_read()/kernel_write() on pipes.
      
      A minimal change, which this patch is doing, is to set O_NONBLOCK flag
       from p9_fd_open(), for O_NONBLOCK flag does not affect reading/writing
      of regular files. But this approach changes O_NONBLOCK flag on userspace-
      supplied file descriptors (which might break userspace programs), and
      O_NONBLOCK flag could be changed by userspace. It would be possible to set
      O_NONBLOCK flag every time p9_fd_read()/p9_fd_write() is invoked, but still
      remains small race window for clearing O_NONBLOCK flag.
      
      If we don't want to manipulate O_NONBLOCK flag, we might be able to
      surround kernel_read()/kernel_write() with set_thread_flag(TIF_SIGPENDING)
      and recalc_sigpending(). Since p9_read_work()/p9_write_work() works are
      processed by kernel threads which process global system_wq workqueue,
      signals could not be delivered from remote threads when p9_mux_poll_stop()
       from p9_conn_destroy() from p9_fd_close() is called. Therefore, calling
      set_thread_flag(TIF_SIGPENDING)/recalc_sigpending() every time would be
      needed if we count on signals for making kernel_read()/kernel_write()
      non-blocking.
      
      Link: https://lkml.kernel.org/r/345de429-a88b-7097-d177-adecf9fed342@I-love.SAKURA.ne.jp
      Link: https://syzkaller.appspot.com/bug?extid=8b41a1365f1106fd0f33
      
       [1]
      Reported-by: default avatarsyzbot <syzbot+8b41a1365f1106fd0f33@syzkaller.appspotmail.com>
      Signed-off-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Tested-by: default avatarsyzbot <syzbot+8b41a1365f1106fd0f33@syzkaller.appspotmail.com>
      Reviewed-by: default avatarChristian Schoenebeck <linux_oss@crudebyte.com>
      [Dominique: add comment at Christian's suggestion]
      Signed-off-by: default avatarDominique Martinet <asmadeus@codewreck.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a8e2fc8f
    • Andreas Gruenbacher's avatar
      gfs2: Switch from strlcpy to strscpy · a5da76df
      Andreas Gruenbacher authored
      
      commit 204c0300 upstream.
      
      Switch from strlcpy to strscpy and make sure that @count is the size of
      the smaller of the source and destination buffers.  This prevents
      reading beyond the end of the source buffer when the source string isn't
      null terminated.
      
      Found by a modified version of syzkaller.
      
      Suggested-by: default avatarWolfram Sang <wsa+renesas@sang-engineering.com>
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a5da76df
    • Andrew Price's avatar
      gfs2: Check sb_bsize_shift after reading superblock · 5fa30be7
      Andrew Price authored
      
      commit 670f8ce5 upstream.
      
      Fuzzers like to scribble over sb_bsize_shift but in reality it's very
      unlikely that this field would be corrupted on its own. Nevertheless it
      should be checked to avoid the possibility of messy mount errors due to
      bad calculations. It's always a fixed value based on the block size so
      we can just check that it's the expected value.
      
      Tested with:
      
          mkfs.gfs2 -O -p lock_nolock /dev/vdb
          for i in 0 -1 64 65 32 33; do
              gfs2_edit -p sb field sb_bsize_shift $i /dev/vdb
              mount /dev/vdb /mnt/test && umount /mnt/test
          done
      
      Before this patch we get a withdraw after
      
      [   76.413681] gfs2: fsid=loop0.0: fatal: invalid metadata block
      [   76.413681]   bh = 19 (type: exp=5, found=4)
      [   76.413681]   function = gfs2_meta_buffer, file = fs/gfs2/meta_io.c, line = 492
      
      and with UBSAN configured we also get complaints like
      
      [   76.373395] UBSAN: shift-out-of-bounds in fs/gfs2/ops_fstype.c:295:19
      [   76.373815] shift exponent 4294967287 is too large for 64-bit type 'long unsigned int'
      
      After the patch, these complaints don't appear, mount fails immediately
      and we get an explanation in dmesg.
      
      Reported-by: default avatar <syzbot+dcf33a7aae997956fe06@syzkaller.appspotmail.com>
      Signed-off-by: default avatarAndrew Price <anprice@redhat.com>
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5fa30be7
    • Dominique Martinet's avatar
      9p: trans_fd/p9_conn_cancel: drop client lock earlier · f14858bc
      Dominique Martinet authored
      commit 52f1c45d upstream.
      
      syzbot reported a double-lock here and we no longer need this
      lock after requests have been moved off to local list:
      just drop the lock earlier.
      
      Link: https://lkml.kernel.org/r/20220904064028.1305220-1-asmadeus@codewreck.org
      
      
      Reported-by: default avatar <syzbot+50f7e8d06c3768dd97f3@syzkaller.appspotmail.com>
      Signed-off-by: default avatarDominique Martinet <asmadeus@codewreck.org>
      Tested-by: default avatarSchspa Shi <schspa@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f14858bc
    • Cong Wang's avatar
      kcm: close race conditions on sk_receive_queue · 4154b6af
      Cong Wang authored
      
      commit 5121197e upstream.
      
      sk->sk_receive_queue is protected by skb queue lock, but for KCM
      sockets its RX path takes mux->rx_lock to protect more than just
      skb queue. However, kcm_recvmsg() still only grabs the skb queue
      lock, so race conditions still exist.
      
      We can teach kcm_recvmsg() to grab mux->rx_lock too but this would
      introduce a potential performance regression as struct kcm_mux can
      be shared by multiple KCM sockets.
      
      So we have to enforce skb queue lock in requeue_rx_msgs() and handle
      skb peek case carefully in kcm_wait_data(). Fortunately,
      skb_recv_datagram() already handles it nicely and is widely used by
      other sockets, we can just switch to skb_recv_datagram() after
      getting rid of the unnecessary sock lock in kcm_recvmsg() and
      kcm_splice_read(). Side note: SOCK_DONE is not used by KCM sockets,
      so it is safe to get rid of this check too.
      
      I ran the original syzbot reproducer for 30 min without seeing any
      issue.
      
      Fixes: ab7ac4eb ("kcm: Kernel Connection Multiplexor module")
      Reported-by: default avatar <syzbot+278279efdd2730dd14bf@syzkaller.appspotmail.com>
      Reported-by: default avatarshaozhengchao <shaozhengchao@huawei.com>
      Cc: Paolo Abeni <pabeni@redhat.com>
      Cc: Tom Herbert <tom@herbertland.com>
      Signed-off-by: default avatarCong Wang <cong.wang@bytedance.com>
      Link: https://lore.kernel.org/r/20221114005119.597905-1-xiyou.wangcong@gmail.com
      
      
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4154b6af
    • Eric Dumazet's avatar
      kcm: avoid potential race in kcm_tx_work · 7deb7a9d
      Eric Dumazet authored
      
      commit ec7eede3 upstream.
      
      syzbot found that kcm_tx_work() could crash [1] in:
      
      	/* Primarily for SOCK_SEQPACKET sockets */
      	if (likely(sk->sk_socket) &&
      	    test_bit(SOCK_NOSPACE, &sk->sk_socket->flags)) {
      <<*>>	clear_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
      		sk->sk_write_space(sk);
      	}
      
      I think the reason is that another thread might concurrently
      run in kcm_release() and call sock_orphan(sk) while sk is not
      locked. kcm_tx_work() find sk->sk_socket being NULL.
      
      [1]
      BUG: KASAN: null-ptr-deref in instrument_atomic_write include/linux/instrumented.h:86 [inline]
      BUG: KASAN: null-ptr-deref in clear_bit include/asm-generic/bitops/instrumented-atomic.h:41 [inline]
      BUG: KASAN: null-ptr-deref in kcm_tx_work+0xff/0x160 net/kcm/kcmsock.c:742
      Write of size 8 at addr 0000000000000008 by task kworker/u4:3/53
      
      CPU: 0 PID: 53 Comm: kworker/u4:3 Not tainted 5.19.0-rc3-next-20220621-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Workqueue: kkcmd kcm_tx_work
      Call Trace:
      <TASK>
      __dump_stack lib/dump_stack.c:88 [inline]
      dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
      kasan_report+0xbe/0x1f0 mm/kasan/report.c:495
      check_region_inline mm/kasan/generic.c:183 [inline]
      kasan_check_range+0x13d/0x180 mm/kasan/generic.c:189
      instrument_atomic_write include/linux/instrumented.h:86 [inline]
      clear_bit include/asm-generic/bitops/instrumented-atomic.h:41 [inline]
      kcm_tx_work+0xff/0x160 net/kcm/kcmsock.c:742
      process_one_work+0x996/0x1610 kernel/workqueue.c:2289
      worker_thread+0x665/0x1080 kernel/workqueue.c:2436
      kthread+0x2e9/0x3a0 kernel/kthread.c:376
      ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:302
      </TASK>
      
      Fixes: ab7ac4eb ("kcm: Kernel Connection Multiplexor module")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Tom Herbert <tom@herbertland.com>
      Link: https://lore.kernel.org/r/20221012133412.519394-1-edumazet@google.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7deb7a9d
    • Eric Dumazet's avatar
      tcp: cdg: allow tcp_cdg_release() to be called multiple times · 35309be0
      Eric Dumazet authored
      
      commit 72e560cb upstream.
      
      Apparently, mptcp is able to call tcp_disconnect() on an already
      disconnected flow. This is generally fine, unless current congestion
      control is CDG, because it might trigger a double-free [1]
      
      Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
      more resilient.
      
      [1]
      BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
      BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567
      
      CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
      Workqueue: events mptcp_worker
      Call Trace:
      <TASK>
      __dump_stack lib/dump_stack.c:88 [inline]
      dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
      print_address_description mm/kasan/report.c:317 [inline]
      print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
      kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
      ____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
      kasan_slab_free include/linux/kasan.h:200 [inline]
      slab_free_hook mm/slub.c:1759 [inline]
      slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
      slab_free mm/slub.c:3539 [inline]
      kfree+0xe2/0x580 mm/slub.c:4567
      tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
      __mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
      mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
      mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
      process_one_work+0x991/0x1610 kernel/workqueue.c:2289
      worker_thread+0x665/0x1080 kernel/workqueue.c:2436
      kthread+0x2e4/0x3a0 kernel/kthread.c:376
      ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
      </TASK>
      
      Allocated by task 3671:
      kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
      kasan_set_track mm/kasan/common.c:45 [inline]
      set_alloc_info mm/kasan/common.c:437 [inline]
      ____kasan_kmalloc mm/kasan/common.c:516 [inline]
      ____kasan_kmalloc mm/kasan/common.c:475 [inline]
      __kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
      kmalloc_array include/linux/slab.h:640 [inline]
      kcalloc include/linux/slab.h:671 [inline]
      tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
      tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
      tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
      tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
      do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
      tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
      mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
      __sys_setsockopt+0x2d6/0x690 net/socket.c:2252
      __do_sys_setsockopt net/socket.c:2263 [inline]
      __se_sys_setsockopt net/socket.c:2260 [inline]
      __x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
      do_syscall_x64 arch/x86/entry/common.c:50 [inline]
      do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
      entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      Freed by task 16:
      kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
      kasan_set_track+0x21/0x30 mm/kasan/common.c:45
      kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
      ____kasan_slab_free mm/kasan/common.c:367 [inline]
      ____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
      kasan_slab_free include/linux/kasan.h:200 [inline]
      slab_free_hook mm/slub.c:1759 [inline]
      slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
      slab_free mm/slub.c:3539 [inline]
      kfree+0xe2/0x580 mm/slub.c:4567
      tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
      tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
      tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
      inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
      tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
      tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
      tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
      tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
      ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
      ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
      NF_HOOK include/linux/netfilter.h:302 [inline]
      NF_HOOK include/linux/netfilter.h:296 [inline]
      ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
      dst_input include/net/dst.h:455 [inline]
      ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
      ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
      ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
      nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
      nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
      nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
      NF_HOOK include/linux/netfilter.h:300 [inline]
      ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
      __netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
      __netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
      netif_receive_skb_internal net/core/dev.c:5685 [inline]
      netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
      NF_HOOK include/linux/netfilter.h:302 [inline]
      NF_HOOK include/linux/netfilter.h:296 [inline]
      br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
      br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
      br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
      br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
      NF_HOOK include/linux/netfilter.h:302 [inline]
      br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
      br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
      nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
      nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
      br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
      __netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
      __netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
      __netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
      process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
      __napi_poll+0xb3/0x6d0 net/core/dev.c:6494
      napi_poll net/core/dev.c:6561 [inline]
      net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
      __do_softirq+0x1d0/0x9c8 kernel/softirq.c:571
      
      Fixes: 2b0a8c9e ("tcp: add CDG congestion control")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      35309be0
    • Eric Dumazet's avatar
      macvlan: enforce a consistent minimal mtu · e929ec98
      Eric Dumazet authored
      
      commit b64085b0 upstream.
      
      macvlan should enforce a minimal mtu of 68, even at link creation.
      
      This patch avoids the current behavior (which could lead to crashes
      in ipv6 stack if the link is brought up)
      
      $ ip link add macvlan1 link eno1 mtu 8 type macvlan  # This should fail !
      $ ip link sh dev macvlan1
      5: macvlan1@eno1: <BROADCAST,MULTICAST> mtu 8 qdisc noop
          state DOWN mode DEFAULT group default qlen 1000
          link/ether 02:47:6c:24:74:82 brd ff:ff:ff:ff:ff:ff
      $ ip link set macvlan1 mtu 67
      Error: mtu less than device minimum.
      $ ip link set macvlan1 mtu 68
      $ ip link set macvlan1 mtu 8
      Error: mtu less than device minimum.
      
      Fixes: 91572088 ("net: use core MTU range checking in core net infra")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e929ec98
    • Tadeusz Struk's avatar
      uapi/linux/stddef.h: Add include guards · 95ebea5a
      Tadeusz Struk authored
      
      commit 55037ed7 upstream.
      
      Add include guard wrapper define to uapi/linux/stddef.h to prevent macro
      redefinition errors when stddef.h is included more than once. This was not
      needed before since the only contents already used a redefinition test.
      
      Signed-off-by: default avatarTadeusz Struk <tadeusz.struk@linaro.org>
      Link: https://lore.kernel.org/r/20220329171252.57279-1-tadeusz.struk@linaro.org
      
      
      Fixes: 50d7bd38 ("stddef: Introduce struct_group() helper macro")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      95ebea5a
    • Chen Jun's avatar
      Input: i8042 - fix leaking of platform device on module removal · 3f25add5
      Chen Jun authored
      
      [ Upstream commit 81cd7e84 ]
      
      Avoid resetting the module-wide i8042_platform_device pointer in
      i8042_probe() or i8042_remove(), so that the device can be properly
      destroyed by i8042_exit() on module unload.
      
      Fixes: 9222ba68 ("Input: i8042 - add deferred probe support")
      Signed-off-by: default avatarChen Jun <chenjun102@huawei.com>
      Link: https://lore.kernel.org/r/20221109034148.23821-1-chenjun102@huawei.com
      
      
      Signed-off-by: default avatarDmitry Torokhov <dmitry.torokhov@gmail.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      3f25add5
    • Li Huafei's avatar
      kprobes: Skip clearing aggrprobe's post_handler in kprobe-on-ftrace case · 7d606ae1
      Li Huafei authored
      [ Upstream commit 5dd7caf0 ]
      
      In __unregister_kprobe_top(), if the currently unregistered probe has
      post_handler but other child probes of the aggrprobe do not have
      post_handler, the post_handler of the aggrprobe is cleared. If this is
      a ftrace-based probe, there is a problem. In later calls to
      disarm_kprobe(), we will use kprobe_ftrace_ops because post_handler is
      NULL. But we're armed with kprobe_ipmodify_ops. This triggers a WARN in
      __disarm_kprobe_ftrace() and may even cause use-after-free:
      
        Failed to disarm kprobe-ftrace at kernel_clone+0x0/0x3c0 (error -2)
        WARNING: CPU: 5 PID: 137 at kernel/kprobes.c:1135 __disarm_kprobe_ftrace.isra.21+0xcf/0xe0
        Modules linked in: testKprobe_007(-)
        CPU: 5 PID: 137 Comm: rmmod Not tainted 6.1.0-rc4-dirty #18
        [...]
        Call Trace:
         <TASK>
         __disable_kprobe+0xcd/0xe0
         __unregister_kprobe_top+0x12/0x150
         ? mutex_lock+0xe/0x30
         unregister_kprobes.part.23+0x31/0xa0
         unregister_kprobe+0x32/0x40
         __x64_sys_delete_module+0x15e/0x260
         ? do_user_addr_fault+0x2cd/0x6b0
         do_syscall_64+0x3a/0x90
         entry_SYSCALL_64_after_hwframe+0x63/0xcd
         [...]
      
      For the kprobe-on-ftrace case, we keep the post_handler setting to
      identify this aggrprobe armed with kprobe_ipmodify_ops. This way we
      can disarm it correctly.
      
      Link: https://lore.kernel.org/all/20221112070000.35299-1-lihuafei1@huawei.com/
      
      
      
      Fixes: 0bc11ed5 ("kprobes: Allow kprobes coexist with livepatch")
      Reported-by: default avatarZhao Gongyi <zhaogongyi@huawei.com>
      Suggested-by: default avatarMasami Hiramatsu (Google) <mhiramat@kernel.org>
      Signed-off-by: default avatarLi Huafei <lihuafei1@huawei.com>
      Acked-by: default avatarMasami Hiramatsu (Google) <mhiramat@kernel.org>
      Signed-off-by: default avatarMasami Hiramatsu (Google) <mhiramat@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      7d606ae1
    • Yuan Can's avatar
      scsi: scsi_debug: Fix possible UAF in sdebug_add_host_helper() · 89ece5ff
      Yuan Can authored
      
      [ Upstream commit e208a1d7 ]
      
      If device_register() fails in sdebug_add_host_helper(), it will goto clean
      and sdbg_host will be freed, but sdbg_host->host_list will not be removed
      from sdebug_host_list, then list traversal may cause UAF. Fix it.
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarYuan Can <yuancan@huawei.com>
      Link: https://lore.kernel.org/r/20221117084421.58918-1-yuancan@huawei.com
      
      
      Acked-by: default avatarDouglas Gilbert <dgilbert@interlog.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      89ece5ff
    • Yang Yingliang's avatar
      scsi: target: tcm_loop: Fix possible name leak in tcm_loop_setup_hba_bus() · 75205f1b
      Yang Yingliang authored
      
      [ Upstream commit bc68e428 ]
      
      If device_register() fails in tcm_loop_setup_hba_bus(), the name allocated
      by dev_set_name() need be freed. As comment of device_register() says, it
      should use put_device() to give up the reference in the error path. So fix
      this by calling put_device(), then the name can be freed in kobject_cleanup().
      The 'tl_hba' will be freed in tcm_loop_release_adapter(), so it don't need
      goto error label in this case.
      
      Fixes: 3703b2c5 ("[SCSI] tcm_loop: Add multi-fabric Linux/SCSI LLD fabric module")
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      Link: https://lore.kernel.org/r/20221115015042.3652261-1-yangyingliang@huawei.com
      
      
      Reviewed-by: default avatarMike Christie <michael.chritie@oracle.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      75205f1b
    • Hangbin Liu's avatar
      net: use struct_group to copy ip/ipv6 header addresses · 6e933443
      Hangbin Liu authored
      
      [ Upstream commit 58e0be1e ]
      
      kernel test robot reported warnings when build bonding module with
      make W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash drivers/net/bonding/:
      
                       from ../drivers/net/bonding/bond_main.c:35:
      In function ‘fortify_memcpy_chk’,
          inlined from ‘iph_to_flow_copy_v4addrs’ at ../include/net/ip.h:566:2,
          inlined from ‘bond_flow_ip’ at ../drivers/net/bonding/bond_main.c:3984:3:
      ../include/linux/fortify-string.h:413:25: warning: call to ‘__read_overflow2_field’ declared with attribute warning: detected read beyond size of f
      ield (2nd parameter); maybe use struct_group()? [-Wattribute-warning]
        413 |                         __read_overflow2_field(q_size_field, size);
            |                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      In function ‘fortify_memcpy_chk’,
          inlined from ‘iph_to_flow_copy_v6addrs’ at ../include/net/ipv6.h:900:2,
          inlined from ‘bond_flow_ip’ at ../drivers/net/bonding/bond_main.c:3994:3:
      ../include/linux/fortify-string.h:413:25: warning: call to ‘__read_overflow2_field’ declared with attribute warning: detected read beyond size of f
      ield (2nd parameter); maybe use struct_group()? [-Wattribute-warning]
        413 |                         __read_overflow2_field(q_size_field, size);
            |                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      
      This is because we try to copy the whole ip/ip6 address to the flow_key,
      while we only point the to ip/ip6 saddr. Note that since these are UAPI
      headers, __struct_group() is used to avoid the compiler warnings.
      
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Fixes: c3f83241 ("net: Add full IPv6 addresses to flow_keys")
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Link: https://lore.kernel.org/r/20221115142400.1204786-1-liuhangbin@gmail.com
      
      
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      6e933443
    • Kees Cook's avatar
      stddef: Introduce struct_group() helper macro · 9fd7bdaf
      Kees Cook authored
      
      [ Upstream commit 50d7bd38 ]
      
      Kernel code has a regular need to describe groups of members within a
      structure usually when they need to be copied or initialized separately
      from the rest of the surrounding structure. The generally accepted design
      pattern in C is to use a named sub-struct:
      
      	struct foo {
      		int one;
      		struct {
      			int two;
      			int three, four;
      		} thing;
      		int five;
      	};
      
      This would allow for traditional references and sizing:
      
      	memcpy(&dst.thing, &src.thing, sizeof(dst.thing));
      
      However, doing this would mean that referencing struct members enclosed
      by such named structs would always require including the sub-struct name
      in identifiers:
      
      	do_something(dst.thing.three);
      
      This has tended to be quite inflexible, especially when such groupings
      need to be added to established code which causes huge naming churn.
      Three workarounds exist in the kernel for this problem, and each have
      other negative properties.
      
      To avoid the naming churn, there is a design pattern of adding macro
      aliases for the named struct:
      
      	#define f_three thing.three
      
      This ends up polluting the global namespace, and makes it difficult to
      search for identifiers.
      
      Another common work-around in kernel code avoids the pollution by avoiding
      the named struct entirely, instead identifying the group's boundaries using
      either a pair of empty anonymous structs of a pair of zero-element arrays:
      
      	struct foo {
      		int one;
      		struct { } start;
      		int two;
      		int three, four;
      		struct { } finish;
      		int five;
      	};
      
      	struct foo {
      		int one;
      		int start[0];
      		int two;
      		int three, four;
      		int finish[0];
      		int five;
      	};
      
      This allows code to avoid needing to use a sub-struct named for member
      references within the surrounding structure, but loses the benefits of
      being able to actually use such a struct, making it rather fragile. Using
      these requires open-coded calculation of sizes and offsets. The efforts
      made to avoid common mistakes include lots of comments, or adding various
      BUILD_BUG_ON()s. Such code is left with no way for the compiler to reason
      about the boundaries (e.g. the "start" object looks like it's 0 bytes
      in length), making bounds checking depend on open-coded calculations:
      
      	if (length > offsetof(struct foo, finish) -
      		     offsetof(struct foo, start))
      		return -EINVAL;
      	memcpy(&dst.start, &src.start, offsetof(struct foo, finish) -
      				       offsetof(struct foo, start));
      
      However, the vast majority of places in the kernel that operate on
      groups of members do so without any identification of the grouping,
      relying either on comments or implicit knowledge of the struct contents,
      which is even harder for the compiler to reason about, and results in
      even more fragile manual sizing, usually depending on member locations
      outside of the region (e.g. to copy "two" and "three", use the start of
      "four" to find the size):
      
      	BUILD_BUG_ON((offsetof(struct foo, four) <
      		      offsetof(struct foo, two)) ||
      		     (offsetof(struct foo, four) <
      		      offsetof(struct foo, three));
      	if (length > offsetof(struct foo, four) -
      		     offsetof(struct foo, two))
      		return -EINVAL;
      	memcpy(&dst.two, &src.two, length);
      
      In order to have a regular programmatic way to describe a struct
      region that can be used for references and sizing, can be examined for
      bounds checking, avoids forcing the use of intermediate identifiers,
      and avoids polluting the global namespace, introduce the struct_group()
      macro. This macro wraps the member declarations to create an anonymous
      union of an anonymous struct (no intermediate name) and a named struct
      (for references and sizing):
      
      	struct foo {
      		int one;
      		struct_group(thing,
      			int two;
      			int three, four;
      		);
      		int five;
      	};
      
      	if (length > sizeof(src.thing))
      		return -EINVAL;
      	memcpy(&dst.thing, &src.thing, length);
      	do_something(dst.three);
      
      There are some rare cases where the resulting struct_group() needs
      attributes added, so struct_group_attr() is also introduced to allow
      for specifying struct attributes (e.g. __align(x) or __packed).
      Additionally, there are places where such declarations would like to
      have the struct be tagged, so struct_group_tagged() is added.
      
      Given there is a need for a handful of UAPI uses too, the underlying
      __struct_group() macro has been defined in UAPI so it can be used there
      too.
      
      To avoid confusing scripts/kernel-doc, hide the macro from its struct
      parsing.
      
      Co-developed-by: default avatarKeith Packard <keithp@keithp.com>
      Signed-off-by: default avatarKeith Packard <keithp@keithp.com>
      Acked-by: default avatarGustavo A. R. Silva <gustavoars@kernel.org>
      Link: https://lore.kernel.org/lkml/20210728023217.GC35706@embeddedor
      
      
      Enhanced-by: default avatarRasmus Villemoes <linux@rasmusvillemoes.dk>
      Link: https://lore.kernel.org/lkml/41183a98-bdb9-4ad6-7eab-5a7292a6df84@rasmusvillemoes.dk
      
      
      Enhanced-by: default avatarDan Williams <dan.j.williams@intel.com>
      Link: https://lore.kernel.org/lkml/1d9a2e6df2a9a35b2cdd50a9a68cac5991e7e5f0.camel@intel.com
      
      
      Enhanced-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Link: https://lore.kernel.org/lkml/YQKa76A6XuFqgM03@phenom.ffwll.local
      
      
      Acked-by: default avatarDan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Stable-dep-of: 58e0be1e ("net: use struct_group to copy ip/ipv6 header addresses")
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      9fd7bdaf
    • Lukas Wunner's avatar
      usbnet: smsc95xx: Fix deadlock on runtime resume · 47c3bdd9
      Lukas Wunner authored
      
      [ Upstream commit 7b960c96 ]
      
      Commit 05b35e7e ("smsc95xx: add phylib support") amended
      smsc95xx_resume() to call phy_init_hw().  That function waits for the
      device to runtime resume even though it is placed in the runtime resume
      path, causing a deadlock.
      
      The problem is that phy_init_hw() calls down to smsc95xx_mdiobus_read(),
      which never uses the _nopm variant of usbnet_read_cmd().
      
      Commit b4df480f ("usbnet: smsc95xx: add reset_resume function with
      reset operation") causes a similar deadlock on resume if the device was
      already runtime suspended when entering system sleep:
      
      That's because the commit introduced smsc95xx_reset_resume(), which
      calls down to smsc95xx_reset(), which neglects to use _nopm accessors.
      
      Fix by auto-detecting whether a device access is performed by the
      suspend/resume task_struct and use the _nopm variant if so.  This works
      because the PM core guarantees that suspend/resume callbacks are run in
      task context.
      
      Stacktrace for posterity:
      
        INFO: task kworker/2:1:49 blocked for more than 122 seconds.
        Workqueue: usb_hub_wq hub_event
        schedule
        rpm_resume
        __pm_runtime_resume
        usb_autopm_get_interface
        usbnet_read_cmd
        __smsc95xx_read_reg
        __smsc95xx_phy_wait_not_busy
        __smsc95xx_mdio_read
        smsc95xx_mdiobus_read
        __mdiobus_read
        mdiobus_read
        smsc_phy_reset
        phy_init_hw
        smsc95xx_resume
        usb_resume_interface
        usb_resume_both
        usb_runtime_resume
        __rpm_callback
        rpm_callback
        rpm_resume
        __pm_runtime_resume
        usb_autoresume_device
        hub_event
        process_one_work
      
      Fixes: b4df480f ("usbnet: smsc95xx: add reset_resume function with reset operation")
      Signed-off-by: default avatarLukas Wunner <lukas@wunner.de>
      Cc: stable@vger.kernel.org # v3.16+
      Cc: Andre Edich <andre.edich@microchip.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      47c3bdd9
    • Steven Rostedt (Google)'s avatar
      ring-buffer: Include dropped pages in counting dirty patches · 8208c266
      Steven Rostedt (Google) authored
      [ Upstream commit 31029a8b ]
      
      The function ring_buffer_nr_dirty_pages() was created to find out how many
      pages are filled in the ring buffer. There's two running counters. One is
      incremented whenever a new page is touched (pages_touched) and the other
      is whenever a page is read (pages_read). The dirty count is the number
      touched minus the number read. This is used to determine if a blocked task
      should be woken up if the percentage of the ring buffer it is waiting for
      is hit.
      
      The problem is that it does not take into account dropped pages (when the
      new writes overwrite pages that were not read). And then the dirty pages
      will always be greater than the percentage.
      
      This makes the "buffer_percent" file inaccurate, as the number of dirty
      pages end up always being larger than the percentage, event when it's not
      and this causes user space to be woken up more than it wants to be.
      
      Add a new counter to keep track of lost pages, and include that in the
      accounting of dirty pages so that it is actually accurate.
      
      Link: https://lkml.kernel.org/r/20221021123013.55fb6055@gandalf.local.home
      
      
      
      Fixes: 2c2b0a78 ("ring-buffer: Add percentage of ring buffer full to wake up reader")
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      8208c266
    • Gong, Sishuai's avatar
      net: fix a concurrency bug in l2tp_tunnel_register() · 36b5095b
      Gong, Sishuai authored
      
      [ Upstream commit 69e16d01 ]
      
      l2tp_tunnel_register() registers a tunnel without fully
      initializing its attribute. This can allow another kernel thread
      running l2tp_xmit_core() to access the uninitialized data and
      then cause a kernel NULL pointer dereference error, as shown below.
      
      Thread 1    Thread 2
      //l2tp_tunnel_register()
      list_add_rcu(&tunnel->list, &pn->l2tp_tunnel_list);
                 //pppol2tp_connect()
                 tunnel = l2tp_tunnel_get(sock_net(sk), info.tunnel_id);
                 // Fetch the new tunnel
                 ...
                 //l2tp_xmit_core()
                 struct sock *sk = tunnel->sock;
                 ...
                 bh_lock_sock(sk);
                 //Null pointer error happens
      tunnel->sock = sk;
      
      Fix this bug by initializing tunnel->sock before adding the
      tunnel into l2tp_tunnel_list.
      
      Reviewed-by: default avatarCong Wang <cong.wang@bytedance.com>
      Signed-off-by: default avatarSishuai Gong <sishuai@purdue.edu>
      Reported-by: default avatarSishuai Gong <sishuai@purdue.edu>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Stable-dep-of: b68777d5 ("l2tp: Serialize access to sk_user_data with sk_callback_lock")
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      36b5095b
    • Keith Busch's avatar
      nvme: ensure subsystem reset is single threaded · 023435a0
      Keith Busch authored
      commit 1e866afd upstream.
      
      The subsystem reset writes to a register, so we have to ensure the
      device state is capable of handling that otherwise the driver may access
      unmapped registers. Use the state machine to ensure the subsystem reset
      doesn't try to write registers on a device already undergoing this type
      of reset.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=214771
      
      
      Signed-off-by: default avatarKeith Busch <kbusch@kernel.org>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarOvidiu Panait <ovidiu.panait@windriver.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      023435a0
    • Keith Busch's avatar
      nvme: restrict management ioctls to admin · b9a5ecf2
      Keith Busch authored
      
      commit 23e085b2 upstream.
      
      The passthrough commands already have this restriction, but the other
      operations do not. Require the same capabilities for all users as all of
      these operations, which include resets and rescans, can be disruptive.
      
      Signed-off-by: default avatarKeith Busch <kbusch@kernel.org>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarOvidiu Panait <ovidiu.panait@windriver.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b9a5ecf2
    • Adrian Hunter's avatar
      perf/x86/intel/pt: Fix sampling using single range output · 5e2f14d7
      Adrian Hunter authored
      
      commit ce0d998b upstream.
      
      Deal with errata TGL052, ADL037 and RPL017 "Trace May Contain Incorrect
      Data When Configured With Single Range Output Larger Than 4KB" by
      disabling single range output whenever larger than 4KB.
      
      Fixes: 67063847 ("perf/x86/intel/pt: Opportunistically use single range output mode")
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20221112151508.13768-1-adrian.hunter@intel.com
      
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5e2f14d7
    • Alexander Potapenko's avatar
      misc/vmw_vmci: fix an infoleak in vmci_host_do_receive_datagram() · 62634b43
      Alexander Potapenko authored
      
      commit e5b0d06d upstream.
      
      `struct vmci_event_qp` allocated by qp_notify_peer() contains padding,
      which may carry uninitialized data to the userspace, as observed by
      KMSAN:
      
        BUG: KMSAN: kernel-infoleak in instrument_copy_to_user ./include/linux/instrumented.h:121
         instrument_copy_to_user ./include/linux/instrumented.h:121
         _copy_to_user+0x5f/0xb0 lib/usercopy.c:33
         copy_to_user ./include/linux/uaccess.h:169
         vmci_host_do_receive_datagram drivers/misc/vmw_vmci/vmci_host.c:431
         vmci_host_unlocked_ioctl+0x33d/0x43d0 drivers/misc/vmw_vmci/vmci_host.c:925
         vfs_ioctl fs/ioctl.c:51
        ...
      
        Uninit was stored to memory at:
         kmemdup+0x74/0xb0 mm/util.c:131
         dg_dispatch_as_host drivers/misc/vmw_vmci/vmci_datagram.c:271
         vmci_datagram_dispatch+0x4f8/0xfc0 drivers/misc/vmw_vmci/vmci_datagram.c:339
         qp_notify_peer+0x19a/0x290 drivers/misc/vmw_vmci/vmci_queue_pair.c:1479
         qp_broker_attach drivers/misc/vmw_vmci/vmci_queue_pair.c:1662
         qp_broker_alloc+0x2977/0x2f30 drivers/misc/vmw_vmci/vmci_queue_pair.c:1750
         vmci_qp_broker_alloc+0x96/0xd0 drivers/misc/vmw_vmci/vmci_queue_pair.c:1940
         vmci_host_do_alloc_queuepair drivers/misc/vmw_vmci/vmci_host.c:488
         vmci_host_unlocked_ioctl+0x24fd/0x43d0 drivers/misc/vmw_vmci/vmci_host.c:927
        ...
      
        Local variable ev created at:
         qp_notify_peer+0x54/0x290 drivers/misc/vmw_vmci/vmci_queue_pair.c:1456
         qp_broker_attach drivers/misc/vmw_vmci/vmci_queue_pair.c:1662
         qp_broker_alloc+0x2977/0x2f30 drivers/misc/vmw_vmci/vmci_queue_pair.c:1750
      
        Bytes 28-31 of 48 are uninitialized
        Memory access of size 48 starts at ffff888035155e00
        Data copied to user address 0000000020000100
      
      Use memset() to prevent the infoleaks.
      
      Also speculatively fix qp_notify_peer_local(), which may suffer from the
      same problem.
      
      Reported-by: default avatar <syzbot+39be4da489ed2493ba25@syzkaller.appspotmail.com>
      Cc: stable <stable@kernel.org>
      Fixes: 06164d2b ("VMCI: queue pairs implementation.")
      Signed-off-by: default avatarAlexander Potapenko <glider@google.com>
      Reviewed-by: default avatarVishnu Dasa <vdasa@vmware.com>
      Link: https://lore.kernel.org/r/20221104175849.2782567-1-glider@google.com
      
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      62634b43
    • Shuah Khan's avatar
      docs: update mediator contact information in CoC doc · c1eb46a6
      Shuah Khan authored
      
      commit 5fddf896 upstream.
      
      Update mediator contact information in CoC interpretation document.
      
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarShuah Khan <skhan@linuxfoundation.org>
      Link: https://lore.kernel.org/r/20221011171417.34286-1-skhan@linuxfoundation.org
      
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c1eb46a6
    • Xiongfeng Wang's avatar
      mmc: sdhci-pci: Fix possible memory leak caused by missing pci_dev_put() · 4423866d
      Xiongfeng Wang authored
      
      commit 222cfa01 upstream.
      
      pci_get_device() will increase the reference count for the returned
      pci_dev. We need to use pci_dev_put() to decrease the reference count
      before amd_probe() returns. There is no problem for the 'smbus_dev ==
      NULL' branch because pci_dev_put() can also handle the NULL input
      parameter case.
      
      Fixes: 659c9bc1 ("mmc: sdhci-pci: Build o2micro support in the same module")
      Signed-off-by: default avatarXiongfeng Wang <wangxiongfeng2@huawei.com>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/r/20221114083100.149200-1-wangxiongfeng2@huawei.com
      
      
      Signed-off-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4423866d
    • Chevron Li's avatar
      mmc: sdhci-pci-o2micro: fix card detect fail issue caused by CD# debounce timeout · 440653a1
      Chevron Li authored
      
      commit 096cc0cd upstream.
      
      The SD card is recognized failed sometimes when resume from suspend.
      Because CD# debounce time too long then card present report wrong.
      Finally, card is recognized failed.
      
      Signed-off-by: default avatarChevron Li <chevron.li@bayhubtech.com>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/r/20221104095512.4068-1-chevron.li@bayhubtech.com
      
      
      Signed-off-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      440653a1
    • Yann Gautier's avatar
      mmc: core: properly select voltage range without power cycle · 8e70b141
      Yann Gautier authored
      
      commit 39a72dbf upstream.
      
      In mmc_select_voltage(), if there is no full power cycle, the voltage
      range selected at the end of the function will be on a single range
      (e.g. 3.3V/3.4V). To keep a range around the selected voltage (3.2V/3.4V),
      the mask shift should be reduced by 1.
      
      This issue was triggered by using a specific SD-card (Verbatim Premium
      16GB UHS-1) on an STM32MP157C-DK2 board. This board cannot do UHS modes
      and there is no power cycle. And the card was failing to switch to
      high-speed mode. When adding the range 3.2V/3.3V for this card with the
      proposed shift change, the card can switch to high-speed mode.
      
      Fixes: ce69d37b ("mmc: core: Prevent violation of specs while initializing cards")
      Signed-off-by: default avatarYann Gautier <yann.gautier@foss.st.com>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/r/20221028073740.7259-1-yann.gautier@foss.st.com
      
      
      Signed-off-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8e70b141
    • Brian Norris's avatar
      firmware: coreboot: Register bus in module init · 05b0f662
      Brian Norris authored
      
      commit 65946690 upstream.
      
      The coreboot_table driver registers a coreboot bus while probing a
      "coreboot_table" device representing the coreboot table memory region.
      Probing this device (i.e., registering the bus) is a dependency for the
      module_init() functions of any driver for this bus (e.g.,
      memconsole-coreboot.c / memconsole_driver_init()).
      
      With synchronous probe, this dependency works OK, as the link order in
      the Makefile ensures coreboot_table_driver_init() (and thus,
      coreboot_table_probe()) completes before a coreboot device driver tries
      to add itself to the bus.
      
      With asynchronous probe, however, coreboot_table_probe() may race with
      memconsole_driver_init(), and so we're liable to hit one of these two:
      
      1. coreboot_driver_register() eventually hits "[...] the bus was not
         initialized.", and the memconsole driver fails to register; or
      2. coreboot_driver_register() gets past #1, but still races with
         bus_register() and hits some other undefined/crashing behavior (e.g.,
         in driver_find() [1])
      
      We can resolve this by registering the bus in our initcall, and only
      deferring "device" work (scanning the coreboot memory region and
      creating sub-devices) to probe().
      
      [1] Example failure, using 'driver_async_probe=*' kernel command line:
      
      [    0.114217] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000010
      ...
      [    0.114307] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 6.1.0-rc1 #63
      [    0.114316] Hardware name: Google Scarlet (DT)
      ...
      [    0.114488] Call trace:
      [    0.114494]  _raw_spin_lock+0x34/0x60
      [    0.114502]  kset_find_obj+0x28/0x84
      [    0.114511]  driver_find+0x30/0x50
      [    0.114520]  driver_register+0x64/0x10c
      [    0.114528]  coreboot_driver_register+0x30/0x3c
      [    0.114540]  memconsole_driver_init+0x24/0x30
      [    0.114550]  do_one_initcall+0x154/0x2e0
      [    0.114560]  do_initcall_level+0x134/0x160
      [    0.114571]  do_initcalls+0x60/0xa0
      [    0.114579]  do_basic_setup+0x28/0x34
      [    0.114588]  kernel_init_freeable+0xf8/0x150
      [    0.114596]  kernel_init+0x2c/0x12c
      [    0.114607]  ret_from_fork+0x10/0x20
      [    0.114624] Code: 5280002b 1100054a b900092a f9800011 (885ffc01)
      [    0.114631] ---[ end trace 0000000000000000 ]---
      
      Fixes: b81e3140 ("firmware: coreboot: Make bus registration symmetric")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarBrian Norris <briannorris@chromium.org>
      Reviewed-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Reviewed-by: default avatarStephen Boyd <swboyd@chromium.org>
      Link: https://lore.kernel.org/r/20221019180934.1.If29e167d8a4771b0bf4a39c89c6946ed764817b9@changeid
      
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      05b0f662
    • Tina Zhang's avatar
      iommu/vt-d: Set SRE bit only when hardware has SRS cap · deda86a0
      Tina Zhang authored
      
      commit 7fc961cf upstream.
      
      SRS cap is the hardware cap telling if the hardware IOMMU can support
      requests seeking supervisor privilege or not. SRE bit in scalable-mode
      PASID table entry is treated as Reserved(0) for implementation not
      supporting SRS cap.
      
      Checking SRS cap before setting SRE bit can avoid the non-recoverable
      fault of "Non-zero reserved field set in PASID Table Entry" caused by
      setting SRE bit while there is no SRS cap support. The fault messages
      look like below:
      
       DMAR: DRHD: handling fault status reg 2
       DMAR: [DMA Read NO_PASID] Request device [00:0d.0] fault addr 0x1154e1000
             [fault reason 0x5a]
             SM: Non-zero reserved field set in PASID Table Entry
      
      Fixes: 6f7db75e ("iommu/vt-d: Add second level page table interface")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarTina Zhang <tina.zhang@intel.com>
      Link: https://lore.kernel.org/r/20221115070346.1112273-1-tina.zhang@intel.com
      
      
      Signed-off-by: default avatarLu Baolu <baolu.lu@linux.intel.com>
      Link: https://lore.kernel.org/r/20221116051544.26540-3-baolu.lu@linux.intel.com
      
      
      Signed-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      deda86a0
    • Benjamin Block's avatar
      scsi: zfcp: Fix double free of FSF request when qdio send fails · d2c7d8f5
      Benjamin Block authored
      
      commit 0954256e upstream.
      
      We used to use the wrong type of integer in 'zfcp_fsf_req_send()' to cache
      the FSF request ID when sending a new FSF request. This is used in case the
      sending fails and we need to remove the request from our internal hash
      table again (so we don't keep an invalid reference and use it when we free
      the request again).
      
      In 'zfcp_fsf_req_send()' we used to cache the ID as 'int' (signed and 32
      bit wide), but the rest of the zfcp code (and the firmware specification)
      handles the ID as 'unsigned long'/'u64' (unsigned and 64 bit wide [s390x
      ELF ABI]).  For one this has the obvious problem that when the ID grows
      past 32 bit (this can happen reasonably fast) it is truncated to 32 bit
      when storing it in the cache variable and so doesn't match the original ID
      anymore.  The second less obvious problem is that even when the original ID
      has not yet grown past 32 bit, as soon as the 32nd bit is set in the
      original ID (0x80000000 = 2'147'483'648) we will have a mismatch when we
      cast it back to 'unsigned long'. As the cached variable is of a signed
      type, the compiler will choose a sign-extending instruction to load the 32
      bit variable into a 64 bit register (e.g.: 'lgf %r11,188(%r15)'). So once
      we pass the cached variable into 'zfcp_reqlist_find_rm()' to remove the
      request again all the leading zeros will be flipped to ones to extend the
      sign and won't match the original ID anymore (this has been observed in
      practice).
      
      If we can't successfully remove the request from the hash table again after
      'zfcp_qdio_send()' fails (this happens regularly when zfcp cannot notify
      the adapter about new work because the adapter is already gone during
      e.g. a ChpID toggle) we will end up with a double free.  We unconditionally
      free the request in the calling function when 'zfcp_fsf_req_send()' fails,
      but because the request is still in the hash table we end up with a stale
      memory reference, and once the zfcp adapter is either reset during recovery
      or shutdown we end up freeing the same memory twice.
      
      The resulting stack traces vary depending on the kernel and have no direct
      correlation to the place where the bug occurs. Here are three examples that
      have been seen in practice:
      
        list_del corruption. next->prev should be 00000001b9d13800, but was 00000000dead4ead. (next=00000001bd131a00)
        ------------[ cut here ]------------
        kernel BUG at lib/list_debug.c:62!
        monitor event: 0040 ilc:2 [#1] PREEMPT SMP
        Modules linked in: ...
        CPU: 9 PID: 1617 Comm: zfcperp0.0.1740 Kdump: loaded
        Hardware name: ...
        Krnl PSW : 0704d00180000000 00000003cbeea1f8 (__list_del_entry_valid+0x98/0x140)
                   R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3
        Krnl GPRS: 00000000916d12f1 0000000080000000 000000000000006d 00000003cb665cd6
                   0000000000000001 0000000000000000 0000000000000000 00000000d28d21e8
                   00000000d3844000 00000380099efd28 00000001bd131a00 00000001b9d13800
                   00000000d3290100 0000000000000000 00000003cbeea1f4 00000380099efc70
        Krnl Code: 00000003cbeea1e8: c020004f68a7        larl    %r2,00000003cc8d7336
                   00000003cbeea1ee: c0e50027fd65        brasl   %r14,00000003cc3e9cb8
                  #00000003cbeea1f4: af000000            mc      0,0
                  >00000003cbeea1f8: c02000920440        larl    %r2,00000003cd12aa78
                   00000003cbeea1fe: c0e500289c25        brasl   %r14,00000003cc3fda48
                   00000003cbeea204: b9040043            lgr     %r4,%r3
                   00000003cbeea208: b9040051            lgr     %r5,%r1
                   00000003cbeea20c: b9040032            lgr     %r3,%r2
        Call Trace:
         [<00000003cbeea1f8>] __list_del_entry_valid+0x98/0x140
        ([<00000003cbeea1f4>] __list_del_entry_valid+0x94/0x140)
         [<000003ff7ff502fe>] zfcp_fsf_req_dismiss_all+0xde/0x150 [zfcp]
         [<000003ff7ff49cd0>] zfcp_erp_strategy_do_action+0x160/0x280 [zfcp]
         [<000003ff7ff4a22e>] zfcp_erp_strategy+0x21e/0xca0 [zfcp]
         [<000003ff7ff4ad34>] zfcp_erp_thread+0x84/0x1a0 [zfcp]
         [<00000003cb5eece8>] kthread+0x138/0x150
         [<00000003cb557f3c>] __ret_from_fork+0x3c/0x60
         [<00000003cc4172ea>] ret_from_fork+0xa/0x40
        INFO: lockdep is turned off.
        Last Breaking-Event-Address:
         [<00000003cc3e9d04>] _printk+0x4c/0x58
        Kernel panic - not syncing: Fatal exception: panic_on_oops
      
      or:
      
        Unable to handle kernel pointer dereference in virtual kernel address space
        Failing address: 6b6b6b6b6b6b6000 TEID: 6b6b6b6b6b6b6803
        Fault in home space mode while using kernel ASCE.
        AS:0000000063b10007 R3:0000000000000024
        Oops: 0038 ilc:3 [#1] SMP
        Modules linked in: ...
        CPU: 10 PID: 0 Comm: swapper/10 Kdump: loaded
        Hardware name: ...
        Krnl PSW : 0404d00180000000 000003ff7febaf8e (zfcp_fsf_reqid_check+0x86/0x158 [zfcp])
                   R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3
        Krnl GPRS: 5a6f1cfa89c49ac3 00000000aff2c4c8 6b6b6b6b6b6b6b6b 00000000000002a8
                   0000000000000000 0000000000000055 0000000000000000 00000000a8515800
                   0700000000000000 00000000a6e14500 00000000aff2c000 000000008003c44c
                   000000008093c700 0000000000000010 00000380009ebba8 00000380009ebb48
        Krnl Code: 000003ff7febaf7e: a7f4003d            brc     15,000003ff7febaff8
                   000003ff7febaf82: e32020000004        lg      %r2,0(%r2)
                  #000003ff7febaf88: ec2100388064        cgrj    %r2,%r1,8,000003ff7febaff8
                  >000003ff7febaf8e: e3b020100020        cg      %r11,16(%r2)
                   000003ff7febaf94: a774fff7            brc     7,000003ff7febaf82
                   000003ff7febaf98: ec280030007c        cgij    %r2,0,8,000003ff7febaff8
                   000003ff7febaf9e: e31020080004        lg      %r1,8(%r2)
                   000003ff7febafa4: e33020000004        lg      %r3,0(%r2)
        Call Trace:
         [<000003ff7febaf8e>] zfcp_fsf_reqid_check+0x86/0x158 [zfcp]
         [<000003ff7febbdbc>] zfcp_qdio_int_resp+0x6c/0x170 [zfcp]
         [<000003ff7febbf90>] zfcp_qdio_irq_tasklet+0xd0/0x108 [zfcp]
         [<0000000061d90a04>] tasklet_action_common.constprop.0+0xdc/0x128
         [<000000006292f300>] __do_softirq+0x130/0x3c0
         [<0000000061d906c6>] irq_exit_rcu+0xfe/0x118
         [<000000006291e818>] do_io_irq+0xc8/0x168
         [<000000006292d516>] io_int_handler+0xd6/0x110
         [<000000006292d596>] psw_idle_exit+0x0/0xa
        ([<0000000061d3be50>] arch_cpu_idle+0x40/0xd0)
         [<000000006292ceea>] default_idle_call+0x52/0xf8
         [<0000000061de4fa4>] do_idle+0xd4/0x168
         [<0000000061de51fe>] cpu_startup_entry+0x36/0x40
         [<0000000061d4faac>] smp_start_secondary+0x12c/0x138
         [<000000006292d88e>] restart_int_handler+0x6e/0x90
        Last Breaking-Event-Address:
         [<000003ff7febaf94>] zfcp_fsf_reqid_check+0x8c/0x158 [zfcp]
        Kernel panic - not syncing: Fatal exception in interrupt
      
      or:
      
        Unable to handle kernel pointer dereference in virtual kernel address space
        Failing address: 523b05d3ae76a000 TEID: 523b05d3ae76a803
        Fault in home space mode while using kernel ASCE.
        AS:0000000077c40007 R3:0000000000000024
        Oops: 0038 ilc:3 [#1] SMP
        Modules linked in: ...
        CPU: 3 PID: 453 Comm: kworker/3:1H Kdump: loaded
        Hardware name: ...
        Workqueue: kblockd blk_mq_run_work_fn
        Krnl PSW : 0404d00180000000 0000000076fc0312 (__kmalloc+0xd2/0x398)
                   R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3
        Krnl GPRS: ffffffffffffffff 523b05d3ae76abf6 0000000000000000 0000000000092a20
                   0000000000000002 00000007e49b5cc0 00000007eda8f000 0000000000092a20
                   00000007eda8f000 00000003b02856b9 00000000000000a8 523b05d3ae76abf6
                   00000007dd662000 00000007eda8f000 0000000076fc02b2 000003e0037637a0
        Krnl Code: 0000000076fc0302: c004000000d4	brcl	0,76fc04aa
                   0000000076fc0308: b904001b		lgr	%r1,%r11
                  #0000000076fc030c: e3106020001a	algf	%r1,32(%r6)
                  >0000000076fc0312: e31010000082	xg	%r1,0(%r1)
                   0000000076fc0318: b9040001		lgr	%r0,%r1
                   0000000076fc031c: e30061700082	xg	%r0,368(%r6)
                   0000000076fc0322: ec59000100d9	aghik	%r5,%r9,1
                   0000000076fc0328: e34003b80004	lg	%r4,952
        Call Trace:
         [<0000000076fc0312>] __kmalloc+0xd2/0x398
         [<0000000076f318f2>] mempool_alloc+0x72/0x1f8
         [<000003ff8027c5f8>] zfcp_fsf_req_create.isra.7+0x40/0x268 [zfcp]
         [<000003ff8027f1bc>] zfcp_fsf_fcp_cmnd+0xac/0x3f0 [zfcp]
         [<000003ff80280f1a>] zfcp_scsi_queuecommand+0x122/0x1d0 [zfcp]
         [<000003ff800b4218>] scsi_queue_rq+0x778/0xa10 [scsi_mod]
         [<00000000771782a0>] __blk_mq_try_issue_directly+0x130/0x208
         [<000000007717a124>] blk_mq_request_issue_directly+0x4c/0xa8
         [<000003ff801302e2>] dm_mq_queue_rq+0x2ea/0x468 [dm_mod]
         [<0000000077178c12>] blk_mq_dispatch_rq_list+0x33a/0x818
         [<000000007717f064>] __blk_mq_do_dispatch_sched+0x284/0x2f0
         [<000000007717f44c>] __blk_mq_sched_dispatch_requests+0x1c4/0x218
         [<000000007717fa7a>] blk_mq_sched_dispatch_requests+0x52/0x90
         [<0000000077176d74>] __blk_mq_run_hw_queue+0x9c/0xc0
         [<0000000076da6d74>] process_one_work+0x274/0x4d0
         [<0000000076da7018>] worker_thread+0x48/0x560
         [<0000000076daef18>] kthread+0x140/0x160
         [<000000007751d144>] ret_from_fork+0x28/0x30
        Last Breaking-Event-Address:
         [<0000000076fc0474>] __kmalloc+0x234/0x398
        Kernel panic - not syncing: Fatal exception: panic_on_oops
      
      To fix this, simply change the type of the cache variable to 'unsigned
      long', like the rest of zfcp and also the argument for
      'zfcp_reqlist_find_rm()'. This prevents truncation and wrong sign extension
      and so can successfully remove the request from the hash table.
      
      Fixes: e60a6d69 ("[SCSI] zfcp: Remove function zfcp_reqlist_find_safe")
      Cc: <stable@vger.kernel.org> #v2.6.34+
      Signed-off-by: default avatarBenjamin Block <bblock@linux.ibm.com>
      Link: https://lore.kernel.org/r/979f6e6019d15f91ba56182f1aaf68d61bf37fc6.1668595505.git.bblock@linux.ibm.com
      
      
      Reviewed-by: default avatarSteffen Maier <maier@linux.ibm.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d2c7d8f5
    • Alban Crequy's avatar
      maccess: Fix writing offset in case of fault in strncpy_from_kernel_nofault() · db744288
      Alban Crequy authored
      
      commit 8678ea06 upstream.
      
      If a page fault occurs while copying the first byte, this function resets one
      byte before dst.
      As a consequence, an address could be modified and leaded to kernel crashes if
      case the modified address was accessed later.
      
      Fixes: b58294ea ("maccess: allow architectures to provide kernel probing directly")
      Signed-off-by: default avatarAlban Crequy <albancrequy@linux.microsoft.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Tested-by: default avatarFrancis Laniel <flaniel@linux.microsoft.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: <stable@vger.kernel.org> [5.8]
      Link: https://lore.kernel.org/bpf/20221110085614.111213-2-albancrequy@linux.microsoft.com
      
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      db744288
    • Tetsuo Handa's avatar
      Input: iforce - invert valid length check when fetching device IDs · 24cc679a
      Tetsuo Handa authored
      
      commit b8ebf250 upstream.
      
      syzbot is reporting uninitialized value at iforce_init_device() [1], for
      commit 6ac0aec6 ("Input: iforce - allow callers supply data buffer
      when fetching device IDs") is checking that valid length is shorter than
      bytes to read. Since iforce_get_id_packet() stores valid length when
      returning 0, the caller needs to check that valid length is longer than or
      equals to bytes to read.
      
      Reported-by: default avatarsyzbot <syzbot+4dd880c1184280378821@syzkaller.appspotmail.com>
      Signed-off-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Fixes: 6ac0aec6 ("Input: iforce - allow callers supply data buffer when fetching device IDs")
      Link: https://lore.kernel.org/r/531fb432-7396-ad37-ecba-3e42e7f56d5c@I-love.SAKURA.ne.jp
      
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarDmitry Torokhov <dmitry.torokhov@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      24cc679a
    • Ilpo Järvinen's avatar
      serial: 8250_lpss: Configure DMA also w/o DMA filter · 5f4611fe
      Ilpo Järvinen authored
      
      commit 1bfcbe58 upstream.
      
      If the platform doesn't use DMA device filter (as is the case with
      Elkhart Lake), whole lpss8250_dma_setup() setup is skipped. This
      results in skipping also *_maxburst setup which is undesirable.
      Refactor lpss8250_dma_setup() to configure DMA even if filter is not
      setup.
      
      Cc: stable <stable@kernel.org>
      Signed-off-by: default avatarIlpo Järvinen <ilpo.jarvinen@linux.intel.com>
      Reviewed-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Link: https://lore.kernel.org/r/20221108121952.5497-3-ilpo.jarvinen@linux.intel.com
      
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5f4611fe
    • Ilpo Järvinen's avatar
      serial: 8250: Flush DMA Rx on RLSI · 8679087e
      Ilpo Järvinen authored
      
      commit 1980860e upstream.
      
      Returning true from handle_rx_dma() without flushing DMA first creates
      a data ordering hazard. If DMA Rx has handled any character at the
      point when RLSI occurs, the non-DMA path handles any pending characters
      jumping them ahead of those characters that are pending under DMA.
      
      Fixes: 75df022b ("serial: 8250_dma: Fix RX handling")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarIlpo Järvinen <ilpo.jarvinen@linux.intel.com>
      Reviewed-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Link: https://lore.kernel.org/r/20221108121952.5497-5-ilpo.jarvinen@linux.intel.com
      
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8679087e
    • Ilpo Järvinen's avatar
      serial: 8250: Fall back to non-DMA Rx if IIR_RDI occurs · a5eaad87
      Ilpo Järvinen authored
      
      commit a931237c upstream.
      
      DW UART sometimes triggers IIR_RDI during DMA Rx when IIR_RX_TIMEOUT
      should have been triggered instead. Since IIR_RDI has higher priority
      than IIR_RX_TIMEOUT, this causes the Rx to hang into interrupt loop.
      The problem seems to occur at least with some combinations of
      small-sized transfers (I've reproduced the problem on Elkhart Lake PSE
      UARTs).
      
      If there's already an on-going Rx DMA and IIR_RDI triggers, fall
      graciously back to non-DMA Rx. That is, behave as if IIR_RX_TIMEOUT had
      occurred.
      
      8250_omap already considers IIR_RDI similar to this change so its
      nothing unheard of.
      
      Fixes: 75df022b ("serial: 8250_dma: Fix RX handling")
      Cc: <stable@vger.kernel.org>
      Co-developed-by: default avatarSrikanth Thokala <srikanth.thokala@intel.com>
      Signed-off-by: default avatarSrikanth Thokala <srikanth.thokala@intel.com>
      Co-developed-by: default avatarAman Kumar <aman.kumar@intel.com>
      Signed-off-by: default avatarAman Kumar <aman.kumar@intel.com>
      Signed-off-by: default avatarIlpo Järvinen <ilpo.jarvinen@linux.intel.com>
      Reviewed-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Link: https://lore.kernel.org/r/20221108121952.5497-2-ilpo.jarvinen@linux.intel.com
      
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a5eaad87
Loading