Forum | Documentation | Website | Blog

Skip to content
Snippets Groups Projects
  1. Oct 29, 2022
    • Greg Kroah-Hartman's avatar
    • Seth Jenkins's avatar
      mm: /proc/pid/smaps_rollup: fix no vma's null-deref · 97898139
      Seth Jenkins authored
      
      Commit 258f669e ("mm: /proc/pid/smaps_rollup: convert to single value
      seq_file") introduced a null-deref if there are no vma's in the task in
      show_smaps_rollup.
      
      Fixes: 258f669e ("mm: /proc/pid/smaps_rollup: convert to single value seq_file")
      Signed-off-by: default avatarSeth Jenkins <sethjenkins@google.com>
      Reviewed-by: default avatarAlexey Dobriyan <adobriyan@gmail.com>
      Tested-by: default avatarAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      97898139
    • Werner Sembach's avatar
      ACPI: video: Force backlight native for more TongFang devices · 6f9134dd
      Werner Sembach authored
      commit 3dbc80a3 upstream.
      
      This commit is very different from the upstream commit! It fixes the same
      issue by adding more quirks, rather then the general fix from the 6.1
      kernel, because the general fix from the 6.1 kernel is part of a larger
      refactoring of the backlight code which is not suitable for the stable
      series.
      
      As described in "ACPI: video: Drop NL5x?U, PF4NU1F and PF5?U??
      acpi_backlight=native quirks" (10212754) the upstream commit "ACPI:
      video: Make backlight class device registration a separate step (v2)"
      (3dbc80a3) makes these quirks unnecessary. However as mentioned in this
      bugtracker ticket https://bugzilla.kernel.org/show_bug.cgi?id=215683#c17
      
      
      the upstream fix is part of a larger patchset that is overall too complex
      for stable.
      
      The TongFang GKxNRxx, GMxNGxx, GMxZGxx, and GMxRGxx / TUXEDO
      Stellaris/Polaris Gen 1-4, have the same problem as the Clevo NL5xRU and
      NL5xNU / TUXEDO Aura 15 Gen1 and Gen2:
      They have a working native and video interface for screen backlight.
      However the default detection mechanism first registers the video interface
      before unregistering it again and switching to the native interface during
      boot. This results in a dangling SBIOS request for backlight change for
      some reason, causing the backlight to switch to ~2% once per boot on the
      first power cord connect or disconnect event. Setting the native interface
      explicitly circumvents this buggy behaviour by avoiding the unregistering
      process.
      
      Reviewed-by: default avatarHans de Goede <hdegoede@redhat.com>
      Signed-off-by: default avatarWerner Sembach <wse@tuxedocomputers.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6f9134dd
    • Ye Bin's avatar
      ext4: fix potential out of bound read in ext4_fc_replay_scan() · f2342948
      Ye Bin authored
      
      [ Upstream commit 1b45cc5c ]
      
      For scan loop must ensure that at least EXT4_FC_TAG_BASE_LEN space. If remain
      space less than EXT4_FC_TAG_BASE_LEN which will lead to out of bound read
      when mounting corrupt file system image.
      ADD_RANGE/HEAD/TAIL is needed to add extra check when do journal scan, as this
      three tags will read data during scan, tag length couldn't less than data length
      which will read.
      
      Cc: stable@kernel.org
      Signed-off-by: default avatarYe Bin <yebin10@huawei.com>
      Link: https://lore.kernel.org/r/20220924075233.2315259-4-yebin10@huawei.com
      
      
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      f2342948
    • Ye Bin's avatar
      ext4: factor out ext4_fc_get_tl() · 3d6873c4
      Ye Bin authored
      
      [ Upstream commit dcc58274 ]
      
      Factor out ext4_fc_get_tl() to fill 'tl' with host byte order.
      
      Signed-off-by: default avatarYe Bin <yebin10@huawei.com>
      Link: https://lore.kernel.org/r/20220924075233.2315259-3-yebin10@huawei.com
      
      
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Stable-dep-of: 1b45cc5c ("ext4: fix potential out of bound read in ext4_fc_replay_scan()")
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      3d6873c4
    • Ye Bin's avatar
      ext4: introduce EXT4_FC_TAG_BASE_LEN helper · 85ddefaa
      Ye Bin authored
      
      [ Upstream commit fdc2a3c7 ]
      
      Introduce EXT4_FC_TAG_BASE_LEN helper for calculate length of
      struct ext4_fc_tl.
      
      Signed-off-by: default avatarYe Bin <yebin10@huawei.com>
      Link: https://lore.kernel.org/r/20220924075233.2315259-2-yebin10@huawei.com
      
      
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Stable-dep-of: 1b45cc5c ("ext4: fix potential out of bound read in ext4_fc_replay_scan()")
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      85ddefaa
    • Jens Axboe's avatar
      io_uring: don't gate task_work run on TIF_NOTIFY_SIGNAL · 5a7d9406
      Jens Axboe authored
      
      [ Upstream commit 46a525e1 ]
      
      This isn't a reliable mechanism to tell if we have task_work pending, we
      really should be looking at whether we have any items queued. This is
      problematic if forward progress is gated on running said task_work. One
      such example is reading from a pipe, where the write side has been closed
      right before the read is started. The fput() of the file queues TWA_RESUME
      task_work, and we need that task_work to be run before ->release() is
      called for the pipe. If ->release() isn't called, then the read will sit
      forever waiting on data that will never arise.
      
      Fix this by io_run_task_work() so it checks if we have task_work pending
      rather than rely on TIF_NOTIFY_SIGNAL for that. The latter obviously
      doesn't work for task_work that is queued without TWA_SIGNAL.
      
      Reported-by: default avatarChristiano Haesbaert <haesbaert@haesbaert.org>
      Cc: stable@vger.kernel.org
      Link: https://github.com/axboe/liburing/issues/665
      
      
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      5a7d9406
    • Deren Wu's avatar
      wifi: mt76: mt7921e: fix random fw download fail · e4c2e8f7
      Deren Wu authored
      
      [ Upstream commit 29e247ec ]
      
      In case of PCIe interoperability problem shows up, the firmware
      payload may be corrupted in download stage. Turn off L0s to keep
      fw download process accurately.
      
      [ 1093.528363] mt7921e 0000:3b:00.0: Message 00000007 (seq 7) timeout
      [ 1093.528414] mt7921e 0000:3b:00.0: Failed to start patch
      [ 1096.600156] mt7921e 0000:3b:00.0: Message 00000010 (seq 8) timeout
      [ 1096.600207] mt7921e 0000:3b:00.0: Failed to release patch semaphore
      [ 1097.699031] mt7921e 0000:3b:00.0: Timeout for driver own
      [ 1098.758427] mt7921e 0000:3b:00.0: Timeout for driver own
      [ 1099.834408] mt7921e 0000:3b:00.0: Timeout for driver own
      [ 1100.915264] mt7921e 0000:3b:00.0: Timeout for driver own
      [ 1101.990625] mt7921e 0000:3b:00.0: Timeout for driver own
      [ 1103.077587] mt7921e 0000:3b:00.0: Timeout for driver own
      [ 1104.173258] mt7921e 0000:3b:00.0: Timeout for driver own
      [ 1105.248466] mt7921e 0000:3b:00.0: Timeout for driver own
      [ 1106.336969] mt7921e 0000:3b:00.0: Timeout for driver own
      [ 1106.397542] mt7921e 0000:3b:00.0: hardware init failed
      
      Cc: stable@vger.kernel.org
      Fixes: bf3747ae ("mt76: mt7921: enable aspm by default")
      Signed-off-by: default avatarDeren Wu <deren.wu@mediatek.com>
      Tested-by: default avatarAngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
      Signed-off-by: default avatarFelix Fietkau <nbd@nbd.name>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      e4c2e8f7
    • Jerry Snitselaar's avatar
      iommu/vt-d: Clean up si_domain in the init_dmars() error path · c4ad3ae4
      Jerry Snitselaar authored
      
      [ Upstream commit 620bf9f9 ]
      
      A splat from kmem_cache_destroy() was seen with a kernel prior to
      commit ee2653bb ("iommu/vt-d: Remove domain and devinfo mempool")
      when there was a failure in init_dmars(), because the iommu_domain
      cache still had objects. While the mempool code is now gone, there
      still is a leak of the si_domain memory if init_dmars() fails. So
      clean up si_domain in the init_dmars() error path.
      
      Cc: Lu Baolu <baolu.lu@linux.intel.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Will Deacon <will@kernel.org>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Fixes: 86080ccc ("iommu/vt-d: Allocate si_domain in init_dmars()")
      Signed-off-by: default avatarJerry Snitselaar <jsnitsel@redhat.com>
      Link: https://lore.kernel.org/r/20221010144842.308890-1-jsnitsel@redhat.com
      
      
      Signed-off-by: default avatarLu Baolu <baolu.lu@linux.intel.com>
      Signed-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      c4ad3ae4
    • Charlotte Tan's avatar
      iommu/vt-d: Allow NVS regions in arch_rmrr_sanity_check() · beea336e
      Charlotte Tan authored
      [ Upstream commit 5566e68d ]
      
      arch_rmrr_sanity_check() warns if the RMRR is not covered by an ACPI
      Reserved region, but it seems like it should accept an NVS region as
      well. The ACPI spec
      https://uefi.org/specs/ACPI/6.5/15_System_Address_Map_Interfaces.html
      uses similar wording for "Reserved" and "NVS" region types; for NVS
      regions it says "This range of addresses is in use or reserved by the
      system and must not be used by the operating system."
      
      There is an old comment on this mailing list that also suggests NVS
      regions should pass the arch_rmrr_sanity_check() test:
      
       The warnings come from arch_rmrr_sanity_check() since it checks whether
       the region is E820_TYPE_RESERVED. However, if the purpose of the check
       is to detect RMRR has regions that may be used by OS as free memory,
       isn't  E820_TYPE_NVS safe, too?
      
      This patch overlaps with another proposed patch that would add the region
      type to the log since sometimes the bug reporter sees this log on the
      console but doesn't know to include the kernel log:
      
      https://lore.kernel.org/lkml/20220611204859.234975-3-atomlin@redhat.com/
      
      Here's an example of the "Firmware Bug" apparent false positive (wrapped
      for line length):
      
       DMAR: [Firmware Bug]: No firmware reserved region can cover this RMRR
             [0x000000006f760000-0x000000006f762fff], contact BIOS vendor for
             fixes
       DMAR: [Firmware Bug]: Your BIOS is broken; bad RMRR
             [0x000000006f760000-0x000000006f762fff]
      
      This is the snippet from the e820 table:
      
       BIOS-e820: [mem 0x0000000068bff000-0x000000006ebfefff] reserved
       BIOS-e820: [mem 0x000000006ebff000-0x000000006f9fefff] ACPI NVS
       BIOS-e820: [mem 0x000000006f9ff000-0x000000006fffefff] ACPI data
      
      Fixes: f036c7fa ("iommu/vt-d: Check VT-d RMRR region in BIOS is reported as reserved")
      Cc: Will Mortensen <will@extrahop.com>
      Link: https://lore.kernel.org/linux-iommu/64a5843d-850d-e58c-4fc2-0a0eeeb656dc@nec.com/
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=216443
      
      
      Signed-off-by: default avatarCharlotte Tan <charlotte@extrahop.com>
      Reviewed-by: default avatarAaron Tomlin <atomlin@redhat.com>
      Link: https://lore.kernel.org/r/20220929044449.32515-1-charlotte@extrahop.com
      
      
      Signed-off-by: default avatarLu Baolu <baolu.lu@linux.intel.com>
      Signed-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      beea336e
    • Daniel Bristot de Oliveira's avatar
      rv/dot2c: Make automaton definition static · 671e8222
      Daniel Bristot de Oliveira authored
      [ Upstream commit 21a1994b ]
      
      Monitor's automata definition is only used locally, so make dot2c generate
      a static definition.
      
      Link: https://lore.kernel.org/all/202208210332.gtHXje45-lkp@intel.com
      Link: https://lore.kernel.org/all/202208210358.6HH3OrVs-lkp@intel.com
      Link: https://lkml.kernel.org/r/ffbb92010f643307766c9307fd42f416e5b85fa0.1661266564.git.bristot@kernel.org
      
      
      
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Fixes: e3c9fc78 ("tools/rv: Add dot2c")
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Signed-off-by: default avatarDaniel Bristot de Oliveira <bristot@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      671e8222
    • Christoph Böhmwalder's avatar
      drbd: only clone bio if we have a backing device · 05580a3b
      Christoph Böhmwalder authored
      
      [ Upstream commit 6d42ddf7 ]
      
      Commit c347a787 (drbd: set ->bi_bdev in drbd_req_new) moved a
      bio_set_dev call (which has since been removed) to "earlier", from
      drbd_request_prepare to drbd_req_new.
      
      The problem is that this accesses device->ldev->backing_bdev, which is
      not NULL-checked at this point. When we don't have an ldev (i.e. when
      the DRBD device is diskless), this leads to a null pointer deref.
      
      So, only allocate the private_bio if we actually have a disk. This is
      also a small optimization, since we don't clone the bio to only to
      immediately free it again in the diskless case.
      
      Fixes: c347a787 ("drbd: set ->bi_bdev in drbd_req_new")
      Co-developed-by: default avatarChristoph Böhmwalder <christoph.boehmwalder@linbit.com>
      Signed-off-by: default avatarChristoph Böhmwalder <christoph.boehmwalder@linbit.com>
      Co-developed-by: default avatarJoel Colledge <joel.colledge@linbit.com>
      Signed-off-by: default avatarJoel Colledge <joel.colledge@linbit.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Link: https://lore.kernel.org/r/20221020085205.129090-1-christoph.boehmwalder@linbit.com
      
      
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      05580a3b
    • Felix Riemann's avatar
      net: phy: dp83822: disable MDI crossover status change interrupt · e0f8ac08
      Felix Riemann authored
      
      [ Upstream commit 7f378c03 ]
      
      If the cable is disconnected the PHY seems to toggle between MDI and
      MDI-X modes. With the MDI crossover status interrupt active this causes
      roughly 10 interrupts per second.
      
      As the crossover status isn't checked by the driver, the interrupt can
      be disabled to reduce the interrupt load.
      
      Fixes: 87461f7a ("net: phy: DP83822 initial driver submission")
      Signed-off-by: default avatarFelix Riemann <felix.riemann@sma.de>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Link: https://lore.kernel.org/r/20221018104755.30025-1-svc.sw.rte.linux@sma.de
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      e0f8ac08
    • Eric Dumazet's avatar
      net: sched: fix race condition in qdisc_graft() · caee0b9d
      Eric Dumazet authored
      
      [ Upstream commit ebda44da ]
      
      We had one syzbot report [1] in syzbot queue for a while.
      I was waiting for more occurrences and/or a repro but
      Dmitry Vyukov spotted the issue right away.
      
      <quoting Dmitry>
      qdisc_graft() drops reference to qdisc in notify_and_destroy
      while it's still assigned to dev->qdisc
      </quoting>
      
      Indeed, RCU rules are clear when replacing a data structure.
      The visible pointer (dev->qdisc in this case) must be updated
      to the new object _before_ RCU grace period is started
      (qdisc_put(old) in this case).
      
      [1]
      BUG: KASAN: use-after-free in __tcf_qdisc_find.part.0+0xa3a/0xac0 net/sched/cls_api.c:1066
      Read of size 4 at addr ffff88802065e038 by task syz-executor.4/21027
      
      CPU: 0 PID: 21027 Comm: syz-executor.4 Not tainted 6.0.0-rc3-syzkaller-00363-g7726d4c3e60b #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/26/2022
      Call Trace:
      <TASK>
      __dump_stack lib/dump_stack.c:88 [inline]
      dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
      print_address_description mm/kasan/report.c:317 [inline]
      print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
      kasan_report+0xb1/0x1e0 mm/kasan/report.c:495
      __tcf_qdisc_find.part.0+0xa3a/0xac0 net/sched/cls_api.c:1066
      __tcf_qdisc_find net/sched/cls_api.c:1051 [inline]
      tc_new_tfilter+0x34f/0x2200 net/sched/cls_api.c:2018
      rtnetlink_rcv_msg+0x955/0xca0 net/core/rtnetlink.c:6081
      netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2501
      netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
      netlink_unicast+0x543/0x7f0 net/netlink/af_netlink.c:1345
      netlink_sendmsg+0x917/0xe10 net/netlink/af_netlink.c:1921
      sock_sendmsg_nosec net/socket.c:714 [inline]
      sock_sendmsg+0xcf/0x120 net/socket.c:734
      ____sys_sendmsg+0x6eb/0x810 net/socket.c:2482
      ___sys_sendmsg+0x110/0x1b0 net/socket.c:2536
      __sys_sendmsg+0xf3/0x1c0 net/socket.c:2565
      do_syscall_x64 arch/x86/entry/common.c:50 [inline]
      do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
      entry_SYSCALL_64_after_hwframe+0x63/0xcd
      RIP: 0033:0x7f5efaa89279
      Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007f5efbc31168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      RAX: ffffffffffffffda RBX: 00007f5efab9bf80 RCX: 00007f5efaa89279
      RDX: 0000000000000000 RSI: 0000000020000140 RDI: 0000000000000005
      RBP: 00007f5efaae32e9 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
      R13: 00007f5efb0cfb1f R14: 00007f5efbc31300 R15: 0000000000022000
      </TASK>
      
      Allocated by task 21027:
      kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
      kasan_set_track mm/kasan/common.c:45 [inline]
      set_alloc_info mm/kasan/common.c:437 [inline]
      ____kasan_kmalloc mm/kasan/common.c:516 [inline]
      ____kasan_kmalloc mm/kasan/common.c:475 [inline]
      __kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
      kmalloc_node include/linux/slab.h:623 [inline]
      kzalloc_node include/linux/slab.h:744 [inline]
      qdisc_alloc+0xb0/0xc50 net/sched/sch_generic.c:938
      qdisc_create_dflt+0x71/0x4a0 net/sched/sch_generic.c:997
      attach_one_default_qdisc net/sched/sch_generic.c:1152 [inline]
      netdev_for_each_tx_queue include/linux/netdevice.h:2437 [inline]
      attach_default_qdiscs net/sched/sch_generic.c:1170 [inline]
      dev_activate+0x760/0xcd0 net/sched/sch_generic.c:1229
      __dev_open+0x393/0x4d0 net/core/dev.c:1441
      __dev_change_flags+0x583/0x750 net/core/dev.c:8556
      rtnl_configure_link+0xee/0x240 net/core/rtnetlink.c:3189
      rtnl_newlink_create net/core/rtnetlink.c:3371 [inline]
      __rtnl_newlink+0x10b8/0x17e0 net/core/rtnetlink.c:3580
      rtnl_newlink+0x64/0xa0 net/core/rtnetlink.c:3593
      rtnetlink_rcv_msg+0x43a/0xca0 net/core/rtnetlink.c:6090
      netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2501
      netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
      netlink_unicast+0x543/0x7f0 net/netlink/af_netlink.c:1345
      netlink_sendmsg+0x917/0xe10 net/netlink/af_netlink.c:1921
      sock_sendmsg_nosec net/socket.c:714 [inline]
      sock_sendmsg+0xcf/0x120 net/socket.c:734
      ____sys_sendmsg+0x6eb/0x810 net/socket.c:2482
      ___sys_sendmsg+0x110/0x1b0 net/socket.c:2536
      __sys_sendmsg+0xf3/0x1c0 net/socket.c:2565
      do_syscall_x64 arch/x86/entry/common.c:50 [inline]
      do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
      entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      Freed by task 21020:
      kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
      kasan_set_track+0x21/0x30 mm/kasan/common.c:45
      kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
      ____kasan_slab_free mm/kasan/common.c:367 [inline]
      ____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
      kasan_slab_free include/linux/kasan.h:200 [inline]
      slab_free_hook mm/slub.c:1754 [inline]
      slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1780
      slab_free mm/slub.c:3534 [inline]
      kfree+0xe2/0x580 mm/slub.c:4562
      rcu_do_batch kernel/rcu/tree.c:2245 [inline]
      rcu_core+0x7b5/0x1890 kernel/rcu/tree.c:2505
      __do_softirq+0x1d3/0x9c6 kernel/softirq.c:571
      
      Last potentially related work creation:
      kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
      __kasan_record_aux_stack+0xbe/0xd0 mm/kasan/generic.c:348
      call_rcu+0x99/0x790 kernel/rcu/tree.c:2793
      qdisc_put+0xcd/0xe0 net/sched/sch_generic.c:1083
      notify_and_destroy net/sched/sch_api.c:1012 [inline]
      qdisc_graft+0xeb1/0x1270 net/sched/sch_api.c:1084
      tc_modify_qdisc+0xbb7/0x1a00 net/sched/sch_api.c:1671
      rtnetlink_rcv_msg+0x43a/0xca0 net/core/rtnetlink.c:6090
      netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2501
      netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
      netlink_unicast+0x543/0x7f0 net/netlink/af_netlink.c:1345
      netlink_sendmsg+0x917/0xe10 net/netlink/af_netlink.c:1921
      sock_sendmsg_nosec net/socket.c:714 [inline]
      sock_sendmsg+0xcf/0x120 net/socket.c:734
      ____sys_sendmsg+0x6eb/0x810 net/socket.c:2482
      ___sys_sendmsg+0x110/0x1b0 net/socket.c:2536
      __sys_sendmsg+0xf3/0x1c0 net/socket.c:2565
      do_syscall_x64 arch/x86/entry/common.c:50 [inline]
      do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
      entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      Second to last potentially related work creation:
      kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
      __kasan_record_aux_stack+0xbe/0xd0 mm/kasan/generic.c:348
      kvfree_call_rcu+0x74/0x940 kernel/rcu/tree.c:3322
      neigh_destroy+0x431/0x630 net/core/neighbour.c:912
      neigh_release include/net/neighbour.h:454 [inline]
      neigh_cleanup_and_release+0x1f8/0x330 net/core/neighbour.c:103
      neigh_del net/core/neighbour.c:225 [inline]
      neigh_remove_one+0x37d/0x460 net/core/neighbour.c:246
      neigh_forced_gc net/core/neighbour.c:276 [inline]
      neigh_alloc net/core/neighbour.c:447 [inline]
      ___neigh_create+0x18b5/0x29a0 net/core/neighbour.c:642
      ip6_finish_output2+0xfb8/0x1520 net/ipv6/ip6_output.c:125
      __ip6_finish_output net/ipv6/ip6_output.c:195 [inline]
      ip6_finish_output+0x690/0x1160 net/ipv6/ip6_output.c:206
      NF_HOOK_COND include/linux/netfilter.h:296 [inline]
      ip6_output+0x1ed/0x540 net/ipv6/ip6_output.c:227
      dst_output include/net/dst.h:451 [inline]
      NF_HOOK include/linux/netfilter.h:307 [inline]
      NF_HOOK include/linux/netfilter.h:301 [inline]
      mld_sendpack+0xa09/0xe70 net/ipv6/mcast.c:1820
      mld_send_cr net/ipv6/mcast.c:2121 [inline]
      mld_ifc_work+0x71c/0xdc0 net/ipv6/mcast.c:2653
      process_one_work+0x991/0x1610 kernel/workqueue.c:2289
      worker_thread+0x665/0x1080 kernel/workqueue.c:2436
      kthread+0x2e4/0x3a0 kernel/kthread.c:376
      ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
      
      The buggy address belongs to the object at ffff88802065e000
      which belongs to the cache kmalloc-1k of size 1024
      The buggy address is located 56 bytes inside of
      1024-byte region [ffff88802065e000, ffff88802065e400)
      
      The buggy address belongs to the physical page:
      page:ffffea0000819600 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x20658
      head:ffffea0000819600 order:3 compound_mapcount:0 compound_pincount:0
      flags: 0xfff00000010200(slab|head|node=0|zone=1|lastcpupid=0x7ff)
      raw: 00fff00000010200 0000000000000000 dead000000000001 ffff888011841dc0
      raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000
      page dumped because: kasan: bad access detected
      page_owner tracks the page as allocated
      page last allocated via order 3, migratetype Unmovable, gfp_mask 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 3523, tgid 3523 (sshd), ts 41495190986, free_ts 41417713212
      prep_new_page mm/page_alloc.c:2532 [inline]
      get_page_from_freelist+0x109b/0x2ce0 mm/page_alloc.c:4283
      __alloc_pages+0x1c7/0x510 mm/page_alloc.c:5515
      alloc_pages+0x1a6/0x270 mm/mempolicy.c:2270
      alloc_slab_page mm/slub.c:1824 [inline]
      allocate_slab+0x27e/0x3d0 mm/slub.c:1969
      new_slab mm/slub.c:2029 [inline]
      ___slab_alloc+0x7f1/0xe10 mm/slub.c:3031
      __slab_alloc.constprop.0+0x4d/0xa0 mm/slub.c:3118
      slab_alloc_node mm/slub.c:3209 [inline]
      __kmalloc_node_track_caller+0x2f2/0x380 mm/slub.c:4955
      kmalloc_reserve net/core/skbuff.c:358 [inline]
      __alloc_skb+0xd9/0x2f0 net/core/skbuff.c:430
      alloc_skb_fclone include/linux/skbuff.h:1307 [inline]
      tcp_stream_alloc_skb+0x38/0x580 net/ipv4/tcp.c:861
      tcp_sendmsg_locked+0xc36/0x2f80 net/ipv4/tcp.c:1325
      tcp_sendmsg+0x2b/0x40 net/ipv4/tcp.c:1483
      inet_sendmsg+0x99/0xe0 net/ipv4/af_inet.c:819
      sock_sendmsg_nosec net/socket.c:714 [inline]
      sock_sendmsg+0xcf/0x120 net/socket.c:734
      sock_write_iter+0x291/0x3d0 net/socket.c:1108
      call_write_iter include/linux/fs.h:2187 [inline]
      new_sync_write fs/read_write.c:491 [inline]
      vfs_write+0x9e9/0xdd0 fs/read_write.c:578
      ksys_write+0x1e8/0x250 fs/read_write.c:631
      page last free stack trace:
      reset_page_owner include/linux/page_owner.h:24 [inline]
      free_pages_prepare mm/page_alloc.c:1449 [inline]
      free_pcp_prepare+0x5e4/0xd20 mm/page_alloc.c:1499
      free_unref_page_prepare mm/page_alloc.c:3380 [inline]
      free_unref_page+0x19/0x4d0 mm/page_alloc.c:3476
      __unfreeze_partials+0x17c/0x1a0 mm/slub.c:2548
      qlink_free mm/kasan/quarantine.c:168 [inline]
      qlist_free_all+0x6a/0x170 mm/kasan/quarantine.c:187
      kasan_quarantine_reduce+0x180/0x200 mm/kasan/quarantine.c:294
      __kasan_slab_alloc+0xa2/0xc0 mm/kasan/common.c:447
      kasan_slab_alloc include/linux/kasan.h:224 [inline]
      slab_post_alloc_hook mm/slab.h:727 [inline]
      slab_alloc_node mm/slub.c:3243 [inline]
      slab_alloc mm/slub.c:3251 [inline]
      __kmem_cache_alloc_lru mm/slub.c:3258 [inline]
      kmem_cache_alloc+0x267/0x3b0 mm/slub.c:3268
      kmem_cache_zalloc include/linux/slab.h:723 [inline]
      alloc_buffer_head+0x20/0x140 fs/buffer.c:2974
      alloc_page_buffers+0x280/0x790 fs/buffer.c:829
      create_empty_buffers+0x2c/0xee0 fs/buffer.c:1558
      ext4_block_write_begin+0x1004/0x1530 fs/ext4/inode.c:1074
      ext4_da_write_begin+0x422/0xae0 fs/ext4/inode.c:2996
      generic_perform_write+0x246/0x560 mm/filemap.c:3738
      ext4_buffered_write_iter+0x15b/0x460 fs/ext4/file.c:270
      ext4_file_write_iter+0x44a/0x1660 fs/ext4/file.c:679
      call_write_iter include/linux/fs.h:2187 [inline]
      new_sync_write fs/read_write.c:491 [inline]
      vfs_write+0x9e9/0xdd0 fs/read_write.c:578
      
      Fixes: af356afa ("net_sched: reintroduce dev->qdisc for use by sch_api")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Diagnosed-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20221018203258.2793282-1-edumazet@google.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      caee0b9d
    • Yang Yingliang's avatar
      net: hns: fix possible memory leak in hnae_ae_register() · 02dc0db1
      Yang Yingliang authored
      
      [ Upstream commit ff2f5ec5 ]
      
      Inject fault while probing module, if device_register() fails,
      but the refcount of kobject is not decreased to 0, the name
      allocated in dev_set_name() is leaked. Fix this by calling
      put_device(), so that name can be freed in callback function
      kobject_cleanup().
      
      unreferenced object 0xffff00c01aba2100 (size 128):
        comm "systemd-udevd", pid 1259, jiffies 4294903284 (age 294.152s)
        hex dump (first 32 bytes):
          68 6e 61 65 30 00 00 00 18 21 ba 1a c0 00 ff ff  hnae0....!......
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<0000000034783f26>] slab_post_alloc_hook+0xa0/0x3e0
          [<00000000748188f2>] __kmem_cache_alloc_node+0x164/0x2b0
          [<00000000ab0743e8>] __kmalloc_node_track_caller+0x6c/0x390
          [<000000006c0ffb13>] kvasprintf+0x8c/0x118
          [<00000000fa27bfe1>] kvasprintf_const+0x60/0xc8
          [<0000000083e10ed7>] kobject_set_name_vargs+0x3c/0xc0
          [<000000000b87affc>] dev_set_name+0x7c/0xa0
          [<000000003fd8fe26>] hnae_ae_register+0xcc/0x190 [hnae]
          [<00000000fe97edc9>] hns_dsaf_ae_init+0x9c/0x108 [hns_dsaf]
          [<00000000c36ff1eb>] hns_dsaf_probe+0x548/0x748 [hns_dsaf]
      
      Fixes: 6fe6611f ("net: add Hisilicon Network Subsystem hnae framework support")
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Link: https://lore.kernel.org/r/20221018122451.1749171-1-yangyingliang@huawei.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      02dc0db1
    • Yang Yingliang's avatar
      wwan_hwsim: fix possible memory leak in wwan_hwsim_dev_new() · d8797331
      Yang Yingliang authored
      
      [ Upstream commit 258ad2fe ]
      
      Inject fault while probing module, if device_register() fails,
      but the refcount of kobject is not decreased to 0, the name
      allocated in dev_set_name() is leaked. Fix this by calling
      put_device(), so that name can be freed in callback function
      kobject_cleanup().
      
      unreferenced object 0xffff88810152ad20 (size 8):
        comm "modprobe", pid 252, jiffies 4294849206 (age 22.713s)
        hex dump (first 8 bytes):
          68 77 73 69 6d 30 00 ff                          hwsim0..
        backtrace:
          [<000000009c3504ed>] __kmalloc_node_track_caller+0x44/0x1b0
          [<00000000c0228a5e>] kvasprintf+0xb5/0x140
          [<00000000cff8c21f>] kvasprintf_const+0x55/0x180
          [<0000000055a1e073>] kobject_set_name_vargs+0x56/0x150
          [<000000000a80b139>] dev_set_name+0xab/0xe0
      
      Fixes: f36a111a ("wwan_hwsim: WWAN device simulator")
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      Reviewed-by: default avatarLoic Poulain <loic.poulain@linaro.org>
      Acked-by: default avatarSergey Ryazanov <ryazanov.s.a@gmail.com>
      Link: https://lore.kernel.org/r/20221018131607.1901641-1-yangyingliang@huawei.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      d8797331
    • Pieter Jansen van Vuuren's avatar
      sfc: include vport_id in filter spec hash and equal() · fb56ab8e
      Pieter Jansen van Vuuren authored
      
      [ Upstream commit c2bf23e4 ]
      
      Filters on different vports are qualified by different implicit MACs and/or
      VLANs, so shouldn't be considered equal even if their other match fields
      are identical.
      
      Fixes: 7c460d9b ("sfc: Extend and abstract efx_filter_spec to cover Huntington/EF10")
      Co-developed-by: default avatarEdward Cree <ecree.xilinx@gmail.com>
      Signed-off-by: default avatarEdward Cree <ecree.xilinx@gmail.com>
      Signed-off-by: default avatarPieter Jansen van Vuuren <pieter.jansen-van-vuuren@amd.com>
      Reviewed-by: default avatarMartin Habets <habetsm.xilinx@gmail.com>
      Link: https://lore.kernel.org/r/20221018092841.32206-1-pieter.jansen-van-vuuren@amd.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      fb56ab8e
    • Harshit Mogalapalli's avatar
      io_uring/msg_ring: Fix NULL pointer dereference in io_msg_send_fd() · 0163e04e
      Harshit Mogalapalli authored
      
      [ Upstream commit 16bbdfe5 ]
      
      Syzkaller produced the below call trace:
      
       BUG: KASAN: null-ptr-deref in io_msg_ring+0x3cb/0x9f0
       Write of size 8 at addr 0000000000000070 by task repro/16399
      
       CPU: 0 PID: 16399 Comm: repro Not tainted 6.1.0-rc1 #28
       Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7
       Call Trace:
        <TASK>
        dump_stack_lvl+0xcd/0x134
        ? io_msg_ring+0x3cb/0x9f0
        kasan_report+0xbc/0xf0
        ? io_msg_ring+0x3cb/0x9f0
        kasan_check_range+0x140/0x190
        io_msg_ring+0x3cb/0x9f0
        ? io_msg_ring_prep+0x300/0x300
        io_issue_sqe+0x698/0xca0
        io_submit_sqes+0x92f/0x1c30
        __do_sys_io_uring_enter+0xae4/0x24b0
      ....
       RIP: 0033:0x7f2eaf8f8289
       RSP: 002b:00007fff40939718 EFLAGS: 00000246 ORIG_RAX: 00000000000001aa
       RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f2eaf8f8289
       RDX: 0000000000000000 RSI: 0000000000006f71 RDI: 0000000000000004
       RBP: 00007fff409397a0 R08: 0000000000000000 R09: 0000000000000039
       R10: 0000000000000000 R11: 0000000000000246 R12: 00000000004006d0
       R13: 00007fff40939880 R14: 0000000000000000 R15: 0000000000000000
        </TASK>
       Kernel panic - not syncing: panic_on_warn set ...
      
      We don't have a NULL check on file_ptr in io_msg_send_fd() function,
      so when file_ptr is NUL src_file is also NULL and get_file()
      dereferences a NULL pointer and leads to above crash.
      
      Add a NULL check to fix this issue.
      
      Fixes: e6130eba ("io_uring: add support for passing fixed file descriptors")
      Reported-by: default avatarsyzkaller <syzkaller@googlegroups.com>
      Signed-off-by: default avatarHarshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
      Link: https://lore.kernel.org/r/20221019171218.1337614-1-harshit.m.mogalapalli@oracle.com
      
      
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      0163e04e
    • Paul Blakey's avatar
      net: Fix return value of qdisc ingress handling on success · 9d7f7277
      Paul Blakey authored
      
      [ Upstream commit 672e97ef ]
      
      Currently qdisc ingress handling (sch_handle_ingress()) doesn't
      set a return value and it is left to the old return value of
      the caller (__netif_receive_skb_core()) which is RX drop, so if
      the packet is consumed, caller will stop and return this value
      as if the packet was dropped.
      
      This causes a problem in the kernel tcp stack when having a
      egress tc rule forwarding to a ingress tc rule.
      The tcp stack sending packets on the device having the egress rule
      will see the packets as not successfully transmitted (although they
      actually were), will not advance it's internal state of sent data,
      and packets returning on such tcp stream will be dropped by the tcp
      stack with reason ack-of-unsent-data. See reproduction in [0] below.
      
      Fix that by setting the return value to RX success if
      the packet was handled successfully.
      
      [0] Reproduction steps:
       $ ip link add veth1 type veth peer name peer1
       $ ip link add veth2 type veth peer name peer2
       $ ifconfig peer1 5.5.5.6/24 up
       $ ip netns add ns0
       $ ip link set dev peer2 netns ns0
       $ ip netns exec ns0 ifconfig peer2 5.5.5.5/24 up
       $ ifconfig veth2 0 up
       $ ifconfig veth1 0 up
      
       #ingress forwarding veth1 <-> veth2
       $ tc qdisc add dev veth2 ingress
       $ tc qdisc add dev veth1 ingress
       $ tc filter add dev veth2 ingress prio 1 proto all flower \
         action mirred egress redirect dev veth1
       $ tc filter add dev veth1 ingress prio 1 proto all flower \
         action mirred egress redirect dev veth2
      
       #steal packet from peer1 egress to veth2 ingress, bypassing the veth pipe
       $ tc qdisc add dev peer1 clsact
       $ tc filter add dev peer1 egress prio 20 proto ip flower \
         action mirred ingress redirect dev veth1
      
       #run iperf and see connection not running
       $ iperf3 -s&
       $ ip netns exec ns0 iperf3 -c 5.5.5.6 -i 1
      
       #delete egress rule, and run again, now should work
       $ tc filter del dev peer1 egress
       $ ip netns exec ns0 iperf3 -c 5.5.5.6 -i 1
      
      Fixes: f697c3e8 ("[NET]: Avoid unnecessary cloning for ingress filtering")
      Signed-off-by: default avatarPaul Blakey <paulb@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      9d7f7277
    • Zhengchao Shao's avatar
      net: sched: sfb: fix null pointer access issue when sfb_init() fails · 723399af
      Zhengchao Shao authored
      
      [ Upstream commit 2a3fc782 ]
      
      When the default qdisc is sfb, if the qdisc of dev_queue fails to be
      inited during mqprio_init(), sfb_reset() is invoked to clear resources.
      In this case, the q->qdisc is NULL, and it will cause gpf issue.
      
      The process is as follows:
      qdisc_create_dflt()
      	sfb_init()
      		tcf_block_get()          --->failed, q->qdisc is NULL
      	...
      	qdisc_put()
      		...
      		sfb_reset()
      			qdisc_reset(q->qdisc)    --->q->qdisc is NULL
      				ops = qdisc->ops
      
      The following is the Call Trace information:
      general protection fault, probably for non-canonical address
      0xdffffc0000000003: 0000 [#1] PREEMPT SMP KASAN
      KASAN: null-ptr-deref in range [0x0000000000000018-0x000000000000001f]
      RIP: 0010:qdisc_reset+0x2b/0x6f0
      Call Trace:
      <TASK>
      sfb_reset+0x37/0xd0
      qdisc_reset+0xed/0x6f0
      qdisc_destroy+0x82/0x4c0
      qdisc_put+0x9e/0xb0
      qdisc_create_dflt+0x2c3/0x4a0
      mqprio_init+0xa71/0x1760
      qdisc_create+0x3eb/0x1000
      tc_modify_qdisc+0x408/0x1720
      rtnetlink_rcv_msg+0x38e/0xac0
      netlink_rcv_skb+0x12d/0x3a0
      netlink_unicast+0x4a2/0x740
      netlink_sendmsg+0x826/0xcc0
      sock_sendmsg+0xc5/0x100
      ____sys_sendmsg+0x583/0x690
      ___sys_sendmsg+0xe8/0x160
      __sys_sendmsg+0xbf/0x160
      do_syscall_64+0x35/0x80
      entry_SYSCALL_64_after_hwframe+0x46/0xb0
      RIP: 0033:0x7f2164122d04
      </TASK>
      
      Fixes: e13e02a3 ("net_sched: SFB flow scheduler")
      Signed-off-by: default avatarZhengchao Shao <shaozhengchao@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      723399af
    • Zhengchao Shao's avatar
      net: sched: delete duplicate cleanup of backlog and qlen · 3c23f9ad
      Zhengchao Shao authored
      
      [ Upstream commit c19d893f ]
      
      qdisc_reset() is clearing qdisc->q.qlen and qdisc->qstats.backlog
      _after_ calling qdisc->ops->reset. There is no need to clear them
      again in the specific reset function.
      
      Signed-off-by: default avatarZhengchao Shao <shaozhengchao@huawei.com>
      Link: https://lore.kernel.org/r/20220824005231.345727-1-shaozhengchao@huawei.com
      
      
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Stable-dep-of: 2a3fc782 ("net: sched: sfb: fix null pointer access issue when sfb_init() fails")
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      3c23f9ad
    • Zhengchao Shao's avatar
      net: sched: cake: fix null pointer access issue when cake_init() fails · 1dc0a019
      Zhengchao Shao authored
      
      [ Upstream commit 51f9a892 ]
      
      When the default qdisc is cake, if the qdisc of dev_queue fails to be
      inited during mqprio_init(), cake_reset() is invoked to clear
      resources. In this case, the tins is NULL, and it will cause gpf issue.
      
      The process is as follows:
      qdisc_create_dflt()
      	cake_init()
      		q->tins = kvcalloc(...)        --->failed, q->tins is NULL
      	...
      	qdisc_put()
      		...
      		cake_reset()
      			...
      			cake_dequeue_one()
      				b = &q->tins[...]   --->q->tins is NULL
      
      The following is the Call Trace information:
      general protection fault, probably for non-canonical address
      0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN
      KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
      RIP: 0010:cake_dequeue_one+0xc9/0x3c0
      Call Trace:
      <TASK>
      cake_reset+0xb1/0x140
      qdisc_reset+0xed/0x6f0
      qdisc_destroy+0x82/0x4c0
      qdisc_put+0x9e/0xb0
      qdisc_create_dflt+0x2c3/0x4a0
      mqprio_init+0xa71/0x1760
      qdisc_create+0x3eb/0x1000
      tc_modify_qdisc+0x408/0x1720
      rtnetlink_rcv_msg+0x38e/0xac0
      netlink_rcv_skb+0x12d/0x3a0
      netlink_unicast+0x4a2/0x740
      netlink_sendmsg+0x826/0xcc0
      sock_sendmsg+0xc5/0x100
      ____sys_sendmsg+0x583/0x690
      ___sys_sendmsg+0xe8/0x160
      __sys_sendmsg+0xbf/0x160
      do_syscall_64+0x35/0x80
      entry_SYSCALL_64_after_hwframe+0x46/0xb0
      RIP: 0033:0x7f89e5122d04
      </TASK>
      
      Fixes: 046f6fd5 ("sched: Add Common Applications Kept Enhanced (cake) qdisc")
      Signed-off-by: default avatarZhengchao Shao <shaozhengchao@huawei.com>
      Acked-by: default avatarToke Høiland-Jørgensen <toke@toke.dk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      1dc0a019
    • Sagi Grimberg's avatar
      nvmet: fix workqueue MEM_RECLAIM flushing dependency · 4c707a6c
      Sagi Grimberg authored
      
      [ Upstream commit ddd2b8de ]
      
      The keep alive timer needs to stay on nvmet_wq, and not
      modified to reschedule on the system_wq.
      
      This fixes a warning:
      ------------[ cut here ]------------
      workqueue: WQ_MEM_RECLAIM
      nvmet-wq:nvmet_rdma_release_queue_work [nvmet_rdma] is flushing
      !WQ_MEM_RECLAIM events:nvmet_keep_alive_timer [nvmet]
      WARNING: CPU: 3 PID: 1086 at kernel/workqueue.c:2628
      check_flush_dependency+0x16c/0x1e0
      
      Reported-by: default avatarYi Zhang <yi.zhang@redhat.com>
      Fixes: 8832cf92 ("nvmet: use a private workqueue instead of the system workqueue")
      Signed-off-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Reviewed-by: default avatarChaitanya Kulkarni <kch@nvidia.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      4c707a6c
    • Serge Semin's avatar
      nvme-hwmon: kmalloc the NVME SMART log buffer · 704d5f5d
      Serge Semin authored
      
      [ Upstream commit c94b7f9b ]
      
      Recent commit 52fde2c0 ("nvme: set dma alignment to dword") has
      caused a regression on our platform.
      
      It turned out that the nvme_get_log() method invocation caused the
      nvme_hwmon_data structure instance corruption.  In particular the
      nvme_hwmon_data.ctrl pointer was overwritten either with zeros or with
      garbage.  After some research we discovered that the problem happened
      even before the actual NVME DMA execution, but during the buffer mapping.
      Since our platform is DMA-noncoherent, the mapping implied the cache-line
      invalidations or write-backs depending on the DMA-direction parameter.
      In case of the NVME SMART log getting the DMA was performed
      from-device-to-memory, thus the cache-invalidation was activated during
      the buffer mapping.  Since the log-buffer isn't cache-line aligned, the
      cache-invalidation caused the neighbour data to be discarded.  The
      neighbouring data turned to be the data surrounding the buffer in the
      framework of the nvme_hwmon_data structure.
      
      In order to fix that we need to make sure that the whole log-buffer is
      defined within the cache-line-aligned memory region so the
      cache-invalidation procedure wouldn't involve the adjacent data. One of
      the option to guarantee that is to kmalloc the DMA-buffer [1]. Seeing the
      rest of the NVME core driver prefer that method it has been chosen to fix
      this problem too.
      
      Note after a deeper researches we found out that the denoted commit wasn't
      a root cause of the problem. It just revealed the invalidity by activating
      the DMA-based NVME SMART log getting performed in the framework of the
      NVME hwmon driver. The problem was here since the initial commit of the
      driver.
      
      [1] Documentation/core-api/dma-api-howto.rst
      
      Fixes: 400b6a7b ("nvme: Add hardware monitoring support")
      Signed-off-by: default avatarSerge Semin <Sergey.Semin@baikalelectronics.ru>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      704d5f5d
    • Christoph Hellwig's avatar
      nvme-hwmon: consistently ignore errors from nvme_hwmon_init · dece37b2
      Christoph Hellwig authored
      
      [ Upstream commit 6b8cf940 ]
      
      An NVMe controller works perfectly fine even when the hwmon
      initialization fails.  Stop returning errors that do not come from a
      controller reset from nvme_hwmon_init to handle this case consistently.
      
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Reviewed-by: default avatarSerge Semin <fancer.lancer@gmail.com>
      Stable-dep-of: c94b7f9b ("nvme-hwmon: kmalloc the NVME SMART log buffer")
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      dece37b2
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: relax NFTA_SET_ELEM_KEY_END set flags requirements · 47772604
      Pablo Neira Ayuso authored
      
      [ Upstream commit 96df8360 ]
      
      Otherwise EINVAL is bogusly reported to userspace when deleting a set
      element. NFTA_SET_ELEM_KEY_END does not need to be set in case of:
      
      - insertion: if not present, start key is used as end key.
      - deletion: only start key needs to be specified, end key is ignored.
      
      Hence, relax the sanity check.
      
      Fixes: 88cccd90 ("netfilter: nf_tables: NFTA_SET_ELEM_KEY_END requires concat and interval flags")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      47772604
    • Guillaume Nault's avatar
      netfilter: rpfilter/fib: Set ->flowic_uid correctly for user namespaces. · de16491c
      Guillaume Nault authored
      
      [ Upstream commit 1fcc064b ]
      
      Currently netfilter's rpfilter and fib modules implicitely initialise
      ->flowic_uid with 0. This is normally the root UID. However, this isn't
      the case in user namespaces, where user ID 0 is mapped to a different
      kernel UID. By initialising ->flowic_uid with sock_net_uid(), we get
      the root UID of the user namespace, thus keeping the same behaviour
      whether or not we're running in a user namepspace.
      
      Note, this is similar to commit 8bcfd092 ("ipv4: add missing
      initialization for flowi4_uid"), which fixed the rp_filter sysctl.
      
      Fixes: 622ec2c9 ("net: core: add UID to flows, rules, and routes")
      Signed-off-by: default avatarGuillaume Nault <gnault@redhat.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      de16491c
    • Phil Sutter's avatar
      netfilter: rpfilter/fib: Populate flowic_l3mdev field · 14051ae7
      Phil Sutter authored
      
      [ Upstream commit acc641ab ]
      
      Use the introduced field for correct operation with VRF devices instead
      of conditionally overwriting flowic_oif. This is a partial revert of
      commit b575b24b ("netfilter: Fix rpfilter dropping vrf packets by
      mistake"), implementing a simpler solution.
      
      Signed-off-by: default avatarPhil Sutter <phil@nwl.cc>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Reviewed-by: default avatarGuillaume Nault <gnault@redhat.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Stable-dep-of: 1fcc064b ("netfilter: rpfilter/fib: Set ->flowic_uid correctly for user namespaces.")
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      14051ae7
    • Brett Creeley's avatar
      ionic: catch NULL pointer issue on reconfig · 0e0bf291
      Brett Creeley authored
      
      [ Upstream commit aa1d7e12 ]
      
      It's possible that the driver will dereference a qcq that doesn't exist
      when calling ionic_reconfigure_queues(), which causes a page fault BUG.
      
      If a reduction in the number of queues is followed by a different
      reconfig such as changing the ring size, the driver can hit a NULL
      pointer when trying to clean up non-existent queues.
      
      Fix this by checking to make sure both the qcqs array and qcq entry
      exists bofore trying to use and free the entry.
      
      Fixes: 101b40a0 ("ionic: change queue count with no reset")
      Signed-off-by: default avatarBrett Creeley <brett@pensando.io>
      Signed-off-by: default avatarShannon Nelson <snelson@pensando.io>
      Link: https://lore.kernel.org/r/20221017233123.15869-1-snelson@pensando.io
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      0e0bf291
    • Eric Dumazet's avatar
      net: hsr: avoid possible NULL deref in skb_clone() · c46f2e0f
      Eric Dumazet authored
      
      [ Upstream commit d8b57135 ]
      
      syzbot got a crash [1] in skb_clone(), caused by a bug
      in hsr_get_untagged_frame().
      
      When/if create_stripped_skb_hsr() returns NULL, we must
      not attempt to call skb_clone().
      
      While we are at it, replace a WARN_ONCE() by netdev_warn_once().
      
      [1]
      general protection fault, probably for non-canonical address 0xdffffc000000000f: 0000 [#1] PREEMPT SMP KASAN
      KASAN: null-ptr-deref in range [0x0000000000000078-0x000000000000007f]
      CPU: 1 PID: 754 Comm: syz-executor.0 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
      RIP: 0010:skb_clone+0x108/0x3c0 net/core/skbuff.c:1641
      Code: 93 02 00 00 49 83 7c 24 28 00 0f 85 e9 00 00 00 e8 5d 4a 29 fa 4c 8d 75 7e 48 b8 00 00 00 00 00 fc ff df 4c 89 f2 48 c1 ea 03 <0f> b6 04 02 4c 89 f2 83 e2 07 38 d0 7f 08 84 c0 0f 85 9e 01 00 00
      RSP: 0018:ffffc90003ccf4e0 EFLAGS: 00010207
      
      RAX: dffffc0000000000 RBX: ffffc90003ccf5f8 RCX: ffffc9000c24b000
      RDX: 000000000000000f RSI: ffffffff8751cb13 RDI: 0000000000000000
      RBP: 0000000000000000 R08: 00000000000000f0 R09: 0000000000000140
      R10: fffffbfff181d972 R11: 0000000000000000 R12: ffff888161fc3640
      R13: 0000000000000a20 R14: 000000000000007e R15: ffffffff8dc5f620
      FS: 00007feb621e4700(0000) GS:ffff8880b9b00000(0000) knlGS:0000000000000000
      CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007feb621e3ff8 CR3: 00000001643a9000 CR4: 00000000003506e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
      <TASK>
      hsr_get_untagged_frame+0x4e/0x610 net/hsr/hsr_forward.c:164
      hsr_forward_do net/hsr/hsr_forward.c:461 [inline]
      hsr_forward_skb+0xcca/0x1d50 net/hsr/hsr_forward.c:623
      hsr_handle_frame+0x588/0x7c0 net/hsr/hsr_slave.c:69
      __netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
      __netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
      __netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
      netif_receive_skb_internal net/core/dev.c:5685 [inline]
      netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
      tun_rx_batched+0x4ab/0x7a0 drivers/net/tun.c:1544
      tun_get_user+0x2686/0x3a00 drivers/net/tun.c:1995
      tun_chr_write_iter+0xdb/0x200 drivers/net/tun.c:2025
      call_write_iter include/linux/fs.h:2187 [inline]
      new_sync_write fs/read_write.c:491 [inline]
      vfs_write+0x9e9/0xdd0 fs/read_write.c:584
      ksys_write+0x127/0x250 fs/read_write.c:637
      do_syscall_x64 arch/x86/entry/common.c:50 [inline]
      do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
      entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      Fixes: f266a683 ("net/hsr: Better frame dispatch")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20221017165928.2150130-1-edumazet@google.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      c46f2e0f
    • Vikas Gupta's avatar
      bnxt_en: fix memory leak in bnxt_nvm_test() · be083d97
      Vikas Gupta authored
      
      [ Upstream commit ba077d68 ]
      
      Free the kzalloc'ed buffer before returning in the success path.
      
      Fixes: 5b6ff128 ("bnxt_en: implement callbacks for devlink selftests")
      Signed-off-by: default avatarVikas Gupta <vikas.gupta@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Link: https://lore.kernel.org/r/1666020742-25834-1-git-send-email-michael.chan@broadcom.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      be083d97
    • Guenter Roeck's avatar
      drm/amd/display: Increase frame size limit for display_mode_vba_util_32.o · 84ea92c6
      Guenter Roeck authored
      
      [ Upstream commit 8a70b2d8 ]
      
      Building 32-bit images may fail with the following error.
      
      drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn32/display_mode_vba_util_32.c:
      	In function ‘dml32_UseMinimumDCFCLK’:
      drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn32/display_mode_vba_util_32.c:3142:1:
      	error: the frame size of 1096 bytes is larger than 1024 bytes
      
      This is seen when building i386:allmodconfig with any of the following
      compilers.
      
      	gcc (Debian 12.2.0-3) 12.2.0
      	gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
      
      The problem is not seen if the compiler supports GCC_PLUGIN_LATENT_ENTROPY
      because in that case CONFIG_FRAME_WARN is already set to 2048 even for
      32-bit builds.
      
      dml32_UseMinimumDCFCLK() was introduced with commit dda4fb85
      ("drm/amd/display: DML changes for DCN32/321"). It declares a large
      number of local variables. Increase the frame size for the affected
      file to 2048, similar to other files in the same directory, to enable
      32-bit build tests with affected compilers.
      
      Fixes: dda4fb85 ("drm/amd/display: DML changes for DCN32/321")
      Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
      Reported-by: default avatarŁukasz Bartosik <ukaszb@google.com>
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      84ea92c6
    • Genjian Zhang's avatar
      dm: remove unnecessary assignment statement in alloc_dev() · d9b4cfa7
      Genjian Zhang authored
      
      [ Upstream commit 99f4f5bc ]
      
      Fixes: 74fe6ba9 ("dm: convert to blk_alloc_disk/blk_cleanup_disk")
      Signed-off-by: default avatarGenjian Zhang <zhanggenjian@kylinos.cn>
      Signed-off-by: default avatarMike Snitzer <snitzer@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      d9b4cfa7
    • Zhang Xiaoxu's avatar
      cifs: Fix memory leak when build ntlmssp negotiate blob failed · fa5a70bd
      Zhang Xiaoxu authored
      
      [ Upstream commit 30b2d7f8 ]
      
      There is a memory leak when mount cifs:
        unreferenced object 0xffff888166059600 (size 448):
          comm "mount.cifs", pid 51391, jiffies 4295596373 (age 330.596s)
          hex dump (first 32 bytes):
            fe 53 4d 42 40 00 00 00 00 00 00 00 01 00 82 00  .SMB@...........
            00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
          backtrace:
            [<0000000060609a61>] mempool_alloc+0xe1/0x260
            [<00000000adfa6c63>] cifs_small_buf_get+0x24/0x60
            [<00000000ebb404c7>] __smb2_plain_req_init+0x32/0x460
            [<00000000bcf875b4>] SMB2_sess_alloc_buffer+0xa4/0x3f0
            [<00000000753a2987>] SMB2_sess_auth_rawntlmssp_negotiate+0xf5/0x480
            [<00000000f0c1f4f9>] SMB2_sess_setup+0x253/0x410
            [<00000000a8b83303>] cifs_setup_session+0x18f/0x4c0
            [<00000000854bd16d>] cifs_get_smb_ses+0xae7/0x13c0
            [<000000006cbc43d9>] mount_get_conns+0x7a/0x730
            [<000000005922d816>] cifs_mount+0x103/0xd10
            [<00000000e33def3b>] cifs_smb3_do_mount+0x1dd/0xc90
            [<0000000078034979>] smb3_get_tree+0x1d5/0x300
            [<000000004371f980>] vfs_get_tree+0x41/0xf0
            [<00000000b670d8a7>] path_mount+0x9b3/0xdd0
            [<000000005e839a7d>] __x64_sys_mount+0x190/0x1d0
            [<000000009404c3b9>] do_syscall_64+0x35/0x80
      
      When build ntlmssp negotiate blob failed, the session setup request
      should be freed.
      
      Fixes: 49bd49f9 ("cifs: send workstation name during ntlmssp session setup")
      Reviewed-by: default avatarPaulo Alcantara (SUSE) <pc@cjr.nz>
      Reviewed-by: default avatarShyam Prasad N <sprasad@microsoft.com>
      Signed-off-by: default avatarZhang Xiaoxu <zhangxiaoxu5@huawei.com>
      Signed-off-by: default avatarSteve French <stfrench@microsoft.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      fa5a70bd
    • Zhang Xiaoxu's avatar
      cifs: Fix xid leak in cifs_ses_add_channel() · db2a8b6c
      Zhang Xiaoxu authored
      
      [ Upstream commit e909d054 ]
      
      Before return, should free the xid, otherwise, the
      xid will be leaked.
      
      Fixes: d70e9fa5 ("cifs: try opening channels after mounting")
      Reviewed-by: default avatarPaulo Alcantara (SUSE) <pc@cjr.nz>
      Signed-off-by: default avatarZhang Xiaoxu <zhangxiaoxu5@huawei.com>
      Signed-off-by: default avatarSteve French <stfrench@microsoft.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      db2a8b6c
    • Zhang Xiaoxu's avatar
      cifs: Fix xid leak in cifs_flock() · f8c9b4a9
      Zhang Xiaoxu authored
      
      [ Upstream commit 575e079c ]
      
      If not flock, before return -ENOLCK, should free the xid,
      otherwise, the xid will be leaked.
      
      Fixes: d0677992 ("cifs: add support for flock")
      Reviewed-by: default avatarPaulo Alcantara (SUSE) <pc@cjr.nz>
      Signed-off-by: default avatarZhang Xiaoxu <zhangxiaoxu5@huawei.com>
      Signed-off-by: default avatarSteve French <stfrench@microsoft.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      f8c9b4a9
    • Zhang Xiaoxu's avatar
      cifs: Fix xid leak in cifs_copy_file_range() · dc283313
      Zhang Xiaoxu authored
      
      [ Upstream commit 9a97df40 ]
      
      If the file is used by swap, before return -EOPNOTSUPP, should
      free the xid, otherwise, the xid will be leaked.
      
      Fixes: 4e8aea30 ("smb3: enable swap on SMB3 mounts")
      Reviewed-by: default avatarPaulo Alcantara (SUSE) <pc@cjr.nz>
      Signed-off-by: default avatarZhang Xiaoxu <zhangxiaoxu5@huawei.com>
      Signed-off-by: default avatarSteve French <stfrench@microsoft.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      dc283313
    • Zhang Xiaoxu's avatar
      cifs: Fix xid leak in cifs_create() · 92aa09c8
      Zhang Xiaoxu authored
      
      [ Upstream commit fee0fb1f ]
      
      If the cifs already shutdown, we should free the xid before return,
      otherwise, the xid will be leaked.
      
      Fixes: 087f757b ("cifs: add shutdown support")
      Reviewed-by: default avatarPaulo Alcantara (SUSE) <pc@cjr.nz>
      Signed-off-by: default avatarZhang Xiaoxu <zhangxiaoxu5@huawei.com>
      Signed-off-by: default avatarSteve French <stfrench@microsoft.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      92aa09c8
    • Zhengchao Shao's avatar
      ip6mr: fix UAF issue in ip6mr_sk_done() when addrconf_init_net() failed · 22a68c3b
      Zhengchao Shao authored
      
      [ Upstream commit 1ca69520 ]
      
      If the initialization fails in calling addrconf_init_net(), devconf_all is
      the pointer that has been released. Then ip6mr_sk_done() is called to
      release the net, accessing devconf->mc_forwarding directly causes invalid
      pointer access.
      
      The process is as follows:
      setup_net()
      	ops_init()
      		addrconf_init_net()
      		all = kmemdup(...)           ---> alloc "all"
      		...
      		net->ipv6.devconf_all = all;
      		__addrconf_sysctl_register() ---> failed
      		...
      		kfree(all);                  ---> ipv6.devconf_all invalid
      		...
      	ops_exit_list()
      		...
      		ip6mr_sk_done()
      			devconf = net->ipv6.devconf_all;
      			//devconf is invalid pointer
      			if (!devconf || !atomic_read(&devconf->mc_forwarding))
      
      The following is the Call Trace information:
      BUG: KASAN: use-after-free in ip6mr_sk_done+0x112/0x3a0
      Read of size 4 at addr ffff888075508e88 by task ip/14554
      Call Trace:
      <TASK>
      dump_stack_lvl+0x8e/0xd1
      print_report+0x155/0x454
      kasan_report+0xba/0x1f0
      kasan_check_range+0x35/0x1b0
      ip6mr_sk_done+0x112/0x3a0
      rawv6_close+0x48/0x70
      inet_release+0x109/0x230
      inet6_release+0x4c/0x70
      sock_release+0x87/0x1b0
      igmp6_net_exit+0x6b/0x170
      ops_exit_list+0xb0/0x170
      setup_net+0x7ac/0xbd0
      copy_net_ns+0x2e6/0x6b0
      create_new_namespaces+0x382/0xa50
      unshare_nsproxy_namespaces+0xa6/0x1c0
      ksys_unshare+0x3a4/0x7e0
      __x64_sys_unshare+0x2d/0x40
      do_syscall_64+0x35/0x80
      entry_SYSCALL_64_after_hwframe+0x46/0xb0
      RIP: 0033:0x7f7963322547
      
      </TASK>
      Allocated by task 14554:
      kasan_save_stack+0x1e/0x40
      kasan_set_track+0x21/0x30
      __kasan_kmalloc+0xa1/0xb0
      __kmalloc_node_track_caller+0x4a/0xb0
      kmemdup+0x28/0x60
      addrconf_init_net+0x1be/0x840
      ops_init+0xa5/0x410
      setup_net+0x5aa/0xbd0
      copy_net_ns+0x2e6/0x6b0
      create_new_namespaces+0x382/0xa50
      unshare_nsproxy_namespaces+0xa6/0x1c0
      ksys_unshare+0x3a4/0x7e0
      __x64_sys_unshare+0x2d/0x40
      do_syscall_64+0x35/0x80
      entry_SYSCALL_64_after_hwframe+0x46/0xb0
      
      Freed by task 14554:
      kasan_save_stack+0x1e/0x40
      kasan_set_track+0x21/0x30
      kasan_save_free_info+0x2a/0x40
      ____kasan_slab_free+0x155/0x1b0
      slab_free_freelist_hook+0x11b/0x220
      __kmem_cache_free+0xa4/0x360
      addrconf_init_net+0x623/0x840
      ops_init+0xa5/0x410
      setup_net+0x5aa/0xbd0
      copy_net_ns+0x2e6/0x6b0
      create_new_namespaces+0x382/0xa50
      unshare_nsproxy_namespaces+0xa6/0x1c0
      ksys_unshare+0x3a4/0x7e0
      __x64_sys_unshare+0x2d/0x40
      do_syscall_64+0x35/0x80
      entry_SYSCALL_64_after_hwframe+0x46/0xb0
      
      Fixes: 7d9b1b57 ("ip6mr: fix use-after-free in ip6mr_sk_done()")
      Signed-off-by: default avatarZhengchao Shao <shaozhengchao@huawei.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20221017080331.16878-1-shaozhengchao@huawei.com
      
      
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      22a68c3b
    • Kuniyuki Iwashima's avatar
      udp: Update reuse->has_conns under reuseport_lock. · 1fb3a672
      Kuniyuki Iwashima authored
      [ Upstream commit 69421bf9 ]
      
      When we call connect() for a UDP socket in a reuseport group, we have
      to update sk->sk_reuseport_cb->has_conns to 1.  Otherwise, the kernel
      could select a unconnected socket wrongly for packets sent to the
      connected socket.
      
      However, the current way to set has_conns is illegal and possible to
      trigger that problem.  reuseport_has_conns() changes has_conns under
      rcu_read_lock(), which upgrades the RCU reader to the updater.  Then,
      it must do the update under the updater's lock, reuseport_lock, but
      it doesn't for now.
      
      For this reason, there is a race below where we fail to set has_conns
      resulting in the wrong socket selection.  To avoid the race, let's split
      the reader and updater with proper locking.
      
       cpu1                               cpu2
      +----+                             +----+
      
      __ip[46]_datagram_connect()        reuseport_grow()
      .                                  .
      |- reuseport_has_conns(sk, true)   |- more_reuse = __reuseport_alloc(more_socks_size)
      |  .                               |
      |  |- rcu_read_lock()
      |  |- reuse = rcu_dereference(sk->sk_reuseport_cb)
      |  |
      |  |                               |  /* reuse->has_conns == 0 here */
      |  |                               |- more_reuse->has_conns = reuse->has_conns
      |  |- reuse->has_conns = 1         |  /* more_reuse->has_conns SHOULD BE 1 HERE */
      |  |                               |
      |  |                               |- rcu_assign_pointer(reuse->socks[i]->sk_reuseport_cb,
      |  |                               |                     more_reuse)
      |  `- rcu_read_unlock()            `- kfree_rcu(reuse, rcu)
      |
      |- sk->sk_state = TCP_ESTABLISHED
      
      Note the likely(reuse) in reuseport_has_conns_set() is always true,
      but we put the test there for ease of review.  [0]
      
      For the record, usually, sk_reuseport_cb is changed under lock_sock().
      The only exception is reuseport_grow() & TCP reqsk migration case.
      
        1) shutdown() TCP listener, which is moved into the latter part of
           reuse->socks[] to migrate reqsk.
      
        2) New listen() overflows reuse->socks[] and call reuseport_grow().
      
        3) reuse->max_socks overflows u16 with the new listener.
      
        4) reuseport_grow() pops the old shutdown()ed listener from the array
           and update its sk->sk_reuseport_cb as NULL without lock_sock().
      
      shutdown()ed TCP sk->sk_reuseport_cb can be changed without lock_sock(),
      but, reuseport_has_conns_set() is called only for UDP under lock_sock(),
      so likely(reuse) never be false in reuseport_has_conns_set().
      
      [0]: https://lore.kernel.org/netdev/CANn89iLja=eQHbsM_Ta2sQF0tOGU8vAGrh_izRuuHjuO1ouUag@mail.gmail.com/
      
      
      
      Fixes: acdcecc6 ("udp: correct reuseport selection with connected sockets")
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Link: https://lore.kernel.org/r/20221014182625.89913-1-kuniyu@amazon.com
      
      
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      1fb3a672
Loading