Forum | Documentation | Website | Blog

Skip to content
Snippets Groups Projects
  1. Jul 02, 2022
    • Greg Kroah-Hartman's avatar
    • Vladimir Oltean's avatar
      net: mscc: ocelot: allow unregistered IP multicast flooding · 2d10984d
      Vladimir Oltean authored
      Flooding of unregistered IP multicast has been broken (both to other
      switch ports and to the CPU) since the ocelot driver introduction, and
      up until commit 4cf35a2b ("net: mscc: ocelot: fix broken IP
      multicast flooding"), a bug fix for commit 421741ea ("net: mscc:
      ocelot: offload bridge port flags to device") from v5.12.
      
      The driver used to set PGID_MCIPV4 and PGID_MCIPV6 to the empty port
      mask (0), which made unregistered IPv4/IPv6 multicast go nowhere, and
      without ever modifying that port mask at runtime.
      
      The expectation is that such packets are treated as broadcast, and
      flooded according to the forwarding domain (to the CPU if the port is
      standalone, or to the CPU and other bridged ports, if under a bridge).
      
      Since the aforementioned commit, the limitation has been lifted by
      responding to SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS events emitted by the
      bridge. As for host flooding, DSA synthesizes another call to
      ocelot_port_bridge_flags() on the NPI port which ensures that the CPU
      gets the unregistered multicast traffic it might need, for example for
      smcroute to work between standalone ports.
      
      But between v4.18 and v5.12, IP multicast flooding has remained unfixed.
      
      Delete the inexplicable premature optimization of clearing PGID_MCIPV4
      and PGID_MCIPV6 as part of the init sequence, and allow unregistered IP
      multicast to be flooded freely according to the forwarding domain
      established by PGID_SRC, by explicitly programming PGID_MCIPV4 and
      PGID_MCIPV6 towards all physical ports plus the CPU port module.
      
      Fixes: a556c76a
      
       ("net: mscc: Add initial Ocelot switch support")
      Cc: stable@kernel.org
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2d10984d
    • Naveen N. Rao's avatar
      powerpc/ftrace: Remove ftrace init tramp once kernel init is complete · 6a656280
      Naveen N. Rao authored
      commit 84ade0a6 upstream.
      
      Stop using the ftrace trampoline for init section once kernel init is
      complete.
      
      Fixes: 67361cf8
      
       ("powerpc/ftrace: Handle large kernel configs")
      Cc: stable@vger.kernel.org # v4.20+
      Signed-off-by: default avatarNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20220516071422.463738-1-naveen.n.rao@linux.vnet.ibm.com
      
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6a656280
    • Dave Chinner's avatar
      xfs: check sb_meta_uuid for dabuf buffer recovery · 6b734f7b
      Dave Chinner authored
      commit 09654ed8 upstream.
      
      Got a report that a repeated crash test of a container host would
      eventually fail with a log recovery error preventing the system from
      mounting the root filesystem. It manifested as a directory leaf node
      corruption on writeback like so:
      
       XFS (loop0): Mounting V5 Filesystem
       XFS (loop0): Starting recovery (logdev: internal)
       XFS (loop0): Metadata corruption detected at xfs_dir3_leaf_check_int+0x99/0xf0, xfs_dir3_leaf1 block 0x12faa158
       XFS (loop0): Unmount and run xfs_repair
       XFS (loop0): First 128 bytes of corrupted metadata buffer:
       00000000: 00 00 00 00 00 00 00 00 3d f1 00 00 e1 9e d5 8b  ........=.......
       00000010: 00 00 00 00 12 fa a1 58 00 00 00 29 00 00 1b cc  .......X...)....
       00000020: 91 06 78 ff f7 7e 4a 7d 8d 53 86 f2 ac 47 a8 23  ..x..~J}.S...G.#
       00000030: 00 00 00 00 17 e0 00 80 00 43 00 00 00 00 00 00  .........C......
       00000040: 00 00 00 2e 00 00 00 08 00 00 17 2e 00 00 00 0a  ................
       00000050: 02 35 79 83 00 00 00 30 04 d3 b4 80 00 00 01 50  .5y....0.......P
       00000060: 08 40 95 7f 00 00 02 98 08 41 fe b7 00 00 02 d4  .@.......A......
       00000070: 0d 62 ef a7 00 00 01 f2 14 50 21 41 00 00 00 0c  .b.......P!A....
       XFS (loop0): Corruption of in-memory data (0x8) detected at xfs_do_force_shutdown+0x1a/0x20 (fs/xfs/xfs_buf.c:1514).  Shutting down.
       XFS (loop0): Please unmount the filesystem and rectify the problem(s)
       XFS (loop0): log mount/recovery failed: error -117
       XFS (loop0): log mount failed
      
      Tracing indicated that we were recovering changes from a transaction
      at LSN 0x29/0x1c16 into a buffer that had an LSN of 0x29/0x1d57.
      That is, log recovery was overwriting a buffer with newer changes on
      disk than was in the transaction. Tracing indicated that we were
      hitting the "recovery immediately" case in
      xfs_buf_log_recovery_lsn(), and hence it was ignoring the LSN in the
      buffer.
      
      The code was extracting the LSN correctly, then ignoring it because
      the UUID in the buffer did not match the superblock UUID. The
      problem arises because the UUID check uses the wrong UUID - it
      should be checking the sb_meta_uuid, not sb_uuid. This filesystem
      has sb_uuid != sb_meta_uuid (which is fine), and the buffer has the
      correct matching sb_meta_uuid in it, it's just the code checked it
      against the wrong superblock uuid.
      
      The is no corruption in the filesystem, and failing to recover the
      buffer due to a write verifier failure means the recovery bug did
      not propagate the corruption to disk. Hence there is no corruption
      before or after this bug has manifested, the impact is limited
      simply to an unmountable filesystem....
      
      This was missed back in 2015 during an audit of incorrect sb_uuid
      usage that resulted in commit fcfbe2c4 ("xfs: log recovery needs
      to validate against sb_meta_uuid") that fixed the magic32 buffers to
      validate against sb_meta_uuid instead of sb_uuid. It missed the
      magicda buffers....
      
      Fixes: ce748eaa
      
       ("xfs: create new metadata UUID field and incompat flag")
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: default avatarAmir Goldstein <amir73il@gmail.com>
      Acked-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6b734f7b
    • Darrick J. Wong's avatar
      xfs: remove all COW fork extents when remounting readonly · 071e750f
      Darrick J. Wong authored
      commit 089558bc upstream.
      
      [backport xfs_icwalk -> xfs_eofblocks for 5.10.y]
      
      As part of multiple customer escalations due to file data corruption
      after copy on write operations, I wrote some fstests that use fsstress
      to hammer on COW to shake things loose.  Regrettably, I caught some
      filesystem shutdowns due to incorrect rmap operations with the following
      loop:
      
      mount <filesystem>				# (0)
      fsstress <run only readonly ops> &		# (1)
      while true; do
      	fsstress <run all ops>
      	mount -o remount,ro			# (2)
      	fsstress <run only readonly ops>
      	mount -o remount,rw			# (3)
      done
      
      When (2) happens, notice that (1) is still running.  xfs_remount_ro will
      call xfs_blockgc_stop to walk the inode cache to free all the COW
      extents, but the blockgc mechanism races with (1)'s reader threads to
      take IOLOCKs and loses, which means that it doesn't clean them all out.
      Call such a file (A).
      
      When (3) happens, xfs_remount_rw calls xfs_reflink_recover_cow, which
      walks the ondisk refcount btree and frees any COW extent that it finds.
      This function does not check the inode cache, which means that incore
      COW forks of inode (A) is now inconsistent with the ondisk metadata.  If
      one of those former COW extents are allocated and mapped into another
      file (B) and someone triggers a COW to the stale reservation in (A), A's
      dirty data will be written into (B) and once that's done, those blocks
      will be transferred to (A)'s data fork without bumping the refcount.
      
      The results are catastrophic -- file (B) and the refcount btree are now
      corrupt.  Solve this race by forcing the xfs_blockgc_free_space to run
      synchronously, which causes xfs_icwalk to return to inodes that were
      skipped because the blockgc code couldn't take the IOLOCK.  This is safe
      to do here because the VFS has already prohibited new writer threads.
      
      Fixes: 10ddf64e
      
       ("xfs: remove leftover CoW reservations when remounting ro")
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChandan Babu R <chandan.babu@oracle.com>
      Signed-off-by: default avatarAmir Goldstein <amir73il@gmail.com>
      Acked-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      071e750f
    • Yang Xu's avatar
      xfs: Fix the free logic of state in xfs_attr_node_hasname · 1e76bd4c
      Yang Xu authored
      commit a1de97fe upstream.
      
      When testing xfstests xfs/126 on lastest upstream kernel, it will hang on some machine.
      Adding a getxattr operation after xattr corrupted, I can reproduce it 100%.
      
      The deadlock as below:
      [983.923403] task:setfattr        state:D stack:    0 pid:17639 ppid: 14687 flags:0x00000080
      [  983.923405] Call Trace:
      [  983.923410]  __schedule+0x2c4/0x700
      [  983.923412]  schedule+0x37/0xa0
      [  983.923414]  schedule_timeout+0x274/0x300
      [  983.923416]  __down+0x9b/0xf0
      [  983.923451]  ? xfs_buf_find.isra.29+0x3c8/0x5f0 [xfs]
      [  983.923453]  down+0x3b/0x50
      [  983.923471]  xfs_buf_lock+0x33/0xf0 [xfs]
      [  983.923490]  xfs_buf_find.isra.29+0x3c8/0x5f0 [xfs]
      [  983.923508]  xfs_buf_get_map+0x4c/0x320 [xfs]
      [  983.923525]  xfs_buf_read_map+0x53/0x310 [xfs]
      [  983.923541]  ? xfs_da_read_buf+0xcf/0x120 [xfs]
      [  983.923560]  xfs_trans_read_buf_map+0x1cf/0x360 [xfs]
      [  983.923575]  ? xfs_da_read_buf+0xcf/0x120 [xfs]
      [  983.923590]  xfs_da_read_buf+0xcf/0x120 [xfs]
      [  983.923606]  xfs_da3_node_read+0x1f/0x40 [xfs]
      [  983.923621]  xfs_da3_node_lookup_int+0x69/0x4a0 [xfs]
      [  983.923624]  ? kmem_cache_alloc+0x12e/0x270
      [  983.923637]  xfs_attr_node_hasname+0x6e/0xa0 [xfs]
      [  983.923651]  xfs_has_attr+0x6e/0xd0 [xfs]
      [  983.923664]  xfs_attr_set+0x273/0x320 [xfs]
      [  983.923683]  xfs_xattr_set+0x87/0xd0 [xfs]
      [  983.923686]  __vfs_removexattr+0x4d/0x60
      [  983.923688]  __vfs_removexattr_locked+0xac/0x130
      [  983.923689]  vfs_removexattr+0x4e/0xf0
      [  983.923690]  removexattr+0x4d/0x80
      [  983.923693]  ? __check_object_size+0xa8/0x16b
      [  983.923695]  ? strncpy_from_user+0x47/0x1a0
      [  983.923696]  ? getname_flags+0x6a/0x1e0
      [  983.923697]  ? _cond_resched+0x15/0x30
      [  983.923699]  ? __sb_start_write+0x1e/0x70
      [  983.923700]  ? mnt_want_write+0x28/0x50
      [  983.923701]  path_removexattr+0x9b/0xb0
      [  983.923702]  __x64_sys_removexattr+0x17/0x20
      [  983.923704]  do_syscall_64+0x5b/0x1a0
      [  983.923705]  entry_SYSCALL_64_after_hwframe+0x65/0xca
      [  983.923707] RIP: 0033:0x7f080f10ee1b
      
      When getxattr calls xfs_attr_node_get function, xfs_da3_node_lookup_int fails with EFSCORRUPTED in
      xfs_attr_node_hasname because we have use blocktrash to random it in xfs/126. So it
      free state in internal and xfs_attr_node_get doesn't do xfs_buf_trans release job.
      
      Then subsequent removexattr will hang because of it.
      
      This bug was introduced by kernel commit 07120f1a ("xfs: Add xfs_has_attr and subroutines").
      It adds xfs_attr_node_hasname helper and said caller will be responsible for freeing the state
      in this case. But xfs_attr_node_hasname will free state itself instead of caller if
      xfs_da3_node_lookup_int fails.
      
      Fix this bug by moving the step of free state into caller.
      
      [amir: this text from original commit is not relevant for 5.10 backport:
      Also, use "goto error/out" instead of returning error directly in xfs_attr_node_addname_find_attr and
      xfs_attr_node_removename_setup function because we should free state ourselves.
      ]
      
      Fixes: 07120f1a
      
       ("xfs: Add xfs_has_attr and subroutines")
      Signed-off-by: default avatarYang Xu <xuyang2018.jy@fujitsu.com>
      Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: default avatarAmir Goldstein <amir73il@gmail.com>
      Acked-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1e76bd4c
    • Brian Foster's avatar
      xfs: punch out data fork delalloc blocks on COW writeback failure · 0cdccc05
      Brian Foster authored
      commit 5ca5916b upstream.
      
      If writeback I/O to a COW extent fails, the COW fork blocks are
      punched out and the data fork blocks left alone. It is possible for
      COW fork blocks to overlap non-shared data fork blocks (due to
      cowextsz hint prealloc), however, and writeback unconditionally maps
      to the COW fork whenever blocks exist at the corresponding offset of
      the page undergoing writeback. This means it's quite possible for a
      COW fork extent to overlap delalloc data fork blocks, writeback to
      convert and map to the COW fork blocks, writeback to fail, and
      finally for ioend completion to cancel the COW fork blocks and leave
      stale data fork delalloc blocks around in the inode. The blocks are
      effectively stale because writeback failure also discards dirty page
      state.
      
      If this occurs, it is likely to trigger assert failures, free space
      accounting corruption and failures in unrelated file operations. For
      example, a subsequent reflink attempt of the affected file to a new
      target file will trip over the stale delalloc in the source file and
      fail. Several of these issues are occasionally reproduced by
      generic/648, but are reproducible on demand with the right sequence
      of operations and timely I/O error injection.
      
      To fix this problem, update the ioend failure path to also punch out
      underlying data fork delalloc blocks on I/O error. This is analogous
      to the writeback submission failure path in xfs_discard_page() where
      we might fail to map data fork delalloc blocks and consistent with
      the successful COW writeback completion path, which is responsible
      for unmapping from the data fork and remapping in COW fork blocks.
      
      Fixes: 787eb485
      
       ("xfs: fix and streamline error handling in xfs_end_io")
      Signed-off-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: default avatarAmir Goldstein <amir73il@gmail.com>
      Acked-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0cdccc05
    • Rustam Kovhaev's avatar
      xfs: use kmem_cache_free() for kmem_cache objects · db3f8110
      Rustam Kovhaev authored
      commit c30a0cbd upstream.
      
      For kmalloc() allocations SLOB prepends the blocks with a 4-byte header,
      and it puts the size of the allocated blocks in that header.
      Blocks allocated with kmem_cache_alloc() allocations do not have that
      header.
      
      SLOB explodes when you allocate memory with kmem_cache_alloc() and then
      try to free it with kfree() instead of kmem_cache_free().
      SLOB will assume that there is a header when there is none, read some
      garbage to size variable and corrupt the adjacent objects, which
      eventually leads to hang or panic.
      
      Let's make XFS work with SLOB by using proper free function.
      
      Fixes: 9749fee8
      
       ("xfs: enable the xfs_defer mechanism to process extents to free")
      Signed-off-by: default avatarRustam Kovhaev <rkovhaev@gmail.com>
      Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: default avatarAmir Goldstein <amir73il@gmail.com>
      Acked-by: default avatarDarrick J. Wong <djwong@kernel.org>
      ...
      db3f8110
    • Coly Li's avatar
      bcache: memset on stack variables in bch_btree_check() and bch_sectors_dirty_init() · 09c9902c
      Coly Li authored
      commit 7d6b902e
      
       upstream.
      
      The local variables check_state (in bch_btree_check()) and state (in
      bch_sectors_dirty_init()) should be fully filled by 0, because before
      allocating them on stack, they were dynamically allocated by kzalloc().
      
      Signed-off-by: default avatarColy Li <colyli@suse.de>
      Link: https://lore.kernel.org/r/20220527152818.27545-2-colyli@suse.de
      
      
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      09c9902c
    • Masahiro Yamada's avatar
      tick/nohz: unexport __init-annotated tick_nohz_full_setup() · c4ff3ffe
      Masahiro Yamada authored
      commit 23900951 upstream.
      
      EXPORT_SYMBOL and __init is a bad combination because the .init.text
      section is freed up after the initialization. Hence, modules cannot
      use symbols annotated __init. The access to a freed symbol may end up
      with kernel panic.
      
      modpost used to detect it, but it had been broken for a decade.
      
      Commit 28438794 ("modpost: fix section mismatch check for exported
      init/exit sections") fixed it so modpost started to warn it again, then
      this showed up:
      
          MODPOST vmlinux.symvers
        WARNING: modpost: vmlinux.o(___ksymtab_gpl+tick_nohz_full_setup+0x0): Section mismatch in reference from the variable __ksymtab_tick_nohz_full_setup to the function .init.text:tick_nohz_full_setup()
        The symbol tick_nohz_full_setup is exported and annotated __init
        Fix this by removing the __init annotation of tick_nohz_full_setup or drop the export.
      
      Drop the export because tick_nohz_full_setup() is only called from the
      built-in code in kernel/sched/isolation.c.
      
      Fixes: ae9e557b
      
       ("time: Export tick start/stop functions for rcutorture")
      Reported-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarMasahiro Yamada <masahiroy@kernel.org>
      Tested-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Thomas Backlund <tmb@tmb.nu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c4ff3ffe
    • Christoph Hellwig's avatar
      drm: remove drm_fb_helper_modinit · 069fff50
      Christoph Hellwig authored
      commit bf22c9ec
      
       upstream.
      
      drm_fb_helper_modinit has a lot of boilerplate for what is not very
      simple functionality.  Just open code it in the only caller using
      IS_ENABLED and IS_MODULE, and skip the find_module check as a
      request_module is harmless if the module is already loaded (and not
      other caller has this find_module check either).
      
      Acked-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJessica Yu <jeyu@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      069fff50
    • Amir Goldstein's avatar
      MAINTAINERS: add Amir as xfs maintainer for 5.10.y · 52dc7f3f
      Amir Goldstein authored
      
      This is an attempt to direct the bots and human that are testing
      LTS 5.10.y towards the maintainer of xfs in the 5.10.y tree.
      
      This is not an upstream MAINTAINERS entry and 5.15.y and 5.4.y will
      have their own LTS xfs maintainer entries.
      
      Update Darrick's email address from upstream and add Amir as xfs
      maintaier for the 5.10.y tree.
      
      Suggested-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Link: https://lore.kernel.org/linux-xfs/Yrx6%2F0UmYyuBPjEr@magnolia/
      
      
      Signed-off-by: default avatarAmir Goldstein <amir73il@gmail.com>
      Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      52dc7f3f
  2. Jun 29, 2022