Forum | Documentation | Website | Blog

Skip to content
Snippets Groups Projects
  1. May 12, 2022
    • Greg Kroah-Hartman's avatar
    • Ricky WU's avatar
      mmc: rtsx: add 74 Clocks in power on flow · 8a7f9205
      Ricky WU authored
      commit 1f311c94
      
       upstream.
      
      SD spec definition:
      "Host provides at least 74 Clocks before issuing first command"
      After 1ms for the voltage stable then start issuing the Clock signals
      
      if POWER STATE is
      MMC_POWER_OFF to MMC_POWER_UP to issue Clock signal to card
      MMC_POWER_UP to MMC_POWER_ON to stop issuing signal to card
      
      Signed-off-by: default avatarRicky Wu <ricky_wu@realtek.com>
      Link: https://lore.kernel.org/r/1badf10aba764191a1a752edcbf90389@realtek.com
      
      
      Signed-off-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
      Signed-off-by: default avatarChristian Loehle <cloehle@hyperstone.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8a7f9205
    • Pali Rohár's avatar
      PCI: aardvark: Fix reading MSI interrupt number · d789b989
      Pali Rohár authored
      commit 805dfc18 upstream.
      
      In advk_pcie_handle_msi() it is expected that when bit i in the W1C
      register PCIE_MSI_STATUS_REG is cleared, the PCIE_MSI_PAYLOAD_REG is
      updated to contain the MSI number corresponding to index i.
      
      Experiments show that this is not so, and instead PCIE_MSI_PAYLOAD_REG
      always contains the number of the last received MSI, overall.
      
      Do not read PCIE_MSI_PAYLOAD_REG register for determining MSI interrupt
      number. Since Aardvark already forbids more than 32 interrupts and uses
      own allocated hwirq numbers, the msi_idx already corresponds to the
      received MSI number.
      
      Link: https://lore.kernel.org/r/20220110015018.26359-3-kabel@kernel.org
      Fixes: 8c39d710
      
       ("PCI: aardvark: Add Aardvark PCI host controller driver")
      Signed-off-by: default avatarPali Rohár <pali@kernel.org>
      Signed-off-by: default avatarMarek Behún <kabel@kernel.org>
      Signed-off-by: default avatarLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Signed-off-by: default avatarMarek Behún <kabel@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d789b989
    • Pali Rohár's avatar
      PCI: aardvark: Clear all MSIs at setup · 253bc43c
      Pali Rohár authored
      commit 7d8dc1f7 upstream.
      
      We already clear all the other interrupts (ISR0, ISR1, HOST_CTRL_INT).
      
      Define a new macro PCIE_MSI_ALL_MASK and do the same clearing for MSIs,
      to ensure that we don't start receiving spurious interrupts.
      
      Use this new mask in advk_pcie_handle_msi();
      
      Link: https://lore.kernel.org/r/20211130172913.9727-5-kabel@kernel.org
      
      
      Signed-off-by: default avatarPali Rohár <pali@kernel.org>
      Signed-off-by: default avatarMarek Behún <kabel@kernel.org>
      Signed-off-by: default avatarLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      253bc43c
    • Mike Snitzer's avatar
      dm: interlock pending dm_io and dm_wait_for_bios_completion · 786dc86c
      Mike Snitzer authored
      commit 9f6dc633 upstream.
      
      Commit d208b894 ("dm: fix mempool NULL pointer race when
      completing IO") didn't go far enough.
      
      When bio_end_io_acct ends the count of in-flight I/Os may reach zero
      and the DM device may be suspended. There is a possibility that the
      suspend races with dm_stats_account_io.
      
      Fix this by adding percpu "pending_io" counters to track outstanding
      dm_io. Move kicking of suspend queue to dm_io_dec_pending(). Also,
      rename md_in_flight_bios() to dm_in_flight_bios() and update it to
      iterate all pending_io counters.
      
      Fixes: d208b894
      
       ("dm: fix mempool NULL pointer race when completing IO")
      Cc: stable@vger.kernel.org
      Co-developed-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Reviewed-by: default avatarMike Snitzer <snitzer@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      786dc86c
    • Jiazi Li's avatar
      dm: fix mempool NULL pointer race when completing IO · ad1393b9
      Jiazi Li authored
      commit d208b894
      
       upstream.
      
      dm_io_dec_pending() calls end_io_acct() first and will then dec md
      in-flight pending count. But if a task is swapping DM table at same
      time this can result in a crash due to mempool->elements being NULL:
      
      task1                             task2
      do_resume
       ->do_suspend
        ->dm_wait_for_completion
                                        bio_endio
      				   ->clone_endio
      				    ->dm_io_dec_pending
      				     ->end_io_acct
      				      ->wakeup task1
       ->dm_swap_table
        ->__bind
         ->__bind_mempools
          ->bioset_exit
           ->mempool_exit
                                           ->free_io
      
      [ 67.330330] Unable to handle kernel NULL pointer dereference at
      virtual address 0000000000000000
      ......
      [ 67.330494] pstate: 80400085 (Nzcv daIf +PAN -UAO)
      [ 67.330510] pc : mempool_free+0x70/0xa0
      [ 67.330515] lr : mempool_free+0x4c/0xa0
      [ 67.330520] sp : ffffff8008013b20
      [ 67.330524] x29: ffffff8008013b20 x28: 0000000000000004
      [ 67.330530] x27: ffffffa8c2ff40a0 x26: 00000000ffff1cc8
      [ 67.330535] x25: 0000000000000000 x24: ffffffdada34c800
      [ 67.330541] x23: 0000000000000000 x22: ffffffdada34c800
      [ 67.330547] x21: 00000000ffff1cc8 x20: ffffffd9a1304d80
      [ 67.330552] x19: ffffffdada34c970 x18: 000000b312625d9c
      [ 67.330558] x17: 00000000002dcfbf x16: 00000000000006dd
      [ 67.330563] x15: 000000000093b41e x14: 0000000000000010
      [ 67.330569] x13: 0000000000007f7a x12: 0000000034155555
      [ 67.330574] x11: 0000000000000001 x10: 0000000000000001
      [ 67.330579] x9 : 0000000000000000 x8 : 0000000000000000
      [ 67.330585] x7 : 0000000000000000 x6 : ffffff80148b5c1a
      [ 67.330590] x5 : ffffff8008013ae0 x4 : 0000000000000001
      [ 67.330596] x3 : ffffff80080139c8 x2 : ffffff801083bab8
      [ 67.330601] x1 : 0000000000000000 x0 : ffffffdada34c970
      [ 67.330609] Call trace:
      [ 67.330616] mempool_free+0x70/0xa0
      [ 67.330627] bio_put+0xf8/0x110
      [ 67.330638] dec_pending+0x13c/0x230
      [ 67.330644] clone_endio+0x90/0x180
      [ 67.330649] bio_endio+0x198/0x1b8
      [ 67.330655] dec_pending+0x190/0x230
      [ 67.330660] clone_endio+0x90/0x180
      [ 67.330665] bio_endio+0x198/0x1b8
      [ 67.330673] blk_update_request+0x214/0x428
      [ 67.330683] scsi_end_request+0x2c/0x300
      [ 67.330688] scsi_io_completion+0xa0/0x710
      [ 67.330695] scsi_finish_command+0xd8/0x110
      [ 67.330700] scsi_softirq_done+0x114/0x148
      [ 67.330708] blk_done_softirq+0x74/0xd0
      [ 67.330716] __do_softirq+0x18c/0x374
      [ 67.330724] irq_exit+0xb4/0xb8
      [ 67.330732] __handle_domain_irq+0x84/0xc0
      [ 67.330737] gic_handle_irq+0x148/0x1b0
      [ 67.330744] el1_irq+0xe8/0x190
      [ 67.330753] lpm_cpuidle_enter+0x4f8/0x538
      [ 67.330759] cpuidle_enter_state+0x1fc/0x398
      [ 67.330764] cpuidle_enter+0x18/0x20
      [ 67.330772] do_idle+0x1b4/0x290
      [ 67.330778] cpu_startup_entry+0x20/0x28
      [ 67.330786] secondary_start_kernel+0x160/0x170
      
      Fix this by:
      1) Establishing pointers to 'struct dm_io' members in
      dm_io_dec_pending() so that they may be passed into end_io_acct()
      _after_ free_io() is called.
      2) Moving end_io_acct() after free_io().
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJiazi Li <lijiazi@xiaomi.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Reviewed-by: default avatarMike Snitzer <snitzer@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ad1393b9
    • Eric Dumazet's avatar
      tcp: make sure treq->af_specific is initialized · 40bcd39a
      Eric Dumazet authored
      commit ba5a4fdd upstream.
      
      syzbot complained about a recent change in TCP stack,
      hitting a NULL pointer [1]
      
      tcp request sockets have an af_specific pointer, which
      was used before the blamed change only for SYNACK generation
      in non SYNCOOKIE mode.
      
      tcp requests sockets momentarily created when third packet
      coming from client in SYNCOOKIE mode were not using
      treq->af_specific.
      
      Make sure this field is populated, in the same way normal
      TCP requests sockets do in tcp_conn_request().
      
      [1]
      TCP: request_sock_TCPv6: Possible SYN flooding on port 20002. Sending cookies.  Check SNMP counters.
      general protection fault, probably for non-canonical address 0xdffffc0000000001: 0000 [#1] PREEMPT SMP KASAN
      KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f]
      CPU: 1 PID: 3695 Comm: syz-executor864 Not tainted 5.18.0-rc3-syzkaller-00224-g5fd1fe4807f9 #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:tcp_create_openreq_child+0xe16/0x16b0 net/ipv4/tcp_minisocks.c:534
      Code: 48 c1 ea 03 80 3c 02 00 0f 85 e5 07 00 00 4c 8b b3 28 01 00 00 48 b8 00 00 00 00 00 fc ff df 49 8d 7e 08 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 c9 07 00 00 48 8b 3c 24 48 89 de 41 ff 56 08 48
      RSP: 0018:ffffc90000de0588 EFLAGS: 00010202
      RAX: dffffc0000000000 RBX: ffff888076490330 RCX: 0000000000000100
      RDX: 0000000000000001 RSI: ffffffff87d67ff0 RDI: 0000000000000008
      RBP: ffff88806ee1c7f8 R08: 0000000000000000 R09: 0000000000000000
      R10: ffffffff87d67f00 R11: 0000000000000000 R12: ffff88806ee1bfc0
      R13: ffff88801b0e0368 R14: 0000000000000000 R15: 0000000000000000
      FS:  00007f517fe58700(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007ffcead76960 CR3: 000000006f97b000 CR4: 00000000003506e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       <IRQ>
       tcp_v6_syn_recv_sock+0x199/0x23b0 net/ipv6/tcp_ipv6.c:1267
       tcp_get_cookie_sock+0xc9/0x850 net/ipv4/syncookies.c:207
       cookie_v6_check+0x15c3/0x2340 net/ipv6/syncookies.c:258
       tcp_v6_cookie_check net/ipv6/tcp_ipv6.c:1131 [inline]
       tcp_v6_do_rcv+0x1148/0x13b0 net/ipv6/tcp_ipv6.c:1486
       tcp_v6_rcv+0x3305/0x3840 net/ipv6/tcp_ipv6.c:1725
       ip6_protocol_deliver_rcu+0x2e9/0x1900 net/ipv6/ip6_input.c:422
       ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:464
       NF_HOOK include/linux/netfilter.h:307 [inline]
       NF_HOOK include/linux/netfilter.h:301 [inline]
       ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:473
       dst_input include/net/dst.h:461 [inline]
       ip6_rcv_finish net/ipv6/ip6_input.c:76 [inline]
       NF_HOOK include/linux/netfilter.h:307 [inline]
       NF_HOOK include/linux/netfilter.h:301 [inline]
       ipv6_rcv+0x27f/0x3b0 net/ipv6/ip6_input.c:297
       __netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5405
       __netif_receive_skb+0x24/0x1b0 net/core/dev.c:5519
       process_backlog+0x3a0/0x7c0 net/core/dev.c:5847
       __napi_poll+0xb3/0x6e0 net/core/dev.c:6413
       napi_poll net/core/dev.c:6480 [inline]
       net_rx_action+0x8ec/0xc60 net/core/dev.c:6567
       __do_softirq+0x29b/0x9c2 kernel/softirq.c:558
       invoke_softirq kernel/softirq.c:432 [inline]
       __irq_exit_rcu+0x123/0x180 kernel/softirq.c:637
       irq_exit_rcu+0x5/0x20 kernel/softirq.c:649
       sysvec_apic_timer_interrupt+0x93/0xc0 arch/x86/kernel/apic/apic.c:1097
      
      Fixes: 5b0b9e4c
      
       ("tcp: md5: incorrect tcp_header_len for incoming connections")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Francesco Ruggeri <fruggeri@arista.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [fruggeri: Account for backport conflicts from 35b2c321 and 6fc8c827
      
      ]
      Signed-off-by: default avatarFrancesco Ruggeri <fruggeri@arista.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      40bcd39a
    • Takashi Iwai's avatar
      ALSA: pcm: Fix potential AB/BA lock with buffer_mutex and mmap_lock · 9661bf67
      Takashi Iwai authored
      commit bc55cfd5 upstream.
      
      syzbot caught a potential deadlock between the PCM
      runtime->buffer_mutex and the mm->mmap_lock.  It was brought by the
      recent fix to cover the racy read/write and other ioctls, and in that
      commit, I overlooked a (hopefully only) corner case that may take the
      revert lock, namely, the OSS mmap.  The OSS mmap operation
      exceptionally allows to re-configure the parameters inside the OSS
      mmap syscall, where mm->mmap_mutex is already held.  Meanwhile, the
      copy_from/to_user calls at read/write operations also take the
      mm->mmap_lock internally, hence it may lead to a AB/BA deadlock.
      
      A similar problem was already seen in the past and we fixed it with a
      refcount (in commit b2483716
      
      ).  The former fix covered only the
      call paths with OSS read/write and OSS ioctls, while we need to cover
      the concurrent access via both ALSA and OSS APIs now.
      
      This patch addresses the problem above by replacing the buffer_mutex
      lock in the read/write operations with a refcount similar as we've
      used for OSS.  The new field, runtime->buffer_accessing, keeps the
      number of concurrent read/write operations.  Unlike the former
      buffer_mutex protection, this protects only around the
      copy_from/to_user() calls; the other codes are basically protected by
      the PCM stream lock.  The refcount can be a negative, meaning blocked
      by the ioctls.  If a negative value is seen, the read/write aborts
      with -EBUSY.  In the ioctl side, OTOH, they check this refcount, too,
      and set to a negative value for blocking unless it's already being
      accessed.
      
      Reported-by: default avatar <syzbot+6e5c88838328e99c7e1c@syzkaller.appspotmail.com>
      Fixes: dca947d4 ("ALSA: pcm: Fix races among concurrent read/write and buffer changes")
      Cc: <stable@vger.kernel.org>
      Link: https://lore.kernel.org/r/000000000000381a0d05db622a81@google.com
      Link: https://lore.kernel.org/r/20220330120903.4738-1-tiwai@suse.de
      
      
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      [OP: backport to 5.4: adjusted context]
      Signed-off-by: default avatarOvidiu Panait <ovidiu.panait@windriver.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9661bf67
    • Takashi Iwai's avatar
      ALSA: pcm: Fix races among concurrent prealloc proc writes · 37b12c16
      Takashi Iwai authored
      commit 69534c48
      
       upstream.
      
      We have no protection against concurrent PCM buffer preallocation
      changes via proc files, and it may potentially lead to UAF or some
      weird problem.  This patch applies the PCM open_mutex to the proc
      write operation for avoiding the racy proc writes and the PCM stream
      open (and further operations).
      
      Cc: <stable@vger.kernel.org>
      Reviewed-by: default avatarJaroslav Kysela <perex@perex.cz>
      Link: https://lore.kernel.org/r/20220322170720.3529-5-tiwai@suse.de
      
      
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      [OP: backport to 5.4: adjusted context]
      Signed-off-by: default avatarOvidiu Panait <ovidiu.panait@windriver.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      37b12c16
    • Takashi Iwai's avatar
      ALSA: pcm: Fix races among concurrent prepare and hw_params/hw_free calls · 2a559eec
      Takashi Iwai authored
      commit 3c3201f8
      
       upstream.
      
      Like the previous fixes to hw_params and hw_free ioctl races, we need
      to paper over the concurrent prepare ioctl calls against hw_params and
      hw_free, too.
      
      This patch implements the locking with the existing
      runtime->buffer_mutex for prepare ioctls.  Unlike the previous case
      for snd_pcm_hw_hw_params() and snd_pcm_hw_free(), snd_pcm_prepare() is
      performed to the linked streams, hence the lock can't be applied
      simply on the top.  For tracking the lock in each linked substream, we
      modify snd_pcm_action_group() slightly and apply the buffer_mutex for
      the case stream_lock=false (formerly there was no lock applied)
      there.
      
      Cc: <stable@vger.kernel.org>
      Reviewed-by: default avatarJaroslav Kysela <perex@perex.cz>
      Link: https://lore.kernel.org/r/20220322170720.3529-4-tiwai@suse.de
      
      
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      [OP: backport to 5.4: adjusted context]
      Signed-off-by: default avatarOvidiu Panait <ovidiu.panait@windriver.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2a559eec
    • Takashi Iwai's avatar
      ALSA: pcm: Fix races among concurrent read/write and buffer changes · 08d1807f
      Takashi Iwai authored
      commit dca947d4
      
       upstream.
      
      In the current PCM design, the read/write syscalls (as well as the
      equivalent ioctls) are allowed before the PCM stream is running, that
      is, at PCM PREPARED state.  Meanwhile, we also allow to re-issue
      hw_params and hw_free ioctl calls at the PREPARED state that may
      change or free the buffers, too.  The problem is that there is no
      protection against those mix-ups.
      
      This patch applies the previously introduced runtime->buffer_mutex to
      the read/write operations so that the concurrent hw_params or hw_free
      call can no longer interfere during the operation.  The mutex is
      unlocked before scheduling, so we don't take it too long.
      
      Cc: <stable@vger.kernel.org>
      Reviewed-by: default avatarJaroslav Kysela <perex@perex.cz>
      Link: https://lore.kernel.org/r/20220322170720.3529-3-tiwai@suse.de
      
      
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarOvidiu Panait <ovidiu.panait@windriver.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      08d1807f
    • Takashi Iwai's avatar
      ALSA: pcm: Fix races among concurrent hw_params and hw_free calls · fbeb4926
      Takashi Iwai authored
      commit 92ee3c60
      
       upstream.
      
      Currently we have neither proper check nor protection against the
      concurrent calls of PCM hw_params and hw_free ioctls, which may result
      in a UAF.  Since the existing PCM stream lock can't be used for
      protecting the whole ioctl operations, we need a new mutex to protect
      those racy calls.
      
      This patch introduced a new mutex, runtime->buffer_mutex, and applies
      it to both hw_params and hw_free ioctl code paths.  Along with it, the
      both functions are slightly modified (the mmap_count check is moved
      into the state-check block) for code simplicity.
      
      Reported-by: default avatarHu Jiahui <kirin.say@gmail.com>
      Cc: <stable@vger.kernel.org>
      Reviewed-by: default avatarJaroslav Kysela <perex@perex.cz>
      Link: https://lore.kernel.org/r/20220322170720.3529-2-tiwai@suse.de
      
      
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      [OP: backport to 5.4: adjusted context]
      Signed-off-by: default avatarOvidiu Panait <ovidiu.panait@windriver.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fbeb4926
    • Minchan Kim's avatar
      mm: fix unexpected zeroed page mapping with zram swap · f098f8b9
      Minchan Kim authored
      commit e914d8f0 upstream.
      
      Two processes under CLONE_VM cloning, user process can be corrupted by
      seeing zeroed page unexpectedly.
      
            CPU A                        CPU B
      
        do_swap_page                do_swap_page
        SWP_SYNCHRONOUS_IO path     SWP_SYNCHRONOUS_IO path
        swap_readpage valid data
          swap_slot_free_notify
            delete zram entry
                                    swap_readpage zeroed(invalid) data
                                    pte_lock
                                    map the *zero data* to userspace
                                    pte_unlock
        pte_lock
        if (!pte_same)
          goto out_nomap;
        pte_unlock
        return and next refault will
        read zeroed data
      
      The swap_slot_free_notify is bogus for CLONE_VM case since it doesn't
      increase the refcount of swap slot at copy_mm so it couldn't catch up
      whether it's safe or not to discard data from backing device.  In the
      case, only the lock it could rely on to synchronize swap slot freeing is
      page table lock.  Thus, this patch gets rid of the swap_slot_free_notify
      function.  With this patch, CPU A will see correct data.
      
            CPU A                        CPU B
      
        do_swap_page                do_swap_page
        SWP_SYNCHRONOUS_IO path     SWP_SYNCHRONOUS_IO path
                                    swap_readpage original data
                                    pte_lock
                                    map the original data
                                    swap_free
                                      swap_range_free
                                        bd_disk->fops->swap_slot_free_notify
        swap_readpage read zeroed data
                                    pte_unlock
        pte_lock
        if (!pte_same)
          goto out_nomap;
        pte_unlock
        return
        on next refault will see mapped data by CPU B
      
      The concern of the patch would increase memory consumption since it
      could keep wasted memory with compressed form in zram as well as
      uncompressed form in address space.  However, most of cases of zram uses
      no readahead and do_swap_page is followed by swap_free so it will free
      the compressed form from in zram quickly.
      
      Link: https://lkml.kernel.org/r/YjTVVxIAsnKAXjTd@google.com
      Fixes: 0bcac06f
      
       ("mm, swap: skip swapcache for swapin of synchronous device")
      Reported-by: default avatarIvan Babrou <ivan@cloudflare.com>
      Tested-by: default avatarIvan Babrou <ivan@cloudflare.com>
      Signed-off-by: default avatarMinchan Kim <minchan@kernel.org>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: <stable@vger.kernel.org>	[4.14+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f098f8b9
    • Haimin Zhang's avatar
      block-map: add __GFP_ZERO flag for alloc_page in function bio_copy_kern · c7337efd
      Haimin Zhang authored
      commit cc8f7fe1
      
       upstream.
      
      Add __GFP_ZERO flag for alloc_page in function bio_copy_kern to initialize
      the buffer of a bio.
      
      Signed-off-by: default avatarHaimin Zhang <tcs.kernel@gmail.com>
      Reviewed-by: default avatarChaitanya Kulkarni <kch@nvidia.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Link: https://lore.kernel.org/r/20220216084038.15635-1-tcs.kernel@gmail.com
      
      
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      [nobelbarakat: Backported to 5.4: Manually added __GFP_ZERO flag]
      Signed-off-by: default avatarNobel Barakat <nobelbarakat@google.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c7337efd
    • j.nixdorf@avm.de's avatar
      net: ipv6: ensure we call ipv6_mc_down() at most once · 9588ac2e
      j.nixdorf@avm.de authored
      commit 9995b408 upstream.
      
      There are two reasons for addrconf_notify() to be called with NETDEV_DOWN:
      either the network device is actually going down, or IPv6 was disabled
      on the interface.
      
      If either of them stays down while the other is toggled, we repeatedly
      call the code for NETDEV_DOWN, including ipv6_mc_down(), while never
      calling the corresponding ipv6_mc_up() in between. This will cause a
      new entry in idev->mc_tomb to be allocated for each multicast group
      the interface is subscribed to, which in turn leaks one struct ifmcaddr6
      per nontrivial multicast group the interface is subscribed to.
      
      The following reproducer will leak at least $n objects:
      
      ip addr add ff2e::4242/32 dev eth0 autojoin
      sysctl -w net.ipv6.conf.eth0.disable_ipv6=1
      for i in $(seq 1 $n); do
      	ip link set up eth0; ip link set down eth0
      done
      
      Joining groups with IPV6_ADD_MEMBERSHIP (unprivileged) or setting the
      sysctl net.ipv6.conf.eth0.forwarding to 1 (=> subscribing to ff02::2)
      can also be used to create a nontrivial idev->mc_list, which will the
      leak objects with the right up-down-sequence.
      
      Based on both sources for NETDEV_DOWN events the interface IPv6 state
      should be considered:
      
       - not ready if the network interface is not ready OR IPv6 is disabled
         for it
       - ready if the network interface is ready AND IPv6 is enabled for it
      
      The functions ipv6_mc_up() and ipv6_down() should only be run when this
      state changes.
      
      Implement this by remembering when the IPv6 state is ready, and only
      run ipv6_mc_down() if it actually changed from ready to not ready.
      
      The other direction (not ready -> ready) already works correctly, as:
      
       - the interface notification triggered codepath for NETDEV_UP /
         NETDEV_CHANGE returns early if ipv6 is disabled, and
       - the disable_ipv6=0 triggered codepath skips fully initializing the
         interface as long as addrconf_link_ready(dev) returns false
       - calling ipv6_mc_up() repeatedly does not leak anything
      
      Fixes: 3ce62a84
      
       ("ipv6: exit early in addrconf_notify() if IPv6 is disabled")
      Signed-off-by: default avatarJohannes Nixdorf <j.nixdorf@avm.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [jnixdorf: context updated for bpo to v4.19/v5.4]
      Signed-off-by: default avatarJohannes Nixdorf <j.nixdorf@avm.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9588ac2e
    • Wanpeng Li's avatar
      KVM: LAPIC: Enable timer posted-interrupt only when mwait/hlt is advertised · 367b4908
      Wanpeng Li authored
      [ Upstream commit 1714a4eb ]
      
      As commit 0c5f81da
      
       ("KVM: LAPIC: Inject timer interrupt via posted
      interrupt") mentioned that the host admin should well tune the guest
      setup, so that vCPUs are placed on isolated pCPUs, and with several pCPUs
      surplus for *busy* housekeeping.  In this setup, it is preferrable to
      disable mwait/hlt/pause vmexits to keep the vCPUs in non-root mode.
      
      However, if only some guests isolated and others not, they would not
      have any benefit from posted timer interrupts, and at the same time lose
      VMX preemption timer fast paths because kvm_can_post_timer_interrupt()
      returns true and therefore forces kvm_can_use_hv_timer() to false.
      
      By guaranteeing that posted-interrupt timer is only used if MWAIT or
      HLT are done without vmexit, KVM can make a better choice and use the
      VMX preemption timer and the corresponding fast paths.
      
      Reported-by: default avatarAili Yao <yaoaili@kingsoft.com>
      Reviewed-by: default avatarSean Christopherson <seanjc@google.com>
      Cc: Aili Yao <yaoaili@kingsoft.com>
      Cc: Sean Christopherson <seanjc@google.com>
      Signed-off-by: default avatarWanpeng Li <wanpengli@tencent.com>
      Message-Id: <1643112538-36743-1-git-send-email-wanpengli@tencent.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      367b4908
    • Wanpeng Li's avatar
      x86/kvm: Preserve BSP MSR_KVM_POLL_CONTROL across suspend/resume · c2fadf2d
      Wanpeng Li authored
      [ Upstream commit 0361bdfd
      
       ]
      
      MSR_KVM_POLL_CONTROL is cleared on reset, thus reverting guests to
      host-side polling after suspend/resume.  Non-bootstrap CPUs are
      restored correctly by the haltpoll driver because they are hot-unplugged
      during suspend and hot-plugged during resume; however, the BSP
      is not hotpluggable and remains in host-sde polling mode after
      the guest resume.  The makes the guest pay for the cost of vmexits
      every time the guest enters idle.
      
      Fix it by recording BSP's haltpoll state and resuming it during guest
      resume.
      
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: default avatarWanpeng Li <wanpengli@tencent.com>
      Message-Id: <1650267752-46796-1-git-send-email-wanpengli@tencent.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      c2fadf2d
    • Sandipan Das's avatar
      kvm: x86/cpuid: Only provide CPUID leaf 0xA if host has architectural PMU · 8b78939f
      Sandipan Das authored
      [ Upstream commit 5a1bde46 ]
      
      On some x86 processors, CPUID leaf 0xA provides information
      on Architectural Performance Monitoring features. It
      advertises a PMU version which Qemu uses to determine the
      availability of additional MSRs to manage the PMCs.
      
      Upon receiving a KVM_GET_SUPPORTED_CPUID ioctl request for
      the same, the kernel constructs return values based on the
      x86_pmu_capability irrespective of the vendor.
      
      This leaf and the additional MSRs are not supported on AMD
      and Hygon processors. If AMD PerfMonV2 is detected, the PMU
      version is set to 2 and guest startup breaks because of an
      attempt to access a non-existent MSR. Return zeros to avoid
      this.
      
      Fixes: a6c06ed1
      
       ("KVM: Expose the architectural performance monitoring CPUID leaf")
      Reported-by: default avatarVasant Hegde <vasant.hegde@amd.com>
      Signed-off-by: default avatarSandipan Das <sandipan.das@amd.com>
      Message-Id: <3fef83d9c2b2f7516e8ff50d60851f29a4bcb716.1651058600.git.sandipan.das@amd.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      8b78939f
    • Trond Myklebust's avatar
      NFSv4: Don't invalidate inode attributes on delegation return · f455c8e6
      Trond Myklebust authored
      [ Upstream commit 00c94ebe ]
      
      There is no need to declare attributes such as the ctime, mtime and
      block size invalid when we're just returning a delegation, so it is
      inappropriate to call nfs_post_op_update_inode_force_wcc().
      Instead, just call nfs_refresh_inode() after faking up the change
      attribute. We know that the GETATTR op occurs before the DELEGRETURN, so
      we are safe when doing this.
      
      Fixes: 0bc2c9b4
      
       ("NFSv4: Don't discard the attributes returned by asynchronous DELEGRETURN")
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@hammerspace.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      f455c8e6
    • Felix Kuehling's avatar
      drm/amdkfd: Use drm_priv to pass VM from KFD to amdgpu · 89e7a625
      Felix Kuehling authored
      commit b40a6ab2
      
       upstream.
      
      amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu needs the drm_priv to allow mmap
      to access the BO through the corresponding file descriptor. The VM can
      also be extracted from drm_priv, so drm_priv can replace the vm parameter
      in the kfd2kgd interface.
      
      Signed-off-by: default avatarFelix Kuehling <Felix.Kuehling@amd.com>
      Reviewed-by: default avatarPhilip Yang <philip.yang@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      [This is a partial cherry-pick of the upstream commit.]
      Signed-off-by: default avatarLee Jones <lee.jones@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      89e7a625
    • Eric Dumazet's avatar
      net: igmp: respect RCU rules in ip_mc_source() and ip_mc_msfilter() · 1d14c1c7
      Eric Dumazet authored
      commit dba5bdd5 upstream.
      
      syzbot reported an UAF in ip_mc_sf_allow() [1]
      
      Whenever RCU protected list replaces an object,
      the pointer to the new object needs to be updated
      _before_ the call to kfree_rcu() or call_rcu()
      
      Because kfree_rcu(ptr, rcu) got support for NULL ptr
      only recently in commit 12edff04 ("rcu: Make kfree_rcu()
      ignore NULL pointers"), I chose to use the conditional
      to make sure stable backports won't miss this detail.
      
      if (psl)
          kfree_rcu(psl, rcu);
      
      net/ipv6/mcast.c has similar issues, addressed in a separate patch.
      
      [1]
      BUG: KASAN: use-after-free in ip_mc_sf_allow+0x6bb/0x6d0 net/ipv4/igmp.c:2655
      Read of size 4 at addr ffff88807d37b904 by task syz-executor.5/908
      
      CPU: 0 PID: 908 Comm: syz-executor.5 Not tainted 5.18.0-rc4-syzkaller-00064-g8f4dd16603ce #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       <TASK>
       __dump_stack lib/dump_stack.c:88 [inline]
       dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
       print_address_description.constprop.0.cold+0xeb/0x467 mm/kasan/report.c:313
       print_report mm/kasan/report.c:429 [inline]
       kasan_report.cold+0xf4/0x1c6 mm/kasan/report.c:491
       ip_mc_sf_allow+0x6bb/0x6d0 net/ipv4/igmp.c:2655
       raw_v4_input net/ipv4/raw.c:190 [inline]
       raw_local_deliver+0x4d1/0xbe0 net/ipv4/raw.c:218
       ip_protocol_deliver_rcu+0xcf/0xb30 net/ipv4/ip_input.c:193
       ip_local_deliver_finish+0x2ee/0x4c0 net/ipv4/ip_input.c:233
       NF_HOOK include/linux/netfilter.h:307 [inline]
       NF_HOOK include/linux/netfilter.h:301 [inline]
       ip_local_deliver+0x1b3/0x200 net/ipv4/ip_input.c:254
       dst_input include/net/dst.h:461 [inline]
       ip_rcv_finish+0x1cb/0x2f0 net/ipv4/ip_input.c:437
       NF_HOOK include/linux/netfilter.h:307 [inline]
       NF_HOOK include/linux/netfilter.h:301 [inline]
       ip_rcv+0xaa/0xd0 net/ipv4/ip_input.c:556
       __netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5405
       __netif_receive_skb+0x24/0x1b0 net/core/dev.c:5519
       netif_receive_skb_internal net/core/dev.c:5605 [inline]
       netif_receive_skb+0x13e/0x8e0 net/core/dev.c:5664
       tun_rx_batched.isra.0+0x460/0x720 drivers/net/tun.c:1534
       tun_get_user+0x28b7/0x3e30 drivers/net/tun.c:1985
       tun_chr_write_iter+0xdb/0x200 drivers/net/tun.c:2015
       call_write_iter include/linux/fs.h:2050 [inline]
       new_sync_write+0x38a/0x560 fs/read_write.c:504
       vfs_write+0x7c0/0xac0 fs/read_write.c:591
       ksys_write+0x127/0x250 fs/read_write.c:644
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      RIP: 0033:0x7f3f12c3bbff
      Code: 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 99 fd ff ff 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 48 89 44 24 08 e8 cc fd ff ff 48
      RSP: 002b:00007f3f13ea9130 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
      RAX: ffffffffffffffda RBX: 00007f3f12d9bf60 RCX: 00007f3f12c3bbff
      RDX: 0000000000000036 RSI: 0000000020002ac0 RDI: 00000000000000c8
      RBP: 00007f3f12ce308d R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000036 R11: 0000000000000293 R12: 0000000000000000
      R13: 00007fffb68dd79f R14: 00007f3f13ea9300 R15: 0000000000022000
       </TASK>
      
      Allocated by task 908:
       kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
       kasan_set_track mm/kasan/common.c:45 [inline]
       set_alloc_info mm/kasan/common.c:436 [inline]
       ____kasan_kmalloc mm/kasan/common.c:515 [inline]
       ____kasan_kmalloc mm/kasan/common.c:474 [inline]
       __kasan_kmalloc+0xa6/0xd0 mm/kasan/common.c:524
       kasan_kmalloc include/linux/kasan.h:234 [inline]
       __do_kmalloc mm/slab.c:3710 [inline]
       __kmalloc+0x209/0x4d0 mm/slab.c:3719
       kmalloc include/linux/slab.h:586 [inline]
       sock_kmalloc net/core/sock.c:2501 [inline]
       sock_kmalloc+0xb5/0x100 net/core/sock.c:2492
       ip_mc_source+0xba2/0x1100 net/ipv4/igmp.c:2392
       do_ip_setsockopt net/ipv4/ip_sockglue.c:1296 [inline]
       ip_setsockopt+0x2312/0x3ab0 net/ipv4/ip_sockglue.c:1432
       raw_setsockopt+0x274/0x2c0 net/ipv4/raw.c:861
       __sys_setsockopt+0x2db/0x6a0 net/socket.c:2180
       __do_sys_setsockopt net/socket.c:2191 [inline]
       __se_sys_setsockopt net/socket.c:2188 [inline]
       __x64_sys_setsockopt+0xba/0x150 net/socket.c:2188
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Freed by task 753:
       kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
       kasan_set_track+0x21/0x30 mm/kasan/common.c:45
       kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
       ____kasan_slab_free mm/kasan/common.c:366 [inline]
       ____kasan_slab_free+0x13d/0x180 mm/kasan/common.c:328
       kasan_slab_free include/linux/kasan.h:200 [inline]
       __cache_free mm/slab.c:3439 [inline]
       kmem_cache_free_bulk+0x69/0x460 mm/slab.c:3774
       kfree_bulk include/linux/slab.h:437 [inline]
       kfree_rcu_work+0x51c/0xa10 kernel/rcu/tree.c:3318
       process_one_work+0x996/0x1610 kernel/workqueue.c:2289
       worker_thread+0x665/0x1080 kernel/workqueue.c:2436
       kthread+0x2e9/0x3a0 kernel/kthread.c:376
       ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:298
      
      Last potentially related work creation:
       kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
       __kasan_record_aux_stack+0x7e/0x90 mm/kasan/generic.c:348
       kvfree_call_rcu+0x74/0x990 kernel/rcu/tree.c:3595
       ip_mc_msfilter+0x712/0xb60 net/ipv4/igmp.c:2510
       do_ip_setsockopt net/ipv4/ip_sockglue.c:1257 [inline]
       ip_setsockopt+0x32e1/0x3ab0 net/ipv4/ip_sockglue.c:1432
       raw_setsockopt+0x274/0x2c0 net/ipv4/raw.c:861
       __sys_setsockopt+0x2db/0x6a0 net/socket.c:2180
       __do_sys_setsockopt net/socket.c:2191 [inline]
       __se_sys_setsockopt net/socket.c:2188 [inline]
       __x64_sys_setsockopt+0xba/0x150 net/socket.c:2188
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Second to last potentially related work creation:
       kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
       __kasan_record_aux_stack+0x7e/0x90 mm/kasan/generic.c:348
       call_rcu+0x99/0x790 kernel/rcu/tree.c:3074
       mpls_dev_notify+0x552/0x8a0 net/mpls/af_mpls.c:1656
       notifier_call_chain+0xb5/0x200 kernel/notifier.c:84
       call_netdevice_notifiers_info+0xb5/0x130 net/core/dev.c:1938
       call_netdevice_notifiers_extack net/core/dev.c:1976 [inline]
       call_netdevice_notifiers net/core/dev.c:1990 [inline]
       unregister_netdevice_many+0x92e/0x1890 net/core/dev.c:10751
       default_device_exit_batch+0x449/0x590 net/core/dev.c:11245
       ops_exit_list+0x125/0x170 net/core/net_namespace.c:167
       cleanup_net+0x4ea/0xb00 net/core/net_namespace.c:594
       process_one_work+0x996/0x1610 kernel/workqueue.c:2289
       worker_thread+0x665/0x1080 kernel/workqueue.c:2436
       kthread+0x2e9/0x3a0 kernel/kthread.c:376
       ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:298
      
      The buggy address belongs to the object at ffff88807d37b900
       which belongs to the cache kmalloc-64 of size 64
      The buggy address is located 4 bytes inside of
       64-byte region [ffff88807d37b900, ffff88807d37b940)
      
      The buggy address belongs to the physical page:
      page:ffffea0001f4dec0 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff88807d37b180 pfn:0x7d37b
      flags: 0xfff00000000200(slab|node=0|zone=1|lastcpupid=0x7ff)
      raw: 00fff00000000200 ffff888010c41340 ffffea0001c795c8 ffff888010c40200
      raw: ffff88807d37b180 ffff88807d37b000 000000010000001f 0000000000000000
      page dumped because: kasan: bad access detected
      page_owner tracks the page as allocated
      page last allocated via order 0, migratetype Unmovable, gfp_mask 0x342040(__GFP_IO|__GFP_NOWARN|__GFP_COMP|__GFP_HARDWALL|__GFP_THISNODE), pid 2963, tgid 2963 (udevd), ts 139732238007, free_ts 139730893262
       prep_new_page mm/page_alloc.c:2441 [inline]
       get_page_from_freelist+0xba2/0x3e00 mm/page_alloc.c:4182
       __alloc_pages+0x1b2/0x500 mm/page_alloc.c:5408
       __alloc_pages_node include/linux/gfp.h:587 [inline]
       kmem_getpages mm/slab.c:1378 [inline]
       cache_grow_begin+0x75/0x350 mm/slab.c:2584
       cache_alloc_refill+0x27f/0x380 mm/slab.c:2957
       ____cache_alloc mm/slab.c:3040 [inline]
       ____cache_alloc mm/slab.c:3023 [inline]
       __do_cache_alloc mm/slab.c:3267 [inline]
       slab_alloc mm/slab.c:3309 [inline]
       __do_kmalloc mm/slab.c:3708 [inline]
       __kmalloc+0x3b3/0x4d0 mm/slab.c:3719
       kmalloc include/linux/slab.h:586 [inline]
       kzalloc include/linux/slab.h:714 [inline]
       tomoyo_encode2.part.0+0xe9/0x3a0 security/tomoyo/realpath.c:45
       tomoyo_encode2 security/tomoyo/realpath.c:31 [inline]
       tomoyo_encode+0x28/0x50 security/tomoyo/realpath.c:80
       tomoyo_realpath_from_path+0x186/0x620 security/tomoyo/realpath.c:288
       tomoyo_get_realpath security/tomoyo/file.c:151 [inline]
       tomoyo_path_perm+0x21b/0x400 security/tomoyo/file.c:822
       security_inode_getattr+0xcf/0x140 security/security.c:1350
       vfs_getattr fs/stat.c:157 [inline]
       vfs_statx+0x16a/0x390 fs/stat.c:232
       vfs_fstatat+0x8c/0xb0 fs/stat.c:255
       __do_sys_newfstatat+0x91/0x110 fs/stat.c:425
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      page last free stack trace:
       reset_page_owner include/linux/page_owner.h:24 [inline]
       free_pages_prepare mm/page_alloc.c:1356 [inline]
       free_pcp_prepare+0x549/0xd20 mm/page_alloc.c:1406
       free_unref_page_prepare mm/page_alloc.c:3328 [inline]
       free_unref_page+0x19/0x6a0 mm/page_alloc.c:3423
       __vunmap+0x85d/0xd30 mm/vmalloc.c:2667
       __vfree+0x3c/0xd0 mm/vmalloc.c:2715
       vfree+0x5a/0x90 mm/vmalloc.c:2746
       __do_replace+0x16b/0x890 net/ipv6/netfilter/ip6_tables.c:1117
       do_replace net/ipv6/netfilter/ip6_tables.c:1157 [inline]
       do_ip6t_set_ctl+0x90d/0xb90 net/ipv6/netfilter/ip6_tables.c:1639
       nf_setsockopt+0x83/0xe0 net/netfilter/nf_sockopt.c:101
       ipv6_setsockopt+0x122/0x180 net/ipv6/ipv6_sockglue.c:1026
       tcp_setsockopt+0x136/0x2520 net/ipv4/tcp.c:3696
       __sys_setsockopt+0x2db/0x6a0 net/socket.c:2180
       __do_sys_setsockopt net/socket.c:2191 [inline]
       __se_sys_setsockopt net/socket.c:2188 [inline]
       __x64_sys_setsockopt+0xba/0x150 net/socket.c:2188
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Memory state around the buggy address:
       ffff88807d37b800: 00 00 00 00 00 fc fc fc fc fc fc fc fc fc fc fc
       ffff88807d37b880: 00 00 00 00 00 fc fc fc fc fc fc fc fc fc fc fc
      >ffff88807d37b900: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
                         ^
       ffff88807d37b980: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
       ffff88807d37ba00: 00 00 00 00 00 fc fc fc fc fc fc fc fc fc fc fc
      
      Fixes: c85bb41e
      
       ("igmp: fix ip_mc_sf_allow race [v5]")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Cc: Flavio Leitner <fbl@sysclose.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1d14c1c7
    • Filipe Manana's avatar
      btrfs: always log symlinks in full mode · 2b99ff4c
      Filipe Manana authored
      commit d0e64a98
      
       upstream.
      
      On Linux, empty symlinks are invalid, and attempting to create one with
      the system call symlink(2) results in an -ENOENT error and this is
      explicitly documented in the man page.
      
      If we rename a symlink that was created in the current transaction and its
      parent directory was logged before, we actually end up logging the symlink
      without logging its content, which is stored in an inline extent. That
      means that after a power failure we can end up with an empty symlink,
      having no content and an i_size of 0 bytes.
      
      It can be easily reproduced like this:
      
        $ mkfs.btrfs -f /dev/sdc
        $ mount /dev/sdc /mnt
      
        $ mkdir /mnt/testdir
        $ sync
      
        # Create a file inside the directory and fsync the directory.
        $ touch /mnt/testdir/foo
        $ xfs_io -c "fsync" /mnt/testdir
      
        # Create a symlink inside the directory and then rename the symlink.
        $ ln -s /mnt/testdir/foo /mnt/testdir/bar
        $ mv /mnt/testdir/bar /mnt/testdir/baz
      
        # Now fsync again the directory, this persist the log tree.
        $ xfs_io -c "fsync" /mnt/testdir
      
        <power failure>
      
        $ mount /dev/sdc /mnt
        $ stat -c %s /mnt/testdir/baz
        0
        $ readlink /mnt/testdir/baz
        $
      
      Fix this by always logging symlinks in full mode (LOG_INODE_ALL), so that
      their content is also logged.
      
      A test case for fstests will follow.
      
      CC: stable@vger.kernel.org # 4.9+
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2b99ff4c
    • Sergey Shtylyov's avatar
      smsc911x: allow using IRQ0 · dc478448
      Sergey Shtylyov authored
      commit 5ef9b803 upstream.
      
      The AlphaProject AP-SH4A-3A/AP-SH4AD-0A SH boards use IRQ0 for their SMSC
      LAN911x Ethernet chip, so the networking on them must have been broken by
      commit 965b2aa7 ("net/smsc911x: fix irq resource allocation failure")
      which filtered out 0 as well as the negative error codes -- it was kinda
      correct at the time, as platform_get_irq() could return 0 on of_irq_get()
      failure and on the actual 0 in an IRQ resource.  This issue was fixed by
      me (back in 2016!), so we should be able to fix this driver to allow IRQ0
      usage again...
      
      When merging this to the stable kernels, make sure you also merge commit
      e330b9a6 ("platform: don't return 0 from platform_get_irq[_byname]()
      on error") -- that's my fix to platform_get_irq() for the DT platforms...
      
      Fixes: 965b2aa7
      
       ("net/smsc911x: fix irq resource allocation failure")
      Signed-off-by: default avatarSergey Shtylyov <s.shtylyov@omp.ru>
      Link: https://lore.kernel.org/r/656036e4-6387-38df-b8a7-6ba683b16e63@omp.ru
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      dc478448
    • Somnath Kotur's avatar
      bnxt_en: Fix possible bnxt_open() failure caused by wrong RFS flag · cff6cb16
      Somnath Kotur authored
      commit 13ba7943 upstream.
      
      bnxt_open() can fail in this code path, especially on a VF when
      it fails to reserve default rings:
      
      bnxt_open()
        __bnxt_open_nic()
          bnxt_clear_int_mode()
          bnxt_init_dflt_ring_mode()
      
      RX rings would be set to 0 when we hit this error path.
      
      It is possible for a subsequent bnxt_open() call to potentially succeed
      with a code path like this:
      
      bnxt_open()
        bnxt_hwrm_if_change()
          bnxt_fw_init_one()
            bnxt_fw_init_one_p3()
              bnxt_set_dflt_rfs()
                bnxt_rfs_capable()
                  bnxt_hwrm_reserve_rings()
      
      On older chips, RFS is capable if we can reserve the number of vnics that
      is equal to RX rings + 1.  But since RX rings is still set to 0 in this
      code path, we may mistakenly think that RFS is supported for 0 RX rings.
      
      Later, when the default RX rings are reserved and we try to enable
      RFS, it would fail and cause bnxt_open() to fail unnecessarily.
      
      We fix this in 2 places.  bnxt_rfs_capable() will always return false if
      RX rings is not yet set.  bnxt_init_dflt_ring_mode() will call
      bnxt_set_dflt_rfs() which will always clear the RFS flags if RFS is not
      supported.
      
      Fixes: 20d7d1c5
      
       ("bnxt_en: reliably allocate IRQ table on reset to avoid crash")
      Signed-off-by: default avatarSomnath Kotur <somnath.kotur@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cff6cb16
    • Ido Schimmel's avatar
      selftests: mirror_gre_bridge_1q: Avoid changing PVID while interface is operational · 64ece01a
      Ido Schimmel authored
      commit 3122257c upstream.
      
      In emulated environments, the bridge ports enslaved to br1 get a carrier
      before changing br1's PVID. This means that by the time the PVID is
      changed, br1 is already operational and configured with an IPv6
      link-local address.
      
      When the test is run with netdevs registered by mlxsw, changing the PVID
      is vetoed, as changing the VID associated with an existing L3 interface
      is forbidden. This restriction is similar to the 8021q driver's
      restriction of changing the VID of an existing interface.
      
      Fix this by taking br1 down and bringing it back up when it is fully
      configured.
      
      With this fix, the test reliably passes on top of both the SW and HW
      data paths (emulated or not).
      
      Fixes: 239e754a
      
       ("selftests: forwarding: Test mirror-to-gretap w/ UL 802.1q")
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: default avatarPetr Machata <petrm@nvidia.com>
      Link: https://lore.kernel.org/r/20220502084507.364774-1-idosch@nvidia.com
      
      
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      64ece01a
    • Shravya Kumbham's avatar
      net: emaclite: Add error handling for of_address_to_resource() · 52401926
      Shravya Kumbham authored
      commit 7a6bc33a upstream.
      
      check the return value of of_address_to_resource() and also add
      missing of_node_put() for np and npp nodes.
      
      Fixes: e0a3bc65
      
       ("net: emaclite: Support multiple phys connected to one MDIO bus")
      Addresses-Coverity: Event check_return value.
      Signed-off-by: default avatarShravya Kumbham <shravya.kumbham@xilinx.com>
      Signed-off-by: default avatarRadhey Shyam Pandey <radhey.shyam.pandey@xilinx.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      52401926
    • Yang Yingliang's avatar
      net: stmmac: dwmac-sun8i: add missing of_node_put() in sun8i_dwmac_register_mdio_mux() · 354cac1e
      Yang Yingliang authored
      commit 1a15267b upstream.
      
      The node pointer returned by of_get_child_by_name() with refcount incremented,
      so add of_node_put() after using it.
      
      Fixes: 634db83b
      
       ("net: stmmac: dwmac-sun8i: Handle integrated/external MDIOs")
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      Link: https://lore.kernel.org/r/20220428095716.540452-1-yangyingliang@huawei.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      354cac1e
    • Yang Yingliang's avatar
      net: ethernet: mediatek: add missing of_node_put() in mtk_sgmii_init() · 0510b6cc
      Yang Yingliang authored
      commit ff5265d4 upstream.
      
      The node pointer returned by of_parse_phandle() with refcount incremented,
      so add of_node_put() after using it in mtk_sgmii_init().
      
      Fixes: 9ffee4a8
      
       ("net: ethernet: mediatek: Extend SGMII related functions")
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      Link: https://lore.kernel.org/r/20220428062543.64883-1-yangyingliang@huawei.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0510b6cc
    • Cheng Xu's avatar
      RDMA/siw: Fix a condition race issue in MPA request processing · 10298659
      Cheng Xu authored
      commit ef91271c upstream.
      
      The calling of siw_cm_upcall and detaching new_cep with its listen_cep
      should be atomistic semantics. Otherwise siw_reject may be called in a
      temporary state, e,g, siw_cm_upcall is called but the new_cep->listen_cep
      has not being cleared.
      
      This fixes a WARN:
      
        WARNING: CPU: 7 PID: 201 at drivers/infiniband/sw/siw/siw_cm.c:255 siw_cep_put+0x125/0x130 [siw]
        CPU: 2 PID: 201 Comm: kworker/u16:22 Kdump: loaded Tainted: G            E     5.17.0-rc7 #1
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
        Workqueue: iw_cm_wq cm_work_handler [iw_cm]
        RIP: 0010:siw_cep_put+0x125/0x130 [siw]
        Call Trace:
         <TASK>
         siw_reject+0xac/0x180 [siw]
         iw_cm_reject+0x68/0xc0 [iw_cm]
         cm_work_handler+0x59d/0xe20 [iw_cm]
         process_one_work+0x1e2/0x3b0
         worker_thread+0x50/0x3a0
         ? rescuer_thread+0x390/0x390
         kthread+0xe5/0x110
         ? kthread_complete_and_exit+0x20/0x20
         ret_from_fork+0x1f/0x30
         </TASK>
      
      Fixes: 6c52fdc2 ("rdma/siw: connection management")
      Link: https://lore.kernel.org/r/d528d83466c44687f3872eadcb8c184528b2e2d4.1650526554.git.chengyou@linux.alibaba.com
      
      
      Reported-by: default avatarLuis Chamberlain <mcgrof@kernel.org>
      Reviewed-by: default avatarBernard Metzler <bmt@zurich.ibm.com>
      Signed-off-by: default avatarCheng Xu <chengyou@linux.alibaba.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@nvidia.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      10298659
    • Codrin Ciubotariu's avatar
      ASoC: dmaengine: Restore NULL prepare_slave_config() callback · e6ae21eb
      Codrin Ciubotariu authored
      commit 660564fc upstream.
      
      As pointed out by Sascha Hauer, this patch changes:
      if (pmc->config && !pcm->config->prepare_slave_config)
              <do nothing>
      to:
      if (pmc->config && !pcm->config->prepare_slave_config)
              snd_dmaengine_pcm_prepare_slave_config()
      
      This breaks the drivers that do not need a call to
      dmaengine_slave_config(). Drivers that still need to call
      snd_dmaengine_pcm_prepare_slave_config(), but have a NULL
      pcm->config->prepare_slave_config should use
      snd_dmaengine_pcm_prepare_slave_config() as their prepare_slave_config
      callback.
      
      Fixes: 9a1e1344
      
       ("ASoC: dmaengine: do not use a NULL prepare_slave_config() callback")
      Reported-by: default avatarSascha Hauer <sha@pengutronix.de>
      Signed-off-by: default avatarCodrin Ciubotariu <codrin.ciubotariu@microchip.com>
      Link: https://lore.kernel.org/r/20220421125403.2180824-1-codrin.ciubotariu@microchip.com
      
      
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e6ae21eb
    • Armin Wolf's avatar
      hwmon: (adt7470) Fix warning on module removal · df3ea6cc
      Armin Wolf authored
      commit 7b2666ce
      
       upstream.
      
      When removing the adt7470 module, a warning might be printed:
      
      do not call blocking ops when !TASK_RUNNING; state=1
      set at [<ffffffffa006052b>] adt7470_update_thread+0x7b/0x130 [adt7470]
      
      This happens because adt7470_update_thread() can leave the kthread in
      TASK_INTERRUPTIBLE state when the kthread is being stopped before
      the call of set_current_state(). Since kthread_exit() might sleep in
      exit_signals(), the warning is printed.
      Fix that by using schedule_timeout_interruptible() and removing
      the call of set_current_state().
      This causes TASK_INTERRUPTIBLE to be set after kthread_should_stop()
      which might cause the kthread to exit.
      
      Reported-by: default avatarZheyu Ma <zheyuma97@gmail.com>
      Fixes: 93cacfd4
      
       (hwmon: (adt7470) Allow faster removal)
      Signed-off-by: default avatarArmin Wolf <W_Armin@gmx.de>
      Tested-by: default avatarZheyu Ma <zheyuma97@gmail.com>
      Link: https://lore.kernel.org/r/20220407101312.13331-1-W_Armin@gmx.de
      
      
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      df3ea6cc
    • Duoming Zhou's avatar
      NFC: netlink: fix sleep in atomic bug when firmware download timeout · 01d4363d
      Duoming Zhou authored
      commit 4071bf12 upstream.
      
      There are sleep in atomic bug that could cause kernel panic during
      firmware download process. The root cause is that nlmsg_new with
      GFP_KERNEL parameter is called in fw_dnld_timeout which is a timer
      handler. The call trace is shown below:
      
      BUG: sleeping function called from invalid context at include/linux/sched/mm.h:265
      Call Trace:
      kmem_cache_alloc_node
      __alloc_skb
      nfc_genl_fw_download_done
      call_timer_fn
      __run_timers.part.0
      run_timer_softirq
      __do_softirq
      ...
      
      The nlmsg_new with GFP_KERNEL parameter may sleep during memory
      allocation process, and the timer handler is run as the result of
      a "software interrupt" that should not call any other function
      that could sleep.
      
      This patch changes allocation mode of netlink message from GFP_KERNEL
      to GFP_ATOMIC in order to prevent sleep in atomic bug. The GFP_ATOMIC
      flag makes memory allocation operation could be used in atomic context.
      
      Fixes: 9674da87 ("NFC: Add firmware upload netlink command")
      Fixes: 9ea7187c
      
       ("NFC: netlink: Rename CMD_FW_UPLOAD to CMD_FW_DOWNLOAD")
      Signed-off-by: default avatarDuoming Zhou <duoming@zju.edu.cn>
      Reviewed-by: default avatarKrzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
      Link: https://lore.kernel.org/r/20220504055847.38026-1-duoming@zju.edu.cn
      
      
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      01d4363d
    • Duoming Zhou's avatar
      nfc: nfcmrvl: main: reorder destructive operations in nfcmrvl_nci_unregister_dev to avoid bugs · 33d3e76f
      Duoming Zhou authored
      commit d270453a upstream.
      
      There are destructive operations such as nfcmrvl_fw_dnld_abort and
      gpio_free in nfcmrvl_nci_unregister_dev. The resources such as firmware,
      gpio and so on could be destructed while the upper layer functions such as
      nfcmrvl_fw_dnld_start and nfcmrvl_nci_recv_frame is executing, which leads
      to double-free, use-after-free and null-ptr-deref bugs.
      
      There are three situations that could lead to double-free bugs.
      
      The first situation is shown below:
      
         (Thread 1)                 |      (Thread 2)
      nfcmrvl_fw_dnld_start         |
       ...                          |  nfcmrvl_nci_unregister_dev
       release_firmware()           |   nfcmrvl_fw_dnld_abort
        kfree(fw) //(1)             |    fw_dnld_over
                                    |     release_firmware
        ...                         |      kfree(fw) //(2)
                                    |     ...
      
      The second situation is shown below:
      
         (Thread 1)                 |      (Thread 2)
      nfcmrvl_fw_dnld_start         |
       ...                          |
       mod_timer                    |
       (wait a time)                |
       fw_dnld_timeout              |  nfcmrvl_nci_unregister_dev
         fw_dnld_over               |   nfcmrvl_fw_dnld_abort
          release_firmware          |    fw_dnld_over
           kfree(fw) //(1)          |     release_firmware
           ...                      |      kfree(fw) //(2)
      
      The third situation is shown below:
      
             (Thread 1)               |       (Thread 2)
      nfcmrvl_nci_recv_frame          |
       if(..->fw_download_in_progress)|
        nfcmrvl_fw_dnld_recv_frame    |
         queue_work                   |
                                      |
      fw_dnld_rx_work                 | nfcmrvl_nci_unregister_dev
       fw_dnld_over                   |  nfcmrvl_fw_dnld_abort
        release_firmware              |   fw_dnld_over
         kfree(fw) //(1)              |    release_firmware
                                      |     kfree(fw) //(2)
      
      The firmware struct is deallocated in position (1) and deallocated
      in position (2) again.
      
      The crash trace triggered by POC is like below:
      
      BUG: KASAN: double-free or invalid-free in fw_dnld_over
      Call Trace:
        kfree
        fw_dnld_over
        nfcmrvl_nci_unregister_dev
        nci_uart_tty_close
        tty_ldisc_kill
        tty_ldisc_hangup
        __tty_hangup.part.0
        tty_release
        ...
      
      What's more, there are also use-after-free and null-ptr-deref bugs
      in nfcmrvl_fw_dnld_start. If we deallocate firmware struct, gpio or
      set null to the members of priv->fw_dnld in nfcmrvl_nci_unregister_dev,
      then, we dereference firmware, gpio or the members of priv->fw_dnld in
      nfcmrvl_fw_dnld_start, the UAF or NPD bugs will happen.
      
      This patch reorders destructive operations after nci_unregister_device
      in order to synchronize between cleanup routine and firmware download
      routine.
      
      The nci_unregister_device is well synchronized. If the device is
      detaching, the firmware download routine will goto error. If firmware
      download routine is executing, nci_unregister_device will wait until
      firmware download routine is finished.
      
      Fixes: 3194c687
      
       ("NFC: nfcmrvl: add firmware download support")
      Signed-off-by: default avatarDuoming Zhou <duoming@zju.edu.cn>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      33d3e76f
    • Duoming Zhou's avatar
      nfc: replace improper check device_is_registered() in netlink related functions · 85aecdef
      Duoming Zhou authored
      commit da5c0f11 upstream.
      
      The device_is_registered() in nfc core is used to check whether
      nfc device is registered in netlink related functions such as
      nfc_fw_download(), nfc_dev_up() and so on. Although device_is_registered()
      is protected by device_lock, there is still a race condition between
      device_del() and device_is_registered(). The root cause is that
      kobject_del() in device_del() is not protected by device_lock.
      
         (cleanup task)         |     (netlink task)
                                |
      nfc_unregister_device     | nfc_fw_download
       device_del               |  device_lock
        ...                     |   if (!device_is_registered)//(1)
        kobject_del//(2)        |   ...
       ...                      |  device_unlock
      
      The device_is_registered() returns the value of state_in_sysfs and
      the state_in_sysfs is set to zero in kobject_del(). If we pass check in
      position (1), then set zero in position (2). As a result, the check
      in position (1) is useless.
      
      This patch uses bool variable instead of device_is_registered() to judge
      whether the nfc device is registered, which is well synchronized.
      
      Fixes: 3e256b8f
      
       ("NFC: add nfc subsystem core")
      Signed-off-by: default avatarDuoming Zhou <duoming@zju.edu.cn>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      85aecdef
    • Daniel Hellstrom's avatar
      can: grcan: use ofdev->dev when allocating DMA memory · da9eb43b
      Daniel Hellstrom authored
      commit 101da426 upstream.
      
      Use the device of the device tree node should be rather than the
      device of the struct net_device when allocating DMA buffers.
      
      The driver got away with it on sparc32 until commit 53b7670e
      ("sparc: factor the dma coherent mapping into helper") after which the
      driver oopses.
      
      Fixes: 6cec9b07 ("can: grcan: Add device driver for GRCAN and GRHCAN cores")
      Link: https://lore.kernel.org/all/20220429084656.29788-2-andreas@gaisler.com
      
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarDaniel Hellstrom <daniel@gaisler.com>
      Signed-off-by: default avatarAndreas Larsson <andreas@gaisler.com>
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      da9eb43b
    • Duoming Zhou's avatar
      can: grcan: grcan_close(): fix deadlock · 8b451b7d
      Duoming Zhou authored
      commit 47f070a6 upstream.
      
      There are deadlocks caused by del_timer_sync(&priv->hang_timer) and
      del_timer_sync(&priv->rr_timer) in grcan_close(), one of the deadlocks
      are shown below:
      
         (Thread 1)              |      (Thread 2)
                                 | grcan_reset_timer()
      grcan_close()              |  mod_timer()
       spin_lock_irqsave() //(1) |  (wait a time)
       ...                       | grcan_initiate_running_reset()
       del_timer_sync()          |  spin_lock_irqsave() //(2)
       (wait timer to stop)      |  ...
      
      We hold priv->lock in position (1) of thread 1 and use
      del_timer_sync() to wait timer to stop, but timer handler also need
      priv->lock in position (2) of thread 2. As a result, grcan_close()
      will block forever.
      
      This patch extracts del_timer_sync() from the protection of
      spin_lock_irqsave(), which could let timer handler to obtain the
      needed lock.
      
      Link: https://lore.kernel.org/all/20220425042400.66517-1-duoming@zju.edu.cn
      Fixes: 6cec9b07
      
       ("can: grcan: Add device driver for GRCAN and GRHCAN cores")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarDuoming Zhou <duoming@zju.edu.cn>
      Reviewed-by: default avatarAndreas Larsson <andreas@gaisler.com>
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8b451b7d
    • Jan Höppner's avatar
      s390/dasd: Fix read inconsistency for ESE DASD devices · 8f424645
      Jan Höppner authored
      commit b9c10f68 upstream.
      
      Read requests that return with NRF error are partially completed in
      dasd_eckd_ese_read(). The function keeps track of the amount of
      processed bytes and the driver will eventually return this information
      back to the block layer for further processing via __dasd_cleanup_cqr()
      when the request is in the final stage of processing (from the driver's
      perspective).
      
      For this, blk_update_request() is used which requires the number of
      bytes to complete the request. As per documentation the nr_bytes
      parameter is described as follows:
         "number of bytes to complete for @req".
      
      This was mistakenly interpreted as "number of bytes _left_ for @req"
      leading to new requests with incorrect data length. The consequence are
      inconsistent and completely wrong read requests as data from random
      memory areas are read back.
      
      Fix this by correctly specifying the amount of bytes that should be used
      to complete the request.
      
      Fixes: 5e6bdd37
      
       ("s390/dasd: fix data corruption for thin provisioned devices")
      Cc: stable@vger.kernel.org # 5.3+
      Signed-off-by: default avatarJan Höppner <hoeppner@linux.ibm.com>
      Reviewed-by: default avatarStefan Haberland <sth@linux.ibm.com>
      Link: https://lore.kernel.org/r/20220505141733.1989450-5-sth@linux.ibm.com
      
      
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8f424645
    • Jan Höppner's avatar
      s390/dasd: Fix read for ESE with blksize < 4k · 91193a2c
      Jan Höppner authored
      commit cd68c48e upstream.
      
      When reading unformatted tracks on ESE devices, the corresponding memory
      areas are simply set to zero for each segment. This is done incorrectly
      for blocksizes < 4096.
      
      There are two problems. First, the increment of dst is done using the
      counter of the loop (off), which is increased by blksize every
      iteration. This leads to a much bigger increment for dst as actually
      intended. Second, the increment of dst is done before the memory area
      is set to 0, skipping a significant amount of bytes of memory.
      
      This leads to illegal overwriting of memory and ultimately to a kernel
      panic.
      
      This is not a problem with 4k blocksize because
      blk_queue_max_segment_size is set to PAGE_SIZE, always resulting in a
      single iteration for the inner segment loop (bv.bv_len == blksize). The
      incorrectly used 'off' value to increment dst is 0 and the correct
      memory area is used.
      
      In order to fix this for blksize < 4k, increment dst correctly using the
      blksize and only do it at the end of the loop.
      
      Fixes: 5e2b17e7
      
       ("s390/dasd: Add dynamic formatting support for ESE volumes")
      Cc: stable@vger.kernel.org # v5.3+
      Signed-off-by: default avatarJan Höppner <hoeppner@linux.ibm.com>
      Reviewed-by: default avatarStefan Haberland <sth@linux.ibm.com>
      Link: https://lore.kernel.org/r/20220505141733.1989450-4-sth@linux.ibm.com
      
      
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      91193a2c
    • Stefan Haberland's avatar
      s390/dasd: prevent double format of tracks for ESE devices · 1aa75808
      Stefan Haberland authored
      commit 71f38716 upstream.
      
      For ESE devices we get an error for write operations on an unformatted
      track. Afterwards the track will be formatted and the IO operation
      restarted.
      When using alias devices a track might be accessed by multiple requests
      simultaneously and there is a race window that a track gets formatted
      twice resulting in data loss.
      
      Prevent this by remembering the amount of formatted tracks when starting
      a request and comparing this number before actually formatting a track
      on the fly. If the number has changed there is a chance that the current
      track was finally formatted in between. As a result do not format the
      track and restart the current IO to check.
      
      The number of formatted tracks does not match the overall number of
      formatted tracks on the device and it might wrap around but this is no
      problem. It is only needed to recognize that a track has been formatted at
      all in between.
      
      Fixes: 5e2b17e7
      
       ("s390/dasd: Add dynamic formatting support for ESE volumes")
      Cc: stable@vger.kernel.org # 5.3+
      Signed-off-by: default avatarStefan Haberland <sth@linux.ibm.com>
      Reviewed-by: default avatarJan Hoeppner <hoeppner@linux.ibm.com>
      Link: https://lore.kernel.org/r/20220505141733.1989450-3-sth@linux.ibm.com
      
      
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1aa75808
    • Stefan Haberland's avatar
      s390/dasd: fix data corruption for ESE devices · 061a424d
      Stefan Haberland authored
      commit 5b53a405 upstream.
      
      For ESE devices we get an error when accessing an unformatted track.
      The handling of this error will return zero data for read requests and
      format the track on demand before writing to it. To do this the code needs
      to distinguish between read and write requests. This is done with data from
      the blocklayer request. A pointer to the blocklayer request is stored in
      the CQR.
      
      If there is an error on the device an ERP request is built to do error
      recovery. While the ERP request is mostly a copy of the original CQR the
      pointer to the blocklayer request is not copied to not accidentally pass
      it back to the blocklayer without cleanup.
      
      This leads to the error that during ESE handling after an ERP request was
      built it is not possible to determine the IO direction. This leads to the
      formatting of a track for read requests which might in turn lead to data
      corruption.
      
      Fixes: 5e2b17e7
      
       ("s390/dasd: Add dynamic formatting support for ESE volumes")
      Cc: stable@vger.kernel.org # 5.3+
      Signed-off-by: default avatarStefan Haberland <sth@linux.ibm.com>
      Reviewed-by: default avatarJan Hoeppner <hoeppner@linux.ibm.com>
      Link: https://lore.kernel.org/r/20220505141733.1989450-2-sth@linux.ibm.com
      
      
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      061a424d