- Nov 26, 2022
-
-
Greg Kroah-Hartman authored
Link: https://lore.kernel.org/r/20221123084625.457073469@linuxfoundation.org Tested-by:
Jon Hunter <jonathanh@nvidia.com> Tested-by:
Ron Economos <re@w6rz.net> Tested-by:
Fenil Jain <fkjainco@gmail.com> Tested-by:
Ronald Warsow <rwarsow@gmx.de> Tested-by:
Linux Kernel Functional Testing <lkft@linaro.org> Tested-by:
Bagas Sanjaya <bagasdotme@gmail.com>=20> Tested-by:
Sudip Mukherjee <sudip.mukherjee@codethink.co.uk> Tested-by:
Justin M. Forbes <jforbes@fedoraproject.org> Link: https://lore.kernel.org/r/20221125075804.064161337@linuxfoundation.org Tested-by:
Ronald Warsow <rwarsow@gmx.de> Tested-by:
Linux Kernel Functional Testing <lkft@linaro.org> Tested-by:
Guenter Roeck <linux@roeck-us.net> Tested-by:
Rudi Heitbaum <rudi@heitbaum.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Hawkins Jiawei authored
commit 63095f4f upstream. Kernel iterates over ATTR_RECORDs in mft record in ntfs_attr_find(). Because the ATTR_RECORDs are next to each other, kernel can get the next ATTR_RECORD from end address of current ATTR_RECORD, through current ATTR_RECORD length field. The problem is that during iteration, when kernel calculates the end address of current ATTR_RECORD, kernel may trigger an integer overflow bug in executing `a = (ATTR_RECORD*)((u8*)a + le32_to_cpu(a->length))`. This may wrap, leading to a forever iteration on 32bit systems. This patch solves it by adding some checks on calculating end address of current ATTR_RECORD during iteration. Link: https://lkml.kernel.org/r/20220831160935.3409-4-yin31149@gmail.com Link: https://lore.kernel.org/all/20220827105842.GM2030@kadam/ Signed-off-by:
Hawkins Jiawei <yin31149@gmail.com> Suggested-by:
Dan Carpenter <dan.carpenter@oracle.com> Cc: Anton Altaparmakov <anton@tuxera.com> Cc: chenxiaosong (A) <chenxiaosong2@huawei.com> Cc: syzkaller-bugs <syzkaller-bugs@googlegroups.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Hawkins Jiawei authored
commit 36a4d82d upstream. Kernel iterates over ATTR_RECORDs in mft record in ntfs_attr_find(). To ensure access on these ATTR_RECORDs are within bounds, kernel will do some checking during iteration. The problem is that during checking whether ATTR_RECORD's name is within bounds, kernel will dereferences the ATTR_RECORD name_offset field, before checking this ATTR_RECORD strcture is within bounds. This problem may result out-of-bounds read in ntfs_attr_find(), reported by Syzkaller: ================================================================== BUG: KASAN: use-after-free in ntfs_attr_find+0xc02/0xce0 fs/ntfs/attrib.c:597 Read of size 2 at addr ffff88807e352009 by task syz-executor153/3607 [...] Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106 print_address_description mm/kasan/report.c:317 [inline] print_report.cold+0x2ba/0x719 mm/kasan/report.c:433 kasan_report+0xb1/0x1e0 mm/kasan/report.c:495 ntfs_attr_find+0xc02/0xce0 fs/ntfs/attrib.c:597 ntfs_attr_lookup+0x1056/0x2070 fs/ntfs/attrib.c:1193 ntfs_read_inode_mount+0x89a/0x2580 fs/ntfs/inode.c:1845 ntfs_fill_super+0x1799/0x9320 fs/ntfs/super.c:2854 mount_bdev+0x34d/0x410 fs/super.c:1400 legacy_get_tree+0x105/0x220 fs/fs_context.c:610 vfs_get_tree+0x89/0x2f0 fs/super.c:1530 do_new_mount fs/namespace.c:3040 [inline] path_mount+0x1326/0x1e20 fs/namespace.c:3370 do_mount fs/namespace.c:3383 [inline] __do_sys_mount fs/namespace.c:3591 [inline] __se_sys_mount fs/namespace.c:3568 [inline] __x64_sys_mount+0x27f/0x300 fs/namespace.c:3568 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd [...] </TASK> The buggy address belongs to the physical page: page:ffffea0001f8d400 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x7e350 head:ffffea0001f8d400 order:3 compound_mapcount:0 compound_pincount:0 flags: 0xfff00000010200(slab|head|node=0|zone=1|lastcpupid=0x7ff) raw: 00fff00000010200 0000000000000000 dead000000000122 ffff888011842140 raw: 0000000000000000 0000000000040004 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff88807e351f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff88807e351f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc >ffff88807e352000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff88807e352080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff88807e352100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ================================================================== This patch solves it by moving the ATTR_RECORD strcture's bounds checking earlier, then checking whether ATTR_RECORD's name is within bounds. What's more, this patch also add some comments to improve its maintainability. Link: https://lkml.kernel.org/r/20220831160935.3409-3-yin31149@gmail.com Link: https://lore.kernel.org/all/1636796c-c85e-7f47-e96f-e074fee3c7d3@huawei.com/ Link: https://groups.google.com/g/syzkaller-bugs/c/t_XdeKPGTR4/m/LECAuIGcBgAJ Signed-off-by:
chenxiaosong (A) <chenxiaosong2@huawei.com> Signed-off-by:
Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by:
Hawkins Jiawei <yin31149@gmail.com> Reported-by:
<syzbot+5f8dcabe4a3b2c51c607@syzkaller.appspotmail.com> Tested-by:
<syzbot+5f8dcabe4a3b2c51c607@syzkaller.appspotmail.com> Cc: Anton Altaparmakov <anton@tuxera.com> Cc: syzkaller-bugs <syzkaller-bugs@googlegroups.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Hawkins Jiawei authored
commit d85a1bec upstream. Patch series "ntfs: fix bugs about Attribute", v2. This patchset fixes three bugs relative to Attribute in record: Patch 1 adds a sanity check to ensure that, attrs_offset field in first mft record loading from disk is within bounds. Patch 2 moves the ATTR_RECORD's bounds checking earlier, to avoid dereferencing ATTR_RECORD before checking this ATTR_RECORD is within bounds. Patch 3 adds an overflow checking to avoid possible forever loop in ntfs_attr_find(). Without patch 1 and patch 2, the kernel triggersa KASAN use-after-free detection as reported by Syzkaller. Although one of patch 1 or patch 2 can fix this, we still need both of them. Because patch 1 fixes the root cause, and patch 2 not only fixes the direct cause, but also fixes the potential out-of-bounds bug. This patch (of 3): Syzkaller reported use-after-free read as follows: ================================================================== BUG: KASAN: use-after-free in ntfs_attr_find+0xc02/0xce0 fs/ntfs/attrib.c:597 Read of size 2 at addr ffff88807e352009 by task syz-executor153/3607 [...] Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106 print_address_description mm/kasan/report.c:317 [inline] print_report.cold+0x2ba/0x719 mm/kasan/report.c:433 kasan_report+0xb1/0x1e0 mm/kasan/report.c:495 ntfs_attr_find+0xc02/0xce0 fs/ntfs/attrib.c:597 ntfs_attr_lookup+0x1056/0x2070 fs/ntfs/attrib.c:1193 ntfs_read_inode_mount+0x89a/0x2580 fs/ntfs/inode.c:1845 ntfs_fill_super+0x1799/0x9320 fs/ntfs/super.c:2854 mount_bdev+0x34d/0x410 fs/super.c:1400 legacy_get_tree+0x105/0x220 fs/fs_context.c:610 vfs_get_tree+0x89/0x2f0 fs/super.c:1530 do_new_mount fs/namespace.c:3040 [inline] path_mount+0x1326/0x1e20 fs/namespace.c:3370 do_mount fs/namespace.c:3383 [inline] __do_sys_mount fs/namespace.c:3591 [inline] __se_sys_mount fs/namespace.c:3568 [inline] __x64_sys_mount+0x27f/0x300 fs/namespace.c:3568 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd [...] </TASK> The buggy address belongs to the physical page: page:ffffea0001f8d400 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x7e350 head:ffffea0001f8d400 order:3 compound_mapcount:0 compound_pincount:0 flags: 0xfff00000010200(slab|head|node=0|zone=1|lastcpupid=0x7ff) raw: 00fff00000010200 0000000000000000 dead000000000122 ffff888011842140 raw: 0000000000000000 0000000000040004 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff88807e351f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff88807e351f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc >ffff88807e352000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff88807e352080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff88807e352100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ================================================================== Kernel will loads $MFT/$DATA's first mft record in ntfs_read_inode_mount(). Yet the problem is that after loading, kernel doesn't check whether attrs_offset field is a valid value. To be more specific, if attrs_offset field is larger than bytes_allocated field, then it may trigger the out-of-bounds read bug(reported as use-after-free bug) in ntfs_attr_find(), when kernel tries to access the corresponding mft record's attribute. This patch solves it by adding the sanity check between attrs_offset field and bytes_allocated field, after loading the first mft record. Link: https://lkml.kernel.org/r/20220831160935.3409-1-yin31149@gmail.com Link: https://lkml.kernel.org/r/20220831160935.3409-2-yin31149@gmail.com Signed-off-by:
Hawkins Jiawei <yin31149@gmail.com> Cc: Anton Altaparmakov <anton@tuxera.com> Cc: ChenXiaoSong <chenxiaosong2@huawei.com> Cc: syzkaller-bugs <syzkaller-bugs@googlegroups.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Jiri Olsa authored
commit 05b24ff9 upstream. We got report from sysbot [1] about warnings that were caused by bpf program attached to contention_begin raw tracepoint triggering the same tracepoint by using bpf_trace_printk helper that takes trace_printk_lock lock. Call Trace: <TASK> ? trace_event_raw_event_bpf_trace_printk+0x5f/0x90 bpf_trace_printk+0x2b/0xe0 bpf_prog_a9aec6167c091eef_prog+0x1f/0x24 bpf_trace_run2+0x26/0x90 native_queued_spin_lock_slowpath+0x1c6/0x2b0 _raw_spin_lock_irqsave+0x44/0x50 bpf_trace_printk+0x3f/0xe0 bpf_prog_a9aec6167c091eef_prog+0x1f/0x24 bpf_trace_run2+0x26/0x90 native_queued_spin_lock_slowpath+0x1c6/0x2b0 _raw_spin_lock_irqsave+0x44/0x50 bpf_trace_printk+0x3f/0xe0 bpf_prog_a9aec6167c091eef_prog+0x1f/0x24 bpf_trace_run2+0x26/0x90 native_queued_spin_lock_slowpath+0x1c6/0x2b0 _raw_spin_lock_irqsave+0x44/0x50 bpf_trace_printk+0x3f/0xe0 bpf_prog_a9aec6167c091eef_prog+0x1f/0x24 bpf_trace_run2+0x26/0x90 native_queued_spin_lock_slowpath+0x1c6/0x2b0 _raw_spin_lock_irqsave+0x44/0x50 __unfreeze_partials+0x5b/0x160 ... The can be reproduced by attaching bpf program as raw tracepoint on contention_begin tracepoint. The bpf prog calls bpf_trace_printk helper. Then by running perf bench the spin lock code is forced to take slow path and call contention_begin tracepoint. Fixing this by skipping execution of the bpf program if it's already running, Using bpf prog 'active' field, which is being currently used by trampoline programs for the same reason. Moving bpf_prog_inc_misses_counter to syscall.c because trampoline.c is compiled in just for CONFIG_BPF_JIT option. Reviewed-by:
Stanislav Fomichev <sdf@google.com> Reported-by:
<syzbot+2251879aa068ad9c960d@syzkaller.appspotmail.com> [1] https://lore.kernel.org/bpf/YxhFe3EwqchC%2FfYf@krava/T/#t Signed-off-by:
Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20220916071914.7156-1-jolsa@kernel.org Signed-off-by:
Alexei Starovoitov <ast@kernel.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Dominique Martinet authored
commit 296ab4a8 upstream. Shamelessly copying the explanation from Tetsuo Handa's suggested patch[1] (slightly reworded): syzbot is reporting inconsistent lock state in p9_req_put()[2], for p9_tag_remove() from p9_req_put() from IRQ context is using spin_lock_irqsave() on "struct p9_client"->lock but trans_fd (not from IRQ context) is using spin_lock(). Since the locks actually protect different things in client.c and in trans_fd.c, just replace trans_fd.c's lock by a new one specific to the transport (client.c's protect the idr for fid/tag allocations, while trans_fd.c's protects its own req list and request status field that acts as the transport's state machine) Link: https://lore.kernel.org/r/20220904112928.1308799-1-asmadeus@codewreck.org Link: https://lkml.kernel.org/r/2470e028-9b05-2013-7198-1fdad071d999@I-love.SAKURA.ne.jp [1] Link: https://syzkaller.appspot.com/bug?extid=2f20b523930c32c160cc [2] Reported-by:
syzbot <syzbot+2f20b523930c32c160cc@syzkaller.appspotmail.com> Reported-by:
Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reviewed-by:
Christian Schoenebeck <linux_oss@crudebyte.com> Signed-off-by:
Dominique Martinet <asmadeus@codewreck.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Alexander Potapenko authored
commit 1468c6f4 upstream. Functions implementing the a_ops->write_end() interface accept the `void *fsdata` parameter that is supposed to be initialized by the corresponding a_ops->write_begin() (which accepts `void **fsdata`). However not all a_ops->write_begin() implementations initialize `fsdata` unconditionally, so it may get passed uninitialized to a_ops->write_end(), resulting in undefined behavior. Fix this by initializing fsdata with NULL before the call to write_begin(), rather than doing so in all possible a_ops implementations. This patch covers only the following cases found by running x86 KMSAN under syzkaller: - generic_perform_write() - cont_expand_zero() and generic_cont_expand_simple() - page_symlink() Other cases of passing uninitialized fsdata may persist in the codebase. Link: https://lkml.kernel.org/r/20220915150417.722975-43-glider@google.com Signed-off-by:
Alexander Potapenko <glider@google.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Christoph Hellwig <hch@lst.de> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Eric Biggers <ebiggers@google.com> Cc: Eric Biggers <ebiggers@kernel.org> Cc: Eric Dumazet <edumazet@google.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Ilya Leoshkevich <iii@linux.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Kees Cook <keescook@chromium.org> Cc: Marco Elver <elver@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vegard Nossum <vegard.nossum@oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Mathieu Desnoyers authored
commit 448dca8c upstream. These commits use WARN_ON_ONCE() and kill the offending processes when deprecated and unknown flags are encountered: commit c17a6ff9 ("rseq: Kill process when unknown flags are encountered in ABI structures") commit 0190e419 ("rseq: Deprecate RSEQ_CS_FLAG_NO_RESTART_ON_* flags") The WARN_ON_ONCE() triggered by userspace input prevents use of Syzkaller to fuzz the rseq system call. Replace this WARN_ON_ONCE() by pr_warn_once() messages which contain actually useful information. Reported-by:
Mark Rutland <mark.rutland@arm.com> Signed-off-by:
Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by:
Mark Rutland <mark.rutland@arm.com> Acked-by:
Paul E. McKenney <paulmck@kernel.org> Link: https://lkml.kernel.org/r/20221102130635.7379-1-mathieu.desnoyers@efficios.com Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Hawkins Jiawei authored
commit e3e6e1d1 upstream. Syzkaller reports buffer overflow false positive as follows: ------------[ cut here ]------------ memcpy: detected field-spanning write (size 8) of single field "&compat_event->pointer" at net/wireless/wext-core.c:623 (size 4) WARNING: CPU: 0 PID: 3607 at net/wireless/wext-core.c:623 wireless_send_event+0xab5/0xca0 net/wireless/wext-core.c:623 Modules linked in: CPU: 1 PID: 3607 Comm: syz-executor659 Not tainted 6.0.0-rc6-next-20220921-syzkaller #0 [...] Call Trace: <TASK> ioctl_standard_call+0x155/0x1f0 net/wireless/wext-core.c:1022 wireless_process_ioctl+0xc8/0x4c0 net/wireless/wext-core.c:955 wext_ioctl_dispatch net/wireless/wext-core.c:988 [inline] wext_ioctl_dispatch net/wireless/wext-core.c:976 [inline] wext_handle_ioctl+0x26b/0x280 net/wireless/wext-core.c:1049 sock_ioctl+0x285/0x640 net/socket.c:1220 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:870 [inline] __se_sys_ioctl fs/ioctl.c:856 [inline] __x64_sys_ioctl+0x193/0x200 fs/ioctl.c:856 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd [...] </TASK> Wireless events will be sent on the appropriate channels in wireless_send_event(). Different wireless events may have different payload structure and size, so kernel uses **len** and **cmd** field in struct __compat_iw_event as wireless event common LCP part, uses **pointer** as a label to mark the position of remaining different part. Yet the problem is that, **pointer** is a compat_caddr_t type, which may be smaller than the relative structure at the same position. So during wireless_send_event() tries to parse the wireless events payload, it may trigger the memcpy() run-time destination buffer bounds checking when the relative structure's data is copied to the position marked by **pointer**. This patch solves it by introducing flexible-array field **ptr_bytes**, to mark the position of the wireless events remaining part next to LCP part. What's more, this patch also adds **ptr_len** variable in wireless_send_event() to improve its maintainability. Reported-and-tested-by:
<syzbot+473754e5af963cf014cf@syzkaller.appspotmail.com> Link: https://lore.kernel.org/all/00000000000070db2005e95a5984@google.com/ Suggested-by:
Kees Cook <keescook@chromium.org> Reviewed-by:
Kees Cook <keescook@chromium.org> Signed-off-by:
Hawkins Jiawei <yin31149@gmail.com> Signed-off-by:
Johannes Berg <johannes.berg@intel.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Kees Cook authored
commit 710d21fd upstream. In preparation for FORTIFY_SOURCE doing bounds-check on memcpy(), switch from __nlmsg_put to nlmsg_put(), and explain the bounds check for dealing with the memcpy() across a composite flexible array struct. Avoids this future run-time warning: memcpy: detected field-spanning write (size 32) of single field "&errmsg->msg" at net/netlink/af_netlink.c:2447 (size 16) Cc: Jakub Kicinski <kuba@kernel.org> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Jozsef Kadlecsik <kadlec@netfilter.org> Cc: Florian Westphal <fw@strlen.de> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Cc: syzbot <syzkaller@googlegroups.com> Cc: netfilter-devel@vger.kernel.org Cc: coreteam@netfilter.org Cc: netdev@vger.kernel.org Signed-off-by:
Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20220901071336.1418572-1-keescook@chromium.org Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Tetsuo Handa authored
commit ef575281 upstream. syzbot is reporting hung task at p9_fd_close() [1], for p9_mux_poll_stop() from p9_conn_destroy() from p9_fd_close() is failing to interrupt already started kernel_read() from p9_fd_read() from p9_read_work() and/or kernel_write() from p9_fd_write() from p9_write_work() requests. Since p9_socket_open() sets O_NONBLOCK flag, p9_mux_poll_stop() does not need to interrupt kernel_read()/kernel_write(). However, since p9_fd_open() does not set O_NONBLOCK flag, but pipe blocks unless signal is pending, p9_mux_poll_stop() needs to interrupt kernel_read()/kernel_write() when the file descriptor refers to a pipe. In other words, pipe file descriptor needs to be handled as if socket file descriptor. We somehow need to interrupt kernel_read()/kernel_write() on pipes. A minimal change, which this patch is doing, is to set O_NONBLOCK flag from p9_fd_open(), for O_NONBLOCK flag does not affect reading/writing of regular files. But this approach changes O_NONBLOCK flag on userspace- supplied file descriptors (which might break userspace programs), and O_NONBLOCK flag could be changed by userspace. It would be possible to set O_NONBLOCK flag every time p9_fd_read()/p9_fd_write() is invoked, but still remains small race window for clearing O_NONBLOCK flag. If we don't want to manipulate O_NONBLOCK flag, we might be able to surround kernel_read()/kernel_write() with set_thread_flag(TIF_SIGPENDING) and recalc_sigpending(). Since p9_read_work()/p9_write_work() works are processed by kernel threads which process global system_wq workqueue, signals could not be delivered from remote threads when p9_mux_poll_stop() from p9_conn_destroy() from p9_fd_close() is called. Therefore, calling set_thread_flag(TIF_SIGPENDING)/recalc_sigpending() every time would be needed if we count on signals for making kernel_read()/kernel_write() non-blocking. Link: https://lkml.kernel.org/r/345de429-a88b-7097-d177-adecf9fed342@I-love.SAKURA.ne.jp Link: https://syzkaller.appspot.com/bug?extid=8b41a1365f1106fd0f33 [1] Reported-by:
syzbot <syzbot+8b41a1365f1106fd0f33@syzkaller.appspotmail.com> Signed-off-by:
Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Tested-by:
syzbot <syzbot+8b41a1365f1106fd0f33@syzkaller.appspotmail.com> Reviewed-by:
Christian Schoenebeck <linux_oss@crudebyte.com> [Dominique: add comment at Christian's suggestion] Signed-off-by:
Dominique Martinet <asmadeus@codewreck.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Andreas Gruenbacher authored
commit 204c0300 upstream. Switch from strlcpy to strscpy and make sure that @count is the size of the smaller of the source and destination buffers. This prevents reading beyond the end of the source buffer when the source string isn't null terminated. Found by a modified version of syzkaller. Suggested-by:
Wolfram Sang <wsa+renesas@sang-engineering.com> Signed-off-by:
Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Andrew Price authored
commit 670f8ce5 upstream. Fuzzers like to scribble over sb_bsize_shift but in reality it's very unlikely that this field would be corrupted on its own. Nevertheless it should be checked to avoid the possibility of messy mount errors due to bad calculations. It's always a fixed value based on the block size so we can just check that it's the expected value. Tested with: mkfs.gfs2 -O -p lock_nolock /dev/vdb for i in 0 -1 64 65 32 33; do gfs2_edit -p sb field sb_bsize_shift $i /dev/vdb mount /dev/vdb /mnt/test && umount /mnt/test done Before this patch we get a withdraw after [ 76.413681] gfs2: fsid=loop0.0: fatal: invalid metadata block [ 76.413681] bh = 19 (type: exp=5, found=4) [ 76.413681] function = gfs2_meta_buffer, file = fs/gfs2/meta_io.c, line = 492 and with UBSAN configured we also get complaints like [ 76.373395] UBSAN: shift-out-of-bounds in fs/gfs2/ops_fstype.c:295:19 [ 76.373815] shift exponent 4294967287 is too large for 64-bit type 'long unsigned int' After the patch, these complaints don't appear, mount fails immediately and we get an explanation in dmesg. Reported-by:
<syzbot+dcf33a7aae997956fe06@syzkaller.appspotmail.com> Signed-off-by:
Andrew Price <anprice@redhat.com> Signed-off-by:
Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Dominique Martinet authored
commit 52f1c45d upstream. syzbot reported a double-lock here and we no longer need this lock after requests have been moved off to local list: just drop the lock earlier. Link: https://lkml.kernel.org/r/20220904064028.1305220-1-asmadeus@codewreck.org Reported-by:
<syzbot+50f7e8d06c3768dd97f3@syzkaller.appspotmail.com> Signed-off-by:
Dominique Martinet <asmadeus@codewreck.org> Tested-by:
Schspa Shi <schspa@gmail.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Eiichi Tsukata authored
commit 73536338 upstream. Should not call eventfd_ctx_put() in case of error. Fixes: 2fd6df2f ("KVM: x86/xen: intercept EVTCHNOP_send from guests") Reported-by:
<syzbot+6f0c896c5a9449a10ded@syzkaller.appspotmail.com> Signed-off-by:
Eiichi Tsukata <eiichi.tsukata@nutanix.com> Message-Id: <20221028092631.117438-1-eiichi.tsukata@nutanix.com> [Introduce new goto target instead. - Paolo] Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Eric Dumazet authored
commit ec7eede3 upstream. syzbot found that kcm_tx_work() could crash [1] in: /* Primarily for SOCK_SEQPACKET sockets */ if (likely(sk->sk_socket) && test_bit(SOCK_NOSPACE, &sk->sk_socket->flags)) { <<*>> clear_bit(SOCK_NOSPACE, &sk->sk_socket->flags); sk->sk_write_space(sk); } I think the reason is that another thread might concurrently run in kcm_release() and call sock_orphan(sk) while sk is not locked. kcm_tx_work() find sk->sk_socket being NULL. [1] BUG: KASAN: null-ptr-deref in instrument_atomic_write include/linux/instrumented.h:86 [inline] BUG: KASAN: null-ptr-deref in clear_bit include/asm-generic/bitops/instrumented-atomic.h:41 [inline] BUG: KASAN: null-ptr-deref in kcm_tx_work+0xff/0x160 net/kcm/kcmsock.c:742 Write of size 8 at addr 0000000000000008 by task kworker/u4:3/53 CPU: 0 PID: 53 Comm: kworker/u4:3 Not tainted 5.19.0-rc3-next-20220621-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: kkcmd kcm_tx_work Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106 kasan_report+0xbe/0x1f0 mm/kasan/report.c:495 check_region_inline mm/kasan/generic.c:183 [inline] kasan_check_range+0x13d/0x180 mm/kasan/generic.c:189 instrument_atomic_write include/linux/instrumented.h:86 [inline] clear_bit include/asm-generic/bitops/instrumented-atomic.h:41 [inline] kcm_tx_work+0xff/0x160 net/kcm/kcmsock.c:742 process_one_work+0x996/0x1610 kernel/workqueue.c:2289 worker_thread+0x665/0x1080 kernel/workqueue.c:2436 kthread+0x2e9/0x3a0 kernel/kthread.c:376 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:302 </TASK> Fixes: ab7ac4eb ("kcm: Kernel Connection Multiplexor module") Reported-by:
syzbot <syzkaller@googlegroups.com> Signed-off-by:
Eric Dumazet <edumazet@google.com> Cc: Tom Herbert <tom@herbertland.com> Link: https://lore.kernel.org/r/20221012133412.519394-1-edumazet@google.com Signed-off-by:
Jakub Kicinski <kuba@kernel.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Eric Dumazet authored
commit 72e560cb upstream. Apparently, mptcp is able to call tcp_disconnect() on an already disconnected flow. This is generally fine, unless current congestion control is CDG, because it might trigger a double-free [1] Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect() more resilient. [1] BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline] BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567 CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022 Workqueue: events mptcp_worker Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106 print_address_description mm/kasan/report.c:317 [inline] print_report.cold+0x2ba/0x719 mm/kasan/report.c:433 kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462 ____kasan...
-
Eric Dumazet authored
commit b64085b0 upstream. macvlan should enforce a minimal mtu of 68, even at link creation. This patch avoids the current behavior (which could lead to crashes in ipv6 stack if the link is brought up) $ ip link add macvlan1 link eno1 mtu 8 type macvlan # This should fail ! $ ip link sh dev macvlan1 5: macvlan1@eno1: <BROADCAST,MULTICAST> mtu 8 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 02:47:6c:24:74:82 brd ff:ff:ff:ff:ff:ff $ ip link set macvlan1 mtu 67 Error: mtu less than device minimum. $ ip link set macvlan1 mtu 68 $ ip link set macvlan1 mtu 8 Error: mtu less than device minimum. Fixes: 91572088 ("net: use core MTU range checking in core net infra") Reported-by:
syzbot <syzkaller@googlegroups.com> Signed-off-by:
Eric Dumazet <edumazet@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Chen Jun authored
[ Upstream commit 81cd7e84 ] Avoid resetting the module-wide i8042_platform_device pointer in i8042_probe() or i8042_remove(), so that the device can be properly destroyed by i8042_exit() on module unload. Fixes: 9222ba68 ("Input: i8042 - add deferred probe support") Signed-off-by:
Chen Jun <chenjun102@huawei.com> Link: https://lore.kernel.org/r/20221109034148.23821-1-chenjun102@huawei.com Signed-off-by:
Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Liu Shixin authored
[ Upstream commit 5b47348f ] The page table check trigger BUG_ON() unexpectedly when collapse hugepage: ------------[ cut here ]------------ kernel BUG at mm/page_table_check.c:82! Internal error: Oops - BUG: 00000000f2000800 [#1] SMP Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 6 PID: 68 Comm: khugepaged Not tainted 6.1.0-rc3+ #750 Hardware name: linux,dummy-virt (DT) pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : page_table_check_clear.isra.0+0x258/0x3f0 lr : page_table_check_clear.isra.0+0x240/0x3f0 [...] Call trace: page_table_check_clear.isra.0+0x258/0x3f0 __page_table_check_pmd_clear+0xbc/0x108 pmdp_collapse_flush+0xb0/0x160 collapse_huge_page+0xa08/0x1080 hpage_collapse_scan_pmd+0xf30/0x1590 khugepaged_scan_mm_slot.constprop.0+0x52c/0xac8 khugepaged+0x338/0x518 kthread+0x278/0x2f8 ret_from_fork+0x10/0x20 [...] Since pmd_user_accessible_page() doesn't check if a pmd is leaf, it decrease file_map_count for a non-leaf pmd comes from collapse_huge_page(). and so trigger BUG_ON() unexpectedly. Fix this problem by using pmd_leaf() insteal of pmd_present() in pmd_user_accessible_page(). Moreover, use pud_leaf() for pud_user_accessible_page() too. Fixes: 42b25471 ("arm64/mm: enable ARCH_SUPPORTS_PAGE_TABLE_CHECK") Reported-by:
Denys Vlasenko <dvlasenk@redhat.com> Signed-off-by:
Liu Shixin <liushixin2@huawei.com> Reviewed-by:
David Hildenbrand <david@redhat.com> Acked-by:
Pasha Tatashin <pasha.tatashin@soleen.com> Reviewed-by:
Kefeng Wang <wangkefeng.wang@huawei.com> Acked-by:
Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20221117075602.2904324-2-liushixin2@huawei.com Signed-off-by:
Catalin Marinas <catalin.marinas@arm.com> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Zheng Yejian authored
[ Upstream commit 067df9e0 ] Entries in list 'tr->err_log' will be reused after entry number exceed TRACING_LOG_ERRS_MAX. The cmd string of the to be reused entry will be freed first then allocated a new one. If the allocation failed, then the entry will still be in list 'tr->err_log' but its 'cmd' field is set to be NULL, later access of 'cmd' is risky. Currently above problem can cause the loss of 'cmd' information of first entry in 'tr->err_log'. When execute `cat /sys/kernel/tracing/error_log`, reproduce logs like: [ 37.495100] trace_kprobe: error: Maxactive is not for kprobe(null) ^ [ 38.412517] trace_kprobe: error: Maxactive is not for kprobe Command: p4:myprobe2 do_sys_openat2 ^ Link: https://lore.kernel.org/linux-trace-kernel/20221114104632.3547266-1-zhengyejian1@huawei.com Fixes: 1581a884 ("tracing: Remove size restriction on tracing_log_err cmd strings") Signed-off-by:
Zheng Yejian <zhengyejian1@huawei.com> Acked-by:
Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by:
Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Li Huafei authored
[ Upstream commit 5dd7caf0 ] In __unregister_kprobe_top(), if the currently unregistered probe has post_handler but other child probes of the aggrprobe do not have post_handler, the post_handler of the aggrprobe is cleared. If this is a ftrace-based probe, there is a problem. In later calls to disarm_kprobe(), we will use kprobe_ftrace_ops because post_handler is NULL. But we're armed with kprobe_ipmodify_ops. This triggers a WARN in __disarm_kprobe_ftrace() and may even cause use-after-free: Failed to disarm kprobe-ftrace at kernel_clone+0x0/0x3c0 (error -2) WARNING: CPU: 5 PID: 137 at kernel/kprobes.c:1135 __disarm_kprobe_ftrace.isra.21+0xcf/0xe0 Modules linked in: testKprobe_007(-) CPU: 5 PID: 137 Comm: rmmod Not tainted 6.1.0-rc4-dirty #18 [...] Call Trace: <TASK> __disable_kprobe+0xcd/0xe0 __unregister_kprobe_top+0x12/0x150 ? mutex_lock+0xe/0x30 unregister_kprobes.part.23+0x31/0xa0 unregister_kprobe+0x32/0x40 __x64_sys_delete_module+0x15e/0x260 ? do_user_addr_fault+0x2cd/0x6b0 do_syscall_64+0x3a/0x90 entry_SYSCALL_64_after_hwframe+0x63/0xcd [...] For the kprobe-on-ftrace case, we keep the post_handler setting to identify this aggrprobe armed with kprobe_ipmodify_ops. This way we can disarm it correctly. Link: https://lore.kernel.org/all/20221112070000.35299-1-lihuafei1@huawei.com/ Fixes: 0bc11ed5 ("kprobes: Allow kprobes coexist with livepatch") Reported-by:
Zhao Gongyi <zhaogongyi@huawei.com> Suggested-by:
Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by:
Li Huafei <lihuafei1@huawei.com> Acked-by:
Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by:
Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Yuan Can authored
[ Upstream commit e208a1d7 ] If device_register() fails in sdebug_add_host_helper(), it will goto clean and sdbg_host will be freed, but sdbg_host->host_list will not be removed from sdebug_host_list, then list traversal may cause UAF. Fix it. Fixes: 1da177e4 ("Linux-2.6.12-rc2") Signed-off-by:
Yuan Can <yuancan@huawei.com> Link: https://lore.kernel.org/r/20221117084421.58918-1-yuancan@huawei.com Acked-by:
Douglas Gilbert <dgilbert@interlog.com> Signed-off-by:
Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Yang Yingliang authored
[ Upstream commit bc68e428 ] If device_register() fails in tcm_loop_setup_hba_bus(), the name allocated by dev_set_name() need be freed. As comment of device_register() says, it should use put_device() to give up the reference in the error path. So fix this by calling put_device(), then the name can be freed in kobject_cleanup(). The 'tl_hba' will be freed in tcm_loop_release_adapter(), so it don't need goto error label in this case. Fixes: 3703b2c5 ("[SCSI] tcm_loop: Add multi-fabric Linux/SCSI LLD fabric module") Signed-off-by:
Yang Yingliang <yangyingliang@huawei.com> Link: https://lore.kernel.org/r/20221115015042.3652261-1-yangyingliang@huawei.com Reviewed-by:
Mike Christie <michael.chritie@oracle.com> Signed-off-by:
Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Hangbin Liu authored
[ Upstream commit 58e0be1e ] kernel test robot reported warnings when build bonding module with make W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash drivers/net/bonding/: from ../drivers/net/bonding/bond_main.c:35: In function ‘fortify_memcpy_chk’, inlined from ‘iph_to_flow_copy_v4addrs’ at ../include/net/ip.h:566:2, inlined from ‘bond_flow_ip’ at ../drivers/net/bonding/bond_main.c:3984:3: ../include/linux/fortify-string.h:413:25: warning: call to ‘__read_overflow2_field’ declared with attribute warning: detected read beyond size of f ield (2nd parameter); maybe use struct_group()? [-Wattribute-warning] 413 | __read_overflow2_field(q_size_field, size); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In function ‘fortify_memcpy_chk’, inlined from ‘iph_to_flow_copy_v6addrs’ at ../include/net/ipv6.h:900:2, inlined from ‘bond_flow_ip’ at ../drivers/net/bonding/bond_main.c:3994:3: ../include/linux/fortify-string.h:413:25: warning: call to ‘__read_overflow2_field’ declared with attribute warning: detected read beyond size of f ield (2nd parameter); maybe use struct_group()? [-Wattribute-warning] 413 | __read_overflow2_field(q_size_field, size); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This is because we try to copy the whole ip/ip6 address to the flow_key, while we only point the to ip/ip6 saddr. Note that since these are UAPI headers, __struct_group() is used to avoid the compiler warnings. Reported-by:
kernel test robot <lkp@intel.com> Fixes: c3f83241 ("net: Add full IPv6 addresses to flow_keys") Signed-off-by:
Hangbin Liu <liuhangbin@gmail.com> Link: https://lore.kernel.org/r/20221115142400.1204786-1-liuhangbin@gmail.com Signed-off-by:
Paolo Abeni <pabeni@redhat.com> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Alexandru Tachici authored
[ Upstream commit 809ff97a ] An external PHY needs settling time after power up or reset. In the bind() function an mdio bus is registered. If at this point the external PHY is still initialising, no valid PHY ID will be read and on phy_find_first() the bind() function will fail. If an external PHY is present, wait the maximum time specified in 802.3 45.2.7.1.1. Fixes: 05b35e7e ("smsc95xx: add phylib support") Signed-off-by:
Alexandru Tachici <alexandru.tachici@analog.com> Reviewed-by:
Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20221115114434.9991-2-alexandru.tachici@analog.com Signed-off-by:
Paolo Abeni <pabeni@redhat.com> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Aashish Sharma authored
[ Upstream commit bedf0683 ] Move the declaration of 'struct trace_array' out of #ifdef CONFIG_TRACING block, to fix the following warning when CONFIG_TRACING is not set: >> include/linux/trace.h:63:45: warning: 'struct trace_array' declared inside parameter list will not be visible outside of this definition or declaration Link: https://lkml.kernel.org/r/20221107160556.2139463-1-shraash@google.com Fixes: 1a77dd1c ("scsi: tracing: Fix compile error in trace_array calls when TRACING is disabled") Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: Arun Easi <aeasi@marvell.com> Acked-by:
Masami Hiramatsu (Google) <mhiramat@kernel.org> Reviewed-by:
Guenter Roeck <linux@roeck-us.net> Signed-off-by:
Aashish Sharma <shraash@google.com> Signed-off-by:
Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Steven Rostedt (Google) authored
[ Upstream commit 31029a8b ] The function ring_buffer_nr_dirty_pages() was created to find out how many pages are filled in the ring buffer. There's two running counters. One is incremented whenever a new page is touched (pages_touched) and the other is whenever a page is read (pages_read). The dirty count is the number touched minus the number read. This is used to determine if a blocked task should be woken up if the percentage of the ring buffer it is waiting for is hit. The problem is that it does not take into account dropped pages (when the new writes overwrite pages that were not read). And then the dirty pages will always be greater than the percentage. This makes the "buffer_percent" file inaccurate, as the number of dirty pages end up always being larger than the percentage, event when it's not and this causes user space to be woken up more than it wants to be. Add a new counter to keep track of lost pages, and include that in the accounting of dirty pages so that it is actually accurate. Link: https://lkml.kernel.org/r/20221021123013.55fb6055@gandalf.local.home Fixes: 2c2b0a78 ("ring-buffer: Add percentage of ring buffer full to wake up reader") Signed-off-by:
Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Ravi Bangoria authored
[ Upstream commit baa014b9 ] amd_pmu_enable_all() does: if (!test_bit(idx, cpuc->active_mask)) continue; amd_pmu_enable_event(cpuc->events[idx]); A perf NMI of another event can come between these two steps. Perf NMI handler internally disables and enables _all_ events, including the one which nmi-intercepted amd_pmu_enable_all() was in process of enabling. If that unintentionally enabled event has very low sampling period and causes immediate successive NMI, causing the event to be throttled, cpuc->events[idx] and cpuc->active_mask gets cleared by x86_pmu_stop(). This will result in amd_pmu_enable_event() getting called with event=NULL when amd_pmu_enable_all() resumes after handling the NMIs. This causes a kernel crash: BUG: kernel NULL pointer dereference, address: 0000000000000198 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page [...] Call Trace: <TASK> amd_pmu_enable_all+0x68/0xb0 ctx_resched+0xd9/0x150 event_function+0xb8/0x130 ? hrtimer_start_range_ns+0x141/0x4a0 ? perf_duration_warn+0x30/0x30 remote_function+0x4d/0x60 __flush_smp_call_function_queue+0xc4/0x500 flush_smp_call_function_queue+0x11d/0x1b0 do_idle+0x18f/0x2d0 cpu_startup_entry+0x19/0x20 start_secondary+0x121/0x160 secondary_startup_64_no_verify+0xe5/0xeb </TASK> amd_pmu_disable_all()/amd_pmu_enable_all() calls inside perf NMI handler were recently added as part of BRS enablement but I'm not sure whether we really need them. We can just disable BRS in the beginning and enable it back while returning from NMI. This will solve the issue by not enabling those events whose active_masks are set but are not yet enabled in hw pmu. Fixes: ada54345 ("perf/x86/amd: Add AMD Fam19h Branch Sampling support") Reported-by:
Linux Kernel Functional Testing <lkft@linaro.org> Signed-off-by:
Ravi Bangoria <ravi.bangoria@amd.com> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20221114044029.373-1-ravi.bangoria@amd.com Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Jason Gunthorpe authored
[ Upstream commit 9446162e ] This is a container item. A following patch will move the vfio_container functions to their own .c file. Reviewed-by:
Kevin Tian <kevin.tian@intel.com> Signed-off-by:
Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/7-v3-297af71838d2+b9-vfio_container_split_jgg@nvidia.com Signed-off-by:
Alex Williamson <alex.williamson@redhat.com> Stable-dep-of: 7fdba001 ("vfio: Fix container device registration life cycle") Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Jason Gunthorpe authored
[ Upstream commit 1408640d ] To vfio_container_ioctl_check_extension(). A following patch will turn this into a non-static function, make it clear it is related to the container. Reviewed-by:
Kevin Tian <kevin.tian@intel.com> Signed-off-by:
Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/6-v3-297af71838d2+b9-vfio_container_split_jgg@nvidia.com Signed-off-by:
Alex Williamson <alex.williamson@redhat.com> Stable-dep-of: 7fdba001 ("vfio: Fix container device registration life cycle") Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Marco Elver authored
[ Upstream commit bb88f969 ] To catch missing SIGTRAP we employ a WARN in __perf_event_overflow(), which fires if pending_sigtrap was already set: returning to user space without consuming pending_sigtrap, and then having the event fire again would re-enter the kernel and trigger the WARN. This, however, seemed to miss the case where some events not associated with progress in the user space task can fire and the interrupt handler runs before the IRQ work meant to consume pending_sigtrap (and generate the SIGTRAP). syzbot gifted us this stack trace: | WARNING: CPU: 0 PID: 3607 at kernel/events/core.c:9313 __perf_event_overflow | Modules linked in: | CPU: 0 PID: 3607 Comm: syz-executor100 Not tainted 6.1.0-rc2-syzkaller-00073-g88619e77b33d #0 | Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/11/2022 | RIP: 0010:__perf_event_overflow+0x498/0x540 kernel/events/core.c:9313 | <...> | Call Trace: | <TASK> | perf_swevent_hrtimer+0x34f/0x3c0 kernel/events/core.c:10729 | __run_hrtimer kernel/time/hrtimer.c:1685 [inline] | __hrtimer_run_queues+0x1c6/0xfb0 kernel/time/hrtimer.c:1749 | hrtimer_interrupt+0x31c/0x790 kernel/time/hrtimer.c:1811 | local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1096 [inline] | __sysvec_apic_timer_interrupt+0x17c/0x640 arch/x86/kernel/apic/apic.c:1113 | sysvec_apic_timer_interrupt+0x40/0xc0 arch/x86/kernel/apic/apic.c:1107 | asm_sysvec_apic_timer_interrupt+0x16/0x20 arch/x86/include/asm/idtentry.h:649 | <...> | </TASK> In this case, syzbot produced a program with event type PERF_TYPE_SOFTWARE and config PERF_COUNT_SW_CPU_CLOCK. The hrtimer manages to fire again before the IRQ work got a chance to run, all while never having returned to user space. Improve the WARN to check for real progress in user space: approximate this by storing a 32-bit hash of the current IP into pending_sigtrap, and if an event fires while pending_sigtrap still matches the previous IP, we assume no progress (false negatives are possible given we could return to user space and trigger again on the same IP). Fixes: ca6c2132 ("perf: Fix missing SIGTRAPs") Reported-by:
<syzbot+b8ded3e2e2c6adde4990@syzkaller.appspotmail.com> Signed-off-by:
Marco Elver <elver@google.com> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20221031093513.3032814-1-elver@google.com Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Peter Ujfalusi authored
[ Upstream commit 3d59eaef ] Move the return value check before attempting to assign the core ID to the swidget since we are going to fail the sof_widget_ready() and free up swidget anyways. Fixes: 909dadf2 ("ASoC: SOF: topology: Make DAI widget parsing IPC agnostic") Signed-off-by:
Peter Ujfalusi <peter.ujfalusi@linux.intel.com> Reviewed-by:
Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Reviewed-by:
Ranjani Sridharan <ranjani.sridharan@linux.intel.com> Link: https://lore.kernel.org/r/20221107090433.5146-1-peter.ujfalusi@linux.intel.com Signed-off-by:
Mark Brown <broonie@kernel.org> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Keith Busch authored
commit 1e866afd upstream. The subsystem reset writes to a register, so we have to ensure the device state is capable of handling that otherwise the driver may access unmapped registers. Use the state machine to ensure the subsystem reset doesn't try to write registers on a device already undergoing this type of reset. Link: https://bugzilla.kernel.org/show_bug.cgi?id=214771 Signed-off-by:
Keith Busch <kbusch@kernel.org> Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Ovidiu Panait <ovidiu.panait@windriver.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Keith Busch authored
commit 23e085b2 upstream. The passthrough commands already have this restriction, but the other operations do not. Require the same capabilities for all users as all of these operations, which include resets and rescans, can be disruptive. Signed-off-by:
Keith Busch <kbusch@kernel.org> Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Ovidiu Panait <ovidiu.panait@windriver.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Adrian Hunter authored
commit ce0d998b upstream. Deal with errata TGL052, ADL037 and RPL017 "Trace May Contain Incorrect Data When Configured With Single Range Output Larger Than 4KB" by disabling single range output whenever larger than 4KB. Fixes: 67063847 ("perf/x86/intel/pt: Opportunistically use single range output mode") Signed-off-by:
Adrian Hunter <adrian.hunter@intel.com> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20221112151508.13768-1-adrian.hunter@intel.com Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Sandipan Das authored
commit bdfe3459 upstream. When a CPU comes online, the per-CPU NB and LLC uncore contexts are freed but not the events array within the context structure. This causes a memory leak as identified by the kmemleak detector. [...] unreferenced object 0xffff8c5944b8e320 (size 32): comm "swapper/0", pid 1, jiffies 4294670387 (age 151.072s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<000000000759fb79>] amd_uncore_cpu_up_prepare+0xaf/0x230 [<00000000ddc9e126>] cpuhp_invoke_callback+0x2cf/0x470 [<0000000093e727d4>] cpuhp_issue_call+0x14d/0x170 [<0000000045464d54>] __cpuhp_setup_state_cpuslocked+0x11e/0x330 [<0000000069f67cbd>] __cpuhp_setup_state+0x6b/0x110 [<0000000015365e0f>] amd_uncore_init+0x260/0x321 [<00000000089152d2>] do_one_initcall+0x3f/0x1f0 [<000000002d0bd18d>] kernel_init_freeable+0x1ca/0x212 [<0000000030be8dde>] kernel_init+0x11/0x120 [<0000000059709e59>] ret_from_fork+0x22/0x30 unreferenced object 0xffff8c5944b8dd40 (size 64): comm "swapper/0", pid 1, jiffies 4294670387 (age 151.072s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<00000000306efe8b>] amd_uncore_cpu_up_prepare+0x183/0x230 [<00000000ddc9e126>] cpuhp_invoke_callback+0x2cf/0x470 [<0000000093e727d4>] cpuhp_issue_call+0x14d/0x170 [<0000000045464d54>] __cpuhp_setup_state_cpuslocked+0x11e/0x330 [<0000000069f67cbd>] __cpuhp_setup_state+0x6b/0x110 [<0000000015365e0f>] amd_uncore_init+0x260/0x321 [<00000000089152d2>] do_one_initcall+0x3f/0x1f0 [<000000002d0bd18d>] kernel_init_freeable+0x1ca/0x212 [<0000000030be8dde>] kernel_init+0x11/0x120 [<0000000059709e59>] ret_from_fork+0x22/0x30 [...] Fix the problem by freeing the events array before freeing the uncore context. Fixes: 39621c58 ("perf/x86/amd/uncore: Use dynamic events array") Reported-by:
Ravi Bangoria <ravi.bangoria@amd.com> Signed-off-by:
Sandipan Das <sandipan.das@amd.com> Signed-off-by:
Borislav Petkov <bp@suse.de> Tested-by:
Ravi Bangoria <ravi.bangoria@amd.com> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/4fa9e5ac6d6e41fa889101e7af7e6ba372cfea52.1662613255.git.sandipan.das@amd.com Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Mel Gorman authored
commit 36b03879 upstream. Mike Galbraith reported the following against an old fork of preempt-rt but the same issue also applies to the current preempt-rt tree. BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:46 in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1, name: systemd preempt_count: 1, expected: 0 RCU nest depth: 0, expected: 0 Preemption disabled at: fpu_clone CPU: 6 PID: 1 Comm: systemd Tainted: G E (unreleased) Call Trace: <TASK> dump_stack_lvl ? fpu_clone __might_resched rt_spin_lock fpu_clone ? copy_thread ? copy_process ? shmem_alloc_inode ? kmem_cache_alloc ? kernel_clone ? __do_sys_clone ? do_syscall_64 ? __x64_sys_rt_sigprocmask ? syscall_exit_to_user_mode ? do_syscall_64 ? syscall_exit_to_user_mode ? do_syscall_64 ? syscall_exit_to_user_mode ? do_syscall_64 ? exc_page_fault ? entry_SYSCALL_64_after_hwframe </TASK> Mike says: The splat comes from fpu_inherit_perms() being called under fpregs_lock(), and us reaching the spin_lock_irq() therein due to fpu_state_size_dynamic() returning true despite static key __fpu_state_size_dynamic having never been enabled. Mike's assessment looks correct. fpregs_lock on a PREEMPT_RT kernel disables preemption so calling spin_lock_irq() in fpu_inherit_perms() is unsafe. This problem exists since commit 9e798e9a ("x86/fpu: Prepare fpu_clone() for dynamically enabled features"). Even though the original bug report should not have enabled the paths at all, the bug still exists. fpregs_lock is necessary when editing the FPU registers or a task's FP state but it is not necessary for fpu_inherit_perms(). The only write of any FP state in fpu_inherit_perms() is for the new child which is not running yet and cannot context switch or be borrowed by a kernel thread yet. Hence, fpregs_lock is not protecting anything in the new child until clone() completes and can be dropped earlier. The siglock still needs to be acquired by fpu_inherit_perms() as the read of the parent's permissions has to be serialised. [ bp: Cleanup splat. ] Fixes: 9e798e9a ("x86/fpu: Prepare fpu_clone() for dynamically enabled features") Reported-by:
Mike Galbraith <efault@gmx.de> Signed-off-by:
Mel Gorman <mgorman@techsingularity.net> Signed-off-by:
Borislav Petkov <bp@suse.de> Reviewed-by:
Thomas Gleixner <tglx@linutronix.de> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20221110124400.zgymc2lnwqjukgfh@techsingularity.net Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Borys Popławski authored
commit f0861f49 upstream. sgx_validate_offset_length() function verifies "offset" and "length" arguments provided by userspace, but was missing an overflow check on their addition. Add it. Fixes: c6d26d37 ("x86/sgx: Add SGX_IOC_ENCLAVE_ADD_PAGES") Signed-off-by:
Borys Popławski <borysp@invisiblethingslab.com> Signed-off-by:
Borislav Petkov <bp@suse.de> Reviewed-by:
Jarkko Sakkinen <jarkko@kernel.org> Cc: stable@vger.kernel.org # v5.11+ Link: https://lore.kernel.org/r/0d91ac79-6d84-abed-5821-4dbe59fa1a38@invisiblethingslab.com Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Chris Mason authored
commit d7dbd43f upstream. blkcg_css_online is supposed to pin the blkcg of the parent, but 397c9f46 refactored things and along the way, changed it to pin the css instead. This results in extra pins, and we end up leaking blkcgs and cgroups. Fixes: 397c9f46 ("blk-cgroup: move blkcg_{pin,unpin}_online out of line") Signed-off-by:
Chris Mason <clm@fb.com> Spotted-by:
Rik van Riel <riel@surriel.com> Cc: <stable@vger.kernel.org> # v5.19+ Acked-by:
Johannes Weiner <hannes@cmpxchg.org> Link: https://lore.kernel.org/r/20221114181930.2093706-1-clm@fb.com Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-