- Jan 04, 2017
-
-
Sebastian Andrzej Siewior authored
Debian started to build the gcc with -fPIE by default so the kernel build ends before it starts properly with: |kernel/bounds.c:1:0: error: code model kernel does not support PIC mode Signed-off-by:
Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by:
Steven Rostedt <rostedt@goodmis.org>
-
Mike Galbraith authored
Line up helper arrows to the right column. Cc: stable-rt@vger.kernel.org Signed-off-by:
Mike Galbraith <umgwanakikbuti@gmail.com> Signed-off-by:
Steven Rostedt <rostedt@goodmis.org> [bigeasy: fixup function tracer header] Signed-off-by:
Sebastian Andrzej Siewior <bigeasy@linutronix.de>
-
Sebastian Andrzej Siewior authored
It has been pointed out by tglx that on UP the non-RT task could spin its entire time slice because the lock owner is preempted. This won't happen on !RT. So we back to "chill" if we can't cond_resched() did not work. Signed-off-by:
Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by:
Steven Rostedt <rostedt@goodmis.org>
-
Sebastian Andrzej Siewior authored
Upstream commit 47be6184 ("fs/dcache.c: avoid soft-lockup in dput()") changed the condition _when_ cpu_relax() / cond_resched() was invoked. This change was adapted in -RT into mostly the same thing except that if cond_resched() did nothing we had to do cpu_chill() to force the task off CPU for a tiny little bit in case the task had RT priority and did not want to leave the CPU. This change resulted in a performance regression (in my testcase the build time on /dev/shm increased from 19min to 24min). The reason is that with this change cpu_chill() was invoked even dput() made progress (dentry_kill() returned a different dentry) instead only if we were trying this operation on the same dentry over and over again. This patch brings back to the old behavior back to cond_resched() & chill if we make no progress. A little improvement is to invoke cpu_chill() only if we are a RT task (and avoid the sleep otherwise). Otherwise the scheduler should remove us from the CPU if we make no progress. Cc: stable-rt@vger.kernel.org Signed-off-by:
Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by:
Steven Rostedt <rostedt@goodmis.org>
-
Sebastian Andrzej Siewior authored
It looks like the this_cpu_ptr() access in icmp_sk() is protected with local_bh_disable(). To avoid missing serialization in -RT I am adding here a local lock. No crash has been observed, this is just precaution. Cc: stable-rt@vger.kernel.org Signed-off-by:
Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by:
Steven Rostedt <rostedt@goodmis.org>
-
Sebastian Andrzej Siewior authored
Some time ago Sami Pietikäinen reported a crash on -RT in ip_send_unicast_reply() which was later fixed by Nicholas Mc Guire (v3.12.8-rt11). Later (v3.18.8) the code was reworked and I dropped the patch. As it turns out it was mistake. I have reports that the same crash is possible with a similar backtrace. It seems that vanilla protects access to this_cpu_ptr() via local_bh_disable(). This does not work the on -RT since we can have NET_RX and NET_TX running in parallel on the same CPU. This is brings back the old locks. |Unable to handle kernel NULL pointer dereference at virtual address 00000010 |PC is at __ip_make_skb+0x198/0x3e8 |[<c04e39d8>] (__ip_make_skb) from [<c04e3ca8>] (ip_push_pending_frames+0x20/0x40) |[<c04e3ca8>] (ip_push_pending_frames) from [<c04e3ff0>] (ip_send_unicast_reply+0x210/0x22c) |[<c04e3ff0>] (ip_send_unicast_reply) from [<c04fbb54>] (tcp_v4_send_reset+0x190/0x1c0) |[<c04fbb54>] (tcp_v4_send_reset) from [<c04fcc1c>] (tcp_v4_do_rcv+0x22c/0x288) |[<c04fcc1c>] (tcp_v4_do_rcv) from [<c0474364>] (release_sock+0xb4/0x150) |[<c0474364>] (release_sock) from [<c04ed904>] (tcp_close+0x240/0x454) |[<c04ed904>] (tcp_close) from [<c0511408>] (inet_release+0x74/0x7c) |[<c0511408>] (inet_release) from [<c0470728>] (sock_release+0x30/0xb0) |[<c0470728>] (sock_release) from [<c0470abc>] (sock_close+0x1c/0x24) |[<c0470abc>] (sock_close) from [<c0115ec4>] (__fput+0xe8/0x20c) |[<c0115ec4>] (__fput) from [<c0116050>] (____fput+0x18/0x1c) |[<c0116050>] (____fput) from [<c0058138>] (task_work_run+0xa4/0xb8) |[<c0058138>] (task_work_run) from [<c0011478>] (do_work_pending+0xd0/0xe4) |[<c0011478>] (do_work_pending) from [<c000e740>] (work_pending+0xc/0x20) |Code: e3530001 8a000001 e3a00040 ea000011 (e5973010) Cc: stable-rt@vger.kernel.org Signed-off-by:
Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by:
Steven Rostedt <rostedt@goodmis.org>
-
Mike Galbraith authored
During master->rt merge, I stumbled across the buglet below. Fix get_cpu()/put_cpu_light() imbalance. Cc: stable-rt@vger.kernel.org Signed-off-by:
Mike Gabraith <umgwanakikbuti@gmail.com> Signed-off-by:
Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by:
Steven Rostedt <rostedt@goodmis.org>
-
Sebastian Andrzej Siewior authored
Signed-off-by:
Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by:
Steven Rostedt <rostedt@goodmis.org>
-
Sebastian Andrzej Siewior authored
There should be no need to hold the base lock during the wakeup. There should be no boosting involved, the wakeup list has its own lock so it should be safe to do this without the lock. Cc: stable-rt@vger.kernel.org Signed-off-by:
Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by:
Steven Rostedt <rostedt@goodmis.org>
-
Sebastian Andrzej Siewior authored
The base lock is dropped during the invocation if the timer. That means it is possible that we have one waiter while timer1 is running and once this one finished, we get another waiter while timer2 is running. Since we wake up only one waiter it is possible that we miss the other one. This will probably heal itself over time because most of the time we complete timers without an active wake up. To avoid the scenario where we don't wake up all waiters at once, wake_up_all() is used. Cc: stable-rt@vger.kernel.org Signed-off-by:
Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by:
Steven Rostedt <rostedt@goodmis.org>
-
Corey Minyard authored
On some x86 systems an MCE interrupt would come in before the kernel was ready for it. Looking at the latest RT code, it has similar (but not quite the same) code, except it adds a bool that tells if MCE handling is initialized. That was required because they had switched to use swork instead of a kernel thread. Here, just checking to see if the thread is NULL is good enough to see if MCE handling is initialized. Suggested-by:
Borislav Petkov <bp@alien8.de> Signed-off-by:
Corey Minyard <cminyard@mvista.com> Signed-off-by:
Steven Rostedt <rostedt@goodmis.org>
-
Sebastian Andrzej Siewior authored
Trace events like raw_syscalls show always a preempt code of one. The reason is that on PREEMPT kernels rcu_read_lock_sched_notrace() increases the preemption counter and the function recording the counter is caller within the RCU section. Cc: stable-rt@vger.kernel.org Signed-off-by:
Sebastian Andrzej Siewior <bigeasy@linutronix.de> [ Changed this to upstream version. See commit e947841c ] Signed-off-by:
Steven Rostedt <rostedt@goodmis.org>
-
Luiz Capitulino authored
lru_add_drain_all() works by scheduling lru_add_drain_cpu() to run on all CPUs that have non-empty LRU pagevecs and then waiting for the scheduled work to complete. However, workqueue threads may never have the chance to run on a CPU that's running a SCHED_FIFO task. This causes lru_add_drain_all() to block forever. This commit solves this problem by changing lru_add_drain_all() to drain the LRU pagevecs of remote CPUs. This is done by grabbing swapvec_lock and calling lru_add_drain_cpu(). PS: This is based on an idea and initial implementation by Rik van Riel. Signed-off-by:
Rik van Riel <riel@redhat.com> Signed-off-by:
Luiz Capitulino <lcapitulino@redhat.com> Signed-off-by:
Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by:
Steven Rostedt <rostedt@goodmis.org>
-
Sebastian Andrzej Siewior authored
Signed-off-by:
Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by:
Steven Rostedt <rostedt@goodmis.org>
-
Sebastian Andrzej Siewior authored
If we get out of preempt_schedule_irq() then we check for NEED_RESCHED and call the former function again if set because the preemption counter has be zero at this point. However the counter for lazy-preempt might not be zero therefore we have to check the counter before looking at the need_resched_lazy flag. Cc: stable-rt@vger.kernel.org Signed-off-by:
Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by:
Steven Rostedt <rostedt@goodmis.org>
-
Sebastian Andrzej Siewior authored
On -RT we try to acquire sleeping locks which might lead to warnings from lockdep or a warn_on() from spin_try_lock() (which is a rtmutex on RT). We don't print in general from a IRQ off region so we should not try this via console_unblank() / bust_spinlocks() as well. Signed-off-by:
Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by:
Steven Rostedt <rostedt@goodmis.org>
-
Josh Cartwright authored
At first glance, the use of 'static inline' seems appropriate for INIT_HLIST_BL_HEAD(). However, when a 'static inline' function invocation is inlined by gcc, all callers share any static local data declared within that inline function. This presents a problem for how lockdep classes are setup. raw_spinlocks, for example, when CONFIG_DEBUG_SPINLOCK, # define raw_spin_lock_init(lock) \ do { \ static struct lock_class_key __key; \ \ __raw_spin_lock_init((lock), #lock, &__key); \ } while (0) When this macro is expanded into a 'static inline' caller, like INIT_HLIST_BL_HEAD(): static inline INIT_HLIST_BL_HEAD(struct hlist_bl_head *h) { h->first = NULL; raw_spin_lock_init(&h->lock); } ...the static local lock_class_key object is made a function static. For compilation units which initialize invoke INIT_HLIST_BL_HEAD() more than once, then, all of the invocations share this same static local object. This can lead to some very confusing lockdep splats (example below). Solve this problem by forcing the INIT_HLIST_BL_HEAD() to be a macro, which prevents the lockdep class object sharing. ============================================= [ INFO: possible recursive locking detected ] 4.4.4-rt11 #4 Not tainted --------------------------------------------- kswapd0/59 is trying to acquire lock: (&h->lock#2){+.+.-.}, at: mb_cache_shrink_scan but task is already holding lock: (&h->lock#2){+.+.-.}, at: mb_cache_shrink_scan other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&h->lock#2); lock(&h->lock#2); *** DEADLOCK *** May be due to missing lock nesting notation 2 locks held by kswapd0/59: #0: (shrinker_rwsem){+.+...}, at: rt_down_read_trylock #1: (&h->lock#2){+.+.-.}, at: mb_cache_shrink_scan Reported-by:
Luis Claudio R. Goncalves <lclaudio@uudg.org> Tested-by:
Luis Claudio R. Goncalves <lclaudio@uudg.org> Signed-off-by:
Josh Cartwright <joshc@ni.com> Signed-off-by:
Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by:
Steven Rostedt <rostedt@goodmis.org>
-
Sebastian Andrzej Siewior authored
The root-lock is dropped before dev_hard_start_xmit() is invoked and after setting the __QDISC___STATE_RUNNING bit. If this task is now pushed away by a task with a higher priority then the task with the higher priority won't be able to submit packets to the NIC directly instead they will be enqueued into the Qdisc. The NIC will remain idle until the task(s) with higher priority leave the CPU and the task with lower priority gets back and finishes the job. If we take always the busylock we ensure that the RT task can boost the low-prio task and submit the packet. Signed-off-by:
Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by:
Steven Rostedt <rostedt@goodmis.org>
-
Rik van Riel authored
The async pagefault wake code can run from the idle task in exception context, so everything here needs to be made non-preemptible. Conversion to a simple wait queue and raw spinlock does the trick. Signed-off-by:
Rik van Riel <riel@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Signed-off-by:
Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by:
Steven Rostedt <rostedt@goodmis.org>
-
Mike Galbraith authored
Drop 'success' arg from probe_wakeup_latency_hist_start(). Link: http://lkml.kernel.org/r/1457064246.3501.2.camel@gmail.com Fixes: cf1dd658 sched: Introduce the trace_sched_waking tracepoint Signed-off-by:
Mike Galbraith <umgwanakikbuti@gmail.com> Signed-off-by:
Steven Rostedt <rostedt@goodmis.org>
-
Sebastian Andrzej Siewior authored
preempt_disable() invokes preempt_count_add() which saves the caller in current->preempt_disable_ip. It uses CALLER_ADDR1 which does not look for its caller but for the parent of the caller. Which means we get the correct caller for something like spin_lock() unless the architectures inlines those invocations. It is always wrong for preempt_disable() or local_bh_disable(). This patch makes the function get_parent_ip() which tries CALLER_ADDR0,1,2 if the former is a locking function. This seems to record the preempt_disable() caller properly for preempt_disable() itself as well as for get_cpu_var() or local_bh_disable(). Signed-off-by:
Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by:
Steven Rostedt <rostedt@goodmis.org>
-
Clark Williams authored
RT has dropped support of rcu_bh, comment out in rcutorture. Signed-off-by:
Clark Williams <williams@redhat.com> Signed-off-by:
Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Signed-off-by:
Steven Rostedt <rostedt@goodmis.org>
-
Yang Shi authored
When running -rt kernel with both PREEMPT_OFF_HIST and LOCKDEP enabled, the below error is reported: [ INFO: suspicious RCU usage. ] 4.4.1-rt6 #1 Not tainted include/trace/events/hist.h:31 suspicious rcu_dereference_check() usage! other info that might help us debug this: RCU used illegally from idle CPU! rcu_scheduler_active = 1, debug_locks = 0 RCU used illegally from extended quiescent state! no locks held by swapper/0/0. stack backtrace: CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.4.1-rt6-WR8.0.0.0_standard #1 Stack : 0000000000000006 0000000000000000 ffffffff81ca8c38 ffffffff81c8fc80 ffffffff811bdd68 ffffffff81cb0000 0000000000000000 ffffffff81cb0000 0000000000000000 0000000000000000 0000000000000004 0000000000000000 0000000000000004 ffffffff811bdf50 0000000000000000 ffffffff82b60000 0000000000000000 ffffffff812897ac ffffffff819f0000 000000000000000b ffffffff811be460 ffffffff81b7c588 ffffffff81c8fc80 0000000000000000 0000000000000000 ffffffff81ec7f88 ffffffff81d70000 ffffffff81b70000 ffffffff81c90000 ffffffff81c3fb00 ffffffff81c3fc28 ffffffff815e6f98 0000000000000000 ffffffff81c8fa87 ffffffff81b70958 ffffffff811bf2c4 0707fe32e8d60ca5 ffffffff81126d60 0000000000000000 0000000000000000 ... Call Trace: [<ffffffff81126d60>] show_stack+0xe8/0x108 [<ffffffff815e6f98>] dump_stack+0x88/0xb0 [<ffffffff8124b88c>] time_hardirqs_off+0x204/0x300 [<ffffffff811aa5dc>] trace_hardirqs_off_caller+0x24/0xe8 [<ffffffff811a4ec4>] cpu_startup_entry+0x39c/0x508 [<ffffffff81d7dc68>] start_kernel+0x584/0x5a0 Replace regular trace_preemptoff_hist to rcuidle version to avoid the error. Signed-off-by:
Yang Shi <yang.shi@windriver.com> Cc: bigeasy@linutronix.de Cc: rostedt@goodmis.org Cc: linux-rt-users@vger.kernel.org Link: http://lkml.kernel.org/r/1456262603-10075-1-git-send-email-yang.shi@windriver.com Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Signed-off-by:
Steven Rostedt <rostedt@goodmis.org>
-
Mike Galbraith authored
homer: # nm kernel/sched/core.o|grep preemptible_lazy 00000000000000b5 t preemptible_lazy echo wakeup_rt > current_tracer ==> Welcome to infinity. Signed-off-by:
Mike Galbraith <umgwanakikbuti@gmail.com> Acked-by:
Hillf Danton <hillf.zj@alibaba-inc.com> Signed-off-by:
Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: linux-rt-users <linux-rt-users@vger.kernel.org> Link: http://lkml.kernel.org/r/1456067490.3771.2.camel@gmail.com Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Signed-off-by:
Steven Rostedt <rostedt@goodmis.org>
-
Sebastian Andrzej Siewior authored
With completion using swait and so rawlocks we don't need this anymore. Further, bisect thinks this patch is responsible for: |BUG: unable to handle kernel NULL pointer dereference at (null) |IP: [<ffffffff81082123>] sched_cpu_active+0x53/0x70 |PGD 0 |Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC |Dumping ftrace buffer: | (ftrace buffer empty) |Modules linked in: |CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.4.1+ #330 |Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Debian-1.8.2-1 04/01/2014 |task: ffff88013ae64b00 ti: ffff88013ae74000 task.ti: ffff88013ae74000 |RIP: 0010:[<ffffffff81082123>] [<ffffffff81082123>] sched_cpu_active+0x53/0x70 |RSP: 0000:ffff88013ae77eb8 EFLAGS: 00010082 |RAX: 0000000000000001 RBX: ffffffff81c2cf20 RCX: 0000001050fb52fb |RDX: 0000001050fb52fb RSI: 000000105117ca1e RDI: 00000000001c7723 |RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000001 |R10: 0000000000000000 R11: 0000000000000001 R12: 00000000ffffffff |R13: ffffffff81c2cee0 R14: 0000000000000000 R15: 0000000000000001 |FS: 0000000000000000(0000) GS:ffff88013b200000(0000) knlGS:0000000000000000 |CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b |CR2: 0000000000000000 CR3: 0000000001c09000 CR4: 00000000000006e0 |Stack: | ffffffff810c446d ffff88013ae77f00 ffffffff8107d8dd 000000000000000a | 0000000000000001 0000000000000000 0000000000000000 0000000000000000 | 0000000000000000 ffff88013ae77f10 ffffffff8107d90e ffff88013ae77f20 |Call Trace: | [<ffffffff810c446d>] ? debug_lockdep_rcu_enabled+0x1d/0x20 | [<ffffffff8107d8dd>] ? notifier_call_chain+0x5d/0x80 | [<ffffffff8107d90e>] ? __raw_notifier_call_chain+0xe/0x10 | [<ffffffff810598a3>] ? cpu_notify+0x23/0x40 | [<ffffffff8105a7b8>] ? notify_cpu_starting+0x28/0x30 during hotplug. The rawlocks need to remain however. Signed-off-by:
Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by:
Steven Rostedt <rostedt@goodmis.org>
-
Sebastian Andrzej Siewior authored
We unlock the lock while the interrupts are off. This isn't a problem now but will get because the migrate_disable() + enable are not symmetrical in regard to the status of interrupts. Signed-off-by:
Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by:
Steven Rostedt <rostedt@goodmis.org>
-
Sebastian Andrzej Siewior authored
With interrupts off it makes no sense to do the long path since we can't leave the CPU anyway. Also we might end up in a recursion with lockdep. Signed-off-by:
Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by:
Steven Rostedt <rostedt@goodmis.org>
-
Sebastian Andrzej Siewior authored
Atleast on X86 we die a recursive death |CPU: 3 PID: 585 Comm: bash Not tainted 4.4.1-rt4+ #198 |Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Debian-1.8.2-1 04/01/2014 |task: ffff88007ab4cd00 ti: ffff88007ab94000 task.ti: ffff88007ab94000 |RIP: 0010:[<ffffffff81684870>] [<ffffffff81684870>] int3+0x0/0x10 |RSP: 0018:ffff88013c107fd8 EFLAGS: 00010082 |RAX: ffff88007ab4cd00 RBX: ffffffff8100ceab RCX: 0000000080202001 |RDX: 0000000000000000 RSI: ffffffff8100ceab RDI: ffffffff810c78b2 |RBP: ffff88007ab97c10 R08: ffffffffff57b000 R09: 0000000000000000 |R10: ffff88013bb64790 R11: ffff88007ab4cd68 R12: ffffffff8100ceab |R13: ffffffff810c78b2 R14: ffffffff810f8158 R15: ffffffff810f9120 |FS: 0000000000000000(0000) GS:ffff88013c100000(0063) knlGS:00000000f74e3940 |CS: 0010 DS: 002b ES: 002b CR0: 000000008005003b |CR2: 0000000008cf6008 CR3: 000000013b169000 CR4: 00000000000006e0 |Call Trace: | <#DB> | [<ffffffff810f8158>] ? trace_preempt_off+0x18/0x170 | <<EOE>> | [<ffffffff81077745>] preempt_count_add+0xa5/0xc0 | [<ffffffff810c78b2>] on_each_cpu+0x22/0x90 | [<ffffffff8100ceab>] text_poke_bp+0x5b/0xc0 | [<ffffffff8100a29c>] arch_jump_label_transform+0x8c/0xf0 | [<ffffffff8111c77c>] __jump_label_update+0x6c/0x80 | [<ffffffff8111c83a>] jump_label_update+0xaa/0xc0 | [<ffffffff8111ca54>] static_key_slow_inc+0x94/0xa0 | [<ffffffff810e0d8d>] tracepoint_probe_register_prio+0x26d/0x2c0 | [<ffffffff810e0df3>] tracepoint_probe_register+0x13/0x20 | [<ffffffff810fca78>] trace_event_reg+0x98/0xd0 | [<ffffffff810fcc8b>] __ftrace_event_enable_disable+0x6b/0x180 | [<ffffffff810fd5b8>] event_enable_write+0x78/0xc0 | [<ffffffff8117a768>] __vfs_write+0x28/0xe0 | [<ffffffff8117b025>] vfs_write+0xa5/0x180 | [<ffffffff8117bb76>] SyS_write+0x46/0xa0 | [<ffffffff81002c91>] do_fast_syscall_32+0xa1/0x1d0 | [<ffffffff81684d57>] sysenter_flags_fixed+0xd/0x17 during echo 1 > /sys/kernel/debug/tracing/events/hist/preemptirqsoff_hist/enable Reported-By:
Christoph Mathys <eraserix@gmail.com> Cc: stable-rt@vger.kernel.org Signed-off-by:
Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by:
Steven Rostedt <rostedt@goodmis.org>
-
Sebastian Andrzej Siewior authored
If the NET_RX uses up all of his budget it moves the following NAPI invocations into the `ksoftirqd`. On -RT it does not do so. Instead it rises the NET_RX softirq in its current context again. In order to get closer to mainline's behaviour this patch provides __raise_softirq_irqoff_ksoft() which raises the softirq in the ksoftird. Cc: stable-rt@vger.kernel.org Signed-off-by:
Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by:
Steven Rostedt <rostedt@goodmis.org>
-
Sebastian Andrzej Siewior authored
The softirqd runs in -RT with SCHED_FIFO (prio 1) and deals mostly with timer wakeup which can not happen in hardirq context. The prio has been risen from the normal SCHED_OTHER so the timer wakeup does not happen too late. With enough networking load it is possible that the system never goes idle and schedules ksoftirqd and everything else with a higher priority. One of the tasks left behind is one of RCU's threads and so we see stalls and eventually run out of memory. This patch moves the TIMER and HRTIMER softirqs out of the `ksoftirqd` thread into its own `ktimersoftd`. The former can now run SCHED_OTHER (same as mainline) and the latter at SCHED_FIFO due to the wakeups. From networking point of view: The NAPI callback runs after the network interrupt thread completes. If its run time takes too long the NAPI code itself schedules the `ksoftirqd`. Here in the thread it can run at SCHED_OTHER priority and it won't defer RCU anymore. Cc: stable-rt@vger.kernel.org Signed-off-by:
Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by:
Steven Rostedt <rostedt@goodmis.org>
-
Sebastian Andrzej Siewior authored
Probably in the rebase onto v4.1 this check got moved into less commonly used preempt_schedule_notrace(). This patch ensures that both functions use it. Reported-by:
Mike Galbraith <umgwanakikbuti@gmail.com> Signed-off-by:
Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by:
Steven Rostedt <rostedt@goodmis.org>
-
Sebastian Andrzej Siewior authored
In the non-RT case the spin_lock_irq() here disables interrupts as well as raw_spin_lock_irq(). So in the unlock case the interrupts are enabled too early. Reported-by:
kernel test robot <ying.huang@linux.intel.com> Cc: stable-rt@vger.kernel.org Signed-off-by:
Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by:
Steven Rostedt <rostedt@goodmis.org>
-
Peter Zijlstra authored
Upstream commit fbd705a0 Mathieu reported that since 317f3941 ("sched: Move the second half of ttwu() to the remote cpu") trace_sched_wakeup() can happen out of context of the waker. This is a problem when you want to analyse wakeup paths because it is now very hard to correlate the wakeup event to whoever issued the wakeup. OTOH trace_sched_wakeup() is issued at the point where we set p->state = TASK_RUNNING, which is right were we hand the task off to the scheduler, so this is an important point when looking at scheduling behaviour, up to here its been the wakeup path everything hereafter is due to scheduler policy. To bridge this gap, introduce a second tracepoint: trace_sched_waking. It is guaranteed to be called in the waker context. [ Ported to linux-4.1.y-rt kernel by Mathieu Desnoyers. Resolved conflict: try_to_wake_up_local() does not exist in -rt kernel. Removed its instrumentation hunk. ] Reported-by:
Mathieu Desnoyers <mathieu.desnoyers@efficios.com> CC: Julien Desfossez <jdesfossez@efficios.com> CC: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Francis Giraldeau <francis.giraldeau@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> CC: Ingo Molnar <mingo@kernel.org> Link: http://lkml.kernel.org/r/20150609091336.GQ3644@twins.programming.kicks-ass.net Signed-off-by:
Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by:
Steven Rostedt <rostedt@goodmis.org>
-
Thomas Gleixner authored
On architectures where arch_irq_work_has_interrupt() returns false, we end up running the irq safe work from the softirq context. That results in a potential deadlock in the scheduler irq work which expects that function to be called with interrupts disabled. Split the irq_work_tick() function into a hard and soft variant. Call the hard variant from the tick interrupt and add the soft variant to the timer softirq. Reported-and-tested-by:
Yanjiang Jin <yanjiang.jin@windriver.com> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Cc: stable-rt@vger.kernel.org Signed-off-by:
Steven Rostedt <rostedt@goodmis.org>
-
Grygorii Strashko authored
I can constantly see below error report with 4.1 RT-kernel on TI ARM dra7-evm if I'm trying to unplug cpu1: [ 57.737589] CPU1: shutdown [ 57.767537] BUG: spinlock bad magic on CPU#0, sh/137 [ 57.767546] lock: 0xee994730, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0 [ 57.767552] CPU: 0 PID: 137 Comm: sh Not tainted 4.1.10-rt8-01700-g2c38702-dirty #55 [ 57.767555] Hardware name: Generic DRA74X (Flattened Device Tree) [ 57.767568] [<c001acd0>] (unwind_backtrace) from [<c001534c>] (show_stack+0x20/0x24) [ 57.767579] [<c001534c>] (show_stack) from [<c075560c>] (dump_stack+0x84/0xa0) [ 57.767593] [<c075560c>] (dump_stack) from [<c00aca48>] (spin_dump+0x84/0xac) [ 57.767603] [<c00aca48>] (spin_dump) from [<c00acaa4>] (spin_bug+0x34/0x38) [ 57.767614] [<c00acaa4>] (spin_bug) from [<c00acc10>] (do_raw_spin_lock+0x168/0x1c0) [ 57.767624] [<c00acc10>] (do_raw_spin_lock) from [<c075b4cc>] (_raw_spin_lock+0x4c/0x54) [ 57.767631] [<c075b4cc>] (_raw_spin_lock) from [<c07599fc>] (rt_spin_lock_slowlock+0x5c/0x374) [ 57.767638] [<c07599fc>] (rt_spin_lock_slowlock) from [<c075bcf4>] (rt_spin_lock+0x38/0x70) [ 57.767649] [<c075bcf4>] (rt_spin_lock) from [<c06333c0>] (skb_dequeue+0x28/0x7c) [ 57.767662] [<c06333c0>] (skb_dequeue) from [<c06476ec>] (dev_cpu_callback+0x1b8/0x240) [ 57.767673] [<c06476ec>] (dev_cpu_callback) from [<c007566c>] (notifier_call_chain+0x3c/0xb4) The reason is that skb_dequeue is taking skb->lock, but RT changed the core code to use a raw spinlock. The non-raw lock is not initialized on purpose to catch exactly this kind of problem. Fixes: 91df05da 'net: Use skbufhead with raw lock' Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Cc: stable-rt@vger.kernel.org Signed-off-by:
Steven Rostedt <rostedt@goodmis.org>
-
Josh Cartwright authored
While the use of synchronize_rcu_expedited() might make synchronize_net() "faster", it does so at significant cost on RT systems, as expediting a grace period forcibly preempts any high-priority RT tasks (via the stop_machine() mechanism). Without this change, we can observe a latency spike up to 30us with cyclictest by rapidly unplugging/reestablishing an ethernet link. Suggested-by:
Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by:
Josh Cartwright <joshc@ni.com> Cc: bigeasy@linutronix.de Acked-by:
Eric Dumazet <eric.dumazet@gmail.com> Acked-by:
David S. Miller <davem@davemloft.net> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/20151027123153.GG8245@jcartwri.amer.corp.natinst.com Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Signed-off-by:
Steven Rostedt <rostedt@goodmis.org>
-
Sebastian Andrzej Siewior authored
I see here large latencies during a stack dump on x86. The preempt_disable() and get_cpu() should forbid moving the task to another CPU during a stack dump and avoiding two stack traces in parallel on the same CPU. However a stack trace from a second CPU may still happen in parallel. Also nesting is allowed so a stack trace happens in process-context and we may have another one from IRQ context. With migrate disable we keep this code preemptible and allow a second backtrace on the same CPU by another task. Signed-off-by:
Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by:
Steven Rostedt <rostedt@goodmis.org>
-
bmouring@ni.com authored
In 8930ed80 (rtmutex: Cleanup deadlock detector debug logic), chainwalking control enums were introduced to limit the deadlock detection logic. One of the calls to task_blocks_on_rt_mutex was missed when converting to use the enums. Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by:
Brad Mouring <brad.mouring@ni.com> Signed-off-by:
Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by:
Steven Rostedt <rostedt@goodmis.org>
-
Thomas Gleixner authored
Yimin debugged that in case of a PI wakeup in progress when rt_mutex_start_proxy_lock() calls task_blocks_on_rt_mutex() the latter returns -EAGAIN and in consequence the remove_waiter() call runs into a BUG_ON() because there is nothing to remove. Guard it with rt_mutex_has_waiters(). This is a quick fix which is easy to backport. The proper fix is to have a central check in remove_waiter() so we can call it unconditionally. Reported-and-debugged-by:
Yimin Deng <yimin11.deng@gmail.com> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Cc: stable-rt@vger.kernel.org Signed-off-by:
Steven Rostedt <rostedt@goodmis.org>