Commits · v3.12.69-rt92-rebase · BeagleBoard.org / Ti Linux Kernel

Jan 04, 2017

Linux 3.12.69-rt92 REBASE · a507cb4a
Steven Rostedt (Red Hat) authored 8 years ago

v3.12.69-rt92-rebase

a507cb4a

Sebastian Andrzej Siewior authored 8 years ago


Debian started to build the gcc with -fPIE by default so the kernel
build ends before it starts properly with:
|kernel/bounds.c:1:0: error: code model kernel does not support PIC mode

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

6db09b64

ftrace: Fix trace header alignment · dd8726e8

Mike Galbraith authored 8 years ago


Line up helper arrows to the right column.

Cc: stable-rt@vger.kernel.org
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
[bigeasy: fixup function tracer header]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

dd8726e8

fs/dcache: incremental fixup of the retry routine · 5c09cbb5

Sebastian Andrzej Siewior authored 8 years ago


It has been pointed out by tglx that on UP the non-RT task could spin
its entire time slice because the lock owner is preempted. This won't
happen on !RT. So we back to "chill" if we can't cond_resched() did not
work.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

5c09cbb5

fs/dcache: resched/chill only if we make no progress · 1050de70

Sebastian Andrzej Siewior authored 8 years ago


Upstream commit 47be6184 ("fs/dcache.c: avoid soft-lockup in
dput()") changed the condition _when_ cpu_relax() / cond_resched() was
invoked. This change was adapted in -RT into mostly the same thing
except that if cond_resched() did nothing we had to do cpu_chill() to
force the task off CPU for a tiny little bit in case the task had RT
priority and did not want to leave the CPU.
This change resulted in a performance regression (in my testcase the
build time on /dev/shm increased from 19min to 24min). The reason is
that with this change cpu_chill() was invoked even dput() made progress
(dentry_kill() returned a different dentry) instead only if we were
trying this operation on the same dentry over and over again.

This patch brings back to the old behavior back to cond_resched() &
chill if we make no progress. A little improvement is to invoke
cpu_chill() only if we are a RT task (and avoid the sleep otherwise).
Otherwise the scheduler should remove us from the CPU if we make no
progress.

Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

1050de70

net: add a lock around icmp_sk() · 2c0a377e

Sebastian Andrzej Siewior authored 8 years ago


It looks like the this_cpu_ptr() access in icmp_sk() is protected with
local_bh_disable(). To avoid missing serialization in -RT I am adding
here a local lock. No crash has been observed, this is just precaution.

Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

2c0a377e

net: add back the missing serialization in ip_send_unicast_reply() · 2d0b4190

Sebastian Andrzej Siewior authored 8 years ago


Some time ago Sami Pietikäinen reported a crash on -RT in
ip_send_unicast_reply() which was later fixed by Nicholas Mc Guire
(v3.12.8-rt11). Later (v3.18.8) the code was reworked and I dropped the
patch. As it turns out it was mistake.
I have reports that the same crash is possible with a similar backtrace.
It seems that vanilla protects access to this_cpu_ptr() via
local_bh_disable(). This does not work the on -RT since we can have
NET_RX and NET_TX running in parallel on the same CPU.
This is brings back the old locks.

|Unable to handle kernel NULL pointer dereference at virtual address 00000010
|PC is at __ip_make_skb+0x198/0x3e8
|[<c04e39d8>] (__ip_make_skb) from [<c04e3ca8>] (ip_push_pending_frames+0x20/0x40)
|[<c04e3ca8>] (ip_push_pending_frames) from [<c04e3ff0>] (ip_send_unicast_reply+0x210/0x22c)
|[<c04e3ff0>] (ip_send_unicast_reply) from [<c04fbb54>] (tcp_v4_send_reset+0x190/0x1c0)
|[<c04fbb54>] (tcp_v4_send_reset) from [<c04fcc1c>] (tcp_v4_do_rcv+0x22c/0x288)
|[<c04fcc1c>] (tcp_v4_do_rcv) from [<c0474364>] (release_sock+0xb4/0x150)
|[<c0474364>] (release_sock) from [<c04ed904>] (tcp_close+0x240/0x454)
|[<c04ed904>] (tcp_close) from [<c0511408>] (inet_release+0x74/0x7c)
|[<c0511408>] (inet_release) from [<c0470728>] (sock_release+0x30/0xb0)
|[<c0470728>] (sock_release) from [<c0470abc>] (sock_close+0x1c/0x24)
|[<c0470abc>] (sock_close) from [<c0115ec4>] (__fput+0xe8/0x20c)
|[<c0115ec4>] (__fput) from [<c0116050>] (____fput+0x18/0x1c)
|[<c0116050>] (____fput) from [<c0058138>] (task_work_run+0xa4/0xb8)
|[<c0058138>] (task_work_run) from [<c0011478>] (do_work_pending+0xd0/0xe4)
|[<c0011478>] (do_work_pending) from [<c000e740>] (work_pending+0xc/0x20)
|Code: e3530001 8a000001 e3a00040 ea000011 (e5973010)

Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

2d0b4190

scsi/fcoe: Fix get_cpu()/put_cpu_light() imbalance in fcoe_recv_frame() · 23a1bc76

Mike Galbraith authored 8 years ago


During master->rt merge, I stumbled across the buglet below.

Fix get_cpu()/put_cpu_light() imbalance.

Cc: stable-rt@vger.kernel.org
Signed-off-by: Mike Gabraith <umgwanakikbuti@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

23a1bc76

sched: lazy_preempt: avoid a warning in the !RT case · 0f66bc14

Sebastian Andrzej Siewior authored 8 years ago


Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

0f66bc14

timers: wakeup all timer waiters without holding the base lock · 45e2ee55

Sebastian Andrzej Siewior authored 8 years ago


There should be no need to hold the base lock during the wakeup. There
should be no boosting involved, the wakeup list has its own lock so it
should be safe to do this without the lock.

Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

45e2ee55

timers: wakeup all timer waiters · fc55238a

Sebastian Andrzej Siewior authored 8 years ago


The base lock is dropped during the invocation if the timer. That means
it is possible that we have one waiter while timer1 is running and once
this one finished, we get another waiter while timer2 is running. Since
we wake up only one waiter it is possible that we miss the other one.
This will probably heal itself over time because most of the time we
complete timers without an active wake up.
To avoid the scenario where we don't wake up all waiters at once,
wake_up_all() is used.

Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

fc55238a

x86: Fix an RT MCE crash · 1124bfbd

Corey Minyard authored 8 years ago


On some x86 systems an MCE interrupt would come in before the kernel
was ready for it.  Looking at the latest RT code, it has similar
(but not quite the same) code, except it adds a bool that tells if
MCE handling is initialized.  That was required because they had
switched to use swork instead of a kernel thread.  Here, just
checking to see if the thread is NULL is good enough to see if
MCE handling is initialized.

Suggested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

1124bfbd

trace: correct off by one while recording the trace-event · 1708954b

Sebastian Andrzej Siewior authored 9 years ago


Trace events like raw_syscalls show always a preempt code of one. The
reason is that on PREEMPT kernels rcu_read_lock_sched_notrace()
increases the preemption counter and the function recording the counter
is caller within the RCU section.

Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
[ Changed this to upstream version. See commit e947841c ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

1708954b

mm: perform lru_add_drain_all() remotely · aaa3d8a6

Luiz Capitulino authored 8 years ago


lru_add_drain_all() works by scheduling lru_add_drain_cpu() to run
on all CPUs that have non-empty LRU pagevecs and then waiting for
the scheduled work to complete. However, workqueue threads may never
have the chance to run on a CPU that's running a SCHED_FIFO task.
This causes lru_add_drain_all() to block forever.

This commit solves this problem by changing lru_add_drain_all()
to drain the LRU pagevecs of remote CPUs. This is done by grabbing
swapvec_lock and calling lru_add_drain_cpu().

PS: This is based on an idea and initial implementation by
    Rik van Riel.

Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

aaa3d8a6

locallock: add local_lock_on() · df1c45e2

Sebastian Andrzej Siewior authored 8 years ago


Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

df1c45e2

arm: lazy preempt: correct resched condition · ad3f4256

Sebastian Andrzej Siewior authored 9 years ago


If we get out of preempt_schedule_irq() then we check for NEED_RESCHED
and call the former function again if set because the preemption counter
has be zero at this point.
However the counter for lazy-preempt might not be zero therefore we have
to check the counter before looking at the need_resched_lazy flag.

Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

ad3f4256

kernel/printk: Don't try to print from IRQ/NMI region · 14b7893b

Sebastian Andrzej Siewior authored 9 years ago


On -RT we try to acquire sleeping locks which might lead to warnings
from lockdep or a warn_on() from spin_try_lock() (which is a rtmutex on
RT).
We don't print in general from a IRQ off region so we should not try
this via console_unblank() / bust_spinlocks() as well.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

14b7893b

list_bl: fixup bogus lockdep warning · 90ffaa6e

Josh Cartwright authored 9 years ago


At first glance, the use of 'static inline' seems appropriate for
INIT_HLIST_BL_HEAD().

However, when a 'static inline' function invocation is inlined by gcc,
all callers share any static local data declared within that inline
function.

This presents a problem for how lockdep classes are setup.  raw_spinlocks, for
example, when CONFIG_DEBUG_SPINLOCK,

	# define raw_spin_lock_init(lock)				\
	do {								\
		static struct lock_class_key __key;			\
									\
		__raw_spin_lock_init((lock), #lock, &__key);		\
	} while (0)

When this macro is expanded into a 'static inline' caller, like
INIT_HLIST_BL_HEAD():

	static inline INIT_HLIST_BL_HEAD(struct hlist_bl_head *h)
	{
		h->first = NULL;
		raw_spin_lock_init(&h->lock);
	}

...the static local lock_class_key object is made a function static.

For compilation units which initialize invoke INIT_HLIST_BL_HEAD() more
than once, then, all of the invocations share this same static local
object.

This can lead to some very confusing lockdep splats (example below).
Solve this problem by forcing the INIT_HLIST_BL_HEAD() to be a macro,
which prevents the lockdep class object sharing.

 =============================================
 [ INFO: possible recursive locking detected ]
 4.4.4-rt11 #4 Not tainted
 ---------------------------------------------
 kswapd0/59 is trying to acquire lock:
  (&h->lock#2){+.+.-.}, at: mb_cache_shrink_scan

 but task is already holding lock:
  (&h->lock#2){+.+.-.}, at:  mb_cache_shrink_scan

 other info that might help us debug this:
  Possible unsafe locking scenario:

        CPU0
        ----
   lock(&h->lock#2);
   lock(&h->lock#2);

  *** DEADLOCK ***

  May be due to missing lock nesting notation

 2 locks held by kswapd0/59:
  #0:  (shrinker_rwsem){+.+...}, at: rt_down_read_trylock
  #1:  (&h->lock#2){+.+.-.}, at: mb_cache_shrink_scan

Reported-by: Luis Claudio R. Goncalves <lclaudio@uudg.org>
Tested-by: Luis Claudio R. Goncalves <lclaudio@uudg.org>
Signed-off-by: Josh Cartwright <joshc@ni.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

90ffaa6e

net: dev: always take qdisc's busylock in __dev_xmit_skb() · e257215f

Sebastian Andrzej Siewior authored 9 years ago

The root-lock is dropped before dev_hard_start_xmit() is invoked and after
setting the __QDISC___STATE_RUNNING bit. If this task is now pushed away
by a task with a higher priority then the task with the higher priority
won't be able to submit packets to the NIC directly instead they will be
enqueued into the Qdisc. The NIC will remain idle until the task(s) with
higher priority leave the CPU and the task with lower priority gets back
and finishes the job.

If we take always the busylock we ensure that the RT task can boost the
low-prio task and submit the packet.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

e257215f

kvm, rt: change async pagefault code locking for PREEMPT_RT · db72e57b

Rik van Riel authored 9 years ago


The async pagefault wake code can run from the idle task in exception
context, so everything here needs to be made non-preemptible.

Conversion to a simple wait queue and raw spinlock does the trick.

Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

db72e57b

tracing: Fix probe_wakeup_latency_hist_start() prototype · 42bf963d

Mike Galbraith authored 9 years ago

Drop 'success' arg from probe_wakeup_latency_hist_start().

Link: http://lkml.kernel.org/r/1457064246.3501.2.camel@gmail.com


Fixes: cf1dd658 sched: Introduce the trace_sched_waking tracepoint
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

42bf963d

kernel: sched: Fix preempt_disable_ip recodring for preempt_disable() · c814473d

Sebastian Andrzej Siewior authored 9 years ago

preempt_disable() invokes preempt_count_add() which saves the caller in
current->preempt_disable_ip. It uses CALLER_ADDR1 which does not look for its
caller but for the parent of the caller. Which means we get the correct caller
for something like spin_lock() unless the architectures inlines those
invocations. It is always wrong for preempt_disable() or local_bh_disable().

This patch makes the function get_parent_ip() which tries CALLER_ADDR0,1,2 if
the former is a locking function. This seems to record the preempt_disable()
caller properly for preempt_disable() itself as well as for get_cpu_var() or
local_bh_disable().

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

c814473d

rcu/torture: Comment out rcu_bh ops on PREEMPT_RT_FULL · 02fa155d

Clark Williams authored 9 years ago


RT has dropped support of rcu_bh, comment out in rcutorture.

Signed-off-by: Clark Williams <williams@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

02fa155d

trace: Use rcuidle version for preemptoff_hist trace point · 2b14b8e4

Yang Shi authored 9 years ago


When running -rt kernel with both PREEMPT_OFF_HIST and LOCKDEP enabled,
the below error is reported:

 [ INFO: suspicious RCU usage. ]
 4.4.1-rt6 #1 Not tainted
 include/trace/events/hist.h:31 suspicious rcu_dereference_check() usage!

 other info that might help us debug this:

 RCU used illegally from idle CPU!
 rcu_scheduler_active = 1, debug_locks = 0
 RCU used illegally from extended quiescent state!
 no locks held by swapper/0/0.

 stack backtrace:
 CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.4.1-rt6-WR8.0.0.0_standard #1
 Stack : 0000000000000006 0000000000000000 ffffffff81ca8c38 ffffffff81c8fc80
    ffffffff811bdd68 ffffffff81cb0000 0000000000000000 ffffffff81cb0000
    0000000000000000 0000000000000000 0000000000000004 0000000000000000
    0000000000000004 ffffffff811bdf50 0000000000000000 ffffffff82b60000
    0000000000000000 ffffffff812897ac ffffffff819f0000 000000000000000b
    ffffffff811be460 ffffffff81b7c588 ffffffff81c8fc80 0000000000000000
    0000000000000000 ffffffff81ec7f88 ffffffff81d70000 ffffffff81b70000
    ffffffff81c90000 ffffffff81c3fb00 ffffffff81c3fc28 ffffffff815e6f98
    0000000000000000 ffffffff81c8fa87 ffffffff81b70958 ffffffff811bf2c4
    0707fe32e8d60ca5 ffffffff81126d60 0000000000000000 0000000000000000
    ...
 Call Trace:
 [<ffffffff81126d60>] show_stack+0xe8/0x108
 [<ffffffff815e6f98>] dump_stack+0x88/0xb0
 [<ffffffff8124b88c>] time_hardirqs_off+0x204/0x300
 [<ffffffff811aa5dc>] trace_hardirqs_off_caller+0x24/0xe8
 [<ffffffff811a4ec4>] cpu_startup_entry+0x39c/0x508
 [<ffffffff81d7dc68>] start_kernel+0x584/0x5a0

Replace regular trace_preemptoff_hist to rcuidle version to avoid the error.

Signed-off-by: Yang Shi <yang.shi@windriver.com>
Cc: bigeasy@linutronix.de
Cc: rostedt@goodmis.org
Cc: linux-rt-users@vger.kernel.org
Link: http://lkml.kernel.org/r/1456262603-10075-1-git-send-email-yang.shi@windriver.com


Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

2b14b8e4

sched,rt: __always_inline preemptible_lazy() · 95666752

Mike Galbraith authored 9 years ago


homer: # nm kernel/sched/core.o|grep preemptible_lazy
00000000000000b5 t preemptible_lazy

echo wakeup_rt > current_tracer ==> Welcome to infinity.

Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: linux-rt-users <linux-rt-users@vger.kernel.org>
Link: http://lkml.kernel.org/r/1456067490.3771.2.camel@gmail.com


Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

95666752

kernel/stop_machine: partly revert "stop_machine: Use raw spinlocks" · e39a9e5e

Sebastian Andrzej Siewior authored 9 years ago


With completion using swait and so rawlocks we don't need this anymore.
Further, bisect thinks this patch is responsible for:

|BUG: unable to handle kernel NULL pointer dereference at           (null)
|IP: [<ffffffff81082123>] sched_cpu_active+0x53/0x70
|PGD 0
|Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
|Dumping ftrace buffer:
|   (ftrace buffer empty)
|Modules linked in:
|CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.4.1+ #330
|Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Debian-1.8.2-1 04/01/2014
|task: ffff88013ae64b00 ti: ffff88013ae74000 task.ti: ffff88013ae74000
|RIP: 0010:[<ffffffff81082123>]  [<ffffffff81082123>] sched_cpu_active+0x53/0x70
|RSP: 0000:ffff88013ae77eb8  EFLAGS: 00010082
|RAX: 0000000000000001 RBX: ffffffff81c2cf20 RCX: 0000001050fb52fb
|RDX: 0000001050fb52fb RSI: 000000105117ca1e RDI: 00000000001c7723
|RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000001
|R10: 0000000000000000 R11: 0000000000000001 R12: 00000000ffffffff
|R13: ffffffff81c2cee0 R14: 0000000000000000 R15: 0000000000000001
|FS:  0000000000000000(0000) GS:ffff88013b200000(0000) knlGS:0000000000000000
|CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
|CR2: 0000000000000000 CR3: 0000000001c09000 CR4: 00000000000006e0
|Stack:
| ffffffff810c446d ffff88013ae77f00 ffffffff8107d8dd 000000000000000a
| 0000000000000001 0000000000000000 0000000000000000 0000000000000000
| 0000000000000000 ffff88013ae77f10 ffffffff8107d90e ffff88013ae77f20
|Call Trace:
| [<ffffffff810c446d>] ? debug_lockdep_rcu_enabled+0x1d/0x20
| [<ffffffff8107d8dd>] ? notifier_call_chain+0x5d/0x80
| [<ffffffff8107d90e>] ? __raw_notifier_call_chain+0xe/0x10
| [<ffffffff810598a3>] ? cpu_notify+0x23/0x40
| [<ffffffff8105a7b8>] ? notify_cpu_starting+0x28/0x30

during hotplug. The rawlocks need to remain however.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

e39a9e5e

kernel: softirq: unlock with irqs on · 21009cec

Sebastian Andrzej Siewior authored 9 years ago


We unlock the lock while the interrupts are off. This isn't a problem
now but will get because the migrate_disable() + enable are not
symmetrical in regard to the status of interrupts.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

21009cec

kernel: migrate_disable() do fastpath in atomic & irqs-off · d85cbde6

Sebastian Andrzej Siewior authored 9 years ago

With interrupts off it makes no sense to do the long path since we can't
leave the CPU anyway. Also we might end up in a recursion with lockdep.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

d85cbde6

latencyhist: disable jump-labels · 6cc82102

Sebastian Andrzej Siewior authored 9 years ago


Atleast on X86 we die a recursive death

|CPU: 3 PID: 585 Comm: bash Not tainted 4.4.1-rt4+ #198
|Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Debian-1.8.2-1 04/01/2014
|task: ffff88007ab4cd00 ti: ffff88007ab94000 task.ti: ffff88007ab94000
|RIP: 0010:[<ffffffff81684870>]  [<ffffffff81684870>] int3+0x0/0x10
|RSP: 0018:ffff88013c107fd8  EFLAGS: 00010082
|RAX: ffff88007ab4cd00 RBX: ffffffff8100ceab RCX: 0000000080202001
|RDX: 0000000000000000 RSI: ffffffff8100ceab RDI: ffffffff810c78b2
|RBP: ffff88007ab97c10 R08: ffffffffff57b000 R09: 0000000000000000
|R10: ffff88013bb64790 R11: ffff88007ab4cd68 R12: ffffffff8100ceab
|R13: ffffffff810c78b2 R14: ffffffff810f8158 R15: ffffffff810f9120
|FS:  0000000000000000(0000) GS:ffff88013c100000(0063) knlGS:00000000f74e3940
|CS:  0010 DS: 002b ES: 002b CR0: 000000008005003b
|CR2: 0000000008cf6008 CR3: 000000013b169000 CR4: 00000000000006e0
|Call Trace:
| <#DB>
| [<ffffffff810f8158>] ? trace_preempt_off+0x18/0x170
| <<EOE>>
| [<ffffffff81077745>] preempt_count_add+0xa5/0xc0
| [<ffffffff810c78b2>] on_each_cpu+0x22/0x90
| [<ffffffff8100ceab>] text_poke_bp+0x5b/0xc0
| [<ffffffff8100a29c>] arch_jump_label_transform+0x8c/0xf0
| [<ffffffff8111c77c>] __jump_label_update+0x6c/0x80
| [<ffffffff8111c83a>] jump_label_update+0xaa/0xc0
| [<ffffffff8111ca54>] static_key_slow_inc+0x94/0xa0
| [<ffffffff810e0d8d>] tracepoint_probe_register_prio+0x26d/0x2c0
| [<ffffffff810e0df3>] tracepoint_probe_register+0x13/0x20
| [<ffffffff810fca78>] trace_event_reg+0x98/0xd0
| [<ffffffff810fcc8b>] __ftrace_event_enable_disable+0x6b/0x180
| [<ffffffff810fd5b8>] event_enable_write+0x78/0xc0
| [<ffffffff8117a768>] __vfs_write+0x28/0xe0
| [<ffffffff8117b025>] vfs_write+0xa5/0x180
| [<ffffffff8117bb76>] SyS_write+0x46/0xa0
| [<ffffffff81002c91>] do_fast_syscall_32+0xa1/0x1d0
| [<ffffffff81684d57>] sysenter_flags_fixed+0xd/0x17

during
 echo 1 > /sys/kernel/debug/tracing/events/hist/preemptirqsoff_hist/enable

Reported-By: Christoph Mathys <eraserix@gmail.com>
Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

6cc82102

net: provide a way to delegate processing a softirq to ksoftirqd · 5a85870b

Sebastian Andrzej Siewior authored 9 years ago


If the NET_RX uses up all of his budget it moves the following NAPI
invocations into the `ksoftirqd`. On -RT it does not do so. Instead it
rises the NET_RX softirq in its current context again.

In order to get closer to mainline's behaviour this patch provides
__raise_softirq_irqoff_ksoft() which raises the softirq in the ksoftird.

Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

5a85870b

softirq: split timer softirqs out of ksoftirqd · 69ae6624

Sebastian Andrzej Siewior authored 9 years ago


The softirqd runs in -RT with SCHED_FIFO (prio 1) and deals mostly with
timer wakeup which can not happen in hardirq context. The prio has been
risen from the normal SCHED_OTHER so the timer wakeup does not happen
too late.
With enough networking load it is possible that the system never goes
idle and schedules ksoftirqd and everything else with a higher priority.
One of the tasks left behind is one of RCU's threads and so we see stalls
and eventually run out of memory.
This patch moves the TIMER and HRTIMER softirqs out of the `ksoftirqd`
thread into its own `ktimersoftd`. The former can now run SCHED_OTHER
(same as mainline) and the latter at SCHED_FIFO due to the wakeups.

From networking point of view: The NAPI callback runs after the network
interrupt thread completes. If its run time takes too long the NAPI code
itself schedules the `ksoftirqd`. Here in the thread it can run at
SCHED_OTHER priority and it won't defer RCU anymore.

Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

69ae6624

preempt-lazy: Add the lazy-preemption check to preempt_schedule() · 2b0609ac

Sebastian Andrzej Siewior authored 9 years ago

Probably in the rebase onto v4.1 this check got moved into less commonly used
preempt_schedule_notrace(). This patch ensures that both functions use it.

Reported-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

2b0609ac

ptrace: don't open IRQs in ptrace_freeze_traced() too early · 738c5e14

Sebastian Andrzej Siewior authored 9 years ago


In the non-RT case the spin_lock_irq() here disables interrupts as well
as raw_spin_lock_irq(). So in the unlock case the interrupts are enabled
too early.

Reported-by: kernel test robot <ying.huang@linux.intel.com>
Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

738c5e14

sched: Introduce the trace_sched_waking tracepoint · e109ec3d

Peter Zijlstra authored 9 years ago


Upstream commit fbd705a0

Mathieu reported that since 317f3941 ("sched: Move the second half
of ttwu() to the remote cpu") trace_sched_wakeup() can happen out of
context of the waker.

This is a problem when you want to analyse wakeup paths because it is
now very hard to correlate the wakeup event to whoever issued the
wakeup.

OTOH trace_sched_wakeup() is issued at the point where we set
p->state = TASK_RUNNING, which is right were we hand the task off to
the scheduler, so this is an important point when looking at
scheduling behaviour, up to here its been the wakeup path everything
hereafter is due to scheduler policy.

To bridge this gap, introduce a second tracepoint: trace_sched_waking.
It is guaranteed to be called in the waker context.

[ Ported to linux-4.1.y-rt kernel by Mathieu Desnoyers. Resolved
  conflict: try_to_wake_up_local() does not exist in -rt kernel. Removed
  its instrumentation hunk. ]

Reported-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Julien Desfossez <jdesfossez@efficios.com>
CC: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Francis Giraldeau <francis.giraldeau@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@kernel.org>
Link: http://lkml.kernel.org/r/20150609091336.GQ3644@twins.programming.kicks-ass.net


Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

e109ec3d

irqwork: Move irq safe work to irq context · c76b83c2

Thomas Gleixner authored 9 years ago


On architectures where arch_irq_work_has_interrupt() returns false, we
end up running the irq safe work from the softirq context. That
results in a potential deadlock in the scheduler irq work which
expects that function to be called with interrupts disabled.

Split the irq_work_tick() function into a hard and soft variant. Call
the hard variant from the tick interrupt and add the soft variant to
the timer softirq.

Reported-and-tested-by: Yanjiang Jin <yanjiang.jin@windriver.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable-rt@vger.kernel.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

c76b83c2

net/core/cpuhotplug: Drain input_pkt_queue lockless · 64fb20a3

Grygorii Strashko authored 9 years ago


I can constantly see below error report with 4.1 RT-kernel on TI ARM dra7-evm
if I'm trying to unplug cpu1:

[   57.737589] CPU1: shutdown
[   57.767537] BUG: spinlock bad magic on CPU#0, sh/137
[   57.767546]  lock: 0xee994730, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0
[   57.767552] CPU: 0 PID: 137 Comm: sh Not tainted 4.1.10-rt8-01700-g2c38702-dirty #55
[   57.767555] Hardware name: Generic DRA74X (Flattened Device Tree)
[   57.767568] [<c001acd0>] (unwind_backtrace) from [<c001534c>] (show_stack+0x20/0x24)
[   57.767579] [<c001534c>] (show_stack) from [<c075560c>] (dump_stack+0x84/0xa0)
[   57.767593] [<c075560c>] (dump_stack) from [<c00aca48>] (spin_dump+0x84/0xac)
[   57.767603] [<c00aca48>] (spin_dump) from [<c00acaa4>] (spin_bug+0x34/0x38)
[   57.767614] [<c00acaa4>] (spin_bug) from [<c00acc10>] (do_raw_spin_lock+0x168/0x1c0)
[   57.767624] [<c00acc10>] (do_raw_spin_lock) from [<c075b4cc>] (_raw_spin_lock+0x4c/0x54)
[   57.767631] [<c075b4cc>] (_raw_spin_lock) from [<c07599fc>] (rt_spin_lock_slowlock+0x5c/0x374)
[   57.767638] [<c07599fc>] (rt_spin_lock_slowlock) from [<c075bcf4>] (rt_spin_lock+0x38/0x70)
[   57.767649] [<c075bcf4>] (rt_spin_lock) from [<c06333c0>] (skb_dequeue+0x28/0x7c)
[   57.767662] [<c06333c0>] (skb_dequeue) from [<c06476ec>] (dev_cpu_callback+0x1b8/0x240)
[   57.767673] [<c06476ec>] (dev_cpu_callback) from [<c007566c>] (notifier_call_chain+0x3c/0xb4)

The reason is that skb_dequeue is taking skb->lock, but RT changed the
core code to use a raw spinlock. The non-raw lock is not initialized
on purpose to catch exactly this kind of problem.

Fixes: 91df05da 'net: Use skbufhead with raw lock'
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable-rt@vger.kernel.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

64fb20a3

net: Make synchronize_rcu_expedited() conditional on !RT_FULL · 3ccb0714

Josh Cartwright authored 9 years ago


While the use of synchronize_rcu_expedited() might make
synchronize_net() "faster", it does so at significant cost on RT
systems, as expediting a grace period forcibly preempts any
high-priority RT tasks (via the stop_machine() mechanism).

Without this change, we can observe a latency spike up to 30us with
cyclictest by rapidly unplugging/reestablishing an ethernet link.

Suggested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Josh Cartwright <joshc@ni.com>
Cc: bigeasy@linutronix.de
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/20151027123153.GG8245@jcartwri.amer.corp.natinst.com


Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

3ccb0714

dump stack: don't disable preemption during trace · c5dafd47

Sebastian Andrzej Siewior authored 9 years ago

I see here large latencies during a stack dump on x86. The
preempt_disable() and get_cpu() should forbid moving the task to another
CPU during a stack dump and avoiding two stack traces in parallel on the
same CPU. However a stack trace from a second CPU may still happen in
parallel. Also nesting is allowed so a stack trace happens in
process-context and we may have another one from IRQ context. With migrate
disable we keep this code preemptible and allow a second backtrace on
the same CPU by another task.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

c5dafd47

rtmutex: Use chainwalking control enum · e34e3df2

bmouring@ni.com authored 9 years ago


In 8930ed80 (rtmutex: Cleanup deadlock detector debug logic),
chainwalking control enums were introduced to limit the deadlock
detection logic. One of the calls to task_blocks_on_rt_mutex was
missed when converting to use the enums.

Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Brad Mouring <brad.mouring@ni.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

e34e3df2

rtmutex: Handle non enqueued waiters gracefully · 3021bbb9

Thomas Gleixner authored 9 years ago


Yimin debugged that in case of a PI wakeup in progress when
rt_mutex_start_proxy_lock() calls task_blocks_on_rt_mutex() the latter
returns -EAGAIN and in consequence the remove_waiter() call runs into
a BUG_ON() because there is nothing to remove.

Guard it with rt_mutex_has_waiters(). This is a quick fix which is
easy to backport. The proper fix is to have a central check in
remove_waiter() so we can call it unconditionally.

Reported-and-debugged-by: Yimin Deng <yimin11.deng@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable-rt@vger.kernel.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

3021bbb9

Admin message