Forum | Documentation | Website | Blog

Skip to content
Snippets Groups Projects
  1. Dec 31, 2014
    • Thomas Gleixner's avatar
      v4.1.13-rt15 · 6829c375
      Thomas Gleixner authored
      
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
    • Mathieu Desnoyers's avatar
      latency_hist: Update sched_wakeup probe · cdc4829c
      Mathieu Desnoyers authored
      
      "sched: Introduce the 'trace_sched_waking' tracepoint" introduces a
      prototype change for the sched_wakeup probe: the "success" argument is
      removed. Update the latency_hist probe following this change.
      
      Signed-off-by: default avatarMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Julien Desfossez <jdesfossez@efficios.com>
      Cc: Francis Giraldeau <francis.giraldeau@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Link: http://lkml.kernel.org/r/1445810765-18732-1-git-send-email-mathieu.desnoyers@efficios.com
      
      
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      cdc4829c
    • Peter Zijlstra's avatar
      sched: Introduce the trace_sched_waking tracepoint · e617cccd
      Peter Zijlstra authored
      Upstream commit fbd705a0
      
      Mathieu reported that since 317f3941
      
       ("sched: Move the second half
      of ttwu() to the remote cpu") trace_sched_wakeup() can happen out of
      context of the waker.
      
      This is a problem when you want to analyse wakeup paths because it is
      now very hard to correlate the wakeup event to whoever issued the
      wakeup.
      
      OTOH trace_sched_wakeup() is issued at the point where we set
      p->state = TASK_RUNNING, which is right were we hand the task off to
      the scheduler, so this is an important point when looking at
      scheduling behaviour, up to here its been the wakeup path everything
      hereafter is due to scheduler policy.
      
      To bridge this gap, introduce a second tracepoint: trace_sched_waking.
      It is guaranteed to be called in the waker context.
      
      [ Ported to linux-4.1.y-rt kernel by Mathieu Desnoyers. Resolved
        conflict: try_to_wake_up_local() does not exist in -rt kernel. Removed
        its instrumentation hunk. ]
      
      Reported-by: default avatarMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      CC: Julien Desfossez <jdesfossez@efficios.com>
      CC: Peter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Francis Giraldeau <francis.giraldeau@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      CC: Ingo Molnar <mingo@kernel.org>
      Link: http://lkml.kernel.org/r/20150609091336.GQ3644@twins.programming.kicks-ass.net
      
      
      Signed-off-by: default avatarMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      e617cccd
    • Thomas Gleixner's avatar
      workqueue: Prevent deadlock/stall on RT · 0f4077dc
      Thomas Gleixner authored
      
      Austin reported a XFS deadlock/stall on RT where scheduled work gets
      never exececuted and tasks are waiting for each other for ever.
      
      The underlying problem is the modification of the RT code to the
      handling of workers which are about to go to sleep. In mainline a
      worker thread which goes to sleep wakes an idle worker if there is
      more work to do. This happens from the guts of the schedule()
      function. On RT this must be outside and the accessed data structures
      are not protected against scheduling due to the spinlock to rtmutex
      conversion. So the naive solution to this was to move the code outside
      of the scheduler and protect the data structures by the pool
      lock. That approach turned out to be a little naive as we cannot call
      into that code when the thread blocks on a lock, as it is not allowed
      to block on two locks in parallel. So we dont call into the worker
      wakeup magic when the worker is blocked on a lock, which causes the
      deadlock/stall observed by Austin and Mike.
      
      Looking deeper into that worker code it turns out that the only
      relevant data structure which needs to be protected is the list of
      idle workers which can be woken up.
      
      So the solution is to protect the list manipulation operations with
      preempt_enable/disable pairs on RT and call unconditionally into the
      worker code even when the worker is blocked on a lock. The preemption
      protection is safe as there is nothing which can fiddle with the list
      outside of thread context.
      
      Reported-and_tested-by: default avatarAustin Schuh <austin@peloton-tech.com>
      Reported-and_tested-by: default avatarMike Galbraith <umgwanakikbuti@gmail.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Link: http://vger.kernel.org/r/alpine.DEB.2.10.1406271249510.5170@nanos
      Cc: Richard Weinberger <richard.weinberger@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      0f4077dc
    • Sebastian Andrzej Siewior's avatar
      md: disable bcache · 4e4fd3f1
      Sebastian Andrzej Siewior authored
      
      It uses anon semaphores
      |drivers/md/bcache/request.c: In function ‘cached_dev_write_complete’:
      |drivers/md/bcache/request.c:1007:2: error: implicit declaration of function ‘up_read_non_owner’ [-Werror=implicit-function-declaration]
      |  up_read_non_owner(&dc->writeback_lock);
      |  ^
      |drivers/md/bcache/request.c: In function ‘request_write’:
      |drivers/md/bcache/request.c:1033:2: error: implicit declaration of function ‘down_read_non_owner’ [-Werror=implicit-function-declaration]
      |  down_read_non_owner(&dc->writeback_lock);
      |  ^
      
      either we get rid of those or we have to introduce them…
      
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      4e4fd3f1
    • Steven Rostedt's avatar
      rt,ntp: Move call to schedule_delayed_work() to helper thread · 34fb3c54
      Steven Rostedt authored
      
      The ntp code for notify_cmos_timer() is called from a hard interrupt
      context. schedule_delayed_work() under PREEMPT_RT_FULL calls spinlocks
      that have been converted to mutexes, thus calling schedule_delayed_work()
      from interrupt is not safe.
      
      Add a helper thread that does the call to schedule_delayed_work and wake
      up that thread instead of calling schedule_delayed_work() directly.
      This is only for CONFIG_PREEMPT_RT_FULL, otherwise the code still calls
      schedule_delayed_work() directly in irq context.
      
      Note: There's a few places in the kernel that do this. Perhaps the RT
      code should have a dedicated thread that does the checks. Just register
      a notifier on boot up for your check and wake up the thread when
      needed. This will be a todo.
      
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      34fb3c54
    • Mike Galbraith's avatar
      memcontrol: Prevent scheduling while atomic in cgroup code · 25c183c1
      Mike Galbraith authored
      
      mm, memcg: make refill_stock() use get_cpu_light()
      
      Nikita reported the following memcg scheduling while atomic bug:
      
      Call Trace:
      [e22d5a90] [c0007ea8] show_stack+0x4c/0x168 (unreliable)
      [e22d5ad0] [c0618c04] __schedule_bug+0x94/0xb0
      [e22d5ae0] [c060b9ec] __schedule+0x530/0x550
      [e22d5bf0] [c060bacc] schedule+0x30/0xbc
      [e22d5c00] [c060ca24] rt_spin_lock_slowlock+0x180/0x27c
      [e22d5c70] [c00b39dc] res_counter_uncharge_until+0x40/0xc4
      [e22d5ca0] [c013ca88] drain_stock.isra.20+0x54/0x98
      [e22d5cc0] [c01402ac] __mem_cgroup_try_charge+0x2e8/0xbac
      [e22d5d70] [c01410d4] mem_cgroup_charge_common+0x3c/0x70
      [e22d5d90] [c0117284] __do_fault+0x38c/0x510
      [e22d5df0] [c011a5f4] handle_pte_fault+0x98/0x858
      [e22d5e50] [c060ed08] do_page_fault+0x42c/0x6fc
      [e22d5f40] [c000f5b4] handle_page_fault+0xc/0x80
      
      What happens:
      
         refill_stock()
            get_cpu_var()
            drain_stock()
               res_counter_uncharge()
                  res_counter_uncharge_until()
                     spin_lock() <== boom
      
      Fix it by replacing get/put_cpu_var() with get/put_cpu_light().
      
      
      Reported-by: default avatarNikita Yushchenko <nyushchenko@dev.rtsoft.ru>
      Signed-off-by: default avatarMike Galbraith <umgwanakikbuti@gmail.com>
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      25c183c1
    • Sebastian Andrzej Siewior's avatar
      cgroups: use simple wait in css_release() · 2fe05774
      Sebastian Andrzej Siewior authored
      
      To avoid:
      |BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:914
      |in_atomic(): 1, irqs_disabled(): 0, pid: 92, name: rcuc/11
      |2 locks held by rcuc/11/92:
      | #0:  (rcu_callback){......}, at: [<ffffffff810e037e>] rcu_cpu_kthread+0x3de/0x940
      | #1:  (rcu_read_lock_sched){......}, at: [<ffffffff81328390>] percpu_ref_call_confirm_rcu+0x0/0xd0
      |Preemption disabled at:[<ffffffff813284e2>] percpu_ref_switch_to_atomic_rcu+0x82/0xc0
      |CPU: 11 PID: 92 Comm: rcuc/11 Not tainted 3.18.7-rt0+ #1
      | ffff8802398cdf80 ffff880235f0bc28 ffffffff815b3a12 0000000000000000
      | 0000000000000000 ffff880235f0bc48 ffffffff8109aa16 0000000000000000
      | ffff8802398cdf80 ffff880235f0bc78 ffffffff815b8dd4 000000000000df80
      |Call Trace:
      | [<ffffffff815b3a12>] dump_stack+0x4f/0x7c
      | [<ffffffff8109aa16>] __might_sleep+0x116/0x190
      | [<ffffffff815b8dd4>] rt_spin_lock+0x24/0x60
      | [<ffffffff8108d2cd>] queue_work_on+0x6d/0x1d0
      | [<ffffffff8110c881>] css_release+0x81/0x90
      | [<ffffffff8132844e>] percpu_ref_call_confirm_rcu+0xbe/0xd0
      | [<ffffffff813284e2>] percpu_ref_switch_to_atomic_rcu+0x82/0xc0
      | [<ffffffff810e03e5>] rcu_cpu_kthread+0x445/0x940
      | [<ffffffff81098a2d>] smpboot_thread_fn+0x18d/0x2d0
      | [<ffffffff810948d8>] kthread+0xe8/0x100
      | [<ffffffff815b9c3c>] ret_from_fork+0x7c/0xb0
      
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      2fe05774
    • Clark Williams's avatar
      i915: bogus warning from i915 when running on PREEMPT_RT · d124820d
      Clark Williams authored
      
      The i915 driver has a 'WARN_ON(!in_interrupt())' in the display
      handler, which whines constanly on the RT kernel (since the interrupt
      is actually handled in a threaded handler and not actual interrupt
      context).
      
      Change the WARN_ON to WARN_ON_NORT
      
      Tested-by: default avatarJoakim Hernberg <jhernberg@alchemy.lu>
      Signed-off-by: default avatarClark Williams <williams@redhat.com>
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      d124820d
    • Sebastian Andrzej Siewior's avatar
      drm/i915: drop trace_i915_gem_ring_dispatch on rt · 2469ee49
      Sebastian Andrzej Siewior authored
      
      This tracepoint is responsible for:
      
      |[<814cc358>] __schedule_bug+0x4d/0x59
      |[<814d24cc>] __schedule+0x88c/0x930
      |[<814d3b90>] ? _raw_spin_unlock_irqrestore+0x40/0x50
      |[<814d3b95>] ? _raw_spin_unlock_irqrestore+0x45/0x50
      |[<810b57b5>] ? task_blocks_on_rt_mutex+0x1f5/0x250
      |[<814d27d9>] schedule+0x29/0x70
      |[<814d3423>] rt_spin_lock_slowlock+0x15b/0x278
      |[<814d3786>] rt_spin_lock+0x26/0x30
      |[<a00dced9>] gen6_gt_force_wake_get+0x29/0x60 [i915]
      |[<a00e183f>] gen6_ring_get_irq+0x5f/0x100 [i915]
      |[<a00b2a33>] ftrace_raw_event_i915_gem_ring_dispatch+0xe3/0x100 [i915]
      |[<a00ac1b3>] i915_gem_do_execbuffer.isra.13+0xbd3/0x1430 [i915]
      |[<810f8943>] ? trace_buffer_unlock_commit+0x43/0x60
      |[<8113e8d2>] ? ftrace_raw_event_kmem_alloc+0xd2/0x180
      |[<8101d063>] ? native_sched_clock+0x13/0x80
      |[<a00acf29>] i915_gem_execbuffer2+0x99/0x280 [i915]
      |[<a00114a3>] drm_ioctl+0x4c3/0x570 [drm]
      |[<8101d0d9>] ? sched_clock+0x9/0x10
      |[<a00ace90>] ? i915_gem_execbuffer+0x480/0x480 [i915]
      |[<810f1c18>] ? rb_commit+0x68/0xa0
      |[<810f1c6c>] ? ring_buffer_unlock_commit+0x1c/0xa0
      |[<81197467>] do_vfs_ioctl+0x97/0x540
      |[<81021318>] ? ftrace_raw_event_sys_enter+0xd8/0x130
      |[<811979a1>] sys_ioctl+0x91/0xb0
      |[<814db931>] tracesys+0xe1/0xe6
      
      Chris Wilson does not like to move i915_trace_irq_get() out of the macro
      
      |No. This enables the IRQ, as well as making a number of
      |very expensively serialised read, unconditionally.
      
      so it is gone now on RT.
      
      
      Reported-by: default avatarJoakim Hernberg <jbh@alchemy.lu>
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      2469ee49
    • Sebastian Andrzej Siewior's avatar
      gpu/i915: don't open code these things · 0af70f52
      Sebastian Andrzej Siewior authored
      The opencode part is gone in 1f83fee0
      
       ("drm/i915: clear up wedged transitions")
      the owner check is still there.
      
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      0af70f52
    • Sebastian Andrzej Siewior's avatar
      cpufreq: drop K8's driver from beeing selected · 7d6248a3
      Sebastian Andrzej Siewior authored
      
      Ralf posted a picture of a backtrace from
      
      | powernowk8_target_fn() -> transition_frequency_fidvid() and then at the
      | end:
      | 932         policy = cpufreq_cpu_get(smp_processor_id());
      | 933         cpufreq_cpu_put(policy);
      
      crashing the system on -RT. I assumed that policy was a NULL pointer but
      was rulled out. Since Ralf can't do any more investigations on this and
      I have no machine with this, I simply switch it off.
      
      Reported-by: default avatarRalf Mardorf <ralf.mardorf@alice-dsl.net>
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      7d6248a3
    • Thomas Gleixner's avatar
      mmci: Remove bogus local_irq_save() · b2536a6b
      Thomas Gleixner authored
      
      On !RT interrupt runs with interrupts disabled. On RT it's in a
      thread, so no need to disable interrupts at all.
      
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      b2536a6b
    • Sebastian Andrzej Siewior's avatar
      i2c/omap: drop the lock hard irq context · ef41bc9f
      Sebastian Andrzej Siewior authored
      
      The lock is taken while reading two registers. On RT the first lock is
      taken in hard irq where it might sleep and in the threaded irq.
      The threaded irq runs in oneshot mode so the hard irq does not run until
      the thread the completes so there is no reason to grab the lock.
      
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      ef41bc9f
    • Sebastian Andrzej Siewior's avatar
      leds: trigger: disable CPU trigger on -RT · 762d9ae7
      Sebastian Andrzej Siewior authored
      
      as it triggers:
      |CPU: 0 PID: 0 Comm: swapper Not tainted 3.12.8-rt10 #141
      |[<c0014aa4>] (unwind_backtrace+0x0/0xf8) from [<c0012788>] (show_stack+0x1c/0x20)
      |[<c0012788>] (show_stack+0x1c/0x20) from [<c043c8dc>] (dump_stack+0x20/0x2c)
      |[<c043c8dc>] (dump_stack+0x20/0x2c) from [<c004c5e8>] (__might_sleep+0x13c/0x170)
      |[<c004c5e8>] (__might_sleep+0x13c/0x170) from [<c043f270>] (__rt_spin_lock+0x28/0x38)
      |[<c043f270>] (__rt_spin_lock+0x28/0x38) from [<c043fa00>] (rt_read_lock+0x68/0x7c)
      |[<c043fa00>] (rt_read_lock+0x68/0x7c) from [<c036cf74>] (led_trigger_event+0x2c/0x5c)
      |[<c036cf74>] (led_trigger_event+0x2c/0x5c) from [<c036e0bc>] (ledtrig_cpu+0x54/0x5c)
      |[<c036e0bc>] (ledtrig_cpu+0x54/0x5c) from [<c000ffd8>] (arch_cpu_idle_exit+0x18/0x1c)
      |[<c000ffd8>] (arch_cpu_idle_exit+0x18/0x1c) from [<c00590b8>] (cpu_startup_entry+0xa8/0x234)
      |[<c00590b8>] (cpu_startup_entry+0xa8/0x234) from [<c043b2cc>] (rest_init+0xb8/0xe0)
      |[<c043b2cc>] (rest_init+0xb8/0xe0) from [<c061ebe0>] (start_kernel+0x2c4/0x380)
      
      
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      762d9ae7
    • Anders Roxell's avatar
      arch/arm64: Add lazy preempt support · 42bfc020
      Anders Roxell authored
      
      arm64 is missing support for PREEMPT_RT. The main feature which is
      lacking is support for lazy preemption. The arch-specific entry code,
      thread information structure definitions, and associated data tables
      have to be extended to provide this support. Then the Kconfig file has
      to be extended to indicate the support is available, and also to
      indicate that support for full RT preemption is now available.
      
      Signed-off-by: default avatarAnders Roxell <anders.roxell@linaro.org>
      42bfc020
    • Thomas Gleixner's avatar
      powerpc: Add support for lazy preemption · e680f1a0
      Thomas Gleixner authored
      
      Implement the powerpc pieces for lazy preempt.
      
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      e680f1a0
    • Thomas Gleixner's avatar
      arm: Add support for lazy preemption · 67071a36
      Thomas Gleixner authored
      
      Implement the arm pieces for lazy preempt.
      
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      67071a36
    • Thomas Gleixner's avatar
      x86: Support for lazy preemption · 1fc52c3e
      Thomas Gleixner authored
      
      Implement the x86 pieces for lazy preempt.
      
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      1fc52c3e
    • Thomas Gleixner's avatar
      sched: Add support for lazy preemption · 6a5fb31a
      Thomas Gleixner authored
      
      It has become an obsession to mitigate the determinism vs. throughput
      loss of RT. Looking at the mainline semantics of preemption points
      gives a hint why RT sucks throughput wise for ordinary SCHED_OTHER
      tasks. One major issue is the wakeup of tasks which are right away
      preempting the waking task while the waking task holds a lock on which
      the woken task will block right after having preempted the wakee. In
      mainline this is prevented due to the implicit preemption disable of
      spin/rw_lock held regions. On RT this is not possible due to the fully
      preemptible nature of sleeping spinlocks.
      
      Though for a SCHED_OTHER task preempting another SCHED_OTHER task this
      is really not a correctness issue. RT folks are concerned about
      SCHED_FIFO/RR tasks preemption and not about the purely fairness
      driven SCHED_OTHER preemption latencies.
      
      So I introduced a lazy preemption mechanism which only applies to
      SCHED_OTHER tasks preempting another SCHED_OTHER task. Aside of the
      existing preempt_count each tasks sports now a preempt_lazy_count
      which is manipulated on lock acquiry and release. This is slightly
      incorrect as for lazyness reasons I coupled this on
      migrate_disable/enable so some other mechanisms get the same treatment
      (e.g. get_cpu_light).
      
      Now on the scheduler side instead of setting NEED_RESCHED this sets
      NEED_RESCHED_LAZY in case of a SCHED_OTHER/SCHED_OTHER preemption and
      therefor allows to exit the waking task the lock held region before
      the woken task preempts. That also works better for cross CPU wakeups
      as the other side can stay in the adaptive spinning loop.
      
      For RT class preemption there is no change. This simply sets
      NEED_RESCHED and forgoes the lazy preemption counter.
      
       Initial test do not expose any observable latency increasement, but
      history shows that I've been proven wrong before :)
      
      The lazy preemption mode is per default on, but with
      CONFIG_SCHED_DEBUG enabled it can be disabled via:
      
       # echo NO_PREEMPT_LAZY >/sys/kernel/debug/sched_features
      
      and reenabled via
      
       # echo PREEMPT_LAZY >/sys/kernel/debug/sched_features
      
      The test results so far are very machine and workload dependent, but
      there is a clear trend that it enhances the non RT workload
      performance.
      
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      6a5fb31a
    • Sebastian Andrzej Siewior's avatar
      rcu: make RCU_BOOST default on RT · ab6e2b79
      Sebastian Andrzej Siewior authored
      
      Since it is no longer invoked from the softirq people run into OOM more
      often if the priority of the RCU thread is too low. Making boosting
      default on RT should help in those case and it can be switched off if
      someone knows better.
      
      
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      ab6e2b79
    • Paul E. McKenney's avatar
      rcu: Eliminate softirq processing from rcutree · bae14124
      Paul E. McKenney authored
      
      Running RCU out of softirq is a problem for some workloads that would
      like to manage RCU core processing independently of other softirq work,
      for example, setting kthread priority.  This commit therefore moves the
      RCU core work from softirq to a per-CPU/per-flavor SCHED_OTHER kthread
      named rcuc.  The SCHED_OTHER approach avoids the scalability problems
      that appeared with the earlier attempt to move RCU core processing to
      from softirq to kthreads.  That said, kernels built with RCU_BOOST=y
      will run the rcuc kthreads at the RCU-boosting priority.
      
      Reported-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Tested-by: default avatarMike Galbraith <bitbucket@online.de>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      bae14124
    • Thomas Gleixner's avatar
      rcu: Disable RCU_FAST_NO_HZ on RT · df2819e9
      Thomas Gleixner authored
      
      This uses a timer_list timer from the irq disabled guts of the idle
      code. Disable it for now to prevent wreckage.
      
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      df2819e9
    • Yong Zhang's avatar
      perf: Make swevent hrtimer run in irq instead of softirq · 5d613fc7
      Yong Zhang authored
      
      Otherwise we get a deadlock like below:
      
      [ 1044.042749] BUG: scheduling while atomic: ksoftirqd/21/141/0x00010003
      [ 1044.042752] INFO: lockdep is turned off.
      [ 1044.042754] Modules linked in:
      [ 1044.042757] Pid: 141, comm: ksoftirqd/21 Tainted: G        W    3.4.0-rc2-rt3-23676-ga723175-dirty #29
      [ 1044.042759] Call Trace:
      [ 1044.042761]  <IRQ>  [<ffffffff8107d8e5>] __schedule_bug+0x65/0x80
      [ 1044.042770]  [<ffffffff8168978c>] __schedule+0x83c/0xa70
      [ 1044.042775]  [<ffffffff8106bdd2>] ? prepare_to_wait+0x32/0xb0
      [ 1044.042779]  [<ffffffff81689a5e>] schedule+0x2e/0xa0
      [ 1044.042782]  [<ffffffff81071ebd>] hrtimer_wait_for_timer+0x6d/0xb0
      [ 1044.042786]  [<ffffffff8106bb30>] ? wake_up_bit+0x40/0x40
      [ 1044.042790]  [<ffffffff81071f20>] hrtimer_cancel+0x20/0x40
      [ 1044.042794]  [<ffffffff8111da0c>] perf_swevent_cancel_hrtimer+0x3c/0x50
      [ 1044.042798]  [<ffffffff8111da31>] task_clock_event_stop+0x11/0x40
      [ 1044.042802]  [<ffffffff8111da6e>] task_clock_event_del+0xe/0x10
      [ 1044.042805]  [<ffffffff8111c568>] event_sched_out+0x118/0x1d0
      [ 1044.042809]  [<ffffffff8111c649>] group_sched_out+0x29/0x90
      [ 1044.042813]  [<ffffffff8111ed7e>] __perf_event_disable+0x18e/0x200
      [ 1044.042817]  [<ffffffff8111c343>] remote_function+0x63/0x70
      [ 1044.042821]  [<ffffffff810b0aae>] generic_smp_call_function_single_interrupt+0xce/0x120
      [ 1044.042826]  [<ffffffff81022bc7>] smp_call_function_single_interrupt+0x27/0x40
      [ 1044.042831]  [<ffffffff8168d50c>] call_function_single_interrupt+0x6c/0x80
      [ 1044.042833]  <EOI>  [<ffffffff811275b0>] ? perf_event_overflow+0x20/0x20
      [ 1044.042840]  [<ffffffff8168b970>] ? _raw_spin_unlock_irq+0x30/0x70
      [ 1044.042844]  [<ffffffff8168b976>] ? _raw_spin_unlock_irq+0x36/0x70
      [ 1044.042848]  [<ffffffff810702e2>] run_hrtimer_softirq+0xc2/0x200
      [ 1044.042853]  [<ffffffff811275b0>] ? perf_event_overflow+0x20/0x20
      [ 1044.042857]  [<ffffffff81045265>] __do_softirq_common+0xf5/0x3a0
      [ 1044.042862]  [<ffffffff81045c3d>] __thread_do_softirq+0x15d/0x200
      [ 1044.042865]  [<ffffffff81045dda>] run_ksoftirqd+0xfa/0x210
      [ 1044.042869]  [<ffffffff81045ce0>] ? __thread_do_softirq+0x200/0x200
      [ 1044.042873]  [<ffffffff81045ce0>] ? __thread_do_softirq+0x200/0x200
      [ 1044.042877]  [<ffffffff8106b596>] kthread+0xb6/0xc0
      [ 1044.042881]  [<ffffffff8168b97b>] ? _raw_spin_unlock_irq+0x3b/0x70
      [ 1044.042886]  [<ffffffff8168d994>] kernel_thread_helper+0x4/0x10
      [ 1044.042889]  [<ffffffff8107d98c>] ? finish_task_switch+0x8c/0x110
      [ 1044.042894]  [<ffffffff8168b97b>] ? _raw_spin_unlock_irq+0x3b/0x70
      [ 1044.042897]  [<ffffffff8168bd5d>] ? retint_restore_args+0xe/0xe
      [ 1044.042900]  [<ffffffff8106b4e0>] ? kthreadd+0x1e0/0x1e0
      [ 1044.042902]  [<ffffffff8168d990>] ? gs_change+0xb/0xb
      
      Signed-off-by: default avatarYong Zhang <yong.zhang0@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Link: http://lkml.kernel.org/r/1341476476-5666-1-git-send-email-yong.zhang0@gmail.com
      
      
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      5d613fc7
    • Josh Cartwright's avatar
      lockdep: selftest: fix warnings due to missing PREEMPT_RT conditionals · e4504269
      Josh Cartwright authored
      
      "lockdep: Selftest: Only do hardirq context test for raw spinlock"
      disabled the execution of certain tests with PREEMPT_RT_FULL, but did
      not prevent the tests from still being defined.  This leads to warnings
      like:
      
        ./linux/lib/locking-selftest.c:574:1: warning: 'irqsafe1_hard_rlock_12' defined but not used [-Wunused-function]
        ./linux/lib/locking-selftest.c:574:1: warning: 'irqsafe1_hard_rlock_21' defined but not used [-Wunused-function]
        ./linux/lib/locking-selftest.c:577:1: warning: 'irqsafe1_hard_wlock_12' defined but not used [-Wunused-function]
        ./linux/lib/locking-selftest.c:577:1: warning: 'irqsafe1_hard_wlock_21' defined but not used [-Wunused-function]
        ./linux/lib/locking-selftest.c:580:1: warning: 'irqsafe1_soft_spin_12' defined but not used [-Wunused-function]
        ...
      
      Fixed by wrapping the test definitions in #ifndef CONFIG_PREEMPT_RT_FULL
      conditionals.
      
      
      Signed-off-by: default avatarJosh Cartwright <josh.cartwright@ni.com>
      Signed-off-by: default avatarXander Huff <xander.huff@ni.com>
      Acked-by: default avatarGratian Crisan <gratian.crisan@ni.com>
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      e4504269
    • Yong Zhang's avatar
      lockdep: selftest: Only do hardirq context test for raw spinlock · 341039d6
      Yong Zhang authored
      
      On -rt there is no softirq context any more and rwlock is sleepable,
      disable softirq context test and rwlock+irq test.
      
      Signed-off-by: default avatarYong Zhang <yong.zhang0@gmail.com>
      Cc: Yong Zhang <yong.zhang@windriver.com>
      Link: http://lkml.kernel.org/r/1334559716-18447-3-git-send-email-yong.zhang0@gmail.com
      
      
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      341039d6
    • Peter Zijlstra's avatar
      crypto: Convert crypto notifier chain to SRCU · 58b78188
      Peter Zijlstra authored
      
      The crypto notifier deadlocks on RT. Though this can be a real deadlock
      on mainline as well due to fifo fair rwsems.
      
      The involved parties here are:
      
      [   82.172678] swapper/0       S 0000000000000001     0     1      0 0x00000000
      [   82.172682]  ffff88042f18fcf0 0000000000000046 ffff88042f18fc80 ffffffff81491238
      [   82.172685]  0000000000011cc0 0000000000011cc0 ffff88042f18c040 ffff88042f18ffd8
      [   82.172688]  0000000000011cc0 0000000000011cc0 ffff88042f18ffd8 0000000000011cc0
      [   82.172689] Call Trace:
      [   82.172697]  [<ffffffff81491238>] ? _raw_spin_unlock_irqrestore+0x6c/0x7a
      [   82.172701]  [<ffffffff8148fd3f>] schedule+0x64/0x66
      [   82.172704]  [<ffffffff8148ec6b>] schedule_timeout+0x27/0xd0
      [   82.172708]  [<ffffffff81043c0c>] ? unpin_current_cpu+0x1a/0x6c
      [   82.172713]  [<ffffffff8106e491>] ? migrate_enable+0x12f/0x141
      [   82.172716]  [<ffffffff8148fbbd>] wait_for_common+0xbb/0x11f
      [   82.172719]  [<ffffffff810709f2>] ? try_to_wake_up+0x182/0x182
      [   82.172722]  [<ffffffff8148fc96>] wait_for_completion_interruptible+0x1d/0x2e
      [   82.172726]  [<ffffffff811debfd>] crypto_wait_for_test+0x49/0x6b
      [   82.172728]  [<ffffffff811ded32>] crypto_register_alg+0x53/0x5a
      [   82.172730]  [<ffffffff811ded6c>] crypto_register_algs+0x33/0x72
      [   82.172734]  [<ffffffff81ad7686>] ? aes_init+0x12/0x12
      [   82.172737]  [<ffffffff81ad76ea>] aesni_init+0x64/0x66
      [   82.172741]  [<ffffffff81000318>] do_one_initcall+0x7f/0x13b
      [   82.172744]  [<ffffffff81ac4d34>] kernel_init+0x199/0x22c
      [   82.172747]  [<ffffffff81ac44ef>] ? loglevel+0x31/0x31
      [   82.172752]  [<ffffffff814987c4>] kernel_thread_helper+0x4/0x10
      [   82.172755]  [<ffffffff81491574>] ? retint_restore_args+0x13/0x13
      [   82.172759]  [<ffffffff81ac4b9b>] ? start_kernel+0x3ca/0x3ca
      [   82.172761]  [<ffffffff814987c0>] ? gs_change+0x13/0x13
      
      [   82.174186] cryptomgr_test  S 0000000000000001     0    41      2 0x00000000
      [   82.174189]  ffff88042c971980 0000000000000046 ffffffff81d74830 0000000000000292
      [   82.174192]  0000000000011cc0 0000000000011cc0 ffff88042c96eb80 ffff88042c971fd8
      [   82.174195]  0000000000011cc0 0000000000011cc0 ffff88042c971fd8 0000000000011cc0
      [   82.174195] Call Trace:
      [   82.174198]  [<ffffffff8148fd3f>] schedule+0x64/0x66
      [   82.174201]  [<ffffffff8148ec6b>] schedule_timeout+0x27/0xd0
      [   82.174204]  [<ffffffff81043c0c>] ? unpin_current_cpu+0x1a/0x6c
      [   82.174206]  [<ffffffff8106e491>] ? migrate_enable+0x12f/0x141
      [   82.174209]  [<ffffffff8148fbbd>] wait_for_common+0xbb/0x11f
      [   82.174212]  [<ffffffff810709f2>] ? try_to_wake_up+0x182/0x182
      [   82.174215]  [<ffffffff8148fc96>] wait_for_completion_interruptible+0x1d/0x2e
      [   82.174218]  [<ffffffff811e4883>] cryptomgr_notify+0x280/0x385
      [   82.174221]  [<ffffffff814943de>] notifier_call_chain+0x6b/0x98
      [   82.174224]  [<ffffffff8108a11c>] ? rt_down_read+0x10/0x12
      [   82.174227]  [<ffffffff810677cd>] __blocking_notifier_call_chain+0x70/0x8d
      [   82.174230]  [<ffffffff810677fe>] blocking_notifier_call_chain+0x14/0x16
      [   82.174234]  [<ffffffff811dd272>] crypto_probing_notify+0x24/0x50
      [   82.174236]  [<ffffffff811dd7a1>] crypto_alg_mod_lookup+0x3e/0x74
      [   82.174238]  [<ffffffff811dd949>] crypto_alloc_base+0x36/0x8f
      [   82.174241]  [<ffffffff811e9408>] cryptd_alloc_ablkcipher+0x6e/0xb5
      [   82.174243]  [<ffffffff811dd591>] ? kzalloc.clone.5+0xe/0x10
      [   82.174246]  [<ffffffff8103085d>] ablk_init_common+0x1d/0x38
      [   82.174249]  [<ffffffff8103852a>] ablk_ecb_init+0x15/0x17
      [   82.174251]  [<ffffffff811dd8c6>] __crypto_alloc_tfm+0xc7/0x114
      [   82.174254]  [<ffffffff811e0caa>] ? crypto_lookup_skcipher+0x1f/0xe4
      [   82.174256]  [<ffffffff811e0dcf>] crypto_alloc_ablkcipher+0x60/0xa5
      [   82.174258]  [<ffffffff811e5bde>] alg_test_skcipher+0x24/0x9b
      [   82.174261]  [<ffffffff8106d96d>] ? finish_task_switch+0x3f/0xfa
      [   82.174263]  [<ffffffff811e6b8e>] alg_test+0x16f/0x1d7
      [   82.174267]  [<ffffffff811e45ac>] ? cryptomgr_probe+0xac/0xac
      [   82.174269]  [<ffffffff811e45d8>] cryptomgr_test+0x2c/0x47
      [   82.174272]  [<ffffffff81061161>] kthread+0x7e/0x86
      [   82.174275]  [<ffffffff8106d9dd>] ? finish_task_switch+0xaf/0xfa
      [   82.174278]  [<ffffffff814987c4>] kernel_thread_helper+0x4/0x10
      [   82.174281]  [<ffffffff81491574>] ? retint_restore_args+0x13/0x13
      [   82.174284]  [<ffffffff810610e3>] ? __init_kthread_worker+0x8c/0x8c
      [   82.174287]  [<ffffffff814987c0>] ? gs_change+0x13/0x13
      
      [   82.174329] cryptomgr_probe D 0000000000000002     0    47      2 0x00000000
      [   82.174332]  ffff88042c991b70 0000000000000046 ffff88042c991bb0 0000000000000006
      [   82.174335]  0000000000011cc0 0000000000011cc0 ffff88042c98ed00 ffff88042c991fd8
      [   82.174338]  0000000000011cc0 0000000000011cc0 ffff88042c991fd8 0000000000011cc0
      [   82.174338] Call Trace:
      [   82.174342]  [<ffffffff8148fd3f>] schedule+0x64/0x66
      [   82.174344]  [<ffffffff814901ad>] __rt_mutex_slowlock+0x85/0xbe
      [   82.174347]  [<ffffffff814902d2>] rt_mutex_slowlock+0xec/0x159
      [   82.174351]  [<ffffffff81089c4d>] rt_mutex_fastlock.clone.8+0x29/0x2f
      [   82.174353]  [<ffffffff81490372>] rt_mutex_lock+0x33/0x37
      [   82.174356]  [<ffffffff8108a0f2>] __rt_down_read+0x50/0x5a
      [   82.174358]  [<ffffffff8108a11c>] ? rt_down_read+0x10/0x12
      [   82.174360]  [<ffffffff8108a11c>] rt_down_read+0x10/0x12
      [   82.174363]  [<ffffffff810677b5>] __blocking_notifier_call_chain+0x58/0x8d
      [   82.174366]  [<ffffffff810677fe>] blocking_notifier_call_chain+0x14/0x16
      [   82.174369]  [<ffffffff811dd272>] crypto_probing_notify+0x24/0x50
      [   82.174372]  [<ffffffff811debd6>] crypto_wait_for_test+0x22/0x6b
      [   82.174374]  [<ffffffff811decd3>] crypto_register_instance+0xb4/0xc0
      [   82.174377]  [<ffffffff811e9b76>] cryptd_create+0x378/0x3b6
      [   82.174379]  [<ffffffff811de512>] ? __crypto_lookup_template+0x5b/0x63
      [   82.174382]  [<ffffffff811e4545>] cryptomgr_probe+0x45/0xac
      [   82.174385]  [<ffffffff811e4500>] ? crypto_alloc_pcomp+0x1b/0x1b
      [   82.174388]  [<ffffffff81061161>] kthread+0x7e/0x86
      [   82.174391]  [<ffffffff8106d9dd>] ? finish_task_switch+0xaf/0xfa
      [   82.174394]  [<ffffffff814987c4>] kernel_thread_helper+0x4/0x10
      [   82.174398]  [<ffffffff81491574>] ? retint_restore_args+0x13/0x13
      [   82.174401]  [<ffffffff810610e3>] ? __init_kthread_worker+0x8c/0x8c
      [   82.174403]  [<ffffffff814987c0>] ? gs_change+0x13/0x13
      
      cryptomgr_test spawns the cryptomgr_probe thread from the notifier
      call. The probe thread fires the same notifier as the test thread and
      deadlocks on the rwsem on RT.
      
      Now this is a potential deadlock in mainline as well, because we have
      fifo fair rwsems. If another thread blocks with a down_write() on the
      notifier chain before the probe thread issues the down_read() it will
      block the probe thread and the whole party is dead locked.
      
      Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      58b78188
    • Sebastian Andrzej Siewior's avatar
      net: Add a mutex around devnet_rename_seq · 3e5d20d9
      Sebastian Andrzej Siewior authored
      
      On RT write_seqcount_begin() disables preemption and device_rename()
      allocates memory with GFP_KERNEL and grabs later the sysfs_mutex
      mutex. Serialize with a mutex and add use the non preemption disabling
      __write_seqcount_begin().
      
      To avoid writer starvation, let the reader grab the mutex and release
      it when it detects a writer in progress. This keeps the normal case
      (no reader on the fly) fast.
      
      [ tglx: Instead of replacing the seqcount by a mutex, add the mutex ]
      
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      3e5d20d9
    • Thomas Gleixner's avatar
      net: netfilter: Serialize xt_write_recseq sections on RT · c49ee171
      Thomas Gleixner authored
      
      The netfilter code relies only on the implicit semantics of
      local_bh_disable() for serializing wt_write_recseq sections. RT breaks
      that and needs explicit serialization here.
      
      Reported-by: default avatarPeter LaDow <petela@gocougs.wsu.edu>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      c49ee171
    • Thomas Gleixner's avatar
      net: Another local_irq_disable/kmalloc headache · 10d5de8d
      Thomas Gleixner authored
      
      Replace it by a local lock. Though that's pretty inefficient :(
      
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      10d5de8d
    • Priyanka Jain's avatar
      net: Remove preemption disabling in netif_rx() · 80c9cad5
      Priyanka Jain authored
      
      1)enqueue_to_backlog() (called from netif_rx) should be
        bind to a particluar CPU. This can be achieved by
        disabling migration. No need to disable preemption
      
      2)Fixes crash "BUG: scheduling while atomic: ksoftirqd"
        in case of RT.
        If preemption is disabled, enqueue_to_backog() is called
        in atomic context. And if backlog exceeds its count,
        kfree_skb() is called. But in RT, kfree_skb() might
        gets scheduled out, so it expects non atomic context.
      
      3)When CONFIG_PREEMPT_RT_FULL is not defined,
       migrate_enable(), migrate_disable() maps to
       preempt_enable() and preempt_disable(), so no
       change in functionality in case of non-RT.
      
      -Replace preempt_enable(), preempt_disable() with
       migrate_enable(), migrate_disable() respectively
      -Replace get_cpu(), put_cpu() with get_cpu_light(),
       put_cpu_light() respectively
      
      Signed-off-by: default avatarPriyanka Jain <Priyanka.Jain@freescale.com>
      Acked-by: default avatarRajan Srivastava <Rajan.Srivastava@freescale.com>
      Cc: <rostedt@goodmis.orgn>
      Link: http://lkml.kernel.org/r/1337227511-2271-1-git-send-email-Priyanka.Jain@freescale.com
      
      
      
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      80c9cad5
    • John Kacur's avatar
      scsi: qla2xxx: Use local_irq_save_nort() in qla2x00_poll · 336bc29e
      John Kacur authored
      
      RT triggers the following:
      
      [   11.307652]  [<ffffffff81077b27>] __might_sleep+0xe7/0x110
      [   11.307663]  [<ffffffff8150e524>] rt_spin_lock+0x24/0x60
      [   11.307670]  [<ffffffff8150da78>] ? rt_spin_lock_slowunlock+0x78/0x90
      [   11.307703]  [<ffffffffa0272d83>] qla24xx_intr_handler+0x63/0x2d0 [qla2xxx]
      [   11.307736]  [<ffffffffa0262307>] qla2x00_poll+0x67/0x90 [qla2xxx]
      
      Function qla2x00_poll does local_irq_save() before calling qla24xx_intr_handler
      which has a spinlock. Since spinlocks are sleepable on rt, it is not allowed
      to call them with interrupts disabled. Therefore we use local_irq_save_nort()
      instead which saves flags without disabling interrupts.
      
      This fix needs to be applied to v3.0-rt, v3.2-rt and v3.4-rt
      
      Suggested-by: Thomas Gleixner
      Signed-off-by: default avatarJohn Kacur <jkacur@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: David Sommerseth <davids@redhat.com>
      Link: http://lkml.kernel.org/r/1335523726-10024-1-git-send-email-jkacur@redhat.com
      
      
      
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      336bc29e
    • Mike Galbraith's avatar
      hotplug: Use set_cpus_allowed_ptr() in sync_unplug_thread() · 629c6153
      Mike Galbraith authored
      
      do_set_cpus_allowed() is not safe vs ->sched_class change.
      
      crash> bt
      PID: 11676  TASK: ffff88026f979da0  CPU: 22  COMMAND: "sync_unplug/22"
       #0 [ffff880274d25bc8] machine_kexec at ffffffff8103b41c
       #1 [ffff880274d25c18] crash_kexec at ffffffff810d881a
       #2 [ffff880274d25cd8] oops_end at ffffffff81525818
       #3 [ffff880274d25cf8] do_invalid_op at ffffffff81003096
       #4 [ffff880274d25d90] invalid_op at ffffffff8152d3de
          [exception RIP: set_cpus_allowed_rt+18]
          RIP: ffffffff8109e012  RSP: ffff880274d25e48  RFLAGS: 00010202
          RAX: ffffffff8109e000  RBX: ffff88026f979da0  RCX: ffff8802770cb6e8
          RDX: 0000000000000000  RSI: ffffffff81add700  RDI: ffff88026f979da0
          RBP: ffff880274d25e78   R8: ffffffff816112e0   R9: 0000000000000001
          R10: 0000000000000001  R11: 0000000000011940  R12: ffff88026f979da0
          R13: ffff8802770cb6d0  R14: ffff880274d25fd8  R15: 0000000000000000
          ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
       #5 [ffff880274d25e60] do_set_cpus_allowed at ffffffff8108e65f
       #6 [ffff880274d25e80] sync_unplug_thread at ffffffff81058c08
       #7 [ffff880274d25ed8] kthread at ffffffff8107cad6
       #8 [ffff880274d25f50] ret_from_fork at ffffffff8152bbbc
      crash> task_struct ffff88026f979da0 | grep class
        sched_class = 0xffffffff816111e0 <fair_sched_class+64>,
      
      Signed-off-by: default avatarMike Galbraith <umgwanakikbuti@gmail.com>
      
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      629c6153
    • Tiejun Chen's avatar
      cpu_down: move migrate_enable() back · c4486250
      Tiejun Chen authored
      Commit 08c1ab68
      
      , "hotplug-use-migrate-disable.patch", intends to
      use migrate_enable()/migrate_disable() to replace that combination
      of preempt_enable() and preempt_disable(), but actually in
      !CONFIG_PREEMPT_RT_FULL case, migrate_enable()/migrate_disable()
      are still equal to preempt_enable()/preempt_disable(). So that
      followed cpu_hotplug_begin()/cpu_unplug_begin(cpu) would go schedule()
      to trigger schedule_debug() like this:
      
      _cpu_down()
      	|
      	+ migrate_disable() = preempt_disable()
      	|
      	+ cpu_hotplug_begin() or cpu_unplug_begin()
      		|
      		+ schedule()
      			|
      			+ __schedule()
      				|
      				+ preempt_disable();
      				|
      				+ __schedule_bug() is true!
      
      So we should move migrate_enable() as the original scheme.
      
      
      Signed-off-by: default avatarTiejun Chen <tiejun.chen@windriver.com>
      c4486250
    • Sebastian Andrzej Siewior's avatar
      kernel/hotplug: restore original cpu mask oncpu/down · 71e28828
      Sebastian Andrzej Siewior authored
      
      If a task which is allowed to run only on CPU X puts CPU Y down then it
      will be allowed on all CPUs but the on CPU Y after it comes back from
      kernel. This patch ensures that we don't lose the initial setting unless
      the CPU the task is running is going down.
      
      
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      71e28828
    • Sebastian Andrzej Siewior's avatar
      kernel/cpu: fix cpu down problem if kthread's cpu is going down · ed8385d1
      Sebastian Andrzej Siewior authored
      
      If kthread is pinned to CPUx and CPUx is going down then we get into
      trouble:
      - first the unplug thread is created
      - it will set itself to hp->unplug. As a result, every task that is
        going to take a lock, has to leave the CPU.
      - the CPU_DOWN_PREPARE notifier are started. The worker thread will
        start a new process for the "high priority worker".
        Now kthread would like to take a lock but since it can't leave the CPU
        it will never complete its task.
      
      We could fire the unplug thread after the notifier but then the cpu is
      no longer marked "online" and the unplug thread will run on CPU0 which
      was fixed before :)
      
      So instead the unplug thread is started and kept waiting until the
      notfier complete their work.
      
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      ed8385d1
    • Steven Rostedt's avatar
      cpu hotplug: Document why PREEMPT_RT uses a spinlock · 2f5819df
      Steven Rostedt authored
      
      The patch:
      
          cpu: Make hotplug.lock a "sleeping" spinlock on RT
      
          Tasks can block on hotplug.lock in pin_current_cpu(), but their
          state might be != RUNNING. So the mutex wakeup will set the state
          unconditionally to RUNNING. That might cause spurious unexpected
          wakeups. We could provide a state preserving mutex_lock() function,
          but this is semantically backwards. So instead we convert the
          hotplug.lock() to a spinlock for RT, which has the state preserving
          semantics already.
      
      Fixed a bug where the hotplug lock on PREEMPT_RT can be called after a
      task set its state to TASK_UNINTERRUPTIBLE and before it called
      schedule. If the hotplug_lock used a mutex, and there was contention,
      the current task's state would be turned to TASK_RUNNABLE and the
      schedule call will not sleep. This caused unexpected results.
      
      Although the patch had a description of the change, the code had no
      comments about it. This causes confusion to those that review the code,
      and as PREEMPT_RT is held in a quilt queue and not git, it's not as easy
      to see why a change was made. Even if it was in git, the code should
      still have a comment for something as subtle as this.
      
      Document the rational for using a spinlock on PREEMPT_RT in the hotplug
      lock code.
      
      Reported-by: default avatarNicholas Mc Guire <der.herr@hofr.at>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      2f5819df
    • Steven Rostedt's avatar
      cpu/rt: Rework cpu down for PREEMPT_RT · 24f2ae6b
      Steven Rostedt authored
      
      Bringing a CPU down is a pain with the PREEMPT_RT kernel because
      tasks can be preempted in many more places than in non-RT. In
      order to handle per_cpu variables, tasks may be pinned to a CPU
      for a while, and even sleep. But these tasks need to be off the CPU
      if that CPU is going down.
      
      Several synchronization methods have been tried, but when stressed
      they failed. This is a new approach.
      
      A sync_tsk thread is still created and tasks may still block on a
      lock when the CPU is going down, but how that works is a bit different.
      When cpu_down() starts, it will create the sync_tsk and wait on it
      to inform that current tasks that are pinned on the CPU are no longer
      pinned. But new tasks that are about to be pinned will still be allowed
      to do so at this time.
      
      Then the notifiers are called. Several notifiers will bring down tasks
      that will enter these locations. Some of these tasks will take locks
      of other tasks that are on the CPU. If we don't let those other tasks
      continue, but make them block until CPU down is done, the tasks that
      the notifiers are waiting on will never complete as they are waiting
      for the locks held by the tasks that are blocked.
      
      Thus we still let the task pin the CPU until the notifiers are done.
      After the notifiers run, we then make new tasks entering the pinned
      CPU sections grab a mutex and wait. This mutex is now a per CPU mutex
      in the hotplug_pcp descriptor.
      
      To help things along, a new function in the scheduler code is created
      called migrate_me(). This function will try to migrate the current task
      off the CPU this is going down if possible. When the sync_tsk is created,
      all tasks will then try to migrate off the CPU going down. There are
      several cases that this wont work, but it helps in most cases.
      
      After the notifiers are called and if a task can't migrate off but enters
      the pin CPU sections, it will be forced to wait on the hotplug_pcp mutex
      until the CPU down is complete. Then the scheduler will force the migration
      anyway.
      
      Also, I found that THREAD_BOUND need to also be accounted for in the
      pinned CPU, and the migrate_disable no longer treats them special.
      This helps fix issues with ksoftirqd and workqueue that unbind on CPU down.
      
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      24f2ae6b
    • Steven Rostedt's avatar
      cpu: Make hotplug.lock a "sleeping" spinlock on RT · 2fd8c147
      Steven Rostedt authored
      
      Tasks can block on hotplug.lock in pin_current_cpu(), but their state
      might be != RUNNING. So the mutex wakeup will set the state
      unconditionally to RUNNING. That might cause spurious unexpected
      wakeups. We could provide a state preserving mutex_lock() function,
      but this is semantically backwards. So instead we convert the
      hotplug.lock() to a spinlock for RT, which has the state preserving
      semantics already.
      
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Cc: Carsten Emde <C.Emde@osadl.org>
      Cc: John Kacur <jkacur@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Clark Williams <clark.williams@gmail.com>
      
      Link: http://lkml.kernel.org/r/1330702617.25686.265.camel@gandalf.stny.rr.com
      
      
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      2fd8c147
    • Thomas Gleixner's avatar
      seqlock: Prevent rt starvation · 874d1245
      Thomas Gleixner authored
      
      If a low prio writer gets preempted while holding the seqlock write
      locked, a high prio reader spins forever on RT.
      
      To prevent this let the reader grab the spinlock, so it blocks and
      eventually boosts the writer. This way the writer can proceed and
      endless spinning is prevented.
      
      For seqcount writers we disable preemption over the update code
      path. Thanks to Al Viro for distangling some VFS code to make that
      possible.
      
      Nicholas Mc Guire:
      - spin_lock+unlock => spin_unlock_wait
      - __write_seqcount_begin => __raw_write_seqcount_begin
      
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      
      874d1245