Skip to content
Snippets Groups Projects
  1. Jan 09, 2016
  2. Jan 08, 2016
  3. Jan 07, 2016
  4. Jan 06, 2016
    • Sebastian Andrzej Siewior's avatar
      sched/core: Reset task's lockless wake-queues on fork() · 093e5840
      Sebastian Andrzej Siewior authored
      
      In the following commit:
      
        76751049 ("sched: Implement lockless wake-queues")
      
      we gained lockless wake-queues.
      
      The -RT kernel managed to lockup itself with those. There could be multiple
      attempts for task X to enqueue it for a wakeup _even_ if task X is already
      running.
      
      The reason is that task X could be runnable but not yet on CPU. The the
      task performing the wakeup did not leave the CPU it could performe
      multiple wakeups.
      
      With the proper timming task X could be running and enqueued for a
      wakeup. If this happens while X is performing a fork() then its its
      child will have a !NULL `wake_q` member copied.
      
      This is not a problem as long as the child task does not participate in
      lockless wakeups :)
      
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Davidlohr Bueso <dbueso@suse.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: 76751049 ("sched: Implement lockless wake-queues")
      Link: http://lkml.kernel.org/r/20151221171710.GA5499@linutronix.de
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      093e5840
    • Peter Zijlstra's avatar
      sched/core: Fix unserialized r-m-w scribbling stuff · be958bdc
      Peter Zijlstra authored
      
      Some of the sched bitfieds (notably sched_reset_on_fork) can be set
      on other than current, this can cause the r-m-w to race with other
      updates.
      
      Since all the sched bits are serialized by scheduler locks, pull them
      in a separate word.
      
      Reported-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: akpm@linux-foundation.org
      Cc: hannes@cmpxchg.org
      Cc: mhocko@kernel.org
      Cc: vdavydov@parallels.com
      Link: http://lkml.kernel.org/r/20151125150207.GM11639@twins.programming.kicks-ass.net
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      be958bdc
    • Sergey Senozhatsky's avatar
      sched/core: Check tgid in is_global_init() · 570f5241
      Sergey Senozhatsky authored
      
      Our global init task can have sub-threads, so ->pid check is not reliable
      enough for is_global_init(), we need to check tgid instead. This has been
      spotted by Oleg and a fix was proposed by Richard a long time ago (see the
      link below).
      
      Oleg wrote:
      
        : Because is_global_init() is only true for the main thread of /sbin/init.
        :
        : Just look at oom_unkillable_task(). It tries to not kill init. But, say,
        : select_bad_process() can happily find a sub-thread of is_global_init()
        : and still kill it.
      
      I recently hit the problem in question; re-sending the patch (to the
      best of my knowledge it has never been submitted) with updated function
      comment. Credit goes to Oleg and Richard.
      
      Suggested-by: default avatarRichard Guy Briggs <rgb@redhat.com>
      Reported-by: default avatarOleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: default avatarSerge Hallyn <serge.hallyn@canonical.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Eric W . Biederman <ebiederm@xmission.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Serge E . Hallyn <serge.hallyn@ubuntu.com>
      Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: https://www.redhat.com/archives/linux-audit/2013-December/msg00086.html
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      570f5241
    • Andrey Ryabinin's avatar
      sched/fair: Fix multiplication overflow on 32-bit systems · 9e0e83a1
      Andrey Ryabinin authored
      
      Make 'r' 64-bit type to avoid overflow in 'r * LOAD_AVG_MAX'
      on 32-bit systems:
      
      	UBSAN: Undefined behaviour in kernel/sched/fair.c:2785:18
      	signed integer overflow:
      	87950 * 47742 cannot be represented in type 'int'
      
      The most likely effect of this bug are bad load average numbers
      resulting in weird scheduling. It's also likely that this can
      persist for a longer time - until the system goes idle for
      a long time so that all load avg numbers get reset.
      
      [ This is the CFS load average metric, not the procfs output, which
        is separate. ]
      
      Signed-off-by: default avatarAndrey Ryabinin <aryabinin@virtuozzo.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: 9d89c257 ("sched/fair: Rewrite runnable load and utilization average tracking")
      Link: http://lkml.kernel.org/r/1450097243-30137-1-git-send-email-aryabinin@virtuozzo.com
      
      
      [ Improved the changelog. ]
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      9e0e83a1
    • Peter Zijlstra's avatar
      perf: Fix race in swevent hash · 12ca6ad2
      Peter Zijlstra authored
      
      There's a race on CPU unplug where we free the swevent hash array
      while it can still have events on. This will result in a
      use-after-free which is BAD.
      
      Simply do not free the hash array on unplug. This leaves the thing
      around and no use-after-free takes place.
      
      When the last swevent dies, we do a for_each_possible_cpu() iteration
      anyway to clean these up, at which time we'll free it, so no leakage
      will occur.
      
      Reported-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Tested-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      12ca6ad2
    • Peter Zijlstra's avatar
      perf: Fix race in perf_event_exec() · c1274499
      Peter Zijlstra authored
      
      I managed to tickle this warning:
      
        [ 2338.884942] ------------[ cut here ]------------
        [ 2338.890112] WARNING: CPU: 13 PID: 35162 at ../kernel/events/core.c:2702 task_ctx_sched_out+0x6b/0x80()
        [ 2338.900504] Modules linked in:
        [ 2338.903933] CPU: 13 PID: 35162 Comm: bash Not tainted 4.4.0-rc4-dirty #244
        [ 2338.911610] Hardware name: Intel Corporation S2600GZ/S2600GZ, BIOS SE5C600.86B.02.02.0002.122320131210 12/23/2013
        [ 2338.923071]  ffffffff81f1468e ffff8807c6457cb8 ffffffff815c680c 0000000000000000
        [ 2338.931382]  ffff8807c6457cf0 ffffffff810c8a56 ffffe8ffff8c1bd0 ffff8808132ed400
        [ 2338.939678]  0000000000000286 ffff880813170380 ffff8808132ed400 ffff8807c6457d00
        [ 2338.947987] Call Trace:
        [ 2338.950726]  [<ffffffff815c680c>] dump_stack+0x4e/0x82
        [ 2338.956474]  [<ffffffff810c8a56>] warn_slowpath_common+0x86/0xc0
        [ 2338.963195]  [<ffffffff810c8b4a>] warn_slowpath_null+0x1a/0x20
        [ 2338.969720]  [<ffffffff811a49cb>] task_ctx_sched_out+0x6b/0x80
        [ 2338.976244]  [<ffffffff811a62d2>] perf_event_exec+0xe2/0x180
        [ 2338.982575]  [<ffffffff8121fb6f>] setup_new_exec+0x6f/0x1b0
        [ 2338.988810]  [<ffffffff8126de83>] load_elf_binary+0x393/0x1660
        [ 2338.995339]  [<ffffffff811dc772>] ? get_user_pages+0x52/0x60
        [ 2339.001669]  [<ffffffff8121e297>] search_binary_handler+0x97/0x200
        [ 2339.008581]  [<ffffffff8121f8b3>] do_execveat_common.isra.33+0x543/0x6e0
        [ 2339.016072]  [<ffffffff8121fcea>] SyS_execve+0x3a/0x50
        [ 2339.021819]  [<ffffffff819fc165>] stub_execve+0x5/0x5
        [ 2339.027469]  [<ffffffff819fbeb2>] ? entry_SYSCALL_64_fastpath+0x12/0x71
        [ 2339.034860] ---[ end trace ee1337c59a0ddeac ]---
      
      Which is a WARN_ON_ONCE() indicating that cpuctx->task_ctx is not
      what we expected it to be.
      
      This is because context switches can swap the task_struct::perf_event_ctxp[]
      pointer around. Therefore you have to either disable preemption when looking
      at current, or hold ctx->lock.
      
      Fix perf_event_enable_on_exec(), it loads current->perf_event_ctxp[]
      before disabling interrupts, therefore a preemption in the right place
      can swap contexts around and we're using the wrong one.
      
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kostya Serebryany <kcc@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: syzkaller <syzkaller@googlegroups.com>
      Link: http://lkml.kernel.org/r/20151210195740.GG6357@twins.programming.kicks-ass.net
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      c1274499
    • Ashutosh Dixit's avatar
      dmaengine: Revert "dmaengine: mic_x100: add missing spin_unlock" · 16605e8d
      Ashutosh Dixit authored
      
      This reverts commit e958e079 ("dmaengine: mic_x100: add missing
      spin_unlock").
      
      The above patch is incorrect. There is nothing wrong with the original
      code. The spin_lock is acquired in the "prep" functions and released
      in "submit".
      
      Signed-off-by: default avatarAshutosh Dixit <ashutosh.dixit@intel.com>
      Signed-off-by: default avatarVinod Koul <vinod.koul@intel.com>
      16605e8d
    • John Fastabend's avatar
      net: sched: fix missing free per cpu on qstats · 73c20a8b
      John Fastabend authored
      
      When a qdisc is using per cpu stats (currently just the ingress
      qdisc) only the bstats are being freed. This also free's the qstats.
      
      Fixes: b0ab6f92 ("net: sched: enable per cpu qstats")
      Signed-off-by: default avatarJohn Fastabend <john.r.fastabend@intel.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      73c20a8b
    • Rabin Vincent's avatar
      ARM: net: bpf: fix zero right shift · f941461c
      Rabin Vincent authored
      
      The LSR instruction cannot be used to perform a zero right shift since a
      0 as the immediate value (imm5) in the LSR instruction encoding means
      that a shift of 32 is perfomed.  See DecodeIMMShift() in the ARM ARM.
      
      Make the JIT skip generation of the LSR if a zero-shift is requested.
      
      This was found using american fuzzy lop.
      
      Signed-off-by: default avatarRabin Vincent <rabin@rab.in>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f941461c
    • One Thousand Gnomes's avatar
      6pack: fix free memory scribbles · 60aa3b08
      One Thousand Gnomes authored
      
      commit acf673a3 fixed a user triggerable free
      memory scribble but in doing so replaced it with a different one that allows
      the user to control the data and scribble even more.
      
      sixpack_close is called by the tty layer in tty context. The tty context is
      protected by sp_get() and sp_put(). However network layer activity via
      sp_xmit() is not protected this way. We must therefore stop the queue
      otherwise the user gets to dump a buffer mostly of their choice into freed
      kernel pages.
      
      Signed-off-by: default avatarAlan Cox <alan@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      60aa3b08