Skip to content
Snippets Groups Projects
  1. Jul 21, 2022
  2. Aug 19, 2021
    • Hanjun Guo's avatar
      sched/debug: Fix 'sched_debug_lock' undeclared error · 42412d30
      Hanjun Guo authored
      hulk inclusion
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I46CWD?from=project-issue
      CVE: NA
      
      -----------------------------------------------
      
      Build error:
      /builds/linux/kernel/sched/debug.c: In function 'print_cpu_tidy':
      /builds/linux/kernel/sched/debug.c:812:21: error: 'sched_debug_lock'
      undeclared (first use in this function); did you mean
      'sched_debug_show'?
        812 |  spin_lock_irqsave(&sched_debug_lock, flags);
            |                     ^~~~~~~~~~~~~~~~
      /builds/linux/include/linux/spinlock.h:241:34: note: in definition of
      macro 'raw_spin_lock_irqsave'
        241 |   flags = _raw_spin_lock_irqsave(lock); \
            |                                  ^~~~
      /builds/linux/kernel/sched/debug.c:812:2: note: in expansion of macro
      'spin_lock_irqsave'
        812 |  spin_lock_irqsave(&sched_debug_lock, flags);
            |  ^~~~~~~~~~~~~~~~~
      /builds/linux/kernel/sched/debug.c:812:21: note: each undeclared
      identifier is reported only once for each func...
      42412d30
  3. Jun 30, 2021
    • Waiman Long's avatar
      sched/debug: Fix cgroup_path[] serialization · b964746b
      Waiman Long authored
      
      stable inclusion
      from linux-4.19.191
      commit 302d674cfacd2a89ba16beb973c757cb977e32a0
      
      --------------------------------
      
      [ Upstream commit ad789f84c9a145f8a18744c0387cec22ec51651e ]
      
      The handling of sysrq key can be activated by echoing the key to
      /proc/sysrq-trigger or via the magic key sequence typed into a terminal
      that is connected to the system in some way (serial, USB or other mean).
      In the former case, the handling is done in a user context. In the
      latter case, it is likely to be in an interrupt context.
      
      Currently in print_cpu() of kernel/sched/debug.c, sched_debug_lock is
      taken with interrupt disabled for the whole duration of the calls to
      print_*_stats() and print_rq() which could last for the quite some time
      if the information dump happens on the serial console.
      
      If the system has many cpus and the sched_debug_lock is somehow busy
      (e.g. parallel sysrq-t), the system may hit a hard lockup panic
      depending on the actually serial console implementation of the
      system.
      
      The purpose of sched_debug_lock is to serialize the use of the global
      cgroup_path[] buffer in print_cpu(). The rests of the printk calls don't
      need serialization from sched_debug_lock.
      
      Calling printk() with interrupt disabled can still be problematic if
      multiple instances are running. Allocating a stack buffer of PATH_MAX
      bytes is not feasible because of the limited size of the kernel stack.
      
      The solution implemented in this patch is to allow only one caller at a
      time to use the full size group_path[], while other simultaneous callers
      will have to use shorter stack buffers with the possibility of path
      name truncation. A "..." suffix will be printed if truncation may have
      happened.  The cgroup path name is provided for informational purpose
      only, so occasional path name truncation should not be a big problem.
      
      Fixes: efe25c2c ("sched: Reinstate group names in /proc/sched_debug")
      Suggested-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarWaiman Long <longman@redhat.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/20210415195426.6677-1-longman@redhat.com
      
      
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      b964746b
  4. Apr 14, 2021
  5. Jan 12, 2020
  6. Dec 27, 2019
  7. Sep 10, 2018
    • Jiada Wang's avatar
      sched/debug: Fix potential deadlock when writing to sched_features · e73e8197
      Jiada Wang authored
      
      The following lockdep report can be triggered by writing to /sys/kernel/debug/sched_features:
      
        ======================================================
        WARNING: possible circular locking dependency detected
        4.18.0-rc6-00152-gcd3f77d74ac3-dirty #18 Not tainted
        ------------------------------------------------------
        sh/3358 is trying to acquire lock:
        000000004ad3989d (cpu_hotplug_lock.rw_sem){++++}, at: static_key_enable+0x14/0x30
        but task is already holding lock:
        00000000c1b31a88 (&sb->s_type->i_mutex_key#3){+.+.}, at: sched_feat_write+0x160/0x428
        which lock already depends on the new lock.
        the existing dependency chain (in reverse order) is:
        -> #3 (&sb->s_type->i_mutex_key#3){+.+.}:
               lock_acquire+0xb8/0x148
               down_write+0xac/0x140
               start_creating+0x5c/0x168
               debugfs_create_dir+0x18/0x220
               opp_debug_register+0x8c/0x120
               _add_opp_dev+0x104/0x1f8
               dev_pm_opp_get_opp_table+0x174/0x340
               _of_add_opp_table_v2+0x110/0x760
               dev_pm_opp_of_add_table+0x5c/0x240
               dev_pm_opp_of_cpumask_add_table+0x5c/0x100
               cpufreq_init+0x160/0x430
               cpufreq_online+0x1cc/0xe30
               cpufreq_add_dev+0x78/0x198
               subsys_interface_register+0x168/0x270
               cpufreq_register_driver+0x1c8/0x278
               dt_cpufreq_probe+0xdc/0x1b8
               platform_drv_probe+0xb4/0x168
               driver_probe_device+0x318/0x4b0
               __device_attach_driver+0xfc/0x1f0
               bus_for_each_drv+0xf8/0x180
               __device_attach+0x164/0x200
               device_initial_probe+0x10/0x18
               bus_probe_device+0x110/0x178
               device_add+0x6d8/0x908
               platform_device_add+0x138/0x3d8
               platform_device_register_full+0x1cc/0x1f8
               cpufreq_dt_platdev_init+0x174/0x1bc
               do_one_initcall+0xb8/0x310
               kernel_init_freeable+0x4b8/0x56c
               kernel_init+0x10/0x138
               ret_from_fork+0x10/0x18
        -> #2 (opp_table_lock){+.+.}:
               lock_acquire+0xb8/0x148
               __mutex_lock+0x104/0xf50
               mutex_lock_nested+0x1c/0x28
               _of_add_opp_table_v2+0xb4/0x760
               dev_pm_opp_of_add_table+0x5c/0x240
               dev_pm_opp_of_cpumask_add_table+0x5c/0x100
               cpufreq_init+0x160/0x430
               cpufreq_online+0x1cc/0xe30
               cpufreq_add_dev+0x78/0x198
               subsys_interface_register+0x168/0x270
               cpufreq_register_driver+0x1c8/0x278
               dt_cpufreq_probe+0xdc/0x1b8
               platform_drv_probe+0xb4/0x168
               driver_probe_device+0x318/0x4b0
               __device_attach_driver+0xfc/0x1f0
               bus_for_each_drv+0xf8/0x180
               __device_attach+0x164/0x200
               device_initial_probe+0x10/0x18
               bus_probe_device+0x110/0x178
               device_add+0x6d8/0x908
               platform_device_add+0x138/0x3d8
               platform_device_register_full+0x1cc/0x1f8
               cpufreq_dt_platdev_init+0x174/0x1bc
               do_one_initcall+0xb8/0x310
               kernel_init_freeable+0x4b8/0x56c
               kernel_init+0x10/0x138
               ret_from_fork+0x10/0x18
        -> #1 (subsys mutex#6){+.+.}:
               lock_acquire+0xb8/0x148
               __mutex_lock+0x104/0xf50
               mutex_lock_nested+0x1c/0x28
               subsys_interface_register+0xd8/0x270
               cpufreq_register_driver+0x1c8/0x278
               dt_cpufreq_probe+0xdc/0x1b8
               platform_drv_probe+0xb4/0x168
               driver_probe_device+0x318/0x4b0
               __device_attach_driver+0xfc/0x1f0
               bus_for_each_drv+0xf8/0x180
               __device_attach+0x164/0x200
               device_initial_probe+0x10/0x18
               bus_probe_device+0x110/0x178
               device_add+0x6d8/0x908
               platform_device_add+0x138/0x3d8
               platform_device_register_full+0x1cc/0x1f8
               cpufreq_dt_platdev_init+0x174/0x1bc
               do_one_initcall+0xb8/0x310
               kernel_init_freeable+0x4b8/0x56c
               kernel_init+0x10/0x138
               ret_from_fork+0x10/0x18
        -> #0 (cpu_hotplug_lock.rw_sem){++++}:
               __lock_acquire+0x203c/0x21d0
               lock_acquire+0xb8/0x148
               cpus_read_lock+0x58/0x1c8
               static_key_enable+0x14/0x30
               sched_feat_write+0x314/0x428
               full_proxy_write+0xa0/0x138
               __vfs_write+0xd8/0x388
               vfs_write+0xdc/0x318
               ksys_write+0xb4/0x138
               sys_write+0xc/0x18
               __sys_trace_return+0x0/0x4
        other info that might help us debug this:
        Chain exists of:
          cpu_hotplug_lock.rw_sem --> opp_table_lock --> &sb->s_type->i_mutex_key#3
         Possible unsafe locking scenario:
               CPU0                    CPU1
               ----                    ----
          lock(&sb->s_type->i_mutex_key#3);
                                       lock(opp_table_lock);
                                       lock(&sb->s_type->i_mutex_key#3);
          lock(cpu_hotplug_lock.rw_sem);
         *** DEADLOCK ***
        2 locks held by sh/3358:
         #0: 00000000a8c4b363 (sb_writers#10){.+.+}, at: vfs_write+0x238/0x318
         #1: 00000000c1b31a88 (&sb->s_type->i_mutex_key#3){+.+.}, at: sched_feat_write+0x160/0x428
        stack backtrace:
        CPU: 5 PID: 3358 Comm: sh Not tainted 4.18.0-rc6-00152-gcd3f77d74ac3-dirty #18
        Hardware name: Renesas H3ULCB Kingfisher board based on r8a7795 ES2.0+ (DT)
        Call trace:
         dump_backtrace+0x0/0x288
         show_stack+0x14/0x20
         dump_stack+0x13c/0x1ac
         print_circular_bug.isra.10+0x270/0x438
         check_prev_add.constprop.16+0x4dc/0xb98
         __lock_acquire+0x203c/0x21d0
         lock_acquire+0xb8/0x148
         cpus_read_lock+0x58/0x1c8
         static_key_enable+0x14/0x30
         sched_feat_write+0x314/0x428
         full_proxy_write+0xa0/0x138
         __vfs_write+0xd8/0x388
         vfs_write+0xdc/0x318
         ksys_write+0xb4/0x138
         sys_write+0xc/0x18
         __sys_trace_return+0x0/0x4
      
      This is because when loading the cpufreq_dt module we first acquire
      cpu_hotplug_lock.rw_sem lock, then in cpufreq_init(), we are taking
      the &sb->s_type->i_mutex_key lock.
      
      But when writing to /sys/kernel/debug/sched_features, the
      cpu_hotplug_lock.rw_sem lock depends on the &sb->s_type->i_mutex_key lock.
      
      To fix this bug, reverse the lock acquisition order when writing to
      sched_features, this way cpu_hotplug_lock.rw_sem no longer depends on
      &sb->s_type->i_mutex_key.
      
      Tested-by: default avatarDietmar Eggemann <dietmar.eggemann@arm.com>
      Signed-off-by: default avatarJiada Wang <jiada_wang@mentor.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Eugeniu Rosca <erosca@de.adit-jv.com>
      Cc: George G. Davis <george_davis@mentor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20180731121222.26195-1-jiada_wang@mentor.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      e73e8197
  8. Jul 25, 2018
  9. Jul 20, 2018
  10. Jun 21, 2018
  11. May 16, 2018
  12. Mar 20, 2018
    • Joe Lawrence's avatar
      sched/debug: Adjust newlines for better alignment · e9ca2670
      Joe Lawrence authored
      
      Scheduler debug stats include newlines that display out of alignment
      when prefixed by timestamps.  For example, the dmesg utility:
      
        % echo t > /proc/sysrq-trigger
        % dmesg
        ...
        [   83.124251]
        runnable tasks:
         S           task   PID         tree-key  switches  prio     wait-time
        sum-exec        sum-sleep
        -----------------------------------------------------------------------------------------------------------
      
      At the same time, some syslog utilities (like rsyslog by default) don't
      like the additional newlines control characters, saving lines like this
      to /var/log/messages:
      
        Mar 16 16:02:29 localhost kernel: #012runnable tasks:#012 S           task   PID         tree-key ...
                                          ^^^^               ^^^^
      Clean these up by moving newline characters to their own SEQ_printf
      invocation.  This leaves the /proc/sched_debug unchanged, but brings the
      entire output into alignment when prefixed:
      
        % echo t > /proc/sysrq-trigger
        % dmesg
        ...
        [   62.410368] runnable tasks:
        [   62.410368]  S           task   PID         tree-key  switches  prio     wait-time             sum-exec        sum-sleep
        [   62.410369] -----------------------------------------------------------------------------------------------------------
        [   62.410369]  I  kworker/u12:0     5      1932.215593       332   120         0.000000         3.621252         0.000000 0 0 /
      
      and no escaped control characters from rsyslog in /var/log/messages:
      
        Mar 16 16:15:06 localhost kernel: runnable tasks:
        Mar 16 16:15:06 localhost kernel: S           task   PID         tree-key  ...
      
      Signed-off-by: default avatarJoe Lawrence <joe.lawrence@redhat.com>
      Acked-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1521484555-8620-3-git-send-email-joe.lawrence@redhat.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      e9ca2670
    • Joe Lawrence's avatar
      sched/debug: Fix per-task line continuation for console output · a8c024cd
      Joe Lawrence authored
      
      When the SEQ_printf() macro prints to the console, it runs a simple
      printk() without KERN_CONT "continued" line printing.  The result of
      this is oddly wrapped task info, for example:
      
        % echo t > /proc/sysrq-trigger
        % dmesg
        ...
        runnable tasks:
        ...
        [   29.608611]  I
        [   29.608613]       rcu_sched     8      3252.013846      4087   120
        [   29.608614]         0.000000        29.090111         0.000000
        [   29.608615]  0 0
        [   29.608616]  /
      
      Modify SEQ_printf to use pr_cont() for expected one-line results:
      
        % echo t > /proc/sysrq-trigger
        % dmesg
        ...
        runnable tasks:
        ...
        [  106.716329]  S        cpuhp/5    37      2006.315026        14   120         0.000000         0.496893         0.000000 0 0 /
      
      Signed-off-by: default avatarJoe Lawrence <joe.lawrence@redhat.com>
      Acked-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1521484555-8620-2-git-send-email-joe.lawrence@redhat.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      a8c024cd
    • Patrick Bellasi's avatar
      sched/fair: Add util_est on top of PELT · 7f65ea42
      Patrick Bellasi authored
      
      The util_avg signal computed by PELT is too variable for some use-cases.
      For example, a big task waking up after a long sleep period will have its
      utilization almost completely decayed. This introduces some latency before
      schedutil will be able to pick the best frequency to run a task.
      
      The same issue can affect task placement. Indeed, since the task
      utilization is already decayed at wakeup, when the task is enqueued in a
      CPU, this can result in a CPU running a big task as being temporarily
      represented as being almost empty. This leads to a race condition where
      other tasks can be potentially allocated on a CPU which just started to run
      a big task which slept for a relatively long period.
      
      Moreover, the PELT utilization of a task can be updated every [ms], thus
      making it a continuously changing value for certain longer running
      tasks. This means that the instantaneous PELT utilization of a RUNNING
      task is not really meaningful to properly support scheduler decisions.
      
      For all these reasons, a more stable signal can do a better job of
      representing the expected/estimated utilization of a task/cfs_rq.
      Such a signal can be easily created on top of PELT by still using it as
      an estimator which produces values to be aggregated on meaningful
      events.
      
      This patch adds a simple implementation of util_est, a new signal built on
      top of PELT's util_avg where:
      
          util_est(task) = max(task::util_avg, f(task::util_avg@dequeue))
      
      This allows to remember how big a task has been reported by PELT in its
      previous activations via f(task::util_avg@dequeue), which is the new
      _task_util_est(struct task_struct*) function added by this patch.
      
      If a task should change its behavior and it runs longer in a new
      activation, after a certain time its util_est will just track the
      original PELT signal (i.e. task::util_avg).
      
      The estimated utilization of cfs_rq is defined only for root ones.
      That's because the only sensible consumer of this signal are the
      scheduler and schedutil when looking for the overall CPU utilization
      due to FAIR tasks.
      
      For this reason, the estimated utilization of a root cfs_rq is simply
      defined as:
      
          util_est(cfs_rq) = max(cfs_rq::util_avg, cfs_rq::util_est::enqueued)
      
      where:
      
          cfs_rq::util_est::enqueued = sum(_task_util_est(task))
                                       for each RUNNABLE task on that root cfs_rq
      
      It's worth noting that the estimated utilization is tracked only for
      objects of interests, specifically:
      
       - Tasks: to better support tasks placement decisions
       - root cfs_rqs: to better support both tasks placement decisions as
                       well as frequencies selection
      
      Signed-off-by: default avatarPatrick Bellasi <patrick.bellasi@arm.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: default avatarDietmar Eggemann <dietmar.eggemann@arm.com>
      Cc: Joel Fernandes <joelaf@google.com>
      Cc: Juri Lelli <juri.lelli@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Morten Rasmussen <morten.rasmussen@arm.com>
      Cc: Paul Turner <pjt@google.com>
      Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com>
      Cc: Steve Muckle <smuckle@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Todd Kjos <tkjos@android.com>
      Cc: Vincent Guittot <vincent.guittot@linaro.org>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Link: http://lkml.kernel.org/r/20180309095245.11071-2-patrick.bellasi@arm.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      7f65ea42
  13. Mar 04, 2018
    • Ingo Molnar's avatar
      sched/headers: Simplify and clean up header usage in the scheduler · 325ea10c
      Ingo Molnar authored
      
      Do the following cleanups and simplifications:
      
       - sched/sched.h already includes <asm/paravirt.h>, so no need to
         include it in sched/core.c again.
      
       - order the <linux/sched/*.h> headers alphabetically
      
       - add all <linux/sched/*.h> headers to kernel/sched/sched.h
      
       - remove all unnecessary includes from the .c files that
         are already included in kernel/sched/sched.h.
      
      Finally, make all scheduler .c files use a single common header:
      
        #include "sched.h"
      
      ... which now contains a union of the relied upon headers.
      
      This makes the various .c files easier to read and easier to handle.
      
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      325ea10c
  14. Mar 03, 2018
    • Ingo Molnar's avatar
      sched: Clean up and harmonize the coding style of the scheduler code base · 97fb7a0a
      Ingo Molnar authored
      A good number of small style inconsistencies have accumulated
      in the scheduler core, so do a pass over them to harmonize
      all these details:
      
       - fix speling in comments,
      
       - use curly braces for multi-line statements,
      
       - remove unnecessary parentheses from integer literals,
      
       - capitalize consistently,
      
       - remove stray newlines,
      
       - add comments where necessary,
      
       - remove invalid/unnecessary comments,
      
       - align structure definitions and other data types vertically,
      
       - add missing newlines for increased readability,
      
       - fix vertical tabulation where it's misaligned,
      
       - harmonize preprocessor conditional block labeling
         and vertical alignment,
      
       - remove line-breaks where they uglify the code,
      
       - add newline after local variable definitions,
      
      No change in functionality:
      
        md5:
           1191fa0a890cfa8132156d2959d7e9e2  built-in.o.before.asm
           1191fa0a890cfa8132156d2959d7e9e2  built-in.o.after.asm
      
      Cc: Linus Torvalds <torvalds@li...
      97fb7a0a
  15. Sep 30, 2017
    • Peter Zijlstra's avatar
      sched/fair: Propagate an effective runnable_load_avg · 1ea6c46a
      Peter Zijlstra authored
      
      The load balancer uses runnable_load_avg as load indicator. For
      !cgroup this is:
      
        runnable_load_avg = \Sum se->avg.load_avg ; where se->on_rq
      
      That is, a direct sum of all runnable tasks on that runqueue. As
      opposed to load_avg, which is a sum of all tasks on the runqueue,
      which includes a blocked component.
      
      However, in the cgroup case, this comes apart since the group entities
      are always runnable, even if most of their constituent entities are
      blocked.
      
      Therefore introduce a runnable_weight which for task entities is the
      same as the regular weight, but for group entities is a fraction of
      the entity weight and represents the runnable part of the group
      runqueue.
      
      Then propagate this load through the PELT hierarchy to arrive at an
      effective runnable load avgerage -- which we should not confuse with
      the canonical runnable load average.
      
      Suggested-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      1ea6c46a
    • Peter Zijlstra's avatar
      sched/fair: Rewrite PELT migration propagation · 0e2d2aaa
      Peter Zijlstra authored
      
      When an entity migrates in (or out) of a runqueue, we need to add (or
      remove) its contribution from the entire PELT hierarchy, because even
      non-runnable entities are included in the load average sums.
      
      In order to do this we have some propagation logic that updates the
      PELT tree, however the way it 'propagates' the runnable (or load)
      change is (more or less):
      
                           tg->weight * grq->avg.load_avg
        ge->avg.load_avg = ------------------------------
                                     tg->load_avg
      
      But that is the expression for ge->weight, and per the definition of
      load_avg:
      
        ge->avg.load_avg := ge->weight * ge->avg.runnable_avg
      
      That destroys the runnable_avg (by setting it to 1) we wanted to
      propagate.
      
      Instead directly propagate runnable_sum.
      
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      0e2d2aaa
    • Peter Zijlstra's avatar
      sched/fair: Rewrite cfs_rq->removed_*avg · 2a2f5d4e
      Peter Zijlstra authored
      
      Since on wakeup migration we don't hold the rq->lock for the old CPU
      we cannot update its state. Instead we add the removed 'load' to an
      atomic variable and have the next update on that CPU collect and
      process it.
      
      Currently we have 2 atomic variables; which already have the issue
      that they can be read out-of-sync. Also, two atomic ops on a single
      cacheline is already more expensive than an uncontended lock.
      
      Since we want to add more, convert the thing over to an explicit
      cacheline with a lock in.
      
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      2a2f5d4e
  16. Sep 29, 2017
  17. Sep 12, 2017
  18. Sep 09, 2017
  19. Aug 25, 2017
  20. Aug 10, 2017
  21. Jun 30, 2017
  22. Mar 02, 2017
  23. Jan 14, 2017
  24. Sep 05, 2016
  25. Aug 10, 2016
    • Tejun Heo's avatar
      cgroup: make cgroup_path() and friends behave in the style of strlcpy() · 4c737b41
      Tejun Heo authored
      
      cgroup_path() and friends used to format the path from the end and
      thus the resulting path usually didn't start at the start of the
      passed in buffer.  Also, when the buffer was too small, the partial
      result was truncated from the head rather than tail and there was no
      way to tell how long the full path would be.  These make the functions
      less robust and more awkward to use.
      
      With recent updates to kernfs_path(), cgroup_path() and friends can be
      made to behave in strlcpy() style.
      
      * cgroup_path(), cgroup_path_ns[_locked]() and task_cgroup_path() now
        always return the length of the full path.  If buffer is too small,
        it contains nul terminated truncated output.
      
      * All users updated accordingly.
      
      v2: cgroup_path() usage in kernel/sched/debug.c converted.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Serge Hallyn <serge.hallyn@ubuntu.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      4c737b41
  26. Jun 08, 2016
  27. May 05, 2016
  28. Feb 29, 2016
    • Steven Rostedt (Red Hat)'s avatar
      sched/debug: Add deadline scheduler bandwidth ratio to /proc/sched_debug · ef477183
      Steven Rostedt (Red Hat) authored
      
      Playing with SCHED_DEADLINE and cpusets, I found that I was unable to create
      new SCHED_DEADLINE tasks, with the error of EBUSY as if the bandwidth was
      already used up. I then realized there wa no way to see what bandwidth is
      used by the runqueues to debug the issue.
      
      By adding the dl_bw->bw and dl_bw->total_bw to the output of the deadline
      info in /proc/sched_debug, this allows us to see what bandwidth has been
      reserved and where a problem may exist.
      
      For example, before the issue we see the ratio of the bandwidth:
      
       # cat /proc/sys/kernel/sched_rt_runtime_us
       950000
       # cat /proc/sys/kernel/sched_rt_period_us
       1000000
      
        # grep dl /proc/sched_debug
        dl_rq[0]:
          .dl_nr_running                 : 0
          .dl_bw->bw                     : 996147
          .dl_bw->total_bw               : 0
        dl_rq[1]:
          .dl_nr_running                 : 0
          .dl_bw->bw                     : 996147
          .dl_bw->total_bw               : 0
        dl_rq[2]:
          .dl_nr_running                 : 0
          .dl_bw->bw                     : 996147
          .dl_bw->total_bw               : 0
        dl_rq[3]:
          .dl_nr_running                 : 0
          .dl_bw->bw                     : 996147
          .dl_bw->total_bw               : 0
        dl_rq[4]:
          .dl_nr_running                 : 0
          .dl_bw->bw                     : 996147
          .dl_bw->total_bw               : 0
        dl_rq[5]:
          .dl_nr_running                 : 0
          .dl_bw->bw                     : 996147
          .dl_bw->total_bw               : 0
        dl_rq[6]:
          .dl_nr_running                 : 0
          .dl_bw->bw                     : 996147
          .dl_bw->total_bw               : 0
        dl_rq[7]:
          .dl_nr_running                 : 0
          .dl_bw->bw                     : 996147
          .dl_bw->total_bw               : 0
      
      Note: (950000 / 1000000) << 20 == 996147
      
      After I played with cpusets and hit the issue, the result is now:
      
        # grep dl /proc/sched_debug
        dl_rq[0]:
          .dl_nr_running                 : 0
          .dl_bw->bw                     : 996147
          .dl_bw->total_bw               : -104857
        dl_rq[1]:
          .dl_nr_running                 : 0
          .dl_bw->bw                     : 996147
          .dl_bw->total_bw               : 104857
        dl_rq[2]:
          .dl_nr_running                 : 0
          .dl_bw->bw                     : 996147
          .dl_bw->total_bw               : 104857
        dl_rq[3]:
          .dl_nr_running                 : 0
          .dl_bw->bw                     : 996147
          .dl_bw->total_bw               : 104857
        dl_rq[4]:
          .dl_nr_running                 : 0
          .dl_bw->bw                     : 996147
          .dl_bw->total_bw               : -104857
        dl_rq[5]:
          .dl_nr_running                 : 0
          .dl_bw->bw                     : 996147
          .dl_bw->total_bw               : -104857
        dl_rq[6]:
          .dl_nr_running                 : 0
          .dl_bw->bw                     : 996147
          .dl_bw->total_bw               : -104857
        dl_rq[7]:
          .dl_nr_running                 : 0
          .dl_bw->bw                     : 996147
          .dl_bw->total_bw               : -104857
      
      This shows that there is definitely a problem as we should never have a
      negative total bandwidth.
      
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Clark Williams <williams@redhat.com>
      Cc: Juri Lelli <juri.lelli@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20160222212825.756849091@goodmis.org
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      ef477183