Skip to content
Snippets Groups Projects
  1. Dec 11, 2020
    • Thomas Gleixner's avatar
      x86/apic/vector: Fix ordering in vector assignment · 190113b4
      Thomas Gleixner authored
      
      Prarit reported that depending on the affinity setting the
      
       ' irq $N: Affinity broken due to vector space exhaustion.'
      
      message is showing up in dmesg, but the vector space on the CPUs in the
      affinity mask is definitely not exhausted.
      
      Shung-Hsi provided traces and analysis which pinpoints the problem:
      
      The ordering of trying to assign an interrupt vector in
      assign_irq_vector_any_locked() is simply wrong if the interrupt data has a
      valid node assigned. It does:
      
       1) Try the intersection of affinity mask and node mask
       2) Try the node mask
       3) Try the full affinity mask
       4) Try the full online mask
      
      Obviously #2 and #3 are in the wrong order as the requested affinity
      mask has to take precedence.
      
      In the observed cases #1 failed because the affinity mask did not contain
      CPUs from node 0. That made it allocate a vector from node 0, thereby
      breaking affinity and emitting the misleading message.
      
      Revert the order of #2 and #3 so the full affinity mask without the node
      intersection is tried before actually affinity is broken.
      
      If no node is assigned then only the full affinity mask and if that fails
      the full online mask is tried.
      
      Fixes: d6ffc6ac ("x86/vector: Respect affinity mask in irq descriptor")
      Reported-by: default avatarPrarit Bhargava <prarit@redhat.com>
      Reported-by: default avatarShung-Hsi Yu <shung-hsi.yu@suse.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Tested-by: default avatarShung-Hsi Yu <shung-hsi.yu@suse.com>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/r/87ft4djtyp.fsf@nanos.tec.linutronix.de
      190113b4
    • Xiaochen Shen's avatar
      x86/resctrl: Fix incorrect local bandwidth when mba_sc is enabled · 06c5fe9b
      Xiaochen Shen authored
      The MBA software controller (mba_sc) is a feedback loop which
      periodically reads MBM counters and tries to restrict the bandwidth
      below a user-specified value. It tags along the MBM counter overflow
      handler to do the updates with 1s interval in mbm_update() and
      update_mba_bw().
      
      The purpose of mbm_update() is to periodically read the MBM counters to
      make sure that the hardware counter doesn't wrap around more than once
      between user samplings. mbm_update() calls __mon_event_count() for local
      bandwidth updating when mba_sc is not enabled, but calls mbm_bw_count()
      instead when mba_sc is enabled. __mon_event_count() will not be called
      for local bandwidth updating in MBM counter overflow handler, but it is
      still called when reading MBM local bandwidth counter file
      'mbm_local_bytes', the call path is as below:
      
        rdtgroup_mondata_show()
          mon_event_read()
            mon_event_count()
              __mon_event_count()
      
      In __mon_event_count(), m->chunks is updated by delta chunks which is
      calculated from previous MSR value (m->prev_msr) and current MSR value.
      When mba_sc is enabled, m->chunks is also updated in mbm_update() by
      mistake by the delta chunks which is calculated from m->prev_bw_msr
      instead of m->prev_msr. But m->chunks is not used in update_mba_bw() in
      the mba_sc feedback loop.
      
      When reading MBM local bandwidth counter file, m->chunks was changed
      unexpectedly by mbm_bw_count(). As a result, the incorrect local
      bandwidth counter which calculated from incorrect m->chunks is shown to
      the user.
      
      Fix this by removing incorrect m->chunks updating in mbm_bw_count() in
      MBM counter overflow handler, and always calling __mon_event_count() in
      mbm_update() to make sure that the hardware local bandwidth counter
      doesn't wrap around.
      
      Test steps:
        # Run workload with aggressive memory bandwidth (e.g., 10 GB/s)
        git clone https://github.com/intel/intel-cmt-cat
      
       && cd intel-cmt-cat
        && make
        ./tools/membw/membw -c 0 -b 10000 --read
      
        # Enable MBA software controller
        mount -t resctrl resctrl -o mba_MBps /sys/fs/resctrl
      
        # Create control group c1
        mkdir /sys/fs/resctrl/c1
      
        # Set MB throttle to 6 GB/s
        echo "MB:0=6000;1=6000" > /sys/fs/resctrl/c1/schemata
      
        # Write PID of the workload to tasks file
        echo `pidof membw` > /sys/fs/resctrl/c1/tasks
      
        # Read local bytes counters twice with 1s interval, the calculated
        # local bandwidth is not as expected (approaching to 6 GB/s):
        local_1=`cat /sys/fs/resctrl/c1/mon_data/mon_L3_00/mbm_local_bytes`
        sleep 1
        local_2=`cat /sys/fs/resctrl/c1/mon_data/mon_L3_00/mbm_local_bytes`
        echo "local b/w (bytes/s):" `expr $local_2 - $local_1`
      
      Before fix:
        local b/w (bytes/s): 11076796416
      
      After fix:
        local b/w (bytes/s): 5465014272
      
      Fixes: ba0f26d8 (x86/intel_rdt/mba_sc: Prepare for feedback loop)
      Signed-off-by: default avatarXiaochen Shen <xiaochen.shen@intel.com>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Reviewed-by: default avatarTony Luck <tony.luck@intel.com>
      Cc: <stable@vger.kernel.org>
      Link: https://lkml.kernel.org/r/1607063279-19437-1-git-send-email-xiaochen.shen@intel.com
      06c5fe9b
  2. Dec 10, 2020
  3. Dec 09, 2020
  4. Dec 07, 2020
  5. Dec 06, 2020