Skip to content
Snippets Groups Projects
  1. Dec 27, 2019
  2. Sep 08, 2018
  3. Aug 15, 2018
  4. Aug 05, 2018
    • Nicolai Stange's avatar
      x86: Don't include linux/irq.h from asm/hardirq.h · 447ae316
      Nicolai Stange authored
      
      The next patch in this series will have to make the definition of
      irq_cpustat_t available to entering_irq().
      
      Inclusion of asm/hardirq.h into asm/apic.h would cause circular header
      dependencies like
      
        asm/smp.h
          asm/apic.h
            asm/hardirq.h
              linux/irq.h
                linux/topology.h
                  linux/smp.h
                    asm/smp.h
      
      or
      
        linux/gfp.h
          linux/mmzone.h
            asm/mmzone.h
              asm/mmzone_64.h
                asm/smp.h
                  asm/apic.h
                    asm/hardirq.h
                      linux/irq.h
                        linux/irqdesc.h
                          linux/kobject.h
                            linux/sysfs.h
                              linux/kernfs.h
                                linux/idr.h
                                  linux/gfp.h
      
      and others.
      
      This causes compilation errors because of the header guards becoming
      effective in the second inclusion: symbols/macros that had been defined
      before wouldn't be available to intermediate headers in the #include chain
      anymore.
      
      A possible workaround would be to move the definition of irq_cpustat_t
      into its own header and include that from both, asm/hardirq.h and
      asm/apic.h.
      
      However, this wouldn't solve the real problem, namely asm/harirq.h
      unnecessarily pulling in all the linux/irq.h cruft: nothing in
      asm/hardirq.h itself requires it. Also, note that there are some other
      archs, like e.g. arm64, which don't have that #include in their
      asm/hardirq.h.
      
      Remove the linux/irq.h #include from x86' asm/hardirq.h.
      
      Fix resulting compilation errors by adding appropriate #includes to *.c
      files as needed.
      
      Note that some of these *.c files could be cleaned up a bit wrt. to their
      set of #includes, but that should better be done from separate patches, if
      at all.
      
      Signed-off-by: default avatarNicolai Stange <nstange@suse.de>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      447ae316
  5. Jul 31, 2018
  6. Jul 24, 2018
  7. Jul 02, 2018
    • Thomas Gleixner's avatar
      Revert "x86/apic: Ignore secondary threads if nosmt=force" · 506a66f3
      Thomas Gleixner authored
      Dave Hansen reported, that it's outright dangerous to keep SMT siblings
      disabled completely so they are stuck in the BIOS and wait for SIPI.
      
      The reason is that Machine Check Exceptions are broadcasted to siblings and
      the soft disabled sibling has CR4.MCE = 0. If a MCE is delivered to a
      logical core with CR4.MCE = 0, it asserts IERR#, which shuts down or
      reboots the machine. The MCE chapter in the SDM contains the following
      blurb:
      
          Because the logical processors within a physical package are tightly
          coupled with respect to shared hardware resources, both logical
          processors are notified of machine check errors that occur within a
          given physical processor. If machine-check exceptions are enabled when
          a fatal error is reported, all the logical processors within a physical
          package are dispatched to the machine-check exception handler. If
          machine-check exceptions are disabled, the logical processors enter the
          shutdown st...
      506a66f3
  8. Jun 21, 2018
    • mike.travis@hpe.com's avatar
      x86/platform/UV: Add kernel parameter to set memory block size · d7609f42
      mike.travis@hpe.com authored
      
      Add a kernel parameter that allows setting UV memory block size.  This
      is to provide an adjustment for new forms of PMEM and other DIMM memory
      that might require alignment restrictions other than scanning the global
      address table for the required minimum alignment.  The value set will be
      further adjusted by both the GAM range table scan as well as restrictions
      imposed by set_memory_block_size_order().
      
      Signed-off-by: default avatarMike Travis <mike.travis@hpe.com>
      Reviewed-by: default avatarAndrew Banman <andrew.banman@hpe.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Russ Anderson <russ.anderson@hpe.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: dan.j.williams@intel.com
      Cc: jgross@suse.com
      Cc: kirill.shutemov@linux.intel.com
      Cc: mhocko@suse.com
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/lkml/20180524201711.854849120@stormcage.americas.sgi.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      d7609f42
    • mike.travis@hpe.com's avatar
      x86/platform/UV: Use new set memory block size function · bbbd2b51
      mike.travis@hpe.com authored
      
      Add a call to the new function to "adjust" the current fixed UV memory
      block size of 2GB so it can be changed to a different physical boundary.
      This accommodates changes in the Intel BIOS, and therefore UV BIOS,
      which now can align boundaries different than the previous UV standard
      of 2GB.  It also flags any UV Global Address boundaries from BIOS that
      cause a change in the mem block size (boundary).
      
      The current boundary of 2GB has been used on UV since the first system
      release in 2009 with Linux 2.6 and has worked fine.  But the new NVDIMM
      persistent memory modules (PMEM), along with the Intel BIOS changes to
      support these modules caused the memory block size boundary to be set
      to a lower limit.  Intel only guarantees that this minimum boundary at
      64MB though the current Linux limit is 128MB.
      
      Note that the default remains 2GB if no changes occur.
      
      Signed-off-by: default avatarMike Travis <mike.travis@hpe.com>
      Reviewed-by: default avatarAndrew Banman <andrew.banman@hpe.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Russ Anderson <russ.anderson@hpe.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: dan.j.williams@intel.com
      Cc: jgross@suse.com
      Cc: kirill.shutemov@linux.intel.com
      Cc: mhocko@suse.com
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/lkml/20180524201711.732785782@stormcage.americas.sgi.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      bbbd2b51
    • Thomas Gleixner's avatar
      x86/apic: Ignore secondary threads if nosmt=force · 2207def7
      Thomas Gleixner authored
      
      nosmt on the kernel command line merely prevents the onlining of the
      secondary SMT siblings.
      
      nosmt=force makes the APIC detection code ignore the secondary SMT siblings
      completely, so they even do not show up as possible CPUs. That reduces the
      amount of memory allocations for per cpu variables and saves other
      resources from being allocated too large.
      
      This is not fully equivalent to disabling SMT in the BIOS because the low
      level SMT enabling in the BIOS can result in partitioning of resources
      between the siblings, which is not undone by just ignoring them. Some CPUs
      can use the full resources when their sibling is not onlined, but this is
      depending on the CPU family and model and it's not well documented whether
      this applies to all partitioned resources. That means depending on the
      workload disabling SMT in the BIOS might result in better performance.
      
      Linus analysis of the Intel manual:
      
        The intel optimization manual is not very clear on what the partitioning
        rules are.
      
        I find:
      
          "In general, the buffers for staging instructions between major pipe
           stages  are partitioned. These buffers include µop queues after the
           execution trace cache, the queues after the register rename stage, the
           reorder buffer which stages instructions for retirement, and the load
           and store buffers.
      
           In the case of load and store buffers, partitioning also provided an
           easier implementation to maintain memory ordering for each logical
           processor and detect memory ordering violations"
      
        but some of that partitioning may be relaxed if the HT thread is "not
        active":
      
          "In Intel microarchitecture code name Sandy Bridge, the micro-op queue
           is statically partitioned to provide 28 entries for each logical
           processor,  irrespective of software executing in single thread or
           multiple threads. If one logical processor is not active in Intel
           microarchitecture code name Ivy Bridge, then a single thread executing
           on that processor  core can use the 56 entries in the micro-op queue"
      
        but I do not know what "not active" means, and how dynamic it is. Some of
        that partitioning may be entirely static and depend on the early BIOS
        disabling of HT, and even if we park the cores, the resources will just be
        wasted.
      
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Acked-by: default avatarIngo Molnar <mingo@kernel.org>
      2207def7
    • Thomas Gleixner's avatar
      x86/smp: Provide topology_is_primary_thread() · 6a4d2657
      Thomas Gleixner authored
      
      If the CPU is supporting SMT then the primary thread can be found by
      checking the lower APIC ID bits for zero. smp_num_siblings is used to build
      the mask for the APIC ID bits which need to be taken into account.
      
      This uses the MPTABLE or ACPI/MADT supplied APIC ID, which can be different
      than the initial APIC ID in CPUID. But according to AMD the lower bits have
      to be consistent. Intel gave a tentative confirmation as well.
      
      Preparatory patch to support disabling SMT at boot/runtime.
      
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Acked-by: default avatarIngo Molnar <mingo@kernel.org>
      6a4d2657
  9. Jun 06, 2018
    • Thomas Gleixner's avatar
      x86/apic/vector: Print APIC control bits in debugfs · a07771ac
      Thomas Gleixner authored
      
      Extend the debugability of the vector management by adding the state bits
      to the debugfs output.
      
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Tested-by: default avatarSong Liu <songliubraving@fb.com>
      Cc: Joerg Roedel <jroedel@suse.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Song Liu <liu.song.a23@gmail.com>
      Cc: Dmitry Safonov <0x7f454c46@gmail.com>
      Cc: Mike Travis <mike.travis@hpe.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Tariq Toukan <tariqt@mellanox.com>
      Link: https://lkml.kernel.org/r/20180604162224.908136099@linutronix.de
      a07771ac
    • Thomas Gleixner's avatar
      x86/ioapic: Use apic_ack_irq() · 2b04e46d
      Thomas Gleixner authored
      
      To address the EBUSY fail of interrupt affinity settings in case that the
      previous setting has not been cleaned up yet, use the new apic_ack_irq()
      function instead of directly invoking ack_APIC_irq().
      
      Preparatory change for the real fix
      
      Fixes: dccfe314 ("x86/vector: Simplify vector move cleanup")
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Tested-by: default avatarSong Liu <songliubraving@fb.com>
      Cc: Joerg Roedel <jroedel@suse.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Song Liu <liu.song.a23@gmail.com>
      Cc: Dmitry Safonov <0x7f454c46@gmail.com>
      Cc: stable@vger.kernel.org
      Cc: Mike Travis <mike.travis@hpe.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Tariq Toukan <tariqt@mellanox.com>
      Link: https://lkml.kernel.org/r/20180604162224.639011135@linutronix.de
      2b04e46d
    • Thomas Gleixner's avatar
      x86/apic: Provide apic_ack_irq() · c0255770
      Thomas Gleixner authored
      
      apic_ack_edge() is explicitely for handling interrupt affinity cleanup when
      interrupt remapping is not available or disable.
      
      Remapped interrupts and also some of the platform specific special
      interrupts, e.g. UV, invoke ack_APIC_irq() directly.
      
      To address the issue of failing an affinity update with -EBUSY the delayed
      affinity mechanism can be reused, but ack_APIC_irq() does not handle
      that. Adding this to ack_APIC_irq() is not possible, because that function
      is also used for exceptions and directly handled interrupts like IPIs.
      
      Create a new function, which just contains the conditional invocation of
      irq_move_irq() and the final ack_APIC_irq().
      
      Reuse the new function in apic_ack_edge().
      
      Preparatory change for the real fix.
      
      Fixes: dccfe314 ("x86/vector: Simplify vector move cleanup")
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Tested-by: default avatarSong Liu <songliubraving@fb.com>
      Cc: Joerg Roedel <jroedel@suse.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Song Liu <liu.song.a23@gmail.com>
      Cc: Dmitry Safonov <0x7f454c46@gmail.com>
      Cc: stable@vger.kernel.org
      Cc: Mike Travis <mike.travis@hpe.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Tariq Toukan <tariqt@mellanox.com>
      Link: https://lkml.kernel.org/r/20180604162224.471925894@linutronix.de
      c0255770
    • Thomas Gleixner's avatar
      x86/apic/vector: Prevent hlist corruption and leaks · 80ae7b1a
      Thomas Gleixner authored
      
      Several people observed the WARN_ON() in irq_matrix_free() which triggers
      when the caller tries to free an vector which is not in the allocation
      range. Song provided the trace information which allowed to decode the root
      cause.
      
      The rework of the vector allocation mechanism failed to preserve a sanity
      check, which prevents setting a new target vector/CPU when the previous
      affinity change has not fully completed.
      
      As a result a half finished affinity change can be overwritten, which can
      cause the leak of a irq descriptor pointer on the previous target CPU and
      double enqueue of the hlist head into the cleanup lists of two or more
      CPUs. After one CPU cleaned up its vector the next CPU will invoke the
      cleanup handler with vector 0, which triggers the out of range warning in
      the matrix allocator.
      
      Prevent this by checking the apic_data of the interrupt whether the
      move_in_progress flag is false and the hlist node is not hashed. Return
      -EBUSY if not.
      
      This prevents the damage and restores the behaviour before the vector
      allocation rework, but due to other changes in that area it also widens the
      chance that user space can observe -EBUSY. In theory this should be fine,
      but actually not all user space tools handle -EBUSY correctly. Addressing
      that is not part of this fix, but will be addressed in follow up patches.
      
      Fixes: 69cde000 ("x86/vector: Use matrix allocator for vector assignment")
      Reported-by: default avatarDmitry Safonov <0x7f454c46@gmail.com>
      Reported-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Reported-by: default avatarSong Liu <liu.song.a23@gmail.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Tested-by: default avatarSong Liu <songliubraving@fb.com>
      Cc: Joerg Roedel <jroedel@suse.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: stable@vger.kernel.org
      Cc: Mike Travis <mike.travis@hpe.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Link: https://lkml.kernel.org/r/20180604162224.303870257@linutronix.de
      80ae7b1a
  10. May 19, 2018
  11. May 18, 2018
  12. Apr 10, 2018
    • Li RongQing's avatar
      x86/apic: Fix signedness bug in APIC ID validity checks · a774635d
      Li RongQing authored
      
      The APIC ID as parsed from ACPI MADT is validity checked with the
      apic->apic_id_valid() callback, which depends on the selected APIC type.
      
      For non X2APIC types APIC IDs >= 0xFF are invalid, but values > 0x7FFFFFFF
      are detected as valid. This happens because the 'apicid' argument of the
      apic_id_valid() callback is type 'int'. So the resulting comparison
      
         apicid < 0xFF
      
      evaluates to true for all unsigned int values > 0x7FFFFFFF which are handed
      to default_apic_id_valid(). As a consequence, invalid APIC IDs in !X2APIC
      mode are considered valid and accounted as possible CPUs.
      
      Change the apicid argument type of the apic_id_valid() callback to u32 so
      the evaluation is unsigned and returns the correct result.
      
      [ tglx: Massaged changelog ]
      
      Signed-off-by: default avatarLi RongQing <lirongqing@baidu.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Cc: jgross@suse.com
      Cc: Dou Liyang <douly.fnst@cn.fujitsu.com>
      Cc: Peter Zijlstra <pete...
      a774635d
  13. Mar 01, 2018
  14. Feb 23, 2018
  15. Feb 20, 2018
  16. Feb 17, 2018
  17. Feb 16, 2018
  18. Feb 15, 2018
  19. Feb 14, 2018
  20. Feb 13, 2018
  21. Jan 17, 2018
  22. Jan 16, 2018