Skip to content
Snippets Groups Projects
  1. Sep 26, 2017
    • Thomas Gleixner's avatar
      x86/apic: Get rid of multi CPU affinity · fdba46ff
      Thomas Gleixner authored
      
      Setting the interrupt affinity of a single interrupt to multiple CPUs has a
      dubious value.
      
       1) This only works on machines where the APIC uses logical destination
          mode. If the APIC uses physical destination mode then it is already
          restricted to a single CPU
      
       2) Experiments have shown, that the benefit of multi CPU affinity is close
          to zero and in some test even worse than setting the affinity to a
          single CPU.
      
          The reason for this is that the delivery targets the APIC with the
          lowest ID first and only if that APIC is busy (servicing an interrupt,
          i.e. ISR is not empty) it hands it over to the next APIC. In the
          conducted tests the vast majority of interrupts ends up on the APIC
          with the lowest ID anyway, so there is no natural spreading of the
          interrupts possible.
      
      Supporting multi CPU affinities adds a lot of complexity to the code, which
      can turn the allocation search into a worst case of
      
          nr_vectors * nr_online_cpus * nr_bits_in_target_mask
      
      As a first step disable it by restricting the vector search to a single
      CPU.
      
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Tested-by: default avatarJuergen Gross <jgross@suse.com>
      Tested-by: default avatarYu Chen <yu.c.chen@intel.com>
      Acked-by: default avatarJuergen Gross <jgross@suse.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Cc: Alok Kataria <akataria@vmware.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Rui Zhang <rui.zhang@intel.com>
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Len Brown <lenb@kernel.org>
      Link: https://lkml.kernel.org/r/20170913213154.228824430@linutronix.de
      fdba46ff
    • Thomas Gleixner's avatar
      x86/vector: Rename used_vectors to system_vectors · 7854f822
      Thomas Gleixner authored
      
      used_vectors is a nisnomer as it only has the system vectors which are
      excluded from the regular vector allocation marked. It's not what the name
      suggests storage for the actually used vectors.
      
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Tested-by: default avatarJuergen Gross <jgross@suse.com>
      Tested-by: default avatarYu Chen <yu.c.chen@intel.com>
      Acked-by: default avatarJuergen Gross <jgross@suse.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Cc: Alok Kataria <akataria@vmware.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Rui Zhang <rui.zhang@intel.com>
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Len Brown <lenb@kernel.org>
      Link: https://lkml.kernel.org/r/20170913213154.150209009@linutronix.de
      7854f822
    • Thomas Gleixner's avatar
      x86/apic: Get rid of apic->target_cpus · c1d1ee9a
      Thomas Gleixner authored
      
      The target_cpus() callback of the apic struct is not really useful. Some
      APICs return cpu_online_mask and others cpus_all_mask. The latter is bogus
      as it does not take holes in the cpus_possible_mask into account.
      
      Replace it with cpus_online_mask which makes the most sense and remove the
      callback.
      
      The usage sites will be removed in a later step anyway, so get rid of it
      now to have incremental changes.
      
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Tested-by: default avatarJuergen Gross <jgross@suse.com>
      Tested-by: default avatarYu Chen <yu.c.chen@intel.com>
      Acked-by: default avatarJuergen Gross <jgross@suse.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Cc: Alok Kataria <akataria@vmware.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Rui Zhang <rui.zhang@intel.com>
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Len Brown <lenb@kernel.org>
      Link: https://lkml.kernel.org/r/20170913213154.070850916@linutronix.de
      c1d1ee9a
  2. Aug 29, 2017
  3. Jun 23, 2017
  4. Jun 22, 2017
  5. Jan 05, 2017
    • Daniel Bristot de Oliveira's avatar
      x86/irq, trace: Add __irq_entry annotation to x86's platform IRQ handlers · c4158ff5
      Daniel Bristot de Oliveira authored
      This patch adds the __irq_entry annotation to the default x86
      platform IRQ handlers. ftrace's function_graph tracer uses the
      __irq_entry annotation to notify the entry and return of IRQ
      handlers.
      
      For example, before the patch:
        354549.667252 |   3)  d..1              |  default_idle_call() {
        354549.667252 |   3)  d..1              |    arch_cpu_idle() {
        354549.667253 |   3)  d..1              |      default_idle() {
        354549.696886 |   3)  d..1              |        smp_trace_reschedule_interrupt() {
        354549.696886 |   3)  d..1              |          irq_enter() {
        354549.696886 |   3)  d..1              |            rcu_irq_enter() {
      
      After the patch:
        366416.254476 |   3)  d..1              |    arch_cpu_idle() {
        366416.254476 |   3)  d..1              |      default_idle() {
        366416.261566 |   3)  d..1  ==========> |
        366416.261566 |   3)  d..1              |        smp_trace_reschedule_interrupt() {
        366416.261566...
      c4158ff5
  6. Oct 04, 2016
    • Mika Westerberg's avatar
      x86/irq: Prevent force migration of irqs which are not in the vector domain · db91aa79
      Mika Westerberg authored
      
      When a CPU is about to be offlined we call fixup_irqs() that resets IRQ
      affinities related to the CPU in question. The same thing is also done when
      the system is suspended to S-states like S3 (mem).
      
      For each IRQ we try to complete any on-going move regardless whether the
      IRQ is actually part of x86_vector_domain. For each IRQ descriptor we fetch
      its chip_data, assume it is of type struct apic_chip_data and manipulate it
      by clearing old_domain mask etc. For irq_chips that are not part of the
      x86_vector_domain, like those created by various GPIO drivers, will find
      their chip_data being changed unexpectly.
      
      Below is an example where GPIO chip owned by pinctrl-sunrisepoint.c gets
      corrupted after resume:
      
        # cat /sys/kernel/debug/gpio
        gpiochip0: GPIOs 360-511, parent: platform/INT344B:00, INT344B:00:
         gpio-511 (                    |sysfs               ) in  hi
      
        # rtcwake -s10 -mmem
        <10 seconds passes>
      
        # cat /sys/kernel/debug/gpio
        gpiochip0: GPIOs 360-511, parent: platform/INT344B:00, INT344B:00:
         gpio-511 (                    |sysfs               ) in  ?
      
      Note '?' in the output. It means the struct gpio_chip ->get function is
      NULL whereas before suspend it was there.
      
      Fix this by first checking that the IRQ belongs to x86_vector_domain before
      we try to use the chip_data as struct apic_chip_data.
      
      Reported-and-tested-by: default avatarSakari Ailus <sakari.ailus@linux.intel.com>
      Signed-off-by: default avatarMika Westerberg <mika.westerberg@linux.intel.com>
      Cc: stable@vger.kernel.org # 4.4+
      Link: http://lkml.kernel.org/r/20161003101708.34795-1-mika.westerberg@linux.intel.com
      
      
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      db91aa79
  7. Aug 04, 2016
    • Masahiro Yamada's avatar
      tree-wide: replace config_enabled() with IS_ENABLED() · 97f2645f
      Masahiro Yamada authored
      The use of config_enabled() against config options is ambiguous.  In
      practical terms, config_enabled() is equivalent to IS_BUILTIN(), but the
      author might have used it for the meaning of IS_ENABLED().  Using
      IS_ENABLED(), IS_BUILTIN(), IS_MODULE() etc.  makes the intention
      clearer.
      
      This commit replaces config_enabled() with IS_ENABLED() where possible.
      This commit is only touching bool config options.
      
      I noticed two cases where config_enabled() is used against a tristate
      option:
      
       - config_enabled(CONFIG_HWMON)
        [ drivers/net/wireless/ath/ath10k/thermal.c ]
      
       - config_enabled(CONFIG_BACKLIGHT_CLASS_DEVICE)
        [ drivers/gpu/drm/gma500/opregion.c ]
      
      I did not touch them because they should be converted to IS_BUILTIN()
      in order to keep the logic, but I was not sure it was the authors'
      intention.
      
      Link: http://lkml.kernel.org/r/1465215656-20569-1-git-send-email-yamada.masahiro@socionext.com
      
      
      Signed-off-by: default avatarMasahiro Yamada <yamada.masahiro@socionext.com>
      Ac...
      97f2645f
  8. Apr 28, 2016
    • Keith Busch's avatar
      x86/apic: Handle zero vector gracefully in clear_vector_irq() · 1bdb8970
      Keith Busch authored
      
      If x86_vector_alloc_irq() fails x86_vector_free_irqs() is invoked to cleanup
      the already allocated vectors. This subsequently calls clear_vector_irq().
      
      The failed irq has no vector assigned, which triggers the BUG_ON(!vector) in
      clear_vector_irq().
      
      We cannot suppress the call to x86_vector_free_irqs() for the failed
      interrupt, because the other data related to this irq must be cleaned up as
      well. So calling clear_vector_irq() with vector == 0 is legitimate.
      
      Remove the BUG_ON and return if vector is zero,
      
      [ tglx: Massaged changelog ]
      
      Fixes: b5dc8e6c "x86/irq: Use hierarchical irqdomain to manage CPU interrupt vectors"
      Signed-off-by: default avatarKeith Busch <keith.busch@intel.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      1bdb8970
  9. Apr 13, 2016
  10. Mar 18, 2016
    • Thomas Gleixner's avatar
      x86/irq: Cure live lock in fixup_irqs() · 551adc60
      Thomas Gleixner authored
      
      Harry reported, that he's able to trigger a system freeze with cpu hot
      unplug. The freeze turned out to be a live lock caused by recent changes in
      irq_force_complete_move().
      
      When fixup_irqs() and from there irq_force_complete_move() is called on the
      dying cpu, then all other cpus are in stop machine an wait for the dying cpu
      to complete the teardown. If there is a move of an interrupt pending then
      irq_force_complete_move() sends the cleanup IPI to the cpus in the old_domain
      mask and waits for them to clear the mask. That's obviously impossible as
      those cpus are firmly stuck in stop machine with interrupts disabled.
      
      I should have known that, but I completely overlooked it being concentrated on
      the locking issues around the vectors. And the existance of the call to
      __irq_complete_move() in the code, which actually sends the cleanup IPI made
      it reasonable to wait for that cleanup to complete. That call was bogus even
      before the recent changes as it was just a pointless distraction.
      
      We have to look at two cases:
      
      1) The move_in_progress flag of the interrupt is set
      
         This means the ioapic has been updated with the new vector, but it has not
         fired yet. In theory there is a race:
      
         set_ioapic(new_vector) <-- Interrupt is raised before update is effective,
         			      i.e. it's raised on the old vector. 
      
         So if the target cpu cannot handle that interrupt before the old vector is
         cleaned up, we get a spurious interrupt and in the worst case the ioapic
         irq line becomes stale, but my experiments so far have only resulted in
         spurious interrupts.
      
         But in case of cpu hotplug this should be a non issue because if the
         affinity update happens right before all cpus rendevouz in stop machine,
         there is no way that the interrupt can be blocked on the target cpu because
         all cpus loops first with interrupts enabled in stop machine, so the old
         vector is not yet cleaned up when the interrupt fires.
      
         So the only way to run into this issue is if the delivery of the interrupt
         on the apic/system bus would be delayed beyond the point where the target
         cpu disables interrupts in stop machine. I doubt that it can happen, but at
         least there is a theroretical chance. Virtualization might be able to
         expose this, but AFAICT the IOAPIC emulation is not as stupid as the real
         hardware.
      
         I've spent quite some time over the weekend to enforce that situation,
         though I was not able to trigger the delayed case.
      
      2) The move_in_progress flag is not set and the old_domain cpu mask is not
         empty.
      
         That means, that an interrupt was delivered after the change and the
         cleanup IPI has been sent to the cpus in old_domain, but not all CPUs have
         responded to it yet.
      
      In both cases we can assume that the next interrupt will arrive on the new
      vector, so we can cleanup the old vectors on the cpus in the old_domain cpu
      mask.
      
      Fixes: 98229aa3 "x86/irq: Plug vector cleanup race"
      Reported-by: default avatarHarry Junior <harryjr@outlook.fr>
      Tested-by: default avatarTony Luck <tony.luck@intel.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Joe Lawrence <joe.lawrence@stratus.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Ben Hutchings <ben@decadent.org.uk>
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1603140931430.3657@nanos
      
      
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      551adc60
  11. Jan 15, 2016
  12. Dec 20, 2015
  13. Nov 07, 2015
    • Vitaly Kuznetsov's avatar
      x86/irq: Probe for PIC presence before allocating descs for legacy IRQs · 8c058b0b
      Vitaly Kuznetsov authored
      
      Commit d32932d0 ("x86/irq: Convert IOAPIC to use hierarchical irqdomain
      interfaces") brought a regression for Hyper-V Gen2 instances. These
      instances don't have i8259 legacy PIC but they use legacy IRQs for serial
      port, rtc, and acpi. With this commit included we end up with these IRQs
      not initialized. Earlier, there was a special workaround for legacy IRQs
      in mp_map_pin_to_irq() doing mp_irqdomain_map() without looking at
      nr_legacy_irqs() and now we fail in __irq_domain_alloc_irqs() when
      irq_domain_alloc_descs() returns -EEXIST.
      
      The essence of the issue seems to be that early_irq_init() calls
      arch_probe_nr_irqs() to figure out the number of legacy IRQs before
      we probe for i8259 and gets 16. Later when init_8259A() is called we switch
      to NULL legacy PIC and nr_legacy_irqs() starts to return 0 but we already
      have 16 descs allocated.
      
      Solve the issue by separating i8259 probe from init and calling it in
      arch_probe_nr_irqs() before we actually use nr_legacy_irqs() information.
      
      Fixes: d32932d0 ("x86/irq: Convert IOAPIC to use hierarchical irqdomain interfaces")
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Jiang Liu <jiang.liu@linux.intel.com>
      Cc: K. Y. Srinivasan <kys@microsoft.com>
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/1446543614-3621-1-git-send-email-vkuznets@redhat.com
      
      
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      8c058b0b
  14. Sep 16, 2015
  15. Aug 19, 2015
  16. Aug 06, 2015
  17. Jul 14, 2015
  18. Jul 07, 2015
    • Thomas Gleixner's avatar
      x86/irq: Plug irq vector hotplug race · 5a3f75e3
      Thomas Gleixner authored
      
      Jin debugged a nasty cpu hotplug race which results in leaking a irq
      vector on the newly hotplugged cpu.
      
      cpu N				cpu M
      native_cpu_up                   device_shutdown
        do_boot_cpu			  free_msi_irqs
        start_secondary                   arch_teardown_msi_irqs
          smp_callin                        default_teardown_msi_irqs
             setup_vector_irq                  arch_teardown_msi_irq
              __setup_vector_irq		   native_teardown_msi_irq
                lock(vector_lock)		     destroy_irq 
                install vectors
                unlock(vector_lock)
      					       lock(vector_lock)
      --->                                  	       __clear_irq_vector
                                          	       unlock(vector_lock)
          lock(vector_lock)
          set_cpu_online
          unlock(vector_lock)
      
      This leaves the irq vector(s) which are torn down on CPU M stale in
      the vector array of CPU N, because CPU M does not see CPU N online
      yet. There is a similar issue with concurrent newly setup interrupts.
      
      The alloc/free protection of irq descriptors does not prevent the
      above race, because it merily prevents interrupt descriptors from
      going away or changing concurrently.
      
      Prevent this by moving the call to setup_vector_irq() into the
      vector_lock held region which protects set_cpu_online():
      
      cpu N				cpu M
      native_cpu_up                   device_shutdown
        do_boot_cpu			  free_msi_irqs
        start_secondary                   arch_teardown_msi_irqs
          smp_callin                        default_teardown_msi_irqs
             lock(vector_lock)                arch_teardown_msi_irq
             setup_vector_irq()
              __setup_vector_irq		   native_teardown_msi_irq
                install vectors		     destroy_irq 
             set_cpu_online
             unlock(vector_lock)
      					       lock(vector_lock)
                                        	       __clear_irq_vector
                                          	       unlock(vector_lock)
      
      So cpu M either sees the cpu N online before clearing the vector or
      cpu N installs the vectors after cpu M has cleared it.
      
      Reported-by: default avatarxiao jin <jin.xiao@intel.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Joerg Roedel <jroedel@suse.de>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Yanmin Zhang <yanmin_zhang@linux.intel.com>
      Link: http://lkml.kernel.org/r/20150705171102.141898931@linutronix.de
      5a3f75e3