Skip to content
Snippets Groups Projects
  1. Apr 24, 2015
  2. Nov 12, 2014
  3. Dec 13, 2013
  4. Nov 07, 2013
    • Konrad Rzeszutek Wilk's avatar
      PCI: Add x86_msi.msi_mask_irq() and msix_mask_irq() · 0e4ccb15
      Konrad Rzeszutek Wilk authored
      Certain platforms do not allow writes in the MSI-X BARs to setup or tear
      down vector values.  To combat against the generic code trying to write to
      that and either silently being ignored or crashing due to the pagetables
      being marked R/O this patch introduces a platform override.
      
      Note that we keep two separate, non-weak, functions default_mask_msi_irqs()
      and default_mask_msix_irqs() for the behavior of the arch_mask_msi_irqs()
      and arch_mask_msix_irqs(), as the default behavior is needed by x86 PCI
      code.
      
      For Xen, which does not allow the guest to write to MSI-X tables - as the
      hypervisor is solely responsible for setting the vector values - we
      implement two nops.
      
      This fixes a Xen guest crash when passing a PCI device with MSI-X to the
      guest.  See the bugzilla for more details.
      
      [bhelgaas: add bugzilla info]
      Reference: https://bugzilla.kernel.org/show_bug.cgi?id=64581
      
      
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      CC: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com>
      CC: Zhenzhong Duan <zhenzhong.duan@oracle.com>
      0e4ccb15
  5. May 29, 2013
    • David Vrabel's avatar
      x86: Increase precision of x86_platform.get/set_wallclock() · 3565184e
      David Vrabel authored
      
      All the virtualized platforms (KVM, lguest and Xen) have persistent
      wallclocks that have more than one second of precision.
      
      read_persistent_wallclock() and update_persistent_wallclock() allow
      for nanosecond precision but their implementation on x86 with
      x86_platform.get/set_wallclock() only allows for one second precision.
      This means guests may see a wallclock time that is off by up to 1
      second.
      
      Make set_wallclock() and get_wallclock() take a struct timespec
      parameter (which allows for nanosecond precision) so KVM and Xen
      guests may start with a more accurate wallclock time and a Xen dom0
      can maintain a more accurate wallclock for guests.
      
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: default avatarJohn Stultz <john.stultz@linaro.org>
      3565184e
  6. Jan 28, 2013
  7. Nov 18, 2012
  8. Sep 12, 2012
  9. Jun 06, 2012
  10. May 25, 2012
  11. May 02, 2012
  12. Apr 17, 2012
  13. Mar 20, 2012
    • Marcelo Tosatti's avatar
      x86: kvmclock: abstract save/restore sched_clock_state · b74f05d6
      Marcelo Tosatti authored
      
      Upon resume from hibernation, CPU 0's hvclock area contains the old
      values for system_time and tsc_timestamp. It is necessary for the
      hypervisor to update these values with uptodate ones before the CPU uses
      them.
      
      Abstract TSC's save/restore sched_clock_state functions and use
      restore_state to write to KVM_SYSTEM_TIME MSR, forcing an update.
      
      Also move restore_sched_clock_state before __restore_processor_state,
      since the later calls CONFIG_LOCK_STAT's lockstat_clock (also for TSC).
      Thanks to Igor Mammedov for tracking it down.
      
      Fixes suspend-to-disk with kvmclock.
      
      Reviewed-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      b74f05d6
  14. Mar 05, 2012
    • Igor Mammedov's avatar
      x86: Introduce x86_cpuinit.early_percpu_clock_init hook · df156f90
      Igor Mammedov authored
      When kvm guest uses kvmclock, it may hang on vcpu hot-plug.
      This is caused by an overflow in pvclock_get_nsec_offset,
      
          u64 delta = tsc - shadow->tsc_timestamp;
      
      which in turn is caused by an undefined values from percpu
      hv_clock that hasn't been initialized yet.
      Uninitialized clock on being booted cpu is accessed from
         start_secondary
          -> smp_callin
            ->  smp_store_cpu_info
              -> identify_secondary_cpu
                -> mtrr_ap_init
                  -> mtrr_restore
                    -> stop_machine_from_inactive_cpu
                      -> queue_stop_cpus_work
                        ...
                          -> sched_clock
                            -> kvm_clock_read
      which is well before x86_cpuinit.setup_percpu_clockev call in
      start_secondary, where percpu clock is initialized.
      
      This patch introduces a hook that allows to setup/initialize
      per_cpu clock early and avoid overflow due to reading
        - undefined values
        - old values if cpu was offlined and then onlined again
      
      Another possible early user of this clock source is ftrace that
      accesses it to get timestamps for ring buffer entries. So if
      mtrr_ap_init is moved from identify_secondary_cpu to past
      x86_cpuinit.setup_percpu_clockev in start_secondary, ftrace
      may cause the same overflow/hang on cpu hot-plug anyway.
      
      More complete description of the problem:
        https://lkml.org/lkml/2012/2/2/101
      
      
      
      Credits to Marcelo Tosatti <mtosatti@redhat.com> for hook idea.
      
      Acked-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarIgor Mammedov <imammedo@redhat.com>
      Signed-off-by: default avatarMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      df156f90
  15. Jan 07, 2012
  16. Dec 06, 2011
  17. Nov 10, 2011
  18. May 13, 2011
    • Stefano Stabellini's avatar
      x86,xen: introduce x86_init.mapping.pagetable_reserve · 279b706b
      Stefano Stabellini authored
      
      Introduce a new x86_init hook called pagetable_reserve that at the end
      of init_memory_mapping is used to reserve a range of memory addresses for
      the kernel pagetable pages we used and free the other ones.
      
      On native it just calls memblock_x86_reserve_range while on xen it also
      takes care of setting the spare memory previously allocated
      for kernel pagetable pages from RO to RW, so that it can be used for
      other purposes.
      
      A detailed explanation of the reason why this hook is needed follows.
      
      As a consequence of the commit:
      
      commit 4b239f45
      Author: Yinghai Lu <yinghai@kernel.org>
      Date:   Fri Dec 17 16:58:28 2010 -0800
      
          x86-64, mm: Put early page table high
      
      at some point init_memory_mapping is going to reach the pagetable pages
      area and map those pages too (mapping them as normal memory that falls
      in the range of addresses passed to init_memory_mapping as argument).
      Some of those pages are already pagetable pages (they are in the range
      pgt_buf_start-pgt_buf_end) therefore they are going to be mapped RO and
      everything is fine.
      Some of these pages are not pagetable pages yet (they fall in the range
      pgt_buf_end-pgt_buf_top; for example the page at pgt_buf_end) so they
      are going to be mapped RW.  When these pages become pagetable pages and
      are hooked into the pagetable, xen will find that the guest has already
      a RW mapping of them somewhere and fail the operation.
      The reason Xen requires pagetables to be RO is that the hypervisor needs
      to verify that the pagetables are valid before using them. The validation
      operations are called "pinning" (more details in arch/x86/xen/mmu.c).
      
      In order to fix the issue we mark all the pages in the entire range
      pgt_buf_start-pgt_buf_top as RO, however when the pagetable allocation
      is completed only the range pgt_buf_start-pgt_buf_end is reserved by
      init_memory_mapping. Hence the kernel is going to crash as soon as one
      of the pages in the range pgt_buf_end-pgt_buf_top is reused (b/c those
      ranges are RO).
      
      For this reason we need a hook to reserve the kernel pagetable pages we
      used and free the other ones so that they can be reused for other
      purposes.
      On native it just means calling memblock_x86_reserve_range, on Xen it
      also means marking RW the pagetable pages that we allocated before but
      that haven't been used before.
      
      Another way to fix this is without using the hook is by adding a 'if
      (xen_pv_domain)' in the 'init_memory_mapping' code and calling the Xen
      counterpart, but that is just nasty.
      
      Signed-off-by: default avatarStefano Stabellini <stefano.stabellini@eu.citrix.com>
      Acked-by: default avatarYinghai Lu <yinghai@kernel.org>
      Acked-by: default avatarH. Peter Anvin <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      279b706b
  19. Feb 15, 2011
  20. Oct 18, 2010
  21. Jul 08, 2010
  22. Feb 27, 2010
    • Russ Anderson's avatar
      x86: Enable NMI on all cpus on UV · 78c06176
      Russ Anderson authored
      
      Enable NMI on all cpus in UV system and add an NMI handler
      to dump_stack on each cpu.
      
      By default on x86 all the cpus except the boot cpu have NMI
      masked off.  This patch enables NMI on all cpus in UV system
      and adds an NMI handler to dump_stack on each cpu.  This
      way if a system hangs we can NMI the machine and get a
      backtrace from all the cpus.
      
      Version 2: Use x86_platform driver mechanism for nmi init, per
                 Ingo's suggestion.
      
      Version 3: Clean up Ingo's nits.
      
      Signed-off-by: default avatarRuss Anderson <rja@sgi.com>
      LKML-Reference: <20100226164912.GA24439@sgi.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      78c06176
  23. Feb 26, 2010
  24. Feb 20, 2010
  25. Nov 24, 2009