Skip to content
Snippets Groups Projects
  1. Aug 05, 2018
    • Nicolai Stange's avatar
      x86/KVM/VMX: Introduce per-host-cpu analogue of l1tf_flush_l1d · 45b575c0
      Nicolai Stange authored
      
      Part of the L1TF mitigation for vmx includes flushing the L1D cache upon
      VMENTRY.
      
      L1D flushes are costly and two modes of operations are provided to users:
      "always" and the more selective "conditional" mode.
      
      If operating in the latter, the cache would get flushed only if a host side
      code path considered unconfined had been traversed. "Unconfined" in this
      context means that it might have pulled in sensitive data like user data
      or kernel crypto keys.
      
      The need for L1D flushes is tracked by means of the per-vcpu flag
      l1tf_flush_l1d. KVM exit handlers considered unconfined set it. A
      vmx_l1d_flush() subsequently invoked before the next VMENTER will conduct a
      L1d flush based on its value and reset that flag again.
      
      Currently, interrupts delivered "normally" while in root operation between
      VMEXIT and VMENTER are not taken into account. Part of the reason is that
      these don't leave any traces and thus, the vmx code is unable to tell if
      any such has happened.
      
      As proposed by Paolo Bonzini, prepare for tracking all interrupts by
      introducing a new per-cpu flag, "kvm_cpu_l1tf_flush_l1d". It will be in
      strong analogy to the per-vcpu ->l1tf_flush_l1d.
      
      A later patch will make interrupt handlers set it.
      
      For the sake of cache locality, group kvm_cpu_l1tf_flush_l1d into x86'
      per-cpu irq_cpustat_t as suggested by Peter Zijlstra.
      
      Provide the helpers kvm_set_cpu_l1tf_flush_l1d(),
      kvm_clear_cpu_l1tf_flush_l1d() and kvm_get_cpu_l1tf_flush_l1d(). Make them
      trivial resp. non-existent for !CONFIG_KVM_INTEL as appropriate.
      
      Let vmx_l1d_flush() handle kvm_cpu_l1tf_flush_l1d in the same way as
      l1tf_flush_l1d.
      
      Suggested-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Suggested-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarNicolai Stange <nstange@suse.de>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      45b575c0
    • Nicolai Stange's avatar
      x86/irq: Demote irq_cpustat_t::__softirq_pending to u16 · 9aee5f8a
      Nicolai Stange authored
      
      An upcoming patch will extend KVM's L1TF mitigation in conditional mode
      to also cover interrupts after VMEXITs. For tracking those, stores to a
      new per-cpu flag from interrupt handlers will become necessary.
      
      In order to improve cache locality, this new flag will be added to x86's
      irq_cpustat_t.
      
      Make some space available there by shrinking the ->softirq_pending bitfield
      from 32 to 16 bits: the number of bits actually used is only NR_SOFTIRQS,
      i.e. 10.
      
      Suggested-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarNicolai Stange <nstange@suse.de>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      9aee5f8a
    • Nicolai Stange's avatar
      x86/KVM/VMX: Move the l1tf_flush_l1d test to vmx_l1d_flush() · 5b6ccc6c
      Nicolai Stange authored
      
      Currently, vmx_vcpu_run() checks if l1tf_flush_l1d is set and invokes
      vmx_l1d_flush() if so.
      
      This test is unncessary for the "always flush L1D" mode.
      
      Move the check to vmx_l1d_flush()'s conditional mode code path.
      
      Notes:
      - vmx_l1d_flush() is likely to get inlined anyway and thus, there's no
        extra function call.
        
      - This inverts the (static) branch prediction, but there hadn't been any
        explicit likely()/unlikely() annotations before and so it stays as is.
      
      Signed-off-by: default avatarNicolai Stange <nstange@suse.de>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      5b6ccc6c
    • Nicolai Stange's avatar
      x86/KVM/VMX: Replace 'vmx_l1d_flush_always' with 'vmx_l1d_flush_cond' · 427362a1
      Nicolai Stange authored
      
      The vmx_l1d_flush_always static key is only ever evaluated if
      vmx_l1d_should_flush is enabled. In that case however, there are only two
      L1d flushing modes possible: "always" and "conditional".
      
      The "conditional" mode's implementation tends to require more sophisticated
      logic than the "always" mode.
      
      Avoid inverted logic by replacing the 'vmx_l1d_flush_always' static key
      with a 'vmx_l1d_flush_cond' one.
      
      There is no change in functionality.
      
      Signed-off-by: default avatarNicolai Stange <nstange@suse.de>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      427362a1
    • Nicolai Stange's avatar
      x86/KVM/VMX: Don't set l1tf_flush_l1d to true from vmx_l1d_flush() · 379fd0c7
      Nicolai Stange authored
      
      vmx_l1d_flush() gets invoked only if l1tf_flush_l1d is true. There's no
      point in setting l1tf_flush_l1d to true from there again.
      
      Signed-off-by: default avatarNicolai Stange <nstange@suse.de>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      379fd0c7
  2. Jul 19, 2018
    • Nicolai Stange's avatar
      x86/KVM/VMX: Initialize the vmx_l1d_flush_pages' content · 288d152c
      Nicolai Stange authored
      
      The slow path in vmx_l1d_flush() reads from vmx_l1d_flush_pages in order
      to evict the L1d cache.
      
      However, these pages are never cleared and, in theory, their data could be
      leaked.
      
      More importantly, KSM could merge a nested hypervisor's vmx_l1d_flush_pages
      to fewer than 1 << L1D_CACHE_ORDER host physical pages and this would break
      the L1d flushing algorithm: L1D on x86_64 is tagged by physical addresses.
      
      Fix this by initializing the individual vmx_l1d_flush_pages with a
      different pattern each.
      
      Rename the "empty_zp" asm constraint identifier in vmx_l1d_flush() to
      "flush_pages" to reflect this change.
      
      Fixes: a47dd5f0 ("x86/KVM/VMX: Add L1D flush algorithm")
      Signed-off-by: default avatarNicolai Stange <nstange@suse.de>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      288d152c
  3. Jul 13, 2018
  4. Jul 05, 2018
  5. Jul 02, 2018
    • Thomas Gleixner's avatar
      Revert "x86/apic: Ignore secondary threads if nosmt=force" · 506a66f3
      Thomas Gleixner authored
      Dave Hansen reported, that it's outright dangerous to keep SMT siblings
      disabled completely so they are stuck in the BIOS and wait for SIPI.
      
      The reason is that Machine Check Exceptions are broadcasted to siblings and
      the soft disabled sibling has CR4.MCE = 0. If a MCE is delivered to a
      logical core with CR4.MCE = 0, it asserts IERR#, which shuts down or
      reboots the machine. The MCE chapter in the SDM contains the following
      blurb:
      
          Because the logical processors within a physical package are tightly
          coupled with respect to shared hardware resources, both logical
          processors are notified of machine check errors that occur within a
          given physical processor. If machine-check exceptions are enabled when
          a fatal error is reported, all the logical processors within a physical
          package are dispatched to the machine-check exception handler. If
          machine-check exceptions are disabled, the logical processors enter the
          shutdown st...
      506a66f3
  6. Jun 30, 2018
  7. Jun 27, 2018
    • Vlastimil Babka's avatar
      x86/speculation/l1tf: Protect PAE swap entries against L1TF · 0d0f6249
      Vlastimil Babka authored
      
      The PAE 3-level paging code currently doesn't mitigate L1TF by flipping the
      offset bits, and uses the high PTE word, thus bits 32-36 for type, 37-63 for
      offset. The lower word is zeroed, thus systems with less than 4GB memory are
      safe. With 4GB to 128GB the swap type selects the memory locations vulnerable
      to L1TF; with even more memory, also the swap offfset influences the address.
      This might be a problem with 32bit PAE guests running on large 64bit hosts.
      
      By continuing to keep the whole swap entry in either high or low 32bit word of
      PTE we would limit the swap size too much. Thus this patch uses the whole PAE
      PTE with the same layout as the 64bit version does. The macros just become a
      bit tricky since they assume the arch-dependent swp_entry_t to be 32bit.
      
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      0d0f6249
  8. Jun 22, 2018
  9. Jun 21, 2018