Skip to content
Snippets Groups Projects
  1. Dec 29, 2017
    • Thomas Gleixner's avatar
      x86/apic: Switch all APICs to Fixed delivery mode · a31e58e1
      Thomas Gleixner authored
      
      Some of the APIC incarnations are operating in lowest priority delivery
      mode. This worked as long as the vector management code allocated the same
      vector on all possible CPUs for each interrupt.
      
      Lowest priority delivery mode does not necessarily respect the affinity
      setting and may redirect to some other online CPU. This was documented
      somewhere in the old code and the conversion to single target delivery
      missed to update the delivery mode of the affected APIC drivers which
      results in spurious interrupts on some of the affected CPU/Chipset
      combinations.
      
      Switch the APIC drivers over to Fixed delivery mode and remove all
      leftovers of lowest priority delivery mode.
      
      Switching to Fixed delivery mode is not a problem on these CPUs because the
      kernel already uses Fixed delivery mode for IPIs. The reason for this is
      that th SDM explicitely forbids lowest prio mode for IPIs. The reason is
      obvious: If the irq routing does not honor destination targets in lowest
      prio mode then an IPI targeted at CPU1 might end up on CPU0, which would be
      a fatal problem in many cases.
      
      As a consequence of this change, the apic::irq_delivery_mode field is now
      pointless, but this needs to be cleaned up in a separate patch.
      
      Fixes: fdba46ff ("x86/apic: Get rid of multi CPU affinity")
      Reported-by: default avatar <vcaputo@pengaru.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Tested-by: default avatar <vcaputo@pengaru.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1712281140440.1688@nanos
      a31e58e1
  2. Dec 28, 2017
  3. Dec 19, 2017
    • Josh Poimboeuf's avatar
      x86/stacktrace: Make zombie stack traces reliable · 6454b3bd
      Josh Poimboeuf authored
      
      Commit:
      
        1959a601 ("x86/dumpstack: Pin the target stack when dumping it")
      
      changed the behavior of stack traces for zombies.  Before that commit,
      /proc/<pid>/stack reported the last execution path of the zombie before
      it died:
      
        [<ffffffff8105b877>] do_exit+0x6f7/0xa80
        [<ffffffff8105bc79>] do_group_exit+0x39/0xa0
        [<ffffffff8105bcf0>] __wake_up_parent+0x0/0x30
        [<ffffffff8152dd09>] system_call_fastpath+0x16/0x1b
        [<00007fd128f9c4f9>] 0x7fd128f9c4f9
        [<ffffffffffffffff>] 0xffffffffffffffff
      
      After the commit, it just reports an empty stack trace.
      
      The new behavior is actually probably more correct.  If the stack
      refcount has gone down to zero, then the task has already gone through
      do_exit() and isn't going to run anymore.  The stack could be freed at
      any time and is basically gone, so reporting an empty stack makes sense.
      
      However, save_stack_trace_tsk_reliable() treats such a missing stack
      condition as an error.  That can cause livepatch transition stalls if
      there are any unreaped zombies.  Instead, just treat it as a reliable,
      empty stack.
      
      Reported-and-tested-by: default avatarMiroslav Benes <mbenes@suse.cz>
      Signed-off-by: default avatarJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: live-patching@vger.kernel.org
      Fixes: af085d90 ("stacktrace/x86: add function for detecting reliable stack traces")
      Link: http://lkml.kernel.org/r/e4b09e630e99d0c1080528f0821fc9d9dbaeea82.1513631620.git.jpoimboe@redhat.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      6454b3bd
  4. Dec 18, 2017
    • Tom Lendacky's avatar
      x86/mm: Unbreak modules that use the DMA API · 9d5f38ba
      Tom Lendacky authored
      
      Commit d8aa7eea ("x86/mm: Add Secure Encrypted Virtualization (SEV)
      support") changed sme_active() from an inline function that referenced
      sme_me_mask to a non-inlined function in order to make the sev_enabled
      variable a static variable.  This function was marked EXPORT_SYMBOL_GPL
      because at the time the patch was submitted, sme_me_mask was marked
      EXPORT_SYMBOL_GPL.
      
      Commit 87df2617 ("x86/mm: Unbreak modules that rely on external
      PAGE_KERNEL availability") changed sme_me_mask variable from
      EXPORT_SYMBOL_GPL to EXPORT_SYMBOL, allowing external modules the ability
      to build with CONFIG_AMD_MEM_ENCRYPT=y.  Now, however, with sev_active()
      no longer an inline function and marked as EXPORT_SYMBOL_GPL, external
      modules that use the DMA API are once again broken in 4.15. Since the DMA
      API is meant to be used by external modules, this needs to be changed.
      
      Change the sme_active() and sev_active() functions from EXPORT_SYMBOL_GPL
      to EXPORT_SYMBOL.
      
      Signed-off-by: default avatarTom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brijesh Singh <brijesh.singh@amd.com>
      Link: https://lkml.kernel.org/r/20171215162011.14125.7113.stgit@tlendack-t1.amdoffice.net
      9d5f38ba
  5. Dec 16, 2017
    • Matthew Wilcox's avatar
      x86/build: Make isoimage work on Debian · 5f0e3fe6
      Matthew Wilcox authored
      
      Debian does not ship a 'mkisofs' symlink to genisoimage.  All modern
      distros ship genisoimage, so just use that directly.  That requires
      renaming the 'genisoimage' function.  Also neaten up the 'for' loop
      while I'm in here.
      
      Signed-off-by: default avatarMatthew Wilcox <mawilcox@microsoft.com>
      Cc: Changbin Du <changbin.du@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      5f0e3fe6
  6. Dec 15, 2017
  7. Dec 12, 2017
  8. Dec 11, 2017
    • Karol Herbst's avatar
      x86/mm/kmmio: Fix mmiotrace for page unaligned addresses · 6d60ce38
      Karol Herbst authored
      
      If something calls ioremap() with an address not aligned to PAGE_SIZE, the
      returned address might be not aligned as well. This led to a probe
      registered on exactly the returned address, but the entire page was armed
      for mmiotracing.
      
      On calling iounmap() the address passed to unregister_kmmio_probe() was
      PAGE_SIZE aligned by the caller leading to a complete freeze of the
      machine.
      
      We should always page align addresses while (un)registerung mappings,
      because the mmiotracer works on top of pages, not mappings. We still keep
      track of the probes based on their real addresses and lengths though,
      because the mmiotrace still needs to know what are mapped memory regions.
      
      Also move the call to mmiotrace_iounmap() prior page aligning the address,
      so that all probes are unregistered properly, otherwise the kernel ends up
      failing memory allocations randomly after disabling the mmiotracer.
      
      Tested-by: default avatarLyude <lyude@redhat.com>
      Signed-off-by: default avatarKarol Herbst <kherbst@redhat.com>
      Acked-by: default avatarPekka Paalanen <ppaalanen@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: nouveau@lists.freedesktop.org
      Link: http://lkml.kernel.org/r/20171127075139.4928-1-kherbst@redhat.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      6d60ce38
  9. Dec 07, 2017
  10. Dec 06, 2017
  11. Nov 28, 2017
  12. Nov 25, 2017
    • Nadav Amit's avatar
      x86/tlb: Disable interrupts when changing CR4 · 9d0b6232
      Nadav Amit authored
      
      CR4 modifications are implemented as RMW operations which update a shadow
      variable and write the result to CR4. The RMW operation is protected by
      preemption disable, but there is no enforcement or debugging mechanism.
      
      CR4 modifications happen also in interrupt context via
      __native_flush_tlb_global(). This implementation does not affect a
      interrupted thread context CR4 operation, because the CR4 toggle restores
      the original content and does not modify the shadow variable.
      
      So the current situation seems to be safe, but a recent patch tried to add
      an actual RMW operation in interrupt context, which will cause subtle
      corruptions.
      
      To prevent that and make the CR4 handling future proof:
      
       - Add a lockdep assertion to __cr4_set() which will catch interrupt
         enabled invocations
      
       - Disable interrupts in the cr4 manipulator inlines
      
       - Rename cr4_toggle_bits() to cr4_toggle_bits_irqsoff(). This is called
         from __switch_to_xtra() where interrupts are already disabled and
         performance matters.
      
      All other call sites are not performance critical, so the extra overhead of
      an additional local_irq_save/restore() pair is not a problem. If new call
      sites care about performance then the necessary _irqsoff() variants can be
      added.
      
      [ tglx: Condensed the patch by moving the irq protection inside the
        	manipulator functions. Updated changelog ]
      
      Signed-off-by: default avatarNadav Amit <namit@vmware.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Luck <tony.luck@intel.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: nadav.amit@gmail.com
      Cc: linux-edac@vger.kernel.org
      Link: https://lkml.kernel.org/r/20171125032907.2241-3-namit@vmware.com
      9d0b6232
    • Nadav Amit's avatar
      x86/tlb: Refactor CR4 setting and shadow write · 0c3292ca
      Nadav Amit authored
      
      Refactor the write to CR4 and its shadow value. This is done in
      preparation for the addition of an assertion to check that IRQs are
      disabled during CR4 update.
      
      No functional change.
      
      Signed-off-by: default avatarNadav Amit <namit@vmware.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: nadav.amit@gmail.com
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: linux-edac@vger.kernel.org
      Link: https://lkml.kernel.org/r/20171125032907.2241-2-namit@vmware.com
      0c3292ca
  13. Nov 24, 2017
  14. Nov 23, 2017
    • Andy Lutomirski's avatar
      x86/entry/64: Add missing irqflags tracing to native_load_gs_index() · ca37e57b
      Andy Lutomirski authored
      
      Running this code with IRQs enabled (where dummy_lock is a spinlock):
      
      static void check_load_gs_index(void)
      {
      	/* This will fail. */
      	load_gs_index(0xffff);
      
      	spin_lock(&dummy_lock);
      	spin_unlock(&dummy_lock);
      }
      
      Will generate a lockdep warning.  The issue is that the actual write
      to %gs would cause an exception with IRQs disabled, and the exception
      handler would, as an inadvertent side effect, update irqflag tracing
      to reflect the IRQs-off status.  native_load_gs_index() would then
      turn IRQs back on and return with irqflag tracing still thinking that
      IRQs were off.  The dummy lock-and-unlock causes lockdep to notice the
      error and warn.
      
      Fix it by adding the missing tracing.
      
      Apparently nothing did this in a context where it mattered.  I haven't
      tried to find a code path that would actually exhibit the warning if
      appropriately nasty user code were running.
      
      I suspect that the security impact of this bug is very, very low --
      production systems don't run with lockdep enabled, and the warning is
      mostly harmless anyway.
      
      Found during a quick audit of the entry code to try to track down an
      unrelated bug that Ingo found in some still-in-development code.
      
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bpetkov@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/e1aeb0e6ba8dd430ec36c8a35e63b429698b4132.1511411918.git.luto@kernel.org
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      ca37e57b
  15. Nov 22, 2017
    • Andrey Ryabinin's avatar
      x86/mm/kasan: Don't use vmemmap_populate() to initialize shadow · f68d62a5
      Andrey Ryabinin authored
      [ Note, this commit is a cherry-picked version of:
      
          d17a1d97: ("x86/mm/kasan: don't use vmemmap_populate() to initialize shadow")
      
        ... for easier x86 entry code testing and back-porting. ]
      
      The KASAN shadow is currently mapped using vmemmap_populate() since that
      provides a semi-convenient way to map pages into init_top_pgt.  However,
      since that no longer zeroes the mapped pages, it is not suitable for
      KASAN, which requires zeroed shadow memory.
      
      Add kasan_populate_shadow() interface and use it instead of
      vmemmap_populate().  Besides, this allows us to take advantage of
      gigantic pages and use them to populate the shadow, which should save us
      some memory wasted on page tables and reduce TLB pressure.
      
      Link: http://lkml.kernel.org/r/20171103185147.2688-2-pasha.tatashin@oracle.com
      
      
      Signed-off-by: default avatarAndrey Ryabinin <aryabinin@virtuozzo.com>
      Signed-off-by: default avatarPavel Tatashin <pasha.tatashin@oracle.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Steven Sistare <steven.sistare@oracle.com>
      Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
      Cc: Bob Picco <bob.picco@oracle.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Sam Ravnborg <sam@ravnborg.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      f68d62a5
    • Andy Lutomirski's avatar
      x86/entry/64: Fix entry_SYSCALL_64_after_hwframe() IRQ tracing · 548c3050
      Andy Lutomirski authored
      
      When I added entry_SYSCALL_64_after_hwframe(), I left TRACE_IRQS_OFF
      before it.  This means that users of entry_SYSCALL_64_after_hwframe()
      were responsible for invoking TRACE_IRQS_OFF, and the one and only
      user (Xen, added in the same commit) got it wrong.
      
      I think this would manifest as a warning if a Xen PV guest with
      CONFIG_DEBUG_LOCKDEP=y were used with context tracking.  (The
      context tracking bit is to cause lockdep to get invoked before we
      turn IRQs back on.)  I haven't tested that for real yet because I
      can't get a kernel configured like that to boot at all on Xen PV.
      
      Move TRACE_IRQS_OFF below the label.
      
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bpetkov@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Fixes: 8a9949bc ("x86/xen/64: Rearrange the SYSCALL entries")
      Link: http://lkml.kernel.org/r/9150aac013b7b95d62c2336751d5b6e91d2722aa.1511325444.git.luto@kernel.org
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      548c3050
  16. Nov 21, 2017