Skip to content
Snippets Groups Projects
  1. Mar 12, 2017
    • Daniel Borkmann's avatar
      x86/tlb: Fix tlb flushing when lguest clears PGE · 2c4ea6e2
      Daniel Borkmann authored
      
      Fengguang reported random corruptions from various locations on x86-32
      after commits d2852a22 ("arch: add ARCH_HAS_SET_MEMORY config") and
      9d876e79 ("bpf: fix unlocking of jited image when module ronx not set")
      that uses the former. While x86-32 doesn't have a JIT like x86_64, the
      bpf_prog_lock_ro() and bpf_prog_unlock_ro() got enabled due to
      ARCH_HAS_SET_MEMORY, whereas Fengguang's test kernel doesn't have module
      support built in and therefore never had the DEBUG_SET_MODULE_RONX setting
      enabled.
      
      After investigating the crashes further, it turned out that using
      set_memory_ro() and set_memory_rw() didn't have the desired effect, for
      example, setting the pages as read-only on x86-32 would still let
      probe_kernel_write() succeed without error. This behavior would manifest
      itself in situations where the vmalloc'ed buffer was accessed prior to
      set_memory_*() such as in case of bpf_prog_alloc(). In cases where it
      wasn't, the page attribute changes seemed to have taken effect, leading to
      the conclusion that a TLB invalidate didn't happen. Moreover, it turned out
      that this issue reproduced with qemu in "-cpu kvm64" mode, but not for
      "-cpu host". When the issue occurs, change_page_attr_set_clr() did trigger
      a TLB flush as expected via __flush_tlb_all() through cpa_flush_range(),
      though.
      
      There are 3 variants for issuing a TLB flush: invpcid_flush_all() (depends
      on CPU feature bits X86_FEATURE_INVPCID, X86_FEATURE_PGE), cr4 based flush
      (depends on X86_FEATURE_PGE), and cr3 based flush.  For "-cpu host" case in
      my setup, the flush used invpcid_flush_all() variant, whereas for "-cpu
      kvm64", the flush was cr4 based. Switching the kvm64 case to cr3 manually
      worked fine, and further investigating the cr4 one turned out that
      X86_CR4_PGE bit was not set in cr4 register, meaning the
      __native_flush_tlb_global_irq_disabled() wrote cr4 twice with the same
      value instead of clearing X86_CR4_PGE in the first write to trigger the
      flush.
      
      It turned out that X86_CR4_PGE was cleared from cr4 during init from
      lguest_arch_host_init() via adjust_pge(). The X86_FEATURE_PGE bit is also
      cleared from there due to concerns of using PGE in guest kernel that can
      lead to hard to trace bugs (see bff672e6 ("lguest: documentation V:
      Host") in init()). The CPU feature bits are cleared in dynamic
      boot_cpu_data, but they never propagated to __flush_tlb_all() as it uses
      static_cpu_has() instead of boot_cpu_has() for testing which variant of TLB
      flushing to use, meaning they still used the old setting of the host
      kernel.
      
      Clearing via setup_clear_cpu_cap(X86_FEATURE_PGE) so this would propagate
      to static_cpu_has() checks is too late at this point as sections have been
      patched already, so for now, it seems reasonable to switch back to
      boot_cpu_has(X86_FEATURE_PGE) as it was prior to commit c109bf95
      ("x86/cpufeature: Remove cpu_has_pge"). This lets the TLB flush trigger via
      cr3 as originally intended, properly makes the new page attributes visible
      and thus fixes the crashes seen by Fengguang.
      
      Fixes: c109bf95 ("x86/cpufeature: Remove cpu_has_pge")
      Reported-by: default avatarFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Cc: bp@suse.de
      Cc: Kees Cook <keescook@chromium.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: netdev@vger.kernel.org
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: lkp@01.org
      Cc: Laura Abbott <labbott@redhat.com>
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernrl.org/r/20170301125426.l4nf65rx4wahohyl@wfg-t540p.sh.intel.com
      Link: http://lkml.kernel.org/r/25c41ad9eca164be4db9ad84f768965b7eb19d9e.1489191673.git.daniel@iogearbox.net
      
      
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      2c4ea6e2
  2. Mar 11, 2017
    • Thomas Gleixner's avatar
      kexec, x86/purgatory: Unbreak it and clean it up · 40c50c1f
      Thomas Gleixner authored
      
      The purgatory code defines global variables which are referenced via a
      symbol lookup in the kexec code (core and arch).
      
      A recent commit addressing sparse warnings made these static and thereby
      broke kexec_file.
      
      Why did this happen? Simply because the whole machinery is undocumented and
      lacks any form of forward declarations. The variable names are unspecific
      and lack a prefix, so adding forward declarations creates shadow variables
      in the core code. Aside of that the code relies on magic constants and
      duplicate struct definitions with no way to ensure that these things stay
      in sync. The section placement of the purgatory variables happened by
      chance and not by design.
      
      Unbreak kexec and cleanup the mess:
      
       - Add proper forward declarations and document the usage
       - Use common struct definition
       - Use the proper common defines instead of magic constants
       - Add a purgatory_ prefix to have a proper name space
       - Use ARRAY_SIZE() instead of a homebrewn reimplementation
       - Add proper sections to the purgatory variables [ From Mike ]
      
      Fixes: 72042a8c ("x86/purgatory: Make functions and variables static")
      Reported-by: default avatarMike Galbraith <&lt;efault@gmx.de>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Nicholas Mc Guire <der.herr@hofr.at>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: "Tobin C. Harding" <me@tobin.cc>
      Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1703101315140.3681@nanos
      
      
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      40c50c1f
  3. Mar 10, 2017
  4. Mar 09, 2017
    • Linus Torvalds's avatar
      Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · ea6200e8
      Linus Torvalds authored
      Pull sched.h split-up fixes for MIPS from Ingo Molnar:
       "These are the fixes for MIPS build failures due to the sched.h
        split-up, from Arnd Bergmann"
      
      * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        MIPS: Add missing include files
      ea6200e8
    • Tony Luck's avatar
      mm, page_alloc: Add missing check for memory holes · b4fb8f66
      Tony Luck authored
      
      Commit 13ad59df ("mm, page_alloc: avoid page_to_pfn() when merging
      buddies") moved the check for memory holes out of page_is_buddy() and
      had the callers do the check.
      
      But this wasn't done correctly in one place which caused ia64 to crash
      very early in boot.
      
      Update to fix that and make ia64 boot again.
      
      [ v2: Vlastimil pointed out we don't need to call page_to_pfn()
            since we already have the result of that in "buddy_pfn" ]
      
      Fixes: 13ad59df ("avoid page_to_pfn() when merging buddies")
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarTony Luck <tony.luck@intel.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b4fb8f66
    • Linus Torvalds's avatar
      Merge tag 'ktest-v4.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest · 8557b8e4
      Linus Torvalds authored
      Pull ktest fixes from Steven Rostedt:
       "Greg Kroah-Hartman reported to me that the ktest of v4.11-rc1 locked
        up in an infinite loop while doing the make mrproper.
      
        Looking into the cause I noticed that a recent update to the function
        run_command (used for running all shell commands, including "make
        mrproper") changed the internal loop to use the function
        wait_for_input.
      
        The wait_for_input function uses select to look at two file
        descriptors. One is the file descriptor of the command it is running,
        the other is STDIN. The STDIN check was not checking the return status
        of the sysread call, and was also just writing a lot of data into
        syswrite without regard to the size of the data read.
      
        Changing the code to check the return status of sysread, and also to
        still process the passed in descriptor data without looping back to
        the select fixed Greg's problem.
      
        While looking at this code I also realized that the loop did not honor
        the timeout if STDIN always had input (or for some reason return
        error). this could prevent wait_for_input to timeout on the file
        descriptor it is suppose to be waiting for. That is fixed too"
      
      * tag 'ktest-v4.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest:
        ktest: Make sure wait_for_input does honor the timeout
        ktest: Fix while loop in wait_for_input
      8557b8e4
    • Linus Torvalds's avatar
      overlayfs: remove now unnecessary header file include · 04bb94b1
      Linus Torvalds authored
      
      This removes the extra include header file that was added in commit
      e58bc927 "Pull overlayfs updates from Miklos Szeredi" now that it
      is no longer needed.
      
      There are probably other such includes that got added during the
      scheduler header splitup series, but this is the one that annoyed me
      personally and I know about.
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      04bb94b1
    • Linus Torvalds's avatar
      sched/headers: fix up header file dependency on <linux/sched/signal.h> · bd0f9b35
      Linus Torvalds authored
      
      The scheduler header file split and cleanups ended up exposing a few
      nasty header file dependencies, and in particular it showed how we in
      <linux/wait.h> ended up depending on "signal_pending()", which now comes
      from <linux/sched/signal.h>.
      
      That's a very subtle and annoying dependency, which already caused a
      semantic merge conflict (see commit e58bc927 "Pull overlayfs updates
      from Miklos Szeredi", which added that fixup in the merge commit).
      
      It turns out that we can avoid this dependency _and_ improve code
      generation by moving the guts of the fairly nasty helper #define
      __wait_event_interruptible_locked() to out-of-line code.  The code that
      includes the signal_pending() check is all in the slow-path where we
      actually go to sleep waiting for the event anyway, so using a helper
      function is the right thing to do.
      
      Using a helper function is also what we already did for the non-locked
      versions, see the "__wait_event*()" macros and the "prepare_to_wait*()"
      set of helper functions.
      
      We might want to try to unify all these macro games, we have a _lot_ of
      subtly different wait-event loops.  But this is the minimal patch to fix
      the annoying header dependency.
      
      Acked-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      bd0f9b35
  5. Mar 08, 2017
  6. Mar 07, 2017