Skip to content
Snippets Groups Projects
  1. Apr 24, 2015
    • Jiang Liu's avatar
      x86/irq: Save destination CPU ID in irq_cfg · 5f0052f9
      Jiang Liu authored
      
      Cache destination CPU APIC ID into struct irq_cfg when assigning vector
      for interrupt. Upper layer just needs to read the cached APIC ID instead
      of calling apic->cpu_mask_to_apicid_and(), it helps to hide APIC driver
      details from IOAPIC/HPET/MSI drivers..
      
      Signed-off-by: default avatarJiang Liu <jiang.liu@linux.intel.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: David Cohen <david.a.cohen@linux.intel.com>
      Cc: Sander Eikelenboom <linux@eikelenboom.it>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dimitri Sivanich <sivanich@sgi.com>
      Link: http://lkml.kernel.org/r/1428905519-23704-2-git-send-email-jiang.liu@linux.intel.com
      
      
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      5f0052f9
  2. Apr 18, 2015
  3. Apr 17, 2015
  4. Apr 16, 2015
    • Oleg Nesterov's avatar
      x86/ptrace: Fix the TIF_FORCED_TF logic in handle_signal() · fd0f86b6
      Oleg Nesterov authored
      
      When the TIF_SINGLESTEP tracee dequeues a signal,
      handle_signal() clears TIF_FORCED_TF and X86_EFLAGS_TF but
      leaves TIF_SINGLESTEP set.
      
      If the tracer does PTRACE_SINGLESTEP again, enable_single_step()
      sets X86_EFLAGS_TF but not TIF_FORCED_TF.  This means that the
      subsequent PTRACE_CONT doesn't not clear X86_EFLAGS_TF, and the
      tracee gets the wrong SIGTRAP.
      
      Test-case (needs -O2 to avoid prologue insns in signal handler):
      
      	#include <unistd.h>
      	#include <stdio.h>
      	#include <sys/ptrace.h>
      	#include <sys/wait.h>
      	#include <sys/user.h>
      	#include <assert.h>
      	#include <stddef.h>
      
      	void handler(int n)
      	{
      		asm("nop");
      	}
      
      	int child(void)
      	{
      		assert(ptrace(PTRACE_TRACEME, 0,0,0) == 0);
      		signal(SIGALRM, handler);
      		kill(getpid(), SIGALRM);
      		return 0x23;
      	}
      
      	void *getip(int pid)
      	{
      		return (void*)ptrace(PTRACE_PEEKUSER, pid,
      					offsetof(struct user, regs.rip), 0);
      	}
      
      	int main(void)
      	{
      		int pid, status;
      
      		pid = fork();
      		if (!pid)
      			return child();
      
      		assert(wait(&status) == pid);
      		assert(WIFSTOPPED(status) && WSTOPSIG(status) == SIGALRM);
      
      		assert(ptrace(PTRACE_SINGLESTEP, pid, 0, SIGALRM) == 0);
      		assert(wait(&status) == pid);
      		assert(WIFSTOPPED(status) && WSTOPSIG(status) == SIGTRAP);
      		assert((getip(pid) - (void*)handler) == 0);
      
      		assert(ptrace(PTRACE_SINGLESTEP, pid, 0, SIGALRM) == 0);
      		assert(wait(&status) == pid);
      		assert(WIFSTOPPED(status) && WSTOPSIG(status) == SIGTRAP);
      		assert((getip(pid) - (void*)handler) == 1);
      
      		assert(ptrace(PTRACE_CONT, pid, 0,0) == 0);
      		assert(wait(&status) == pid);
      		assert(WIFEXITED(status) && WEXITSTATUS(status) == 0x23);
      
      		return 0;
      	}
      
      The last assert() fails because PTRACE_CONT wrongly triggers
      another single-step and X86_EFLAGS_TF can't be cleared by
      debugger until the tracee does sys_rt_sigreturn().
      
      Change handle_signal() to do user_disable_single_step() if
      stepping, we do not need to preserve TIF_SINGLESTEP because we
      are going to do ptrace_notify(), and it is simply wrong to leak
      this bit.
      
      While at it, change the comment to explain why we also need to
      clear TF unconditionally after setup_rt_frame().
      
      Note: in the longer term we should probably change
      setup_sigcontext() to use get_flags() and then just remove this
      user_disable_single_step().  And, the state of TIF_FORCED_TF can
      be wrong after restore_sigcontext() which can set/clear TF, this
      needs another fix.
      
      This fix fixes the 'single_step_syscall_32' testcase in
      the x86 testsuite:
      
      Before:
      
      	~/linux/tools/testing/selftests/x86> ./single_step_syscall_32
      	[RUN]   Set TF and check nop
      	[OK]    Survived with TF set and 9 traps
      	[RUN]   Set TF and check int80
      	[OK]    Survived with TF set and 9 traps
      	[RUN]   Set TF and check a fast syscall
      	[WARN]  Hit 10000 SIGTRAPs with si_addr 0xf7789cc0, ip 0xf7789cc0
      	Trace/breakpoint trap (core dumped)
      
      After:
      
      	~/linux/linux/tools/testing/selftests/x86> ./single_step_syscall_32
      	[RUN]   Set TF and check nop
      	[OK]    Survived with TF set and 9 traps
      	[RUN]   Set TF and check int80
      	[OK]    Survived with TF set and 9 traps
      	[RUN]   Set TF and check a fast syscall
      	[OK]    Survived with TF set and 39 traps
      	[RUN]   Fast syscall with TF cleared
      	[OK]    Nothing unexpected happened
      
      Reported-by: default avatarEvan Teran <eteran@alum.rit.edu>
      Reported-by: default avatarPedro Alves <palves@redhat.com>
      Tested-by: default avatarAndres Freund <andres@anarazel.de>
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      [ Added x86 self-test info. ]
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      fd0f86b6
    • Joe Perches's avatar
      x86: mtrr: if: remove use of seq_printf return value · 3ac62bc0
      Joe Perches authored
      
      The seq_printf return value, because it's frequently misused,
      will eventually be converted to void.
      
      See: commit 1f33c41c ("seq_file: Rename seq_overflow() to
           seq_has_overflowed() and make public")
      
      Signed-off-by: default avatarJoe Perches <joe@perches.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3ac62bc0
  5. Apr 15, 2015
  6. Apr 13, 2015
  7. Apr 12, 2015
  8. Apr 11, 2015
  9. Apr 10, 2015
  10. Apr 09, 2015
  11. Apr 08, 2015
    • Denys Vlasenko's avatar
      x86/asm/entry/64: Add forgotten CFI annotation · 8b3607b5
      Denys Vlasenko authored
      
      Signed-off-by: default avatarDenys Vlasenko <dvlasenk@redhat.com>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Drewry <wad@chromium.org>
      Link: http://lkml.kernel.org/r/1428424967-14460-1-git-send-email-dvlasenk@redhat.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      8b3607b5
    • Denys Vlasenko's avatar
      x86/asm/entry/irq: Simplify interrupt dispatch table (IDT) layout · 3304c9c3
      Denys Vlasenko authored
      
      Interrupt entry points are handled with the following code,
      each 32-byte code block contains seven entry points:
      
      		...
      		[push][jump 22] // 4 bytes
      		[push][jump 18] // 4 bytes
      		[push][jump 14] // 4 bytes
      		[push][jump 10] // 4 bytes
      		[push][jump  6] // 4 bytes
      		[push][jump  2] // 4 bytes
      		[push][jump common_interrupt][padding] // 8 bytes
      
      		[push][jump]
      		[push][jump]
      		[push][jump]
      		[push][jump]
      		[push][jump]
      		[push][jump]
      		[push][jump common_interrupt][padding]
      
      		[padding_2]
      	common_interrupt:
      
      And there is a table which holds pointers to every entry point,
      IOW: to every push.
      
      In cold cache, two jumps are still costlier than one, even
      though we get the benefit of them residing in the same
      cacheline.
      
      This change replaces short jumps with near ones to
      'common_interrupt', and pads every push+jump pair to 8 bytes. This
      way, each interrupt takes only one jump.
      
      This change replaces ".p2align CONFIG_X86_L1_CACHE_SHIFT" before
      dispatch table with ".align 8" - we do not need anything
      stronger than that.
      
      The table of entry addresses (the interrupt[] array) is no
      longer necessary, the address of entries can be easily
      calculated as (irq_entries_start + i*8).
      
         text	   data	    bss	    dec	    hex	filename
        12546	      0	      0	  12546	   3102	entry_64.o.before
        11626	      0	      0	  11626	   2d6a	entry_64.o
      
      The size decrease is because 1656 bytes of .init.rodata are
      gone. That's initdata, though. The resident size does go up a
      bit.
      
      Run-tested (32 and 64 bits).
      
      Acked-and-Tested-by: default avatarBorislav Petkov <bp@suse.de>
      Signed-off-by: default avatarDenys Vlasenko <dvlasenk@redhat.com>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Will Drewry <wad@chromium.org>
      Link: http://lkml.kernel.org/r/1428090553-7283-1-git-send-email-dvlasenk@redhat.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      3304c9c3
    • Denys Vlasenko's avatar
      x86/asm/entry/64: Move opportunistic sysret code to syscall code path · fffbb5dc
      Denys Vlasenko authored
      
      This change does two things:
      
      Copy-pastes "retint_swapgs:" code into syscall handling code,
      the copy is under "syscall_return:" label. The code is unchanged
      apart from some label renames.
      
      Removes "opportunistic sysret" code from "retint_swapgs:" code
      block, since now it won't be reached by syscall return. This in
      fact removes most of the code in question.
      
         text	   data	    bss	    dec	    hex	filename
        12530	      0	      0	  12530	   30f2	entry_64.o.before
        12562	      0	      0	  12562	   3112	entry_64.o
      
      Run-tested.
      
      Acked-and-Tested-by: default avatarBorislav Petkov <bp@suse.de>
      Signed-off-by: default avatarDenys Vlasenko <dvlasenk@redhat.com>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Will Drewry <wad@chromium.org>
      Link: http://lkml.kernel.org/r/1427993219-7291-1-git-send-email-dvlasenk@redhat.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      fffbb5dc
  12. Apr 06, 2015
    • Borislav Petkov's avatar
      x86/alternatives: Guard NOPs optimization · 69df353f
      Borislav Petkov authored
      
      Take a look at the first instruction byte before optimizing the NOP -
      there might be something else there already, like the ALTERNATIVE_2()
      in rdtsc_barrier() which NOPs out on AMD even though we just
      patched in an MFENCE.
      
      This happens because the alternatives sees X86_FEATURE_MFENCE_RDTSC,
      AMD CPUs set it, we patch in the MFENCE and right afterwards it sees
      X86_FEATURE_LFENCE_RDTSC which AMD CPUs don't set and we blindly
      optimize the NOP.
      
      Checking whether at least the first byte is 0x90 prevents that.
      
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1428181662-18020-1-git-send-email-bp@alien8.de
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      69df353f
    • Denys Vlasenko's avatar
      x86/asm/entry: Clear EXTRA_REGS for all executable formats · fc3e958a
      Denys Vlasenko authored
      
      On failure, sys_execve() does not clobber EXTRA_REGS, so we can
      just return to userpsace without saving/restoring them.
      
      On success, ELF_PLAT_INIT() in sys_execve() clears all these
      registers.
      
      On other executable formats:
      
        - binfmt_flat.c has similar FLAT_PLAT_INIT, but x86 (and everyone
          else except sh) doesn't define it.
      
        - binfmt_elf_fdpic.c has ELF_FDPIC_PLAT_INIT, but x86 (and most
          others) doesn't define it.
      
        - There are no such hooks in binfmt_aout.c et al. We inherit
          EXTRA_REGS from the prior executable.
      
      This inconsistency was not intended.
      
      This change removes SAVE/RESTORE_EXTRA_REGS in stub_execve,
      removes register clearing in ELF_PLAT_INIT(),
      and instead simply clears them on success in stub_execve.
      
      Run-tested.
      
      Signed-off-by: default avatarDenys Vlasenko <dvlasenk@redhat.com>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Will Drewry <wad@chromium.org>
      Link: http://lkml.kernel.org/r/1428173719-7637-1-git-send-email-dvlasenk@redhat.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      fc3e958a
    • Brian Gerst's avatar
      x86/signal: Remove pax argument from restore_sigcontext · 6a3713f0
      Brian Gerst authored
      The 'pax' argument is unnecesary.  Instead, store the RAX value
      directly in regs.
      
      This pattern goes all the way back to 2.1.106pre1, when restore_sigcontext()
      was changed to return an error code instead of EAX directly:
      
        https://git.kernel.org/cgit/linux/kernel/git/history/history.git/diff/arch/i386/kernel/signal.c?id=9a8f8b7ca3f319bd668298d447bdf32730e51174
      
      
      
      In 2007 sigaltstack syscall support was added, where the return
      value of restore_sigcontext() was changed to carry the memory-copying
      failure code.
      
      But instead of putting 'ax' into regs->ax directly, it was carried
      in via a pointer and then returned, where the generic syscall return
      code copied it to regs->ax.
      
      So there was never any deeper reason for this suboptimal pattern, it
      was simply never noticed after being introduced.
      
      Signed-off-by: default avatarBrian Gerst <brgerst@gmail.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1428152303-17154-1-git-send-email-brgerst@gmail.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      6a3713f0
  13. Apr 04, 2015
    • Borislav Petkov's avatar
      x86/alternatives: Fix ALTERNATIVE_2 padding generation properly · dbe4058a
      Borislav Petkov authored
      
      Quentin caught a corner case with the generation of instruction
      padding in the ALTERNATIVE_2 macro: if len(orig_insn) <
      len(alt1) < len(alt2), then not enough padding gets added and
      that is not good(tm) as we could overwrite the beginning of the
      next instruction.
      
      Luckily, at the time of this writing, we don't have
      ALTERNATIVE_2() invocations which have that problem and even if
      we did, a simple fix would be to prepend the instructions with
      enough prefixes so that that corner case doesn't happen.
      
      However, best it would be if we fixed it properly. See below for
      a simple, abstracted example of what we're doing.
      
      So what we ended up doing is, we compute the
      
      	max(len(alt1), len(alt2)) - len(orig_insn)
      
      and feed that value to the .skip gas directive. The max() cannot
      have conditionals due to gas limitations, thus the fancy integer
      math.
      
      With this patch, all ALTERNATIVE_2 sites get padded correctly;
      generating obscure test cases pass too:
      
        #define alt_max_short(a, b)    ((a) ^ (((a) ^ (b)) & -(-((a) < (b)))))
      
        #define gen_skip(orig, alt1, alt2, marker)	\
        	.skip -((alt_max_short(alt1, alt2) - (orig)) > 0) * \
        		(alt_max_short(alt1, alt2) - (orig)),marker
      
        	.pushsection .text, "ax"
        .globl main
        main:
        	gen_skip(1, 2, 4, 0x09)
        	gen_skip(4, 1, 2, 0x10)
        	...
        	.popsection
      
      Thanks to Quentin for catching it and double-checking the fix!
      
      Reported-by: default avatarQuentin Casasnovas <quentin.casasnovas@oracle.com>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20150404133443.GE21152@pd.tnic
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      dbe4058a
  14. Apr 03, 2015