- Jan 15, 2016
-
-
Thomas Gleixner authored
Use an explicit goto for the cases where we have success in the search/update and return -ENOSPC if the search loop ends due to no space. Preparatory patch for fixes. No functional change. Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Tested-by:
Borislav Petkov <bp@alien8.de> Tested-by:
Joe Lawrence <joe.lawrence@stratus.com> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Jeremiah Mahler <jmmahler@gmail.com> Cc: andy.shevchenko@gmail.com Cc: Guenter Roeck <linux@roeck-us.net> Cc: stable@vger.kernel.org #4.3+ Link: http://lkml.kernel.org/r/20151231160106.403491024@linutronix.de Signed-off-by:
Thomas Gleixner <tglx@linutronix.de>
-
Jiang Liu authored
Function __assign_irq_vector() makes use of apic_chip_data.old_domain as a temporary buffer, which is in the way of using apic_chip_data.old_domain for synchronizing the vector cleanup with the vector assignement code. Use a proper temporary cpumask for this. [ tglx: Renamed the mask to searched_cpumask for clarity ] Signed-off-by:
Jiang Liu <jiang.liu@linux.intel.com> Tested-by:
Borislav Petkov <bp@alien8.de> Tested-by:
Joe Lawrence <joe.lawrence@stratus.com> Cc: Jeremiah Mahler <jmmahler@gmail.com> Cc: andy.shevchenko@gmail.com Cc: Guenter Roeck <linux@roeck-us.net> Cc: stable@vger.kernel.org #4.3+ Link: http://lkml.kernel.org/r/1450880014-11741-1-git-send-email-jiang.liu@linux.intel.com Signed-off-by:
Thomas Gleixner <tglx@linutronix.de>
-
Thomas Gleixner authored
In fixup_irqs() we unconditionally dereference the irq chip of an irq descriptor. The descriptor might still be valid, but already cleaned up, i.e. the chip removed. Add a check for this condition. Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Joe Lawrence <joe.lawrence@stratus.com> Cc: Jeremiah Mahler <jmmahler@gmail.com> Cc: Borislav Petkov <bp@alien8.de> Cc: andy.shevchenko@gmail.com Cc: Guenter Roeck <linux@roeck-us.net> Cc: stable@vger.kernel.org #4.3+ Link: http://lkml.kernel.org/r/20151231160106.236423282@linutronix.de Signed-off-by:
Thomas Gleixner <tglx@linutronix.de>
-
Jiang Liu authored
There's a race condition between x86_vector_free_irqs() { free_apic_chip_data(irq_data->chip_data); xxxxx //irq_data->chip_data has been freed, but the pointer //hasn't been reset yet irq_domain_reset_irq_data(irq_data); } and smp_irq_move_cleanup_interrupt() { raw_spin_lock(&vector_lock); data = apic_chip_data(irq_desc_get_irq_data(desc)); access data->xxxx // may access freed memory raw_spin_unlock(&desc->lock); } which may cause smp_irq_move_cleanup_interrupt() to access freed memory. Call irq_domain_reset_irq_data(), which clears the pointer with vector lock held. [ tglx: Free memory outside of lock held region. ] Signed-off-by:
Jiang Liu <jiang.liu@linux.intel.com> Tested-by:
Borislav Petkov <bp@alien8.de> Tested-by:
Joe Lawrence <joe.lawrence@stratus.com> Cc: Jeremiah Mahler <jmmahler@gmail.com> Cc: andy.shevchenko@gmail.com Cc: Guenter Roeck <linux@roeck-us.net> Cc: stable@vger.kernel.org #4.3+ Link: http://lkml.kernel.org/r/1450880014-11741-3-git-send-email-jiang.liu@linux.intel.com Signed-off-by:
Thomas Gleixner <tglx@linutronix.de>
-
Thomas Gleixner authored
setup_ioapic_dest() calls irqchip->irq_set_affinity() completely unprotected. That's wrong in several aspects: - it opens a race window where irq_set_affinity() can be interrupted and the irq chip left in unconsistent state. - it triggers a lockdep splat when we fix the vector race for 4.3+ because vector lock is taken with interrupts enabled. The proper calling convention is irq descriptor lock held and interrupts disabled. Reported-and-tested-by:
Borislav Petkov <bp@alien8.de> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Jeremiah Mahler <jmmahler@gmail.com> Cc: andy.shevchenko@gmail.com Cc: Guenter Roeck <linux@roeck-us.net> Cc: Joe Lawrence <joe.lawrence@stratus.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1601140919420.3575@nanos Signed-off-by:
Thomas Gleixner <tglx@linutronix.de>
-
- Jan 14, 2016
-
-
Dan Carpenter authored
Originally we calculated ht_nodeid as "ht_nodeid = apicid - boot_cpu_id;" so presumably it could be negative. But after commit: 01aaea1a ('x86: introduce initial apicid') we use c->initial_apicid which is an unsigned short and thus always >= 0. It causes a static checker warning to test for impossible conditions so let's remove it. Signed-off-by:
Dan Carpenter <dan.carpenter@oracle.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Cc: Borislav Petkov <bp@suse.de> Cc: Hector Marco-Gisbert <hecmargi@upv.es> Cc: Huang Rui <ray.huang@amd.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yinghai Lu <yhlu.kernel@gmail.com> Link: http://lkml.kernel.org/r/20160113123940.GE19993@mwanda Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
- Jan 13, 2016
-
-
Andy Lutomirski authored
If the clock becomes unstable while we're reading it, we need to bail. We can do this by simply moving the check into the seqcount loop. Reported-by:
Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by:
Andy Lutomirski <luto@kernel.org> Cc: Alexander Graf <agraf@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Radim Krcmar <rkrcmar@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/755dcedb17269e1d7ce12a9a713dea303835137e.1451949191.git.luto@kernel.org Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
Andy Lutomirski authored
My previous comments were still a bit confusing and there was a typo. Fix it up. Reported-by:
Peter Zijlstra <peterz@infradead.org> Signed-off-by:
Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Fixes: 71b3c126 ("x86/mm: Add barriers and document switch_mm()-vs-flush synchronization") Link: http://lkml.kernel.org/r/0a0b43cdcdd241c5faaaecfbcc91a155ddedc9a1.1452631609.git.luto@kernel.org Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
- Jan 12, 2016
-
-
Mario Kleiner authored
Without the reboot=pci method, the iMac 10,1 simply hangs after printing "Restarting system" at the point when it should reboot. This fixes it. Signed-off-by:
Mario Kleiner <mario.kleiner.de@gmail.com> Cc: <stable@vger.kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Jones <davej@codemonkey.org.uk> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1450466646-26663-1-git-send-email-mario.kleiner.de@gmail.com Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
Rusty Russell authored
Pavel noted that lguest maps the switcher code executable and read-write. This is a bad idea for any kernel text, but particularly for text mapped at a fixed address. Create two vmas, one for the text (PAGE_KERNEL_RX) and another for the stacks (PAGE_KERNEL). Use VM_NO_GUARD to map them adjacent (as expected by the rest of the code). Reported-by:
Pavel Machek <pavel@ucw.cz> Tested-by:
Pavel Machek <pavel@ucw.cz> Signed-off-by:
Rusty Russell <rusty@rustcorp.com.au> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
Borislav Petkov authored
... from the final ELF image's symbol table as they're not really needed there. Before: $ readelf -a vmlinux | grep verify_cpu 43: ffffffff810001a9 0 NOTYPE LOCAL DEFAULT 1 verify_cpu 45: ffffffff8100028f 0 NOTYPE LOCAL DEFAULT 1 verify_cpu_no_longmode 46: ffffffff810001de 0 NOTYPE LOCAL DEFAULT 1 verify_cpu_noamd 47: ffffffff8100022b 0 NOTYPE LOCAL DEFAULT 1 verify_cpu_check 48: ffffffff8100021c 0 NOTYPE LOCAL DEFAULT 1 verify_cpu_clear_xd 49: ffffffff81000263 0 NOTYPE LOCAL DEFAULT 1 verify_cpu_sse_test 50: ffffffff81000296 0 NOTYPE LOCAL DEFAULT 1 verify_cpu_sse_ok After: $ readelf -a vmlinux | grep verify_cpu 43: ffffffff810001a9 0 NOTYPE LOCAL DEFAULT 1 verify_cpu No functionality change. Signed-off-by:
Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1451860733-21163-1-git-send-email-bp@alien8.de Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
yu-cheng yu authored
When "eagerfpu=off" is given as a command-line input, the kernel should disable AVX support. The Task Switched bit used for lazy context switching does not support AVX. If AVX is enabled without eagerfpu context switching, one task's AVX state could become corrupted or leak to other tasks. This is a bug and has bad security implications. This only affects systems that have AVX/AVX2/AVX512 and this issue will be found only when one actually uses AVX/AVX2/AVX512 _AND_ does eagerfpu=off. Reference: Intel Software Developer's Manual Vol. 3A Sec. 2.5 Control Registers: TS Task Switched bit (bit 3 of CR0) -- Allows the saving of the x87 FPU/ MMX/SSE/SSE2/SSE3/SSSE3/SSE4 context on a task switch to be delayed until an x87 FPU/MMX/SSE/SSE2/SSE3/SSSE3/SSE4 instruction is actually executed by the new task. Sec. 13.4.1 Using the TS Flag to Control the Saving of the X87 FPU and SSE State When the TS flag is set, the processor monitors the instruction stream for x87 FPU, MMX, SSE instructions. When the processor detects one of these instructions, it raises a device-not-available exeception (#NM) prior to executing the instruction. Signed-off-by:
Yu-cheng Yu <yu-cheng.yu@intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Ravi V. Shankar <ravi.v.shankar@intel.com> Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: yu-cheng yu <yu-cheng.yu@intel.com> Link: http://lkml.kernel.org/r/1452119094-7252-5-git-send-email-yu-cheng.yu@intel.com Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
yu-cheng yu authored
This issue is a fallout from the command-line parsing move. When "eagerfpu=off" is given as a command-line input, the kernel should disable MPX support. The decision for turning off MPX was made in fpu__init_system_ctx_switch(), which is after the selection of the XSAVE format. This patch fixes it by getting that decision done earlier in fpu__init_system_xstate(). Signed-off-by:
Yu-cheng Yu <yu-cheng.yu@intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Ravi V. Shankar <ravi.v.shankar@intel.com> Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: yu-cheng yu <yu-cheng.yu@intel.com> Link: http://lkml.kernel.org/r/1452119094-7252-4-git-send-email-yu-cheng.yu@intel.com Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
yu-cheng yu authored
When "noxsave" is given as a command-line input, the kernel should disable XGETBV1. This issue currently does not cause any actual problems. XGETBV1 is only useful if we have something using the 'init optimization' (i.e. xsaveopt, xsaves). We already clear both of those in fpu__xstate_clear_all_cpu_caps(). But this is good for completeness. Signed-off-by:
Yu-cheng Yu <yu-cheng.yu@intel.com> Reviewed-by:
Dave Hansen <dave.hansen@intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Ravi V. Shankar <ravi.v.shankar@intel.com> Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: yu-cheng yu <yu-cheng.yu@intel.com> Link: http://lkml.kernel.org/r/1452119094-7252-3-git-send-email-yu-cheng.yu@intel.com Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
yu-cheng yu authored
The function fpu__init_system() is executed before parse_early_param(). This causes wrong FPU configuration. This patch fixes this issue by parsing boot_command_line in the beginning of fpu__init_system(). With all four patches in this series, each parameter disables features as the following: eagerfpu=off: eagerfpu, avx, avx2, avx512, mpx no387: fpu nofxsr: fxsr, fxsropt, xmm noxsave: xsave, xsaveopt, xsaves, xsavec, avx, avx2, avx512, mpx, xgetbv1 noxsaveopt: xsaveopt noxsaves: xsaves Signed-off-by:
Yu-cheng Yu <yu-cheng.yu@intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Ravi V. Shankar <ravi.v.shankar@intel.com> Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: yu-cheng yu <yu-cheng.yu@intel.com> Link: http://lkml.kernel.org/r/1452119094-7252-2-git-send-email-yu-cheng.yu@intel.com Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
王克锋 authored
Use PAGE_ALIGEND macro in <linux/mm.h> to simplify code. Signed-off-by:
Kefeng Wang <wangkefeng.wang@huawei.com> Cc: <guohanjun@huawei.com> Cc: Alexander Kuleshov <kuleshovmail@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1452565170-11083-1-git-send-email-wangkefeng.wang@huawei.com Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
Dave Jones authored
In CONFIG_PAGEALLOC_DEBUG=y builds, we disable 2M pages. Unfortunatly when we split up mappings during boot, split_page_count() doesn't take this into account, and starts decrementing an empty direct_pages_count[] level. This results in /proc/meminfo showing crazy things like: DirectMap2M: 18446744073709543424 kB Signed-off-by:
Dave Jones <davej@codemonkey.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Toshi Kani <toshi.kani@hp.com> Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
- Jan 11, 2016
-
-
H.J. Lu authored
When decompressing kernel image during x86 bootup, malloc memory for ELF program headers may run out of heap space, which leads to system halt. This patch doubles BOOT_HEAP_SIZE to 64KB. Tested with 32-bit kernel which failed to boot without this patch. Signed-off-by:
H.J. Lu <hjl.tools@gmail.com> Acked-by:
H. Peter Anvin <hpa@zytor.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Cc: <stable@vger.kernel.org> Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
Andy Lutomirski authored
When switch_mm() activates a new PGD, it also sets a bit that tells other CPUs that the PGD is in use so that TLB flush IPIs will be sent. In order for that to work correctly, the bit needs to be visible prior to loading the PGD and therefore starting to fill the local TLB. Document all the barriers that make this work correctly and add a couple that were missing. Signed-off-by:
Andy Lutomirski <luto@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Cc: stable@vger.kernel.org Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
- Jan 09, 2016
-
-
Chris Wilson authored
Whilst inspecting the asm for clflush_cache_range() and some perf profiles that required extensive flushing of single cachelines (from part of the intel-gpu-tools GPU benchmarks), we noticed that gcc was reloading boot_cpu_data.x86_clflush_size on every iteration of the loop. We can manually hoist that read which perf regarded as taking ~25% of the function time for a single cacheline flush. Signed-off-by:
Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by:
Ross Zwisler <ross.zwisler@linux.intel.com> Acked-by:
"H. Peter Anvin" <hpa@zytor.com> Cc: Toshi Kani <toshi.kani@hpe.com> Cc: Borislav Petkov <bp@suse.de> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Sai Praneeth <sai.praneeth.prakhya@intel.com> Link: http://lkml.kernel.org/r/1452246933-10890-1-git-send-email-chris@chris-wilson.co.uk Signed-off-by:
Thomas Gleixner <tglx@linutronix.de>
-
- Jan 07, 2016
-
-
Paolo Bonzini authored
While setting the KVM PIT counters in 'kvm_pit_load_count', if 'hpet_legacy_start' is set, the function disables the timer on channel[0], instead of the respective index 'channel'. This is because channels 1-3 are not linked to the HPET. Fix the caller to only activate the special HPET processing for channel 0. Reported-by:
P J P <pjp@fedoraproject.org> Fixes: 0185604c Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- Jan 06, 2016
-
-
Vince Weaver authored
This is a long standing bug with the l1-dcache-stores generic event on AMD machines. My perf_event testsuite has been complaining about this for years and I'm finally getting around to trying to get it fixed. The data_cache_refills:system event does not make sense for l1-dcache-stores. Maybe this was a typo and it was meant to be for l1-dcache-store-misses? In any case, the values returned are nowhere near correct for l1-dcache-stores and in fact the umask values for the event have completely changed with fam15h so it makes even less sense than ever. So just remove it. Signed-off-by:
Vince Weaver <vincent.weaver@maine.edu> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1512091134350.24311@vincent-weaver-1.umelst.maine.edu Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
Harish Chegondi authored
Knights Landing uncore performance monitoring (perfmon) is derived from Haswell-EP uncore perfmon with several differences. One notable difference is in PCI device IDs. Knights Landing uses common PCI device ID for multiple instances of an uncore PMU device type. In Haswell-EP, each instance of a PMU device type has a unique device ID. Knights Landing uncore components that have performance monitoring units are UBOX, CHA, EDC, MC, M2PCIe, IRP and PCU. Perfmon registers in EDC, MC, IRP, and M2PCIe reside in the PCIe configuration space. Perfmon registers in UBOX, CHA and PCU are accessed via the MSR interface. For more details, please refer to the public document: https://software.intel.com/sites/default/files/managed/15/8d/IntelXeonPhi%E2%84%A2x200ProcessorPerformanceMonitoringReferenceManual_Volume1_Registers_v0%206.pdf Signed-off-by:
Harish Chegondi <harish.chegondi@intel.com> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andi Kleen <andi.kleen@intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Harish Chegondi <harish.chegondi@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Lukasz Anaczkowski <lukasz.anaczkowski@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/8ac513981264c3eb10343a3f523f19cc5a2d12fe.1449470704.git.harish.chegondi@intel.com Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
Harish Chegondi authored
Call uncore_pci_box_ctl() function to get the PMON box control MSR offset instead of hard coding the offset. This would allow us to use this snbep_uncore_pci_init_box() function for other PCI PMON devices whose box control MSR offset is different from SNBEP_PCI_PMON_BOX_CTL. Signed-off-by:
Harish Chegondi <harish.chegondi@intel.com> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andi Kleen <andi.kleen@intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Harish Chegondi <harish.chegondi@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Lukasz Anaczkowski <lukasz.anaczkowski@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/872e8ef16cfc38e5ff3b45fac1094e6f1722e4ad.1449470704.git.harish.chegondi@intel.com Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
Harish Chegondi authored
Knights Landing core is based on Silvermont core with several differences. Like Silvermont, Knights Landing has 8 pairs of LBR MSRs. However, the LBR MSRs addresses match those of the Xeon cores' first 8 pairs of LBR MSRs Unlike Silvermont, Knights Landing supports hyperthreading. Knights Landing offcore response events config register mask is different from that of the Silvermont. This patch was developed based on a patch from Andi Kleen. For more details, please refer to the public document: https://software.intel.com/sites/default/files/managed/15/8d/IntelXeonPhi%E2%84%A2x200ProcessorPerformanceMonitoringReferenceManual_Volume1_Registers_v0%206.pdf Signed-off-by:
Harish Chegondi <harish.chegondi@intel.com> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andi Kleen <andi.kleen@intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Harish Chegondi <harish.chegondi@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Lukasz Anaczkowski <lukasz.anaczkowski@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/d14593c7311f78c93c9cf6b006be843777c5ad5c.1449517401.git.harish.chegondi@intel.com Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
Kan Liang authored
The uncore subsystem for Broadwell-EP is similar to Haswell-EP. There are some differences in pci device IDs, box number and constraints. This patch extends the Broadwell-DE codes to support Broadwell-EP. Signed-off-by:
Kan Liang <kan.liang@intel.com> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/1449176411-9499-1-git-send-email-kan.liang@intel.com Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
Huang Rui authored
Actually, rapl_sysfs_show is a duplicate of perf_event_sysfs_show. We prefer to use the unified interface. Signed-off-by:
Huang Rui <ray.huang@amd.com> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@suse.de> Cc: Dasaratharaman Chandramouli<dasaratharaman.chandramouli@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Richter <rric@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/1449223661-2437-1-git-send-email-ray.huang@amd.com Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
Stephane Eranian authored
This patch updates the PEBS support for Intel Atom to provide an alias for the cycles:pp event used by perf record/top by default nowadays. On Atom, only INST_RETIRED:ANY supports PEBS, so we use this event instead with a large cmask to count cycles. Given that Core2 has the same issue, we use the intel_pebs_aliases_core2() function for Atom as well. Signed-off-by:
Stephane Eranian <eranian@google.com> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: kan.liang@intel.com Link: http://lkml.kernel.org/r/1449172990-30183-3-git-send-email-eranian@google.com Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
Stephane Eranian authored
This patch fixes broken PEBS support on Intel Atom and Core2 due to wrong pointer arithmetic in intel_pmu_drain_pebs_core(). The get_next_pebs_record_by_bit() was called on PEBS format fmt0 which does not use the pebs_record_nhm layout. Signed-off-by:
Stephane Eranian <eranian@google.com> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: kan.liang@intel.com Fixes: 21509084 ("perf/x86/intel: Handle multiple records in the PEBS buffer") Link: http://lkml.kernel.org/r/1449182000-31524-3-git-send-email-eranian@google.com Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
Stephane Eranian authored
This patches fixes the LBR kernel crashes on Intel Atom. The kernel was assuming that if the CPU supports 64-bit format LBR, then it has an LBR_SELECT MSR. Atom uses 64-bit LBR format but does not have LBR_SELECT. That was causing NULL pointer dereferences in a couple of places. Signed-off-by:
Stephane Eranian <eranian@google.com> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: kan.liang@intel.com Fixes: 96f3eda6 ("perf/x86/intel: Fix static checker warning in lbr enable") Link: http://lkml.kernel.org/r/1449182000-31524-2-git-send-email-eranian@google.com Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
Stephane Eranian authored
This patch fixes a bug in the filter_events() function. The patch fixes the bug whereby if some mappings did not exist, e.g., STALLED_CYCLES_FRONTEND, then any event after it in the attrs array would disappear from the published list of events in /sys/devices/cpu/events. This could be verified easily on any system post SNB (which do not publish STALLED_CYCLES_FRONTEND): $ ./perf stat -e cycles,ref-cycles true Performance counter stats for 'true': 1,217,348 cycles <not supported> ref-cycles The problem is that in filter_events() there is an assumption that the argument (attrs) is organized in increasing continuous event indexes related to the event_map(). But if we remove the non-supported events by shifing the position in the array, then the lookup x86_pmu.event_map() needs to compensate for it, otherwise we are looking up the wrong index. This patch corrects this problem by compensating for the deleted events and with that ref-cycles reappears (here shown on Haswell): $ perf stat -e ref-cycles,cycles true Performance counter stats for 'true': 4,525,910 ref-cycles 1,064,920 cycles 0.002943888 seconds time elapsed Signed-off-by:
Stephane Eranian <eranian@google.com> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: jolsa@kernel.org Cc: kan.liang@intel.com Fixes: 8300daa2 ("perf/x86: Filter out undefined events from sysfs events attribute") Link: http://lkml.kernel.org/r/1449516805-6637-1-git-send-email-eranian@google.com Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
Andi Kleen authored
Add a new 'three-p' precise level, that uses INST_RETIRED.PREC_DIST as base. The basic mechanism of abusing the inverse cmask to get all cycles works the same as before. PREC_DIST is available on Sandy Bridge or later. It had some problems on Sandy Bridge, so we only use it on IvyBridge and later. I tested it on Broadwell and Skylake. PREC_DIST has special support for avoiding shadow effects, which can give better results compare to UOPS_RETIRED. The drawback is that PREC_DIST can only schedule on counter 1, but that is ok for cycle sampling, as there is normally no need to do multiple cycle sampling runs in parallel. It is still possible to run perf top in parallel, as that doesn't use precise mode. Also of course the multiplexing can still allow parallel operation. :pp stays with the previous event. Example: Sample a loop with 10 sqrt with old cycles:pp 0.14 │10: sqrtps %xmm1,%xmm0 <-------------- 9.13 │ sqrtps %xmm1,%xmm0 11.58 │ sqrtps %xmm1,%xmm0 11.51 │ sqrtps %xmm1,%xmm0 6.27 │ sqrtps %xmm1,%xmm0 10.38 │ sqrtps %xmm1,%xmm0 12.20 │ sqrtps %xmm1,%xmm0 12.74 │ sqrtps %xmm1,%xmm0 5.40 │ sqrtps %xmm1,%xmm0 10.14 │ sqrtps %xmm1,%xmm0 10.51 │ ↑ jmp 10 We expect all 10 sqrt to get roughly the sample number of samples. But you can see that the instruction directly after the JMP is systematically underestimated in the result, due to sampling shadow effects. With the new PREC_DIST based sampling this problem is gone and all instructions show up roughly evenly: 9.51 │10: sqrtps %xmm1,%xmm0 11.74 │ sqrtps %xmm1,%xmm0 11.84 │ sqrtps %xmm1,%xmm0 6.05 │ sqrtps %xmm1,%xmm0 10.46 │ sqrtps %xmm1,%xmm0 12.25 │ sqrtps %xmm1,%xmm0 12.18 │ sqrtps %xmm1,%xmm0 5.26 │ sqrtps %xmm1,%xmm0 10.13 │ sqrtps %xmm1,%xmm0 10.43 │ sqrtps %xmm1,%xmm0 0.16 │ ↑ jmp 10 Even with PREC_DIST there is still sampling skid and the result is not completely even, but systematic shadow effects are significantly reduced. The improvements are mainly expected to make a difference in high IPC code. With low IPC it should be similar. Signed-off-by:
Andi Kleen <ak@linux.intel.com> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: hpa@zytor.com Link: http://lkml.kernel.org/r/1448929689-13771-2-git-send-email-andi@firstfloor.org Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
Andi Kleen authored
I added UOPS_RETIRED.ALL by mistake to the Skylake PEBS event list for cycles:pp. But the event is not documented for Skylake, and has some issues. The recommended replacement for cycles:pp is to use INST_RETIRED.ANY+pebs as a base, similar to what CPUs before Sandy Bridge did. This new event is called INST_RETIRED.TOTAL_CYCLES_PS. The event is not really new, but has been already used by perf before Sandy Bridge for the original cycles:p Note the SDM doesn't document that event either, but it's being documented in the latest version of the event list on: https://download.01.org/perfmon/SKL This patch does: - Remove UOPS_RETIRED.ALL from the Skylake PEBS event list - Add INST_RETIRED.ANY to the Skylake PEBS event list, and an table entry to allow cmask=16,inv=1 for cycles:pp - We don't need an extra entry for the base INST_RETIRED event, because it is already covered by the catch-all PEBS table entry. - Switch Skylake to use the Core2 PEBS alias (which is INST_RETIRED.TOTAL_CYCLES_PS) Signed-off-by:
Andi Kleen <ak@linux.intel.com> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: hpa@zytor.com Link: http://lkml.kernel.org/r/1448929689-13771-1-git-send-email-andi@firstfloor.org Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
Andi Kleen authored
Normally we drop PEBS events with a zero status field. But when there is only a single PEBS event active we can assume the PEBS record is for that event. The PEBS buffer is always flushed when PEBS events are disabled, so there is no risk of mishandling state PEBS records this way. Signed-off-by:
Andi Kleen <ak@linux.intel.com> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/1449177740-5422-2-git-send-email-andi@firstfloor.org Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
Andi Kleen authored
The recent commit: 75f80859 ("perf/x86/intel/pebs: Robustify PEBS buffer drain") causes lots of warnings on different CPUs before Skylake when running PEBS intensive workloads. They can have a zero status field in the PEBS record when PEBS is racing with clearing of GLOBAl_STATUS. This also can cause hangs (it seems there are still problems with printk in NMI). Disable the warning, but still ignore the record. Signed-off-by:
Andi Kleen <ak@linux.intel.com> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/1449177740-5422-1-git-send-email-andi@firstfloor.org Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
Jiri Olsa authored
The CHECK_MEMBER_AT_END_OF(TYPE, MEMBER) checks whether MEMBER is last member of TYPE by evaluating: offsetof(TYPE::MEMBER) + sizeof(TYPE::MEMBER) == sizeof(TYPE) and ensuring TYPE::MEMBER is the last member of the TYPE. This condition breaks on structs that are padded to be aligned. This patch ensures the TYPE alignment is taken into account. This bug was revealed after adding cacheline alignment into struct sched_entity, which broke task_struct::thread check: CHECK_MEMBER_AT_END_OF(struct task_struct, thread); Signed-off-by:
Jiri Olsa <jolsa@kernel.org> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Dave Hansen <dave@sr71.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1450707930-3445-1-git-send-email-jolsa@kernel.org Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
Andy Lutomirski authored
arch/x86/built-in.o: In function `arch_setup_additional_pages': (.text+0x587): undefined reference to `pvclock_pvti_cpu0_va' KVM_GUEST selects PARAVIRT_CLOCK, so we can make pvclock_pvti_cpu0_va depend on KVM_GUEST. Signed-off-by:
Andy Lutomirski <luto@kernel.org> Tested-by:
Borislav Petkov <bp@alien8.de> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Kees Cook <keescook@chromium.org> Link: http://lkml.kernel.org/r/444d38a9bcba832685740ea1401b569861d09a72.1451446564.git.luto@kernel.org Signed-off-by:
Thomas Gleixner <tglx@linutronix.de>
-
- Jan 05, 2016
-
-
Toshi Kani authored
Using mremap() to shrink the map size of a VM_PFNMAP range causes the following error message, and leaves the pfn range allocated. x86/PAT: test:3493 freeing invalid memtype [mem 0x483200000-0x4863fffff] This is because rbt_memtype_erase(), called from free_memtype() with spin_lock held, only supports to free a whole memtype node in memtype_rbroot. Therefore, this patch changes rbt_memtype_erase() to support a request that shrinks the size of a memtype node for mremap(). memtype_rb_exact_match() is renamed to memtype_rb_match(), and is enhanced to support EXACT_MATCH and END_MATCH in @match_type. Since the memtype_rbroot tree allows overlapping ranges, rbt_memtype_erase() checks with EXACT_MATCH first, i.e. free a whole node for the munmap case. If no such entry is found, it then checks with END_MATCH, i.e. shrink the size of a node from the end for the mremap case. On the mremap case, rbt_memtype_erase() proceeds in two steps, 1) remove the node, and then 2) insert the updated node. This allows proper update of augmented values, subtree_max_end, in the tree. Signed-off-by:
Toshi Kani <toshi.kani@hpe.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Borislav Petkov <bp@suse.de> Cc: stsp@list.ru Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/1450832064-10093-3-git-send-email-toshi.kani@hpe.com Signed-off-by:
Thomas Gleixner <tglx@linutronix.de>
-
Toshi Kani authored
mremap() with MREMAP_FIXED on a VM_PFNMAP range causes the following WARN_ON_ONCE() message in untrack_pfn(). WARNING: CPU: 1 PID: 3493 at arch/x86/mm/pat.c:985 untrack_pfn+0xbd/0xd0() Call Trace: [<ffffffff817729ea>] dump_stack+0x45/0x57 [<ffffffff8109e4b6>] warn_slowpath_common+0x86/0xc0 [<ffffffff8109e5ea>] warn_slowpath_null+0x1a/0x20 [<ffffffff8106a88d>] untrack_pfn+0xbd/0xd0 [<ffffffff811d2d5e>] unmap_single_vma+0x80e/0x860 [<ffffffff811d3725>] unmap_vmas+0x55/0xb0 [<ffffffff811d916c>] unmap_region+0xac/0x120 [<ffffffff811db86a>] do_munmap+0x28a/0x460 [<ffffffff811dec33>] move_vma+0x1b3/0x2e0 [<ffffffff811df113>] SyS_mremap+0x3b3/0x510 [<ffffffff817793ee>] entry_SYSCALL_64_fastpath+0x12/0x71 MREMAP_FIXED moves a pfnmap from old vma to new vma. untrack_pfn() is called with the old vma after its pfnmap page table has been removed, which causes follow_phys() to fail. The new vma has a new pfnmap to the same pfn & cache type with VM_PAT set. Therefore, we only need to clear VM_PAT from the old vma in this case. Add untrack_pfn_moved(), which clears VM_PAT from a given old vma. move_vma() is changed to call this function with the old vma when VM_PFNMAP is set. move_vma() then calls do_munmap(), and untrack_pfn() is a no-op since VM_PAT is cleared. Reported-by:
Stas Sergeev <stsp@list.ru> Signed-off-by:
Toshi Kani <toshi.kani@hpe.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Borislav Petkov <bp@suse.de> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/1450832064-10093-2-git-send-email-toshi.kani@hpe.com Signed-off-by:
Thomas Gleixner <tglx@linutronix.de>
-
- Dec 31, 2015
-
-
Daniel J Blueman authored
The MMCFG PCI accessors weren't being setup for NumacConnect2 correctly due to over-early assignment; this would create the potential for the wrong PCI domain to be accessed. Fix this by using the correct arch-specific PCI init function. Signed-off-by:
Daniel J Blueman <daniel@numascale.com> Acked-by:
Steffen Persvold <sp@numascale.com> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1451498807-15920-1-git-send-email-daniel@numascale.com Signed-off-by:
Thomas Gleixner <tglx@linutronix.de>
-