Skip to content
Snippets Groups Projects
  1. Jul 19, 2021
    • Thomas Gleixner's avatar
      x86/apic: Mark _all_ legacy interrupts when IO/APIC is missing · a9c92fd4
      Thomas Gleixner authored
      
      stable inclusion
      from linux-4.19.194
      commit 7e25cb1b22f81239ae3332e14a1d0cff7014bccd
      
      --------------------------------
      
      commit 7d65f9e80646c595e8c853640a9d0768a33e204c upstream.
      
      PIC interrupts do not support affinity setting and they can end up on
      any online CPU. Therefore, it's required to mark the associated vectors
      as system-wide reserved. Otherwise, the corresponding irq descriptors
      are copied to the secondary CPUs but the vectors are not marked as
      assigned or reserved. This works correctly for the IO/APIC case.
      
      When the IO/APIC is disabled via config, kernel command line or lack of
      enumeration then all legacy interrupts are routed through the PIC, but
      nothing marks them as system-wide reserved vectors.
      
      As a consequence, a subsequent allocation on a secondary CPU can result in
      allocating one of these vectors, which triggers the BUG() in
      apic_update_vector() because the interrupt descriptor slot is not empty.
      
      Imran tried to work around that by marking those interrupts as allocated
      when a CPU comes online. But that's wrong in case that the IO/APIC is
      available and one of the legacy interrupts, e.g. IRQ0, has been switched to
      PIC mode because then marking them as allocated will fail as they are
      already marked as system vectors.
      
      Stay consistent and update the legacy vectors after attempting IO/APIC
      initialization and mark them as system vectors in case that no IO/APIC is
      available.
      
      Fixes: 69cde000 ("x86/vector: Use matrix allocator for vector assignment")
      Reported-by: default avatarImran Khan <imran.f.khan@oracle.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20210519233928.2157496-1-imran.f.khan@oracle.com
      
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      a9c92fd4
  2. Jul 01, 2021
  3. Jun 30, 2021
  4. Jun 11, 2021
  5. Jun 02, 2021
    • Michael Zhivich's avatar
      x86/tsc: Respect tsc command line paraemeter for clocksource_tsc_early · a83b988c
      Michael Zhivich authored
      mainline inclusion
      from mainline-v5.4-rc4
      commit 63ec58b44fcc05efd1542045abd7faf056ac27d9
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I3T8ZP?from=project-issue
      
      
      CVE: NA
      
      --------------------------------
      
      The introduction of clocksource_tsc_early broke the functionality of
      "tsc=reliable" and "tsc=nowatchdog" command line parameters, since
      clocksource_tsc_early is unconditionally registered with
      CLOCK_SOURCE_MUST_VERIFY and thus put on the watchdog list.
      
      This can cause the TSC to be declared unstable during boot:
      
        clocksource: timekeeping watchdog on CPU0: Marking clocksource
                     'tsc-early' as unstable because the skew is too large:
        clocksource: 'refined-jiffies' wd_now: fffb7018 wd_last: fffb6e9d
                     mask: ffffffff
        clocksource: 'tsc-early' cs_now: 68a6a7070f6a0 cs_last: 68a69ab6f74d6
                     mask: ffffffffffffffff
        tsc: Marking TSC unstable due to clocksource watchdog
      
      The corresponding elapsed times are cs_nsec=1224152026 and wd_nsec=378942392, so
      the watchdog differs from TSC by 0.84 seconds.
      
      This happens when HPET is not available and jiffies are used as the TSC
      watchdog instead and the jiffies update is not happening due to lost timer
      interrupts in periodic mode, which can happen e.g. with expensive debug
      mechanisms enabled or under massive overload conditions in virtualized
      environments.
      
      Before the introduction of the early TSC clocksource the command line
      parameters "tsc=reliable" and "tsc=nowatchdog" could be used to work around
      this issue.
      
      Restore the behaviour by disabling the watchdog if requested on the kernel
      command line.
      
      [ tglx: Clarify changelog ]
      
      Fixes: aa83c457 ("x86/tsc: Introduce early tsc clocksource")
      Signed-off-by: default avatarMichael Zhivich <mzhivich@akamai.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Link: https://lkml.kernel.org/r/20191024175945.14338-1-mzhivich@akamai.com
      
      
      [wangxiongfeng: remove 'no_tsc_watchdog' in the origin patch.]
      Signed-off-by: default avatarXiongfeng Wang <wangxiongfeng2@huawei.com>
      Reviewed-by: default avatarJian Cheng <cj.chengjian@huawei.com>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      a83b988c
  6. Jun 01, 2021
  7. May 26, 2021
    • Thomas Gleixner's avatar
      x86/apic/vector: Force interupt handler invocation to irq context · 3e134563
      Thomas Gleixner authored
      
      mainline inclusion
      from mainline-5.7
      commit 008f1d60
      category: bugfix
      bugzilla: NA
      CVE: NA
      
      -------------------------------------------------
      
      Sathyanarayanan reported that the PCI-E AER error injection mechanism
      can result in a NULL pointer dereference in apic_ack_edge():
      
       BUG: unable to handle kernel NULL pointer dereference at 0000000000000078
       RIP: 0010:apic_ack_edge+0x1e/0x40
       Call Trace:
         handle_edge_irq+0x7d/0x1e0
         generic_handle_irq+0x27/0x30
         aer_inject_write+0x53a/0x720
      
      It crashes in irq_complete_move() which dereferences get_irq_regs() which
      is obviously NULL when this is called from non interrupt context.
      
      Of course the pointer could be checked, but that just papers over the real
      issue. Invoking the low level interrupt handling mechanism from random code
      can wreckage the fragile interrupt affinity mechanism of x86 as interrupts
      can only be moved in interrupt context or with special care when a CPU goes
      offline and the move has to be enforced.
      
      In the best case this triggers the warning in the MSI affinity setter, but
      if the call happens on the correct CPU it just corrupts state and might
      prevent further interrupt delivery for the affected device.
      
      Mark the APIC interrupts as unsuitable for being invoked in random contexts.
      
      This prevents the AER injection from proliferating the wreckage, but that's
      less broken than the current state of affairs and more correct than just
      papering over the problem by sprinkling random checks all over the place
      and silently corrupting state.
      
      Reported-by: default avatar <sathyanarayanan.kuppuswamy@linux.intel.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Link: https://lkml.kernel.org/r/20200306130623.684591280@linutronix.de
      
      
      Signed-off-by: default avatarLiao Chang <liaochang1@huawei.com>
      Reviewed-by: default avatarHanjun Guo <guohanjun@huawei.com>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      3e134563
    • Fenghua Yu's avatar
      x86/cpufeatures: Enumerate the new AVX512 BFLOAT16 instructions · 3094b42a
      Fenghua Yu authored
      mainline inclusion
      from mainline-v5.3-rc1
      commit b302e4b1
      category: feature
      bugzilla: https://bugzilla.openeuler.org/show_bug.cgi?id=44
      
      
      CVE: NA
      
      -----------------------------------------------
      
      AVX512 BFLOAT16 instructions support 16-bit BFLOAT16 floating-point
      format (BF16) for deep learning optimization.
      
      BF16 is a short version of 32-bit single-precision floating-point
      format (FP32) and has several advantages over 16-bit half-precision
      floating-point format (FP16). BF16 keeps FP32 accumulation after
      multiplication without loss of precision, offers more than enough
      range for deep learning training tasks, and doesn't need to handle
      hardware exception.
      
      AVX512 BFLOAT16 instructions are enumerated in CPUID.7.1:EAX[bit 5]
      AVX512_BF16.
      
      CPUID.7.1:EAX contains only feature bits. Reuse the currently empty
      word 12 as a pure features word to hold the feature bits including
      AVX512_BF16.
      
      Detailed information of the CPUID bit and AVX512 BFLOAT16 instructions
      can be found in the latest Intel Architecture Instruction Set Extensions
      and Future Features Programming Reference.
      
       [ bp: Check CPUID(7) subleaf validity before accessing subleaf 1. ]
      
      Signed-off-by: default avatarFenghua Yu <fenghua.yu@intel.com>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Cc: "Chang S. Bae" <chang.seok.bae@intel.com>
      Cc: Frederic Weisbecker <frederic@kernel.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jann Horn <jannh@google.com>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Nadav Amit <namit@vmware.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
      Cc: Peter Feiner <pfeiner@google.com>
      Cc: Radim Krcmar <rkrcmar@redhat.com>
      Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
      Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
      Cc: Robert Hoo <robert.hu@linux.intel.com>
      Cc: "Sean J Christopherson" <sean.j.christopherson@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Thomas Lendacky <Thomas.Lendacky@amd.com>
      Cc: x86 <x86@kernel.org>
      Link: https://lkml.kernel.org/r/1560794416-217638-3-git-send-email-fenghua.yu@intel.com
      
      
      Signed-off-by: default avatarZheng Zengkai <zhengzengkai@huawei.com>
      Reviewed-by: default avatarXiongfeng Wang <wangxiongfeng2@huawei.com>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      3094b42a
  8. May 22, 2021
  9. May 11, 2021
    • Rafael J. Wysocki's avatar
      ACPI: x86: Call acpi_boot_table_init() after acpi_table_upgrade() · 9c16025d
      Rafael J. Wysocki authored
      
      stable inclusion
      from linux-4.19.190
      commit 9a5ba778b50d6c6c50597febf3a30387a80ac05d
      
      --------------------------------
      
      commit 6998a8800d73116187aad542391ce3b2dd0f9e30 upstream.
      
      Commit 1a1c130ab757 ("ACPI: tables: x86: Reserve memory occupied by
      ACPI tables") attempted to address an issue with reserving the memory
      occupied by ACPI tables, but it broke the initrd-based table override
      mechanism relied on by multiple users.
      
      To restore the initrd-based ACPI table override functionality, move
      the acpi_boot_table_init() invocation in setup_arch() on x86 after
      the acpi_table_upgrade() one.
      
      Fixes: 1a1c130ab757 ("ACPI: tables: x86: Reserve memory occupied by ACPI tables")
      Reported-by: default avatarHans de Goede <hdegoede@redhat.com>
      Tested-by: default avatarHans de Goede <hdegoede@redhat.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: George Kennedy <george.kennedy@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-...
      9c16025d
    • Rafael J. Wysocki's avatar
      ACPI: tables: x86: Reserve memory occupied by ACPI tables · f60914de
      Rafael J. Wysocki authored
      stable inclusion
      from linux-4.19.190
      commit 7d329dd0a1b3302e10f0f25ef08538c7598d118f
      
      --------------------------------
      
      commit 1a1c130ab7575498eed5bcf7220037ae09cd1f8a upstream.
      
      The following problem has been reported by George Kennedy:
      
       Since commit 7fef431b ("mm/page_alloc: place pages to tail
       in __free_pages_core()") the following use after free occurs
       intermittently when ACPI tables are accessed.
      
       BUG: KASAN: use-after-free in ibft_init+0x134/0xc49
       Read of size 4 at addr ffff8880be453004 by task swapper/0/1
       CPU: 3 PID: 1 Comm: swapper/0 Not tainted 5.12.0-rc1-7a7fd0d #1
       Call Trace:
        dump_stack+0xf6/0x158
        print_address_description.constprop.9+0x41/0x60
        kasan_report.cold.14+0x7b/0xd4
        __asan_report_load_n_noabort+0xf/0x20
        ibft_init+0x134/0xc49
        do_one_initcall+0xc4/0x3e0
        kernel_init_freeable+0x5af/0x66b
        kernel_init+0x16/0x1d0
        ret_from_fork+0x22/0x30
      
       ACPI tables mapped via kmap() do not have their mapped pages
       reserved and the pages can be "stolen" by the buddy allocator.
      
      Apparently, on the affected system, the ACPI table in question is
      not located in "reserved" memory, like ACPI NVS or ACPI Data, that
      will not be used by the buddy allocator, so the memory occupied by
      that table has to be explicitly reserved to prevent the buddy
      allocator from using it.
      
      In order to address this problem, rearrange the initialization of the
      ACPI tables on x86 to locate the initial tables earlier and reserve
      the memory occupied by them.
      
      The other architectures using ACPI should not be affected by this
      change.
      
      Link: https://lore.kernel.org/linux-acpi/1614802160-29362-1-git-send-email-george.kennedy@oracle.com/
      
      
      Reported-by: default avatarGeorge Kennedy <george.kennedy@oracle.com>
      Tested-by: default avatarGeorge Kennedy <george.kennedy@oracle.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Reviewed-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Cc: 5.10+ <stable@vger.kernel.org> # 5.10+
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      f60914de
    • Mike Galbraith's avatar
      x86/crash: Fix crash_setup_memmap_entries() out-of-bounds access · 1cd844dd
      Mike Galbraith authored
      stable inclusion
      from linux-4.19.189
      commit f60194921e30e733ba94b4b3b2681d1cdc4ded55
      
      --------------------------------
      
      commit 5849cdf8c120e3979c57d34be55b92d90a77a47e upstream.
      
      Commit in Fixes: added support for kexec-ing a kernel on panic using a
      new system call. As part of it, it does prepare a memory map for the new
      kernel.
      
      However, while doing so, it wrongly accesses memory it has not
      allocated: it accesses the first element of the cmem->ranges[] array in
      memmap_exclude_ranges() but it has not allocated the memory for it in
      crash_setup_memmap_entries(). As KASAN reports:
      
        BUG: KASAN: vmalloc-out-of-bounds in crash_setup_memmap_entries+0x17e/0x3a0
        Write of size 8 at addr ffffc90000426008 by task kexec/1187
      
        (gdb) list *crash_setup_memmap_entries+0x17e
        0xffffffff8107cafe is in crash_setup_memmap_entries (arch/x86/kernel/crash.c:322).
        317                                      unsigned long long mend)
        318     {
        319           ...
      1cd844dd
  10. Apr 07, 2021
  11. Oct 29, 2020