Skip to content
Snippets Groups Projects
  1. May 10, 2020
  2. May 07, 2020
  3. May 05, 2020
    • Tejun Heo's avatar
      iocost: protect iocg->abs_vdebt with iocg->waitq.lock · 0b80f986
      Tejun Heo authored
      
      abs_vdebt is an atomic_64 which tracks how much over budget a given cgroup
      is and controls the activation of use_delay mechanism. Once a cgroup goes
      over budget from forced IOs, it has to pay it back with its future budget.
      The progress guarantee on debt paying comes from the iocg being active -
      active iocgs are processed by the periodic timer, which ensures that as time
      passes the debts dissipate and the iocg returns to normal operation.
      
      However, both iocg activation and vdebt handling are asynchronous and a
      sequence like the following may happen.
      
      1. The iocg is in the process of being deactivated by the periodic timer.
      
      2. A bio enters ioc_rqos_throttle(), calls iocg_activate() which returns
         without anything because it still sees that the iocg is already active.
      
      3. The iocg is deactivated.
      
      4. The bio from #2 is over budget but needs to be forced. It increases
         abs_vdebt and goes over the threshold and enables use_delay.
      
      5. IO control is enabled for the iocg's subtree and now IOs are attributed
         to the descendant cgroups and the iocg itself no longer issues IOs.
      
      This leaves the iocg with stuck abs_vdebt - it has debt but inactive and no
      further IOs which can activate it. This can end up unduly punishing all the
      descendants cgroups.
      
      The usual throttling path has the same issue - the iocg must be active while
      throttled to ensure that future event will wake it up - and solves the
      problem by synchronizing the throttling path with a spinlock. abs_vdebt
      handling is another form of overage handling and shares a lot of
      characteristics including the fact that it isn't in the hottest path.
      
      This patch fixes the above and other possible races by strictly
      synchronizing abs_vdebt and use_delay handling with iocg->waitq.lock.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-by: default avatarVlad Dmitriev <vvd@fb.com>
      Cc: stable@vger.kernel.org # v5.4+
      Fixes: e1518f63 ("blk-iocost: Don't let merges push vtime into the future")
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      0b80f986
  4. May 01, 2020
  5. Apr 30, 2020
  6. Apr 27, 2020
    • Niklas Cassel's avatar
      nvme: prevent double free in nvme_alloc_ns() error handling · 132be623
      Niklas Cassel authored
      
      When jumping to the out_put_disk label, we will call put_disk(), which will
      trigger a call to disk_release(), which calls blk_put_queue().
      
      Later in the cleanup code, we do blk_cleanup_queue(), which will also call
      blk_put_queue().
      
      Putting the queue twice is incorrect, and will generate a KASAN splat.
      
      Set the disk->queue pointer to NULL, before calling put_disk(), so that the
      first call to blk_put_queue() will not free the queue.
      
      The second call to blk_put_queue() uses another pointer to the same queue,
      so this call will still free the queue.
      
      Fixes: 85136c01 ("lightnvm: simplify geometry enumeration")
      Signed-off-by: default avatarNiklas Cassel <niklas.cassel@wdc.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      132be623
  7. Apr 23, 2020
    • Damien Le Moal's avatar
      null_blk: Cleanup zoned device initialization · d205bde7
      Damien Le Moal authored
      
      Move all zoned mode related code from null_blk_main.c to
      null_blk_zoned.c, avoiding an ugly #ifdef in the process.
      Rename null_zone_init() into null_init_zoned_dev(), null_zone_exit()
      into null_free_zoned_dev() and add the new function
      null_register_zoned_dev() to finalize the zoned dev setup before
      add_disk().
      
      Signed-off-by: default avatarDamien Le Moal <damien.lemoal@wdc.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      d205bde7
    • Damien Le Moal's avatar
      null_blk: Fix zoned command handling · 9dd44c7e
      Damien Le Moal authored
      
      For write operations issued to a null_blk device with zoned mode
      enabled, the state and write pointer position of the zone targeted by
      the command should be checked before badblocks and memory backing
      are handled as the write may be first failed due to, for instance, a
      sector position not aligned with the zone write pointer. This order of
      checking for errors reflects more accuratly the behavior of physical
      zoned devices.
      
      Furthermore, the write pointer position of the target zone should be
      incremented only and only if no errors are reported by badblocks and
      memory backing handling.
      
      To fix this, introduce the small helper function null_process_cmd()
      which execute null_handle_badblocks() and null_handle_memory_backed()
      and use this function in null_zone_write() to correctly handle write
      requests to zoned null devices depending on the type and state of the
      write target zone. Also call this function in null_handle_zoned() to
      process read requests to zoned null devices.
      
      null_process_cmd() is called directly from null_handle_cmd() for
      regular null devices, resulting in no functional change for these type
      of devices. To have symmetric names, the function null_handle_zoned()
      is renamed to null_process_zoned_cmd().
      
      Signed-off-by: default avatarDamien Le Moal <damien.lemoal@wdc.com>
      Reviewed-by: default avatarChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      9dd44c7e
  8. Apr 21, 2020
  9. Apr 20, 2020
    • Douglas Anderson's avatar
      bdev: Reduce time holding bd_mutex in sync in blkdev_close() · b849dd84
      Douglas Anderson authored
      
      While trying to "dd" to the block device for a USB stick, I
      encountered a hung task warning (blocked for > 120 seconds).  I
      managed to come up with an easy way to reproduce this on my system
      (where /dev/sdb is the block device for my USB stick) with:
      
        while true; do dd if=/dev/zero of=/dev/sdb bs=4M; done
      
      With my reproduction here are the relevant bits from the hung task
      detector:
      
       INFO: task udevd:294 blocked for more than 122 seconds.
       ...
       udevd           D    0   294      1 0x00400008
       Call trace:
        ...
        mutex_lock_nested+0x40/0x50
        __blkdev_get+0x7c/0x3d4
        blkdev_get+0x118/0x138
        blkdev_open+0x94/0xa8
        do_dentry_open+0x268/0x3a0
        vfs_open+0x34/0x40
        path_openat+0x39c/0xdf4
        do_filp_open+0x90/0x10c
        do_sys_open+0x150/0x3c8
        ...
      
       ...
       Showing all locks held in the system:
       ...
       1 lock held by dd/2798:
        #0: ffffff814ac1a3b8 (&bdev->bd_mutex){+.+.}, at: __blkdev_put+0x50/0x204
       ...
       dd              D    0  2798   2764 0x00400208
       Call trace:
        ...
        schedule+0x8c/0xbc
        io_schedule+0x1c/0x40
        wait_on_page_bit_common+0x238/0x338
        __lock_page+0x5c/0x68
        write_cache_pages+0x194/0x500
        generic_writepages+0x64/0xa4
        blkdev_writepages+0x24/0x30
        do_writepages+0x48/0xa8
        __filemap_fdatawrite_range+0xac/0xd8
        filemap_write_and_wait+0x30/0x84
        __blkdev_put+0x88/0x204
        blkdev_put+0xc4/0xe4
        blkdev_close+0x28/0x38
        __fput+0xe0/0x238
        ____fput+0x1c/0x28
        task_work_run+0xb0/0xe4
        do_notify_resume+0xfc0/0x14bc
        work_pending+0x8/0x14
      
      The problem appears related to the fact that my USB disk is terribly
      slow and that I have a lot of RAM in my system to cache things.
      Specifically my writes seem to be happening at ~15 MB/s and I've got
      ~4 GB of RAM in my system that can be used for buffering.  To write 4
      GB of buffer to disk thus takes ~4000 MB / ~15 MB/s = ~267 seconds.
      
      The 267 second number is a problem because in __blkdev_put() we call
      sync_blockdev() while holding the bd_mutex.  Any other callers who
      want the bd_mutex will be blocked for the whole time.
      
      The problem is made worse because I believe blkdev_put() specifically
      tells other tasks (namely udev) to go try to access the device at right
      around the same time we're going to hold the mutex for a long time.
      
      Putting some traces around this (after disabling the hung task detector),
      I could confirm:
       dd:    437.608600: __blkdev_put() right before sync_blockdev() for sdb
       udevd: 437.623901: blkdev_open() right before blkdev_get() for sdb
       dd:    661.468451: __blkdev_put() right after sync_blockdev() for sdb
       udevd: 663.820426: blkdev_open() right after blkdev_get() for sdb
      
      A simple fix for this is to realize that sync_blockdev() works fine if
      you're not holding the mutex.  Also, it's not the end of the world if
      you sync a little early (though it can have performance impacts).
      Thus we can make a guess that we're going to need to do the sync and
      then do it without holding the mutex.  We still do one last sync with
      the mutex but it should be much, much faster.
      
      With this, my hung task warnings for my test case are gone.
      
      Signed-off-by: default avatarDouglas Anderson <dianders@chromium.org>
      Reviewed-by: default avatarGuenter Roeck <groeck@chromium.org>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      b849dd84
  10. Apr 18, 2020
  11. Apr 17, 2020
    • Masahiro Yamada's avatar
      kbuild: check libyaml installation for 'make dt_binding_check' · 0903060f
      Masahiro Yamada authored
      
      If you run 'make dtbs_check' without installing the libyaml package,
      the error message "dtc needs libyaml ..." is shown.
      
      This should be checked also for 'make dt_binding_check' because dtc
      needs to validate *.example.dts extracted from *.yaml files.
      
      It is missing since commit 4f0e3a57 ("kbuild: Add support for DT
      binding schema checks"), but this fix-up is applicable only after commit
      e10c4321 ("kbuild: allow to run dt_binding_check and dtbs_check
      in a single command").
      
      I gave the Fixes tag to the latter in case somebody is interested in
      back-porting this.
      
      Fixes: e10c4321 ("kbuild: allow to run dt_binding_check and dtbs_check in a single command")
      Signed-off-by: default avatarMasahiro Yamada <masahiroy@kernel.org>
      Signed-off-by: default avatarRob Herring <robh@kernel.org>
      0903060f
    • Tommi Rantala's avatar
      blk-wbt: Drop needless newlines from tracepoint format strings · 3f22037d
      Tommi Rantala authored
      
      Drop needless newlines from tracepoint format strings, they only add
      empty lines to perf tracing output.
      
      Signed-off-by: default avatarTommi Rantala <tommi.t.rantala@nokia.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      3f22037d
    • Tommi Rantala's avatar
      blk-wbt: Use tracepoint_string() for wbt_step tracepoint string literals · 3a89c25d
      Tommi Rantala authored
      
      Use tracepoint_string() for string literals that are used in the
      wbt_step tracepoint, so that userspace tools can display the string
      content.
      
      Signed-off-by: default avatarTommi Rantala <tommi.t.rantala@nokia.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      3a89c25d
    • Stefan Haberland's avatar
      s390/dasd: remove IOSCHED_DEADLINE from DASD Kconfig · 3dceecfa
      Stefan Haberland authored
      
      CONFIG_IOSCHED_DEADLINE was removed with
      commit f382fb0b ("block: remove legacy IO schedulers")
      
      and setting of the scheduler was removed with
      commit a5fd8ddc ("s390/dasd: remove setting of scheduler from driver").
      
      So get rid of the select.
      
      Reported-by: default avatarKrzysztof Kozlowski <krzk@kernel.org>
      Signed-off-by: default avatarStefan Haberland <sth@linux.ibm.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      3dceecfa
    • Frank Rowand's avatar
      of: unittest: kmemleak in duplicate property update · 29acfb65
      Frank Rowand authored
      kmemleak reports several memory leaks from devicetree unittest.
      This is the fix for problem 5 of 5.
      
      When overlay 'overlay_bad_add_dup_prop' is applied, the apply code
      properly detects that a memory leak will occur if the overlay is removed
      since the duplicate property is located in a base devicetree node and
      reports via printk():
      
        OF: overlay: WARNING: memory leak will occur if overlay removed, property: /testcase-data-2/substation@100/motor-1/rpm_avail
        OF: overlay: WARNING: memory leak will occur if overlay removed, property: /testcase-data-2/substation@100/motor-1/rpm_avail
      
      The overlay is removed when the apply code detects multiple changesets
      modifying the same property.  This is reported via printk():
      
        OF: overlay: ERROR: multiple fragments add, update, and/or delete property /testcase-data-2/substation@100/motor-1/rpm_avail
      
      As a result of this error, the overlay is removed resulting in the
      expected memory leak.
      
      Add another device node leve...
      29acfb65
    • Frank Rowand's avatar
      of: overlay: kmemleak in dup_and_fixup_symbol_prop() · 478ff649
      Frank Rowand authored
      
      kmemleak reports several memory leaks from devicetree unittest.
      This is the fix for problem 4 of 5.
      
      target_path was not freed in the non-error path.
      
      Fixes: e0a58f3e ("of: overlay: remove a dependency on device node full_name")
      Reported-by: default avatarErhard F. <erhard_f@mailbox.org>
      Signed-off-by: default avatarFrank Rowand <frank.rowand@sony.com>
      Signed-off-by: default avatarRob Herring <robh@kernel.org>
      478ff649
    • Frank Rowand's avatar
      of: unittest: kmemleak in of_unittest_overlay_high_level() · 145fc138
      Frank Rowand authored
      
      kmemleak reports several memory leaks from devicetree unittest.
      This is the fix for problem 3 of 5.
      
      of_unittest_overlay_high_level() failed to kfree the newly created
      property when the property named 'name' is skipped.
      
      Fixes: 39a751a4 ("of: change overlay apply input data from unflattened to FDT")
      Reported-by: default avatarErhard F. <erhard_f@mailbox.org>
      Signed-off-by: default avatarFrank Rowand <frank.rowand@sony.com>
      Signed-off-by: default avatarRob Herring <robh@kernel.org>
      145fc138
    • Frank Rowand's avatar
      of: unittest: kmemleak in of_unittest_platform_populate() · 216830d2
      Frank Rowand authored
      
      kmemleak reports several memory leaks from devicetree unittest.
      This is the fix for problem 2 of 5.
      
      of_unittest_platform_populate() left an elevated reference count for
      grandchild nodes (which are platform devices).  Fix the platform
      device reference counts so that the memory will be freed.
      
      Fixes: fb2caa50 ("of/selftest: add testcase for nodes with same name and address")
      Reported-by: default avatarErhard F. <erhard_f@mailbox.org>
      Signed-off-by: default avatarFrank Rowand <frank.rowand@sony.com>
      Signed-off-by: default avatarRob Herring <robh@kernel.org>
      216830d2
    • Frank Rowand's avatar
      of: unittest: kmemleak on changeset destroy · b3fb36ed
      Frank Rowand authored
      
      kmemleak reports several memory leaks from devicetree unittest.
      This is the fix for problem 1 of 5.
      
      of_unittest_changeset() reaches deeply into the dynamic devicetree
      functions.  Several nodes were left with an elevated reference
      count and thus were not properly cleaned up.  Fix the reference
      counts so that the memory will be freed.
      
      Fixes: 201c910b ("of: Transactional DT support.")
      Reported-by: default avatarErhard F. <erhard_f@mailbox.org>
      Signed-off-by: default avatarFrank Rowand <frank.rowand@sony.com>
      Signed-off-by: default avatarRob Herring <robh@kernel.org>
      b3fb36ed
    • Mauro Carvalho Chehab's avatar
      MAINTAINERS: dt: fix pointers for ARM Integrator, Versatile and RealView · 21a431e6
      Mauro Carvalho Chehab authored
      
      There's a conversion from a plain text binding file into 4 yaml ones.
      The old file got removed, causing this new warning:
      
      	Warning: MAINTAINERS references a file that doesn't exist: Documentation/devicetree/bindings/arm/arm-boards
      
      Address it by replacing the old reference by the new ones
      
      Fixes: 4b900070 ("dt-bindings: arm: Add Versatile YAML schema")
      Fixes: 2d483550 ("dt-bindings: arm: Drop the non-YAML bindings")
      Fixes: 7db625b9 ("dt-bindings: arm: Add RealView YAML schema")
      Fixes: 4fb00d90 ("dt-bindings: arm: Add Versatile Express and Juno YAML schema")
      Fixes: 33fbfb3e ("dt-bindings: arm: Add Integrator YAML schema")
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab+huawei@kernel.org>
      Acked-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarRob Herring <robh@kernel.org>
      21a431e6
    • Mauro Carvalho Chehab's avatar
      MAINTAINERS: dt: update display/allwinner file entry · f4d859b7
      Mauro Carvalho Chehab authored
      
      Changeset f5a98bfe ("dt-bindings: display: Convert Allwinner display pipeline to schemas")
      split Documentation/devicetree/bindings/display/sunxi/sun4i-drm.txt
      into several files. Yet, it kept the old place at MAINTAINERS.
      
      Update it to point to the new place.
      
      Fixes: f5a98bfe ("dt-bindings: display: Convert Allwinner display pipeline to schemas")
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab+huawei@kernel.org>
      Signed-off-by: default avatarRob Herring <robh@kernel.org>
      f4d859b7
    • Alexandru Tachici's avatar
      dt-bindings: iio: dac: AD5570R fix bindings errors · 2cf3818f
      Alexandru Tachici authored
      
      Replaced num property with reg property, fixed errors
      reported by dt-binding-check.
      
      Fixes: ea52c212 ("dt-bindings: iio: dac: Add docs for AD5770R DAC")
      Signed-off-by: default avatarAlexandru Tachici <alexandru.tachici@analog.com>
      [robh: Fix required property list, fix Fixes tag]
      Signed-off-by: default avatarRob Herring <robh@kernel.org>
      2cf3818f
    • Josef Bacik's avatar
      btrfs: fix setting last_trans for reloc roots · aec7db3b
      Josef Bacik authored
      
      I made a mistake with my previous fix, I assumed that we didn't need to
      mess with the reloc roots once we were out of the part of relocation where
      we are actually moving the extents.
      
      The subtle thing that I missed is that btrfs_init_reloc_root() also
      updates the last_trans for the reloc root when we do
      btrfs_record_root_in_trans() for the corresponding fs_root.  I've added a
      comment to make sure future me doesn't make this mistake again.
      
      This showed up as a WARN_ON() in btrfs_copy_root() because our
      last_trans didn't == the current transid.  This could happen if we
      snapshotted a fs root with a reloc root after we set
      rc->create_reloc_tree = 0, but before we actually merge the reloc root.
      
      Worth mentioning that the regression produced the following warning
      when running snapshot creation and balance in parallel:
      
        BTRFS info (device sdc): relocating block group 30408704 flags metadata|dup
        ------------[ cut here ]------------
        WARNING: CPU: 0 PID: 12823 at fs/btrfs/ctree.c:191 btrfs_copy_root+0x26f/0x430 [btrfs]
        CPU: 0 PID: 12823 Comm: btrfs Tainted: G        W 5.6.0-rc7-btrfs-next-58 #1
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
        RIP: 0010:btrfs_copy_root+0x26f/0x430 [btrfs]
        RSP: 0018:ffffb96e044279b8 EFLAGS: 00010202
        RAX: 0000000000000009 RBX: ffff9da70bf61000 RCX: ffffb96e04427a48
        RDX: ffff9da733a770c8 RSI: ffff9da70bf61000 RDI: ffff9da694163818
        RBP: ffff9da733a770c8 R08: fffffffffffffff8 R09: 0000000000000002
        R10: ffffb96e044279a0 R11: 0000000000000000 R12: ffff9da694163818
        R13: fffffffffffffff8 R14: ffff9da6d2512000 R15: ffff9da714cdac00
        FS:  00007fdeacf328c0(0000) GS:ffff9da735e00000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 000055a2a5b8a118 CR3: 00000001eed78002 CR4: 00000000003606f0
        DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        Call Trace:
         ? create_reloc_root+0x49/0x2b0 [btrfs]
         ? kmem_cache_alloc_trace+0xe5/0x200
         create_reloc_root+0x8b/0x2b0 [btrfs]
         btrfs_reloc_post_snapshot+0x96/0x5b0 [btrfs]
         create_pending_snapshot+0x610/0x1010 [btrfs]
         create_pending_snapshots+0xa8/0xd0 [btrfs]
         btrfs_commit_transaction+0x4c7/0xc50 [btrfs]
         ? btrfs_mksubvol+0x3cd/0x560 [btrfs]
         btrfs_mksubvol+0x455/0x560 [btrfs]
         __btrfs_ioctl_snap_create+0x15f/0x190 [btrfs]
         btrfs_ioctl_snap_create_v2+0xa4/0xf0 [btrfs]
         ? mem_cgroup_commit_charge+0x6e/0x540
         btrfs_ioctl+0x12d8/0x3760 [btrfs]
         ? do_raw_spin_unlock+0x49/0xc0
         ? _raw_spin_unlock+0x29/0x40
         ? __handle_mm_fault+0x11b3/0x14b0
         ? ksys_ioctl+0x92/0xb0
         ksys_ioctl+0x92/0xb0
         ? trace_hardirqs_off_thunk+0x1a/0x1c
         __x64_sys_ioctl+0x16/0x20
         do_syscall_64+0x5c/0x280
         entry_SYSCALL_64_after_hwframe+0x49/0xbe
        RIP: 0033:0x7fdeabd3bdd7
      
      Fixes: 2abc726a ("btrfs: do not init a reloc root if we aren't relocating")
      Reviewed-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      aec7db3b