kernel_optimize_test

History

Josef Bacik b917f9b946 btrfs: do not take the uuid_mutex in btrfs_rm_device [ Upstream commit 8ef9dc0f14ba6124c62547a4fdc59b163d8b864e ] We got the following lockdep splat while running fstests (specifically btrfs/003 and btrfs/020 in a row) with the new rc. This was uncovered by 87579e9b7d8d ("loop: use worker per cgroup instead of kworker") which converted loop to using workqueues, which comes with lockdep annotations that don't exist with kworkers. The lockdep splat is as follows: WARNING: possible circular locking dependency detected 5.14.0-rc2-custom+ #34 Not tainted ------------------------------------------------------ losetup/156417 is trying to acquire lock: ffff9c7645b02d38 ((wq_completion)loop0){+.+.}-{0:0}, at: flush_workqueue+0x84/0x600 but task is already holding lock: ffff9c7647395468 (&lo->lo_mutex){+.+.}-{3:3}, at: __loop_clr_fd+0x41/0x650 [loop] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #5 (&lo->lo_mutex){+.+.}-{3:3}: __mutex_lock+0xba/0x7c0 lo_open+0x28/0x60 [loop] blkdev_get_whole+0x28/0xf0 blkdev_get_by_dev.part.0+0x168/0x3c0 blkdev_open+0xd2/0xe0 do_dentry_open+0x163/0x3a0 path_openat+0x74d/0xa40 do_filp_open+0x9c/0x140 do_sys_openat2+0xb1/0x170 __x64_sys_openat+0x54/0x90 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae -> #4 (&disk->open_mutex){+.+.}-{3:3}: __mutex_lock+0xba/0x7c0 blkdev_get_by_dev.part.0+0xd1/0x3c0 blkdev_get_by_path+0xc0/0xd0 btrfs_scan_one_device+0x52/0x1f0 [btrfs] btrfs_control_ioctl+0xac/0x170 [btrfs] __x64_sys_ioctl+0x83/0xb0 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae -> #3 (uuid_mutex){+.+.}-{3:3}: __mutex_lock+0xba/0x7c0 btrfs_rm_device+0x48/0x6a0 [btrfs] btrfs_ioctl+0x2d1c/0x3110 [btrfs] __x64_sys_ioctl+0x83/0xb0 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae -> #2 (sb_writers#11){.+.+}-{0:0}: lo_write_bvec+0x112/0x290 [loop] loop_process_work+0x25f/0xcb0 [loop] process_one_work+0x28f/0x5d0 worker_thread+0x55/0x3c0 kthread+0x140/0x170 ret_from_fork+0x22/0x30 -> #1 ((work_completion)(&lo->rootcg_work)){+.+.}-{0:0}: process_one_work+0x266/0x5d0 worker_thread+0x55/0x3c0 kthread+0x140/0x170 ret_from_fork+0x22/0x30 -> #0 ((wq_completion)loop0){+.+.}-{0:0}: __lock_acquire+0x1130/0x1dc0 lock_acquire+0xf5/0x320 flush_workqueue+0xae/0x600 drain_workqueue+0xa0/0x110 destroy_workqueue+0x36/0x250 __loop_clr_fd+0x9a/0x650 [loop] lo_ioctl+0x29d/0x780 [loop] block_ioctl+0x3f/0x50 __x64_sys_ioctl+0x83/0xb0 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae other info that might help us debug this: Chain exists of: (wq_completion)loop0 --> &disk->open_mutex --> &lo->lo_mutex Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&lo->lo_mutex); lock(&disk->open_mutex); lock(&lo->lo_mutex); lock((wq_completion)loop0); * DEADLOCK * 1 lock held by losetup/156417: #0: ffff9c7647395468 (&lo->lo_mutex){+.+.}-{3:3}, at: __loop_clr_fd+0x41/0x650 [loop] stack backtrace: CPU: 8 PID: 156417 Comm: losetup Not tainted 5.14.0-rc2-custom+ #34 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 Call Trace: dump_stack_lvl+0x57/0x72 check_noncircular+0x10a/0x120 __lock_acquire+0x1130/0x1dc0 lock_acquire+0xf5/0x320 ? flush_workqueue+0x84/0x600 flush_workqueue+0xae/0x600 ? flush_workqueue+0x84/0x600 drain_workqueue+0xa0/0x110 destroy_workqueue+0x36/0x250 __loop_clr_fd+0x9a/0x650 [loop] lo_ioctl+0x29d/0x780 [loop] ? __lock_acquire+0x3a0/0x1dc0 ? update_dl_rq_load_avg+0x152/0x360 ? lock_is_held_type+0xa5/0x120 ? find_held_lock.constprop.0+0x2b/0x80 block_ioctl+0x3f/0x50 __x64_sys_ioctl+0x83/0xb0 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f645884de6b Usually the uuid_mutex exists to protect the fs_devices that map together all of the devices that match a specific uuid. In rm_device we're messing with the uuid of a device, so it makes sense to protect that here. However in doing that it pulls in a whole host of lockdep dependencies, as we call mnt_may_write() on the sb before we grab the uuid_mutex, thus we end up with the dependency chain under the uuid_mutex being added under the normal sb write dependency chain, which causes problems with loop devices. We don't need the uuid mutex here however. If we call btrfs_scan_one_device() before we scratch the super block we will find the fs_devices and not find the device itself and return EBUSY because the fs_devices is open. If we call it after the scratch happens it will not appear to be a valid btrfs file system. We do not need to worry about other fs_devices modifying operations here because we're protected by the exclusive operations locking. So drop the uuid_mutex here in order to fix the lockdep splat. A more detailed explanation from the discussion: We are worried about rm and scan racing with each other, before this change we'll zero the device out under the UUID mutex so when scan does run it'll make sure that it can go through the whole device scan thing without rm messing with us. We aren't worried if the scratch happens first, because the result is we don't think this is a btrfs device and we bail out. The only case we are concerned with is we scratch _after_ scan is able to read the superblock and gets a seemingly valid super block, so lets consider this case. Scan will call device_list_add() with the device we're removing. We'll call find_fsid_with_metadata_uuid() and get our fs_devices for this UUID. At this point we lock the fs_devices->device_list_mutex. This is what protects us in this case, but we have two cases here. 1. We aren't to the device removal part of the RM. We found our device, and device name matches our path, we go down and we set total_devices to our super number of devices, which doesn't affect anything because we haven't done the remove yet. 2. We are past the device removal part, which is protected by the device_list_mutex. Scan doesn't find the device, it goes down and does the if (fs_devices->opened) return -EBUSY; check and we bail out. Nothing about this situation is ideal, but the lockdep splat is real, and the fix is safe, tho admittedly a bit scary looking. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> [ copy more from the discussion ] Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>		2021-11-18 14:04:00 +01:00
..
tests	btrfs: fix missing delalloc new bit for new delalloc ranges	2020-11-13 22:15:59 +01:00
acl.c
async-thread.c	Btrfs: fix crash during unmount due to race with delayed inode workers	2020-03-23 17:01:51 +01:00
async-thread.h	Btrfs: fix crash during unmount due to race with delayed inode workers	2020-03-23 17:01:51 +01:00
backref.c	btrfs: do not warn if we can't find the reloc root when looking up backref	2021-03-04 11:38:29 +01:00
backref.h	btrfs: add asserts for deleting backref cache nodes	2021-03-04 11:38:29 +01:00
block-group.c	btrfs: fix race between writes to swap files and scrub	2021-03-09 11:11:11 +01:00
block-group.h	btrfs: fix race between writes to swap files and scrub	2021-03-09 11:11:11 +01:00
block-rsv.c	btrfs: print the block rsv type when we fail our reservation	2020-11-05 13:02:05 +01:00
block-rsv.h	btrfs: Remove __ prefix from btrfs_block_rsv_release	2020-03-23 17:01:55 +01:00
btrfs_inode.h	btrfs: fix race between marking inode needs to be logged and log syncing	2021-09-03 10:09:28 +02:00
check-integrity.c	btrfs: check-integrity: remove unnecessary failure messages during memory allocation	2020-07-27 12:55:21 +02:00
check-integrity.h	btrfs: remove btrfsic_submit_bh()	2020-03-23 17:01:39 +01:00
compression.c	btrfs: mark compressed range uptodate only if all bio succeed	2021-08-04 12:46:39 +02:00
compression.h	btrfs: compression: move declarations to header	2020-10-07 12:06:55 +02:00
ctree.c	btrfs: fix race when picking most recent mod log operation for an old root	2021-05-11 14:47:33 +02:00
ctree.h	btrfs: fix race between writes to swap files and scrub	2021-03-09 11:11:11 +01:00
delalloc-space.c	btrfs: add btrfs_reserve_data_bytes and use it	2020-10-07 12:06:52 +02:00
delalloc-space.h	btrfs: make btrfs_delalloc_reserve_space take btrfs_inode	2020-07-27 12:55:36 +02:00
delayed-inode.c	btrfs: abort transaction if we fail to update the delayed inode	2021-07-14 16:55:55 +02:00
delayed-inode.h	btrfs: delayed-inode: Replace zero-length array with flexible-array member	2020-03-23 17:01:53 +01:00
delayed-ref.c	btrfs: account for new extents being deleted in total_bytes_pinned	2021-03-04 11:38:30 +01:00
delayed-ref.h	btrfs: handle space_info::total_bytes_pinned inside the delayed ref itself	2021-03-04 11:38:30 +01:00
dev-replace.c	btrfs: fix deadlock when cloning inline extent and low on free metadata space	2021-01-17 14:16:54 +01:00
dev-replace.h
dir-item.c
discard.c	btrfs: merge critical sections of discard lock in workfn	2021-01-19 18:27:24 +01:00
discard.h	btrfs: discard: Use the correct style for SPDX License Identifier	2020-04-20 17:43:42 +02:00
disk-io.c	btrfs: call btrfs_check_rw_degradable only if there is a missing device	2021-11-18 14:03:44 +01:00
disk-io.h	btrfs: add a helper to read the tree_root commit root for backref lookup	2020-10-26 15:04:57 +01:00
export.c	btrfs: simplify iget helpers	2020-05-25 11:25:37 +02:00
export.h	btrfs: export helpers for subvolume name/id resolution	2020-03-23 17:01:42 +01:00
extent_io.c	btrfs: return whole extents in fiemap	2021-06-03 09:00:43 +02:00
extent_io.h	btrfs: remove struct extent_io_ops	2020-10-07 12:13:25 +02:00
extent_map.c	Btrfs: fix race between using extent maps and merging them	2020-02-12 17:16:46 +01:00
extent_map.h	btrfs: remove extent_map::bdev	2019-11-18 23:43:44 +01:00
extent-io-tree.h	btrfs: remove struct extent_io_ops	2020-10-07 12:13:25 +02:00
extent-tree.c	btrfs: unlock newly allocated extent buffer after error	2021-10-20 11:44:59 +02:00
file-item.c	btrfs: replace BUG_ON() in btrfs_csum_one_bio() with proper error handling	2021-10-09 14:40:56 +02:00
file.c	btrfs: fix abort logic in btrfs_replace_file_extents	2021-10-20 11:44:59 +02:00
free-space-cache.c	btrfs: fix race between extent freeing/allocation when using bitmaps	2021-03-09 11:11:11 +01:00
free-space-cache.h	btrfs: let btrfs_return_cluster_to_free_space() return void	2020-07-27 12:55:21 +02:00
free-space-tree.c	btrfs: fix possible free space tree corruption with online conversion	2021-02-03 23:28:40 +01:00
free-space-tree.h	btrfs: rename btrfs_block_group_cache	2019-11-18 17:51:51 +01:00
inode-item.c
inode-map.c	btrfs: make btrfs_delalloc_reserve_space take btrfs_inode	2020-07-27 12:55:36 +02:00
inode-map.h
inode.c	btrfs: wake up async_delalloc_pages waiters after submit	2021-09-18 13:40:06 +02:00
ioctl.c	btrfs: fix metadata extent leak after failure to create subvolume	2021-05-11 14:47:15 +02:00
Kconfig	btrfs: disable build on platforms having page size 256K	2021-07-14 16:55:56 +02:00
locking.c	btrfs: add nesting tags to the locking helpers	2020-10-07 12:12:16 +02:00
locking.h	btrfs: introduce BTRFS_NESTING_NEW_ROOT for adding new roots	2020-10-07 12:12:17 +02:00
lzo.c
Makefile	Btrfs: move all reflink implementation code into its own file	2020-03-23 17:01:54 +01:00
misc.h	btrfs: rename tree_entry to rb_simple_node and export it	2020-05-25 11:25:19 +02:00
ordered-data.c	btrfs: remove inode argument from btrfs_start_ordered_extent	2020-10-07 12:13:22 +02:00
ordered-data.h	btrfs: remove inode argument from btrfs_start_ordered_extent	2020-10-07 12:13:22 +02:00
orphan.c
print-tree.c	btrfs: print the actual offset in btrfs_root_name	2021-01-27 11:55:06 +01:00
print-tree.h	btrfs: print the actual offset in btrfs_root_name	2021-01-27 11:55:06 +01:00
props.c	btrfs: simplify iget helpers	2020-05-25 11:25:37 +02:00
props.h
qgroup.c	btrfs: fix sleep while in non-sleep context during qgroup removal	2021-03-30 14:31:53 +02:00
qgroup.h	btrfs: export and rename qgroup_reserve_meta	2021-03-11 14:17:22 +01:00
raid56.c	treewide: Change list_sort to use const pointers	2021-09-30 10:11:04 +02:00
raid56.h
rcu-string.h	btrfs: rcu-string: Replace zero-length array with flexible-array member	2020-03-23 17:01:53 +01:00
reada.c	btrfs: fix readahead hang and use-after-free after removing a device	2020-10-26 15:03:59 +01:00
ref-verify.c	btrfs: ref-verify: fix memory leak in btrfs_ref_tree_mod	2020-11-05 13:03:39 +01:00
ref-verify.h
reflink.c	btrfs: reflink: initialize return value to 0 in btrfs_extent_same()	2021-11-18 14:04:00 +01:00
reflink.h	Btrfs: move all reflink implementation code into its own file	2020-03-23 17:01:54 +01:00
relocation.c	btrfs: convert logic BUG_ON()'s in replace_path to ASSERT()'s	2021-05-11 14:47:22 +02:00
root-tree.c	btrfs: qgroup: fix qgroup meta rsv leak for subvolume operations	2020-10-07 12:12:13 +02:00
scrub.c	btrfs: fix race between writes to swap files and scrub	2021-03-09 11:11:11 +01:00
send.c	btrfs: send: fix invalid path for unlink operations after parent orphanization	2021-07-14 16:55:40 +02:00
send.h	btrfs: send: avoid copying file data	2020-10-07 12:13:17 +02:00
space-info.c	btrfs: prevent __btrfs_dump_space_info() to underflow its free space	2021-09-30 10:11:00 +02:00
space-info.h	btrfs: handle space_info::total_bytes_pinned inside the delayed ref itself	2021-03-04 11:38:30 +01:00
struct-funcs.c	btrfs: use unaligned helpers for stack and header set/get helpers	2020-10-07 12:13:23 +02:00
super.c	btrfs: fix transaction leak and crash after RO remount caused by qgroup rescan	2021-01-19 18:27:24 +01:00
sysfs.c	btrfs: sysfs: fix format string for some discard stats	2021-07-14 16:55:55 +02:00
sysfs.h	btrfs: split and refactor btrfs_sysfs_remove_devices_dir	2020-10-07 12:12:21 +02:00
transaction.c	btrfs: clear defrag status of a root if starting transaction fails	2021-07-14 16:55:40 +02:00
transaction.h	btrfs: fix race between marking inode needs to be logged and log syncing	2021-09-03 10:09:28 +02:00
tree-checker.c	btrfs: tree-checker: do not error out if extent ref hash doesn't match	2021-06-10 13:39:12 +02:00
tree-checker.h
tree-defrag.c	btrfs: remove unused btrfs_root::defrag_trans_start	2020-07-27 12:55:28 +02:00
tree-log.c	btrfs: fix lost error handling when replaying directory deletes	2021-11-18 14:03:44 +01:00
tree-log.h	btrfs: make fast fsyncs wait only for writeback	2020-10-07 12:06:56 +02:00
ulist.c
ulist.h
uuid-tree.c	btrfs: simplify root lookup by id	2020-05-25 11:25:36 +02:00
volumes.c	btrfs: do not take the uuid_mutex in btrfs_rm_device	2021-11-18 14:04:00 +01:00
volumes.h	btrfs: fix lockdep warning due to seqcount_mutex on 32bit arch	2021-02-03 23:28:40 +01:00
xattr.c	btrfs: fix warning when creating a directory with smack enabled	2021-03-09 11:11:12 +01:00
xattr.h
zlib.c	btrfs: use larger zlib buffer for s390 hardware compression	2020-01-31 10:30:40 -08:00
zstd.c