kernel_optimize_test/fs/btrfs
Josef Bacik 38e3eebff6 btrfs: honor path->skip_locking in backref code
Qgroups will do the old roots lookup at delayed ref time, which could be
while walking down the extent root while running a delayed ref.  This
should be fine, except we specifically lock eb's in the backref walking
code irrespective of path->skip_locking, which deadlocks the system.
Fix up the backref code to honor path->skip_locking, nobody will be
modifying the commit_root when we're searching so it's completely safe
to do.

This happens since fb235dc06f ("btrfs: qgroup: Move half of the qgroup
accounting time out of commit trans"), kernel may lockup with quota
enabled.

There is one backref trace triggered by snapshot dropping along with
write operation in the source subvolume.  The example can be reliably
reproduced:

  btrfs-cleaner   D    0  4062      2 0x80000000
  Call Trace:
   schedule+0x32/0x90
   btrfs_tree_read_lock+0x93/0x130 [btrfs]
   find_parent_nodes+0x29b/0x1170 [btrfs]
   btrfs_find_all_roots_safe+0xa8/0x120 [btrfs]
   btrfs_find_all_roots+0x57/0x70 [btrfs]
   btrfs_qgroup_trace_extent_post+0x37/0x70 [btrfs]
   btrfs_qgroup_trace_leaf_items+0x10b/0x140 [btrfs]
   btrfs_qgroup_trace_subtree+0xc8/0xe0 [btrfs]
   do_walk_down+0x541/0x5e3 [btrfs]
   walk_down_tree+0xab/0xe7 [btrfs]
   btrfs_drop_snapshot+0x356/0x71a [btrfs]
   btrfs_clean_one_deleted_snapshot+0xb8/0xf0 [btrfs]
   cleaner_kthread+0x12b/0x160 [btrfs]
   kthread+0x112/0x130
   ret_from_fork+0x27/0x50

When dropping snapshots with qgroup enabled, we will trigger backref
walk.

However such backref walk at that timing is pretty dangerous, as if one
of the parent nodes get WRITE locked by other thread, we could cause a
dead lock.

For example:

           FS 260     FS 261 (Dropped)
            node A        node B
           /      \      /      \
       node C      node D      node E
      /   \         /  \        /     \
  leaf F|leaf G|leaf H|leaf I|leaf J|leaf K

The lock sequence would be:

      Thread A (cleaner)             |       Thread B (other writer)
-----------------------------------------------------------------------
write_lock(B)                        |
write_lock(D)                        |
^^^ called by walk_down_tree()       |
                                     |       write_lock(A)
                                     |       write_lock(D) << Stall
read_lock(H) << for backref walk     |
read_lock(D) << lock owner is        |
                the same thread A    |
                so read lock is OK   |
read_lock(A) << Stall                |

So thread A hold write lock D, and needs read lock A to unlock.
While thread B holds write lock A, while needs lock D to unlock.

This will cause a deadlock.

This is not only limited to snapshot dropping case.  As the backref
walk, even only happens on commit trees, is breaking the normal top-down
locking order, makes it deadlock prone.

Fixes: fb235dc06f ("btrfs: qgroup: Move half of the qgroup accounting time out of commit trans")
CC: stable@vger.kernel.org # 4.14+
Reported-and-tested-by: David Sterba <dsterba@suse.com>
Reported-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
[ rebase to latest branch and fix lock assert bug in btrfs/007 ]
Signed-off-by: Qu Wenruo <wqu@suse.com>
[ copy logs and deadlock analysis from Qu's patch ]
Signed-off-by: David Sterba <dsterba@suse.com>
2019-02-25 14:13:39 +01:00
..
tests btrfs: remove always true if branch in find_delalloc_range 2018-12-17 14:51:44 +01:00
acl.c Btrfs: setup a nofs context for memory allocation at __btrfs_set_acl 2019-02-25 14:13:17 +01:00
async-thread.c btrfs: simplify workqueue name when allocating 2019-02-25 14:13:24 +01:00
async-thread.h
backref.c btrfs: honor path->skip_locking in backref code 2019-02-25 14:13:39 +01:00
backref.h
btrfs_inode.h Btrfs: fix fsync of files with multiple hard links in new directories 2018-12-17 14:51:43 +01:00
check-integrity.c btrfs: Fix typos in comments and strings 2018-12-17 14:51:50 +01:00
check-integrity.h
compression.c btrfs: change set_level() to bound the level passed in 2019-02-25 14:13:32 +01:00
compression.h btrfs: change set_level() to bound the level passed in 2019-02-25 14:13:32 +01:00
ctree.c btrfs: merge btrfs_set_lock_blocking_rw with it's caller 2019-02-25 14:13:28 +01:00
ctree.h btrfs: scrub: remove unused nocow worker pointer 2019-02-25 14:13:38 +01:00
dedupe.h
delayed-inode.c Btrfs: kill btrfs_clear_path_blocking 2018-10-15 17:23:38 +02:00
delayed-inode.h
delayed-ref.c btrfs: qgroup: Move reserved data accounting from btrfs_delayed_ref_head to btrfs_qgroup_extent_record 2019-02-25 14:13:39 +01:00
delayed-ref.h btrfs: qgroup: Move reserved data accounting from btrfs_delayed_ref_head to btrfs_qgroup_extent_record 2019-02-25 14:13:39 +01:00
dev-replace.c btrfs: merge btrfs_find_device and find_device 2019-02-25 14:13:24 +01:00
dev-replace.h btrfs: dev-replace: open code trivial locking helpers 2018-12-17 14:51:45 +01:00
dir-item.c
disk-io.c btrfs: scrub: convert scrub_workers_refcnt to refcount_t 2019-02-25 14:13:38 +01:00
disk-io.h btrfs: drop extra enum initialization where using defaults 2018-12-17 14:51:43 +01:00
export.c
export.h
extent_io.c btrfs: extent_io: Kill the forward declaration of flush_write_bio 2019-02-25 14:13:37 +01:00
extent_io.h btrfs: Remove EXTENT_FIRST_DELALLOC bit 2019-02-25 14:13:36 +01:00
extent_map.c btrfs: Remove impossible condition from mergable_maps 2019-02-25 14:13:21 +01:00
extent_map.h btrfs: Remove impossible condition from mergable_maps 2019-02-25 14:13:21 +01:00
extent-tree.c btrfs: qgroup: Move reserved data accounting from btrfs_delayed_ref_head to btrfs_qgroup_extent_record 2019-02-25 14:13:39 +01:00
file-item.c btrfs: replace btrfs_io_bio::end_io with a simple helper 2018-12-17 14:51:40 +01:00
file.c btrfs: Remove unused arguments from btrfs_get_extent_fiemap 2019-02-25 14:13:17 +01:00
free-space-cache.c Btrfs: fix deadlock on tree root leaf when finding free extent 2018-11-06 16:42:32 +01:00
free-space-cache.h
free-space-tree.c btrfs: use EXPORT_FOR_TESTS for conditionally exported functions 2018-12-17 14:51:37 +01:00
free-space-tree.h
inode-item.c
inode-map.c
inode-map.h
inode.c btrfs: reserve extra space during evict 2019-02-25 14:13:35 +01:00
ioctl.c btrfs: merge btrfs_find_device and find_device 2019-02-25 14:13:24 +01:00
Kconfig
locking.c btrfs: simplify waiting loop in btrfs_tree_lock 2019-02-25 14:13:28 +01:00
locking.h btrfs: merge btrfs_set_lock_blocking_rw with it's caller 2019-02-25 14:13:28 +01:00
lzo.c btrfs: change set_level() to bound the level passed in 2019-02-25 14:13:32 +01:00
Makefile
math.h
ordered-data.c Btrfs: remove no longer used stuff for tracking pending ordered extents 2018-12-17 14:51:25 +01:00
ordered-data.h btrfs: switch BTRFS_ORDERED_* to enums 2018-12-17 14:51:43 +01:00
orphan.c
print-tree.c
print-tree.h
props.c
props.h
qgroup.c btrfs: qgroup: Make qgroup async transaction commit more aggressive 2019-02-25 14:13:39 +01:00
qgroup.h btrfs: qgroup: Move reserved data accounting from btrfs_delayed_ref_head to btrfs_qgroup_extent_record 2019-02-25 14:13:39 +01:00
raid56.c btrfs: Fix typos in comments and strings 2018-12-17 14:51:50 +01:00
raid56.h
rcu-string.h
reada.c btrfs: dev-replace: open code trivial locking helpers 2018-12-17 14:51:45 +01:00
ref-verify.c btrfs: replace btrfs_set_lock_blocking_rw with appropriate helpers 2019-02-25 14:13:27 +01:00
ref-verify.h
relocation.c btrfs: open code now trivial btrfs_set_lock_blocking 2019-02-25 14:13:27 +01:00
root-tree.c
scrub.c btrfs: scrub: add assertions for worker pointers 2019-02-25 14:13:38 +01:00
send.c Remove 'type' argument from access_ok() function 2019-01-03 18:57:57 -08:00
send.h
struct-funcs.c
super.c btrfs: add zstd compression level support 2019-02-25 14:13:33 +01:00
sysfs.c btrfs: Add sysfs support for metadata_uuid feature 2018-12-17 14:51:37 +01:00
sysfs.h btrfs: drop extra enum initialization where using defaults 2018-12-17 14:51:43 +01:00
transaction.c btrfs: open code now trivial btrfs_set_lock_blocking 2019-02-25 14:13:27 +01:00
transaction.h btrfs: drop extra enum initialization where using defaults 2018-12-17 14:51:43 +01:00
tree-checker.c btrfs: Fix typos in comments and strings 2018-12-17 14:51:50 +01:00
tree-checker.h
tree-defrag.c btrfs: open code now trivial btrfs_set_lock_blocking 2019-02-25 14:13:27 +01:00
tree-log.c btrfs: open code now trivial btrfs_set_lock_blocking 2019-02-25 14:13:27 +01:00
tree-log.h Btrfs: remove no longer used io_err from btrfs_log_ctx 2018-12-17 14:51:31 +01:00
ulist.c
ulist.h
uuid-tree.c
volumes.c btrfs: fix comment its device list mutex not volume lock 2019-02-25 14:13:37 +01:00
volumes.h btrfs: introduce new ioctl to unregister a btrfs device 2019-02-25 14:13:30 +01:00
xattr.c Btrfs: use nofs context when initializing security xattrs to avoid deadlock 2018-12-17 14:51:49 +01:00
xattr.h
zlib.c btrfs: change set_level() to bound the level passed in 2019-02-25 14:13:32 +01:00
zstd.c btrfs: add zstd compression level support 2019-02-25 14:13:33 +01:00