kernel_optimize_test/drivers/md
Guoqing Jiang 41a9504112 md-cluster: release RESYNC lock after the last resync message
All the RESYNC messages are sent with resync lock held, the only
exception is resync_finish which releases resync_lockres before
send the last resync message, this should be changed as well.
Otherwise, we can see deadlock issue as follows:

clustermd2-gqjiang2:~ # cat /proc/mdstat
Personalities : [raid10] [raid1]
md0 : active raid1 sdg[0] sdf[1]
      134144 blocks super 1.2 [2/2] [UU]
      [===================>.]  resync = 99.6% (134144/134144) finish=0.0min speed=26K/sec
      bitmap: 1/1 pages [4KB], 65536KB chunk

unused devices: <none>
clustermd2-gqjiang2:~ # ps aux|grep md|grep D
root     20497  0.0  0.0      0     0 ?        D    16:00   0:00 [md0_raid1]
clustermd2-gqjiang2:~ # cat /proc/20497/stack
[<ffffffffc05ff51e>] dlm_lock_sync+0x8e/0xc0 [md_cluster]
[<ffffffffc05ff7e8>] __sendmsg+0x98/0x130 [md_cluster]
[<ffffffffc05ff900>] sendmsg+0x20/0x30 [md_cluster]
[<ffffffffc05ffc35>] resync_info_update+0xb5/0xc0 [md_cluster]
[<ffffffffc0593e84>] md_reap_sync_thread+0x134/0x170 [md_mod]
[<ffffffffc059514c>] md_check_recovery+0x28c/0x510 [md_mod]
[<ffffffffc060c882>] raid1d+0x42/0x800 [raid1]
[<ffffffffc058ab61>] md_thread+0x121/0x150 [md_mod]
[<ffffffff9a0a5b3f>] kthread+0xff/0x140
[<ffffffff9a800235>] ret_from_fork+0x35/0x40
[<ffffffffffffffff>] 0xffffffffffffffff

clustermd-gqjiang1:~ # ps aux|grep md|grep D
root     20531  0.0  0.0      0     0 ?        D    16:00   0:00 [md0_raid1]
root     20537  0.0  0.0      0     0 ?        D    16:00   0:00 [md0_cluster_rec]
root     20676  0.0  0.0      0     0 ?        D    16:01   0:00 [md0_resync]
clustermd-gqjiang1:~ # cat /proc/mdstat
Personalities : [raid10] [raid1]
md0 : active raid1 sdf[1] sdg[0]
      134144 blocks super 1.2 [2/2] [UU]
      [===================>.]  resync = 97.3% (131072/134144) finish=8076.8min speed=0K/sec
      bitmap: 1/1 pages [4KB], 65536KB chunk

unused devices: <none>
clustermd-gqjiang1:~ # cat /proc/20531/stack
[<ffffffffc080974d>] metadata_update_start+0xcd/0xd0 [md_cluster]
[<ffffffffc079c897>] md_update_sb.part.61+0x97/0x820 [md_mod]
[<ffffffffc079f15b>] md_check_recovery+0x29b/0x510 [md_mod]
[<ffffffffc0816882>] raid1d+0x42/0x800 [raid1]
[<ffffffffc0794b61>] md_thread+0x121/0x150 [md_mod]
[<ffffffff9e0a5b3f>] kthread+0xff/0x140
[<ffffffff9e800235>] ret_from_fork+0x35/0x40
[<ffffffffffffffff>] 0xffffffffffffffff
clustermd-gqjiang1:~ # cat /proc/20537/stack
[<ffffffffc0813222>] freeze_array+0xf2/0x140 [raid1]
[<ffffffffc080a56e>] recv_daemon+0x41e/0x580 [md_cluster]
[<ffffffffc0794b61>] md_thread+0x121/0x150 [md_mod]
[<ffffffff9e0a5b3f>] kthread+0xff/0x140
[<ffffffff9e800235>] ret_from_fork+0x35/0x40
[<ffffffffffffffff>] 0xffffffffffffffff
clustermd-gqjiang1:~ # cat /proc/20676/stack
[<ffffffffc080951e>] dlm_lock_sync+0x8e/0xc0 [md_cluster]
[<ffffffffc080957f>] lock_token+0x2f/0xa0 [md_cluster]
[<ffffffffc0809622>] lock_comm+0x32/0x90 [md_cluster]
[<ffffffffc08098f5>] sendmsg+0x15/0x30 [md_cluster]
[<ffffffffc0809c0a>] resync_info_update+0x8a/0xc0 [md_cluster]
[<ffffffffc08130ba>] raid1_sync_request+0xa9a/0xb10 [raid1]
[<ffffffffc079b8ea>] md_do_sync+0xbaa/0xf90 [md_mod]
[<ffffffffc0794b61>] md_thread+0x121/0x150 [md_mod]
[<ffffffff9e0a5b3f>] kthread+0xff/0x140
[<ffffffff9e800235>] ret_from_fork+0x35/0x40
[<ffffffffffffffff>] 0xffffffffffffffff

Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: Shaohua Li <shli@fb.com>
2018-08-31 17:38:10 -07:00
..
bcache bcache: release dc->writeback_lock properly in bch_writeback_thread() 2018-08-22 15:06:29 -06:00
persistent-data dm: Avoid namespace collision with bitmap API 2018-08-01 15:49:38 -07:00
dm-bio-prison-v1.c dm: adjust structure members to improve alignment 2018-06-08 11:53:14 -04:00
dm-bio-prison-v1.h
dm-bio-prison-v2.c dm: adjust structure members to improve alignment 2018-06-08 11:53:14 -04:00
dm-bio-prison-v2.h
dm-bio-record.h
dm-bufio.c
dm-builtin.c
dm-cache-background-tracker.c
dm-cache-background-tracker.h
dm-cache-block-types.h
dm-cache-metadata.c dm cache metadata: set dirty on all cache blocks after a crash 2018-08-09 12:14:32 -04:00
dm-cache-metadata.h
dm-cache-policy-internal.h
dm-cache-policy-smq.c treewide: Use array_size() in vzalloc() 2018-06-12 16:19:22 -07:00
dm-cache-policy.c
dm-cache-policy.h
dm-cache-target.c dm kcopyd: return void from dm_kcopyd_copy() 2018-07-31 17:33:21 -04:00
dm-core.h dm: adjust structure members to improve alignment 2018-06-08 11:53:14 -04:00
dm-crypt.c dm crypt: don't decrease device limits 2018-08-13 15:28:41 -04:00
dm-delay.c dm delay: add flush as a third class of IO 2018-07-27 15:24:19 -04:00
dm-era-target.c
dm-exception-store.c
dm-exception-store.h
dm-flakey.c
dm-integrity.c dm integrity: recalculate checksums on creation 2018-07-27 15:24:27 -04:00
dm-io.c
dm-ioctl.c dm: report which conflicting type caused error during table_load() 2018-06-08 09:50:15 -04:00
dm-kcopyd.c dm kcopyd: avoid softlockup in run_complete_job 2018-08-08 09:16:24 -04:00
dm-linear.c
dm-log-userspace-base.c
dm-log-userspace-transfer.c
dm-log-userspace-transfer.h
dm-log-writes.c
dm-log.c
dm-mpath.c
dm-mpath.h
dm-path-selector.c
dm-path-selector.h
dm-queue-length.c
dm-raid.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input 2018-08-18 16:48:07 -07:00
dm-raid1.c dm kcopyd: return void from dm_kcopyd_copy() 2018-07-31 17:33:21 -04:00
dm-region-hash.c - Error path bug fix for overflow tests (Dan) 2018-06-12 18:28:00 -07:00
dm-round-robin.c
dm-rq.c
dm-rq.h
dm-service-time.c
dm-snap-persistent.c
dm-snap-transient.c
dm-snap.c dm snapshot: remove stale FIXME in snapshot_map() 2018-08-08 20:50:58 -04:00
dm-stats.c treewide: kmalloc() -> kmalloc_array() 2018-06-12 16:19:22 -07:00
dm-stats.h
dm-stripe.c
dm-switch.c treewide: Use array_size() in vmalloc() 2018-06-12 16:19:22 -07:00
dm-sysfs.c
dm-table.c dm: prevent DAX mounts if not supported 2018-06-28 16:06:14 -04:00
dm-target.c
dm-thin-metadata.c dm thin metadata: remove needless work from __commit_transaction 2018-06-22 14:51:11 -04:00
dm-thin-metadata.h
dm-thin.c dm thin: stop no_space_timeout worker when switching to write-mode 2018-08-07 14:30:29 -04:00
dm-uevent.c
dm-uevent.h
dm-unstripe.c
dm-verity-fec.c
dm-verity-fec.h
dm-verity-target.c treewide: kvzalloc() -> kvcalloc() 2018-06-12 16:19:22 -07:00
dm-verity.h
dm-writecache.c libnvdimm-for-4.19_misc 2018-08-25 18:13:10 -07:00
dm-zero.c
dm-zoned-metadata.c
dm-zoned-reclaim.c dm kcopyd: return void from dm_kcopyd_copy() 2018-07-31 17:33:21 -04:00
dm-zoned-target.c dm zoned: avoid triggering reclaim from inside dmz_map() 2018-06-22 14:51:12 -04:00
dm-zoned.h
dm.c block: Add and use op_stat_group() for indexing disk_stat fields. 2018-07-18 08:44:20 -06:00
dm.h
Kconfig dm: add writecache target 2018-06-08 11:59:51 -04:00
Makefile dm: add writecache target 2018-06-08 11:59:51 -04:00
md-bitmap.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input 2018-08-18 16:48:07 -07:00
md-bitmap.h md: Avoid namespace collision with bitmap API 2018-08-01 15:49:39 -07:00
md-cluster.c md-cluster: release RESYNC lock after the last resync message 2018-08-31 17:38:10 -07:00
md-cluster.h
md-faulty.c
md-linear.c
md-linear.h
md-multipath.c treewide: kzalloc() -> kcalloc() 2018-06-12 16:19:22 -07:00
md-multipath.h
md.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input 2018-08-18 16:48:07 -07:00
md.h md-cluster: show array's status more accurate 2018-07-05 11:17:01 -07:00
raid1-10.c
raid1.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input 2018-08-18 16:48:07 -07:00
raid1.h
raid5-cache.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input 2018-08-18 16:48:07 -07:00
raid5-log.h md/raid5-cache: disable reshape completely 2018-08-31 17:38:09 -07:00
raid5-ppl.c
raid5.c md/raid5-cache: disable reshape completely 2018-08-31 17:38:09 -07:00
raid5.h Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md 2018-06-09 12:01:36 -07:00
raid10.c RAID10 BUG_ON in raise_barrier when force is true and conf->barrier is 0 2018-08-31 17:38:10 -07:00
raid10.h
raid0.c treewide: kzalloc() -> kcalloc() 2018-06-12 16:19:22 -07:00
raid0.h