tmp_suning_uos_patched/fs/xfs
Dave Chinner 2728d95c6c xfs: Fix CIL throttle hang when CIL space used going backwards
commit 19f4e7cc819771812a7f527d7897c2deffbf7a00 upstream.

A hang with tasks stuck on the CIL hard throttle was reported and
largely diagnosed by Donald Buczek, who discovered that it was a
result of the CIL context space usage decrementing in committed
transactions once the hard throttle limit had been hit and processes
were already blocked.  This resulted in the CIL push not waking up
those waiters because the CIL context was no longer over the hard
throttle limit.

The surprising aspect of this was the CIL space usage going
backwards regularly enough to trigger this situation. Assumptions
had been made in design that the relogging process would only
increase the size of the objects in the CIL, and so that space would
only increase.

This change and commit message fixes the issue and documents the
result of an audit of the triggers that can cause the CIL space to
go backwards, how large the backwards steps tend to be, the
frequency in which they occur, and what the impact on the CIL
accounting code is.

Even though the CIL ctx->space_used can go backwards, it will only
do so if the log item is already logged to the CIL and contains a
space reservation for it's entire logged state. This is tracked by
the shadow buffer state on the log item. If the item is not
previously logged in the CIL it has no shadow buffer nor log vector,
and hence the entire size of the logged item copied to the log
vector is accounted to the CIL space usage. i.e.  it will always go
up in this case.

If the item has a log vector (i.e. already in the CIL) and the size
decreases, then the existing log vector will be overwritten and the
space usage will go down. This is the only condition where the space
usage reduces, and it can only occur when an item is already tracked
in the CIL. Hence we are safe from CIL space usage underruns as a
result of log items decreasing in size when they are relogged.

Typically this reduction in CIL usage occurs from metadata blocks
being free, such as when a btree block merge occurs or a directory
enter/xattr entry is removed and the da-tree is reduced in size.
This generally results in a reduction in size of around a single
block in the CIL, but also tends to increase the number of log
vectors because the parent and sibling nodes in the tree needs to be
updated when a btree block is removed. If a multi-level merge
occurs, then we see reduction in size of 2+ blocks, but again the
log vector count goes up.

The other vector is inode fork size changes, which only log the
current size of the fork and ignore the previously logged size when
the fork is relogged. Hence if we are removing items from the inode
fork (dir/xattr removal in shortform, extent record removal in
extent form, etc) the relogged size of the inode for can decrease.

No other log items can decrease in size either because they are a
fixed size (e.g. dquots) or they cannot be relogged (e.g. relogging
an intent actually creates a new intent log item and doesn't relog
the old item at all.) Hence the only two vectors for CIL context
size reduction are relogging inode forks and marking buffers active
in the CIL as stale.

Long story short: the majority of the code does the right thing and
handles the reduction in log item size correctly, and only the CIL
hard throttle implementation is problematic and needs fixing. This
patch makes that fix, as well as adds comments in the log item code
that result in items shrinking in size when they are relogged as a
clear reminder that this can and does happen frequently.

The throttle fix is based upon the change Donald proposed, though it
goes further to ensure that once the throttle is activated, it
captures all tasks until the CIL push issues a wakeup, regardless of
whether the CIL space used has gone back under the throttle
threshold.

This ensures that we prevent tasks reducing the CIL slightly under
the throttle threshold and then making more changes that push it
well over the throttle limit. This is acheived by checking if the
throttle wait queue is already active as a condition of throttling.
Hence once we start throttling, we continue to apply the throttle
until the CIL context push wakes everything on the wait queue.

We can use waitqueue_active() for the waitqueue manipulations and
checks as they are all done under the ctx->xc_push_lock. Hence the
waitqueue has external serialisation and we can safely peek inside
the wait queue without holding the internal waitqueue locks.

Many thanks to Donald for his diagnostic and analysis work to
isolate the cause of this hang.

Reported-and-tested-by: Donald Buczek <buczek@molgen.mpg.de>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-06 08:42:42 +02:00
..
libxfs xfs: fix an ABBA deadlock in xfs_rename 2022-06-06 08:42:42 +02:00
scrub treewide: Change list_sort to use const pointers 2021-09-30 10:11:04 +02:00
Kconfig xfs: fix Kconfig asking about XFS_SUPPORT_V4 when XFS_FS=n 2020-10-16 15:34:28 -07:00
kmem.c
kmem.h
Makefile
mrlock.h
xfs_acl.c
xfs_acl.h
xfs_aops.c xfs: fix missing CoW blocks writeback conversion retry 2020-11-04 08:52:47 -08:00
xfs_aops.h
xfs_attr_inactive.c
xfs_attr_list.c
xfs_bio_io.c
xfs_bmap_item.c treewide: Change list_sort to use const pointers 2021-09-30 10:11:04 +02:00
xfs_bmap_item.h
xfs_bmap_util.c xfs: fix fallocate functions when rtextsize is larger than 1 2020-10-21 09:05:19 -07:00
xfs_bmap_util.h
xfs_buf_item_recover.c xfs: fix finobt btree block recovery ordering 2020-09-30 07:28:52 -07:00
xfs_buf_item.c xfs: Fix CIL throttle hang when CIL space used going backwards 2022-06-06 08:42:42 +02:00
xfs_buf_item.h
xfs_buf.c treewide: Change list_sort to use const pointers 2021-09-30 10:11:04 +02:00
xfs_buf.h
xfs_dir2_readdir.c
xfs_discard.c
xfs_discard.h
xfs_dquot_item_recover.c
xfs_dquot_item.c
xfs_dquot_item.h
xfs_dquot.c xfs: fix some comments 2020-09-25 11:34:07 -07:00
xfs_dquot.h
xfs_error.c
xfs_error.h
xfs_export.c
xfs_export.h
xfs_extent_busy.c treewide: Change list_sort to use const pointers 2021-09-30 10:11:04 +02:00
xfs_extent_busy.h treewide: Change list_sort to use const pointers 2021-09-30 10:11:04 +02:00
xfs_extfree_item.c treewide: Change list_sort to use const pointers 2021-09-30 10:11:04 +02:00
xfs_extfree_item.h
xfs_file.c xfs: fix fallocate functions when rtextsize is larger than 1 2020-10-21 09:05:19 -07:00
xfs_filestream.c xfs: drop the obsolete comment on filestream locking 2020-09-25 11:34:08 -07:00
xfs_filestream.h
xfs_fsmap.c xfs: fix deadlock and streamline xfs_getfsmap performance 2020-10-07 08:40:29 -07:00
xfs_fsmap.h xfs: fix deadlock and streamline xfs_getfsmap performance 2020-10-07 08:40:29 -07:00
xfs_fsops.c
xfs_fsops.h
xfs_globals.c
xfs_health.c
xfs_icache.c
xfs_icache.h
xfs_icreate_item.c
xfs_icreate_item.h
xfs_inode_item_recover.c
xfs_inode_item.c xfs: Fix CIL throttle hang when CIL space used going backwards 2022-06-06 08:42:42 +02:00
xfs_inode_item.h
xfs_inode.c xfs: fix an ABBA deadlock in xfs_rename 2022-06-06 08:42:42 +02:00
xfs_inode.h
xfs_ioctl.c xfs: map unwritten blocks in XFS_IOC_{ALLOC,FREE}SP just like fallocate 2022-01-11 15:25:01 +01:00
xfs_ioctl.h
xfs_ioctl32.c
xfs_ioctl32.h
xfs_iomap.c xfs: don't allow NOWAIT DIO across extent boundaries 2020-11-19 08:59:11 -08:00
xfs_iomap.h
xfs_iops.c xfs: Fix assert failure in xfs_setattr_size() 2021-03-07 12:34:05 +01:00
xfs_iops.h
xfs_itable.c
xfs_itable.h
xfs_iwalk.c xfs: fix the forward progress assertion in xfs_iwalk_run_callbacks 2022-06-06 08:42:41 +02:00
xfs_iwalk.h
xfs_linux.h xfs: fix fallocate functions when rtextsize is larger than 1 2020-10-21 09:05:19 -07:00
xfs_log_cil.c xfs: Fix CIL throttle hang when CIL space used going backwards 2022-06-06 08:42:42 +02:00
xfs_log_priv.h
xfs_log_recover.c xfs: cancel intents immediately if process_intents fails 2020-10-21 16:28:46 -07:00
xfs_log.c xfs: expose the log push threshold 2020-10-07 08:40:29 -07:00
xfs_log.h xfs: expose the log push threshold 2020-10-07 08:40:29 -07:00
xfs_message.c
xfs_message.h treewide: Convert macro and uses of __section(foo) to __section("foo") 2020-10-25 14:51:49 -07:00
xfs_mount.c xfs: return corresponding errcode if xfs_initialize_perag() fail 2020-11-18 09:23:51 -08:00
xfs_mount.h
xfs_mru_cache.c
xfs_mru_cache.h
xfs_ondisk.h
xfs_pnfs.c xfs: fix a missing unlock on error in xfs_fs_map_blocks 2020-11-11 08:07:37 -08:00
xfs_pnfs.h
xfs_pwork.c
xfs_pwork.h
xfs_qm_bhv.c
xfs_qm_syscalls.c
xfs_qm.c xfs: do the ASSERT for the arguments O_{u,g,p}dqpp 2020-10-07 08:40:29 -07:00
xfs_qm.h
xfs_quota.h
xfs_quotaops.c
xfs_refcount_item.c treewide: Change list_sort to use const pointers 2021-09-30 10:11:04 +02:00
xfs_refcount_item.h
xfs_reflink.c xfs: only flush the unshared range in xfs_reflink_unshare 2020-11-04 17:41:56 -08:00
xfs_reflink.h
xfs_rmap_item.c treewide: Change list_sort to use const pointers 2021-09-30 10:11:04 +02:00
xfs_rmap_item.h
xfs_rtalloc.c xfs: annotate grabbing the realtime bitmap/summary locks in growfs 2020-10-13 08:41:31 -07:00
xfs_rtalloc.h
xfs_stats.c xfs: periodically relog deferred intent items 2020-10-07 08:40:28 -07:00
xfs_stats.h xfs: periodically relog deferred intent items 2020-10-07 08:40:28 -07:00
xfs_super.c xfs: show the proper user quota options 2022-06-06 08:42:41 +02:00
xfs_super.h
xfs_symlink.c
xfs_symlink.h
xfs_sysctl.c xfs: remove deprecated sysctl options 2020-09-25 11:34:08 -07:00
xfs_sysctl.h
xfs_sysfs.c
xfs_sysfs.h
xfs_trace.c
xfs_trace.h xfs: periodically relog deferred intent items 2020-10-07 08:40:28 -07:00
xfs_trans_ail.c
xfs_trans_buf.c
xfs_trans_dquot.c xfs: fix the indent in xfs_trans_mod_dquot 2020-10-07 08:40:29 -07:00
xfs_trans_priv.h
xfs_trans.c xfs: do the assert for all the log done items in xfs_trans_cancel 2020-09-25 11:34:07 -07:00
xfs_trans.h xfs: periodically relog deferred intent items 2020-10-07 08:40:28 -07:00
xfs_xattr.c
xfs.h