kernel_optimize_test/fs/btrfs
Alexandre Oliva a81cb9a2d9 clear chunk_alloc flag on retryable failure
I've experienced filesystem freezes with permanent spikes in the active
process count for quite a while, particularly on filesystems whose
available raw space has already been fully allocated to chunks.

While looking into this, I found a pretty obvious error in
do_chunk_alloc: it sets space_info->chunk_alloc, but if
btrfs_alloc_chunk returns an error other than ENOSPC, it returns leaving
that flag set, which causes any other threads waiting for
space_info->chunk_alloc to become zero to spin indefinitely.

I haven't double-checked that this patch fixes the failure I've observed
fully (it's not exactly trivial to trigger), but it surely is a bug and
the fix is trivial, so...  Please put it in :-)

What I saw in that function also happens to explain why in some cases I
see filesystems allocate a huge number of chunks that remain unused
(leading to the scenario above, of not having more chunks to allocate).
It happens for data and metadata, but not necessarily both.  I'm
guessing some thread sets the force_alloc flag on the corresponding
space_info, and then several threads trying to get disk space end up
attempting to allocate a new chunk concurrently.  All of them will see
the force_alloc flag and bump their local copy of force up to the level
they see first, and they won't clear it even if another thread succeeds
in allocating a chunk, thus clearing the force flag.  Then each thread
that observed the force flag will, on its turn, force the allocation of
a new chunk.  And any threads that come in while it does that will see
the force flag still set and pick it up, and so on.  This sounds like a
problem to me, but...  what should the correct behavior be?  Clear
force_flag once we copy it to a local force?  Reset force to the
incoming value on every loop?  Set the flag to our incoming force if we
have it at first, clear our local flag, and move it from the space_info
when we determined that we are the thread that's going to perform the
allocation?

btrfs: clear chunk_alloc flag on retryable failure

From: Alexandre Oliva <oliva@gnu.org>

If btrfs_alloc_chunk fails with e.g. ENOMEM, we exit do_chunk_alloc
without clearing chunk_alloc in space_info.  As a result, any further
calls to do_chunk_alloc on that filesystem will start busy-waiting for
chunk_alloc to be cleared, but it never will be.  This patch adjusts
do_chunk_alloc so that it clears this flag in case of an error.

Signed-off-by: Alexandre Oliva <oliva@gnu.org>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-26 11:00:51 -05:00
..
acl.c Btrfs: skip adding an acl attribute if we don't have to 2012-12-16 20:46:15 -05:00
async-thread.c Btrfs: call the ordered free operation without any locks held 2012-07-25 16:15:07 -04:00
async-thread.h btrfs: return void in functions without error conditions 2012-03-22 01:45:34 +01:00
backref.c Btrfs: fix backref walking race with tree deletions 2013-02-26 11:00:50 -05:00
backref.h Btrfs: move fs/btrfs/ioctl.h to include/uapi/linux/btrfs.h 2013-02-20 09:37:28 -05:00
btrfs_inode.h Btrfs: serialize unlocked dio reads with truncate 2013-02-20 12:59:47 -05:00
check-integrity.c btrfs: define BTRFS_MAGIC as a u64 value 2013-02-20 13:00:01 -05:00
check-integrity.h Btrfs: add optional integrity check code 2011-12-21 19:14:09 +01:00
compat.h
compression.c Btrfs: add rw argument to merge_bio_hook() 2013-02-01 11:49:47 -05:00
compression.h btrfs: return void in functions without error conditions 2012-03-22 01:45:34 +01:00
ctree.c btrfs: remove cache only arguments from defrag path 2013-02-20 12:59:36 -05:00
ctree.h Btrfs: fix remount vs autodefrag 2013-02-21 08:11:43 -05:00
delayed-inode.c btrfs: remove unused "item" in btrfs_insert_delayed_item() 2013-02-20 12:59:23 -05:00
delayed-inode.h Btrfs: fix lots of orphan inodes when the space is not enough 2013-02-20 09:36:39 -05:00
delayed-ref.c Btrfs: make delayed ref lock logic more readable 2013-02-20 09:36:41 -05:00
delayed-ref.h Merge branch 'raid56-experimental' into for-linus-3.9 2013-02-20 14:06:05 -05:00
dev-replace.c Btrfs: check the return value of btrfs_start_delalloc_inodes() 2013-02-20 09:37:21 -05:00
dev-replace.h Btrfs: add new sources for device replace code 2012-12-12 17:15:41 -05:00
dir-item.c Btrfs: fix hash overflow handling 2012-12-17 14:48:21 -05:00
disk-io.c Merge branch 'raid56-experimental' into for-linus-3.9 2013-02-20 14:06:05 -05:00
disk-io.h Btrfs: RAID5 and RAID6 2013-02-01 14:24:23 -05:00
export.c ->encode_fh() API change 2012-05-29 23:28:33 -04:00
export.h
extent_io.c Merge branch 'raid56-experimental' into for-linus-3.9 2013-02-20 14:06:05 -05:00
extent_io.h Merge branch 'raid56-experimental' into for-linus-3.9 2013-02-20 14:06:05 -05:00
extent_map.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs 2013-02-08 12:06:46 +11:00
extent_map.h Btrfs: do not allow logged extents to be merged or removed 2013-01-24 12:49:48 -05:00
extent-tree.c clear chunk_alloc flag on retryable failure 2013-02-26 11:00:51 -05:00
file-item.c Btrfs: extend the checksum item as much as possible 2013-02-20 12:59:37 -05:00
file.c Btrfs: fix remount vs autodefrag 2013-02-21 08:11:43 -05:00
free-space-cache.c Merge branch 'raid56-experimental' into for-linus-3.9 2013-02-20 14:06:05 -05:00
free-space-cache.h btrfs: remove all unused functions 2011-05-06 12:34:03 +02:00
hash.h btrfs: extended inode refs 2012-10-09 09:14:45 -04:00
inode-item.c btrfs: extended inode refs 2012-10-09 09:14:45 -04:00
inode-map.c Btrfs: improve the noflush reservation 2012-12-11 13:31:31 -05:00
inode-map.h Btrfs: Support reading/writing on disk free ino cache 2011-04-25 16:46:11 +08:00
inode.c Btrfs: make sure NODATACOW also gets NODATASUM set 2013-02-26 10:57:48 -05:00
ioctl.c Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next into for-linus-3.9 2013-02-20 14:05:45 -05:00
Kconfig Btrfs: select XOR_BLOCKS in Kconfig 2013-02-05 09:55:30 -05:00
locking.c Btrfs: save us a read_lock 2013-02-20 09:37:17 -05:00
locking.h btrfs: return void in functions without error conditions 2012-03-22 01:45:34 +01:00
lzo.c btrfs: remove the second argument of k[un]map_atomic() 2012-03-20 21:48:21 +08:00
Makefile Btrfs: RAID5 and RAID6 2013-02-01 14:24:23 -05:00
math.h Btrfs: cleanup duplicated division functions 2012-12-11 13:31:30 -05:00
ordered-data.c Btrfs: place ordered operations on a per transaction list 2013-02-20 12:59:57 -05:00
ordered-data.h Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next into for-linus-3.9 2013-02-20 14:05:45 -05:00
orphan.c btrfs: replace many BUG_ONs with proper error handling 2012-03-22 11:52:54 +01:00
print-tree.c btrfs: add missing break in btrfs_print_leaf() 2013-02-20 12:59:20 -05:00
print-tree.h
qgroup.c Btrfs: fix missing check before disabling quota 2013-02-20 13:00:07 -05:00
raid56.c Btrfs: add a plugging callback to raid56 writes 2013-02-01 14:24:24 -05:00
raid56.h Btrfs: RAID5 and RAID6 2013-02-01 14:24:23 -05:00
rcu-string.h Btrfs: use rcu to protect device->name 2012-06-14 21:29:16 -04:00
reada.c Btrfs: introduce GET_READ_MIRRORS functionality for btrfs_map_block() 2012-12-12 17:15:43 -05:00
relocation.c Btrfs: use wrapper page_offset 2013-02-20 09:36:43 -05:00
root-tree.c Btrfs: rename root_times_lock to root_item_lock 2012-12-16 20:46:21 -05:00
scrub.c Merge branch 'raid56-experimental' into for-linus-3.9 2013-02-20 14:06:05 -05:00
send.c btrfs: add "no file data" flag to btrfs send ioctl 2013-02-20 12:59:39 -05:00
send.h btrfs: add "no file data" flag to btrfs send ioctl 2013-02-20 12:59:39 -05:00
struct-funcs.c Btrfs: rewrite BTRFS_SETGET_FUNCS 2012-07-23 16:28:06 -04:00
super.c Btrfs: fix remount vs autodefrag 2013-02-21 08:11:43 -05:00
sysfs.c btrfs: Remove unused sysfs code 2011-06-17 14:54:18 -04:00
transaction.c Merge branch 'raid56-experimental' into for-linus-3.9 2013-02-20 14:06:05 -05:00
transaction.h Btrfs: fix uncompleted transaction 2013-02-20 13:00:05 -05:00
tree-defrag.c btrfs: remove cache only arguments from defrag path 2013-02-20 12:59:36 -05:00
tree-log.c btrfs: remove cache only arguments from defrag path 2013-02-20 12:59:36 -05:00
tree-log.h btrfs: return void in functions without error conditions 2012-03-22 01:45:34 +01:00
ulist.c Btrfs: make aux field of ulist 64 bit 2012-10-01 15:18:53 -04:00
ulist.h Btrfs: make aux field of ulist 64 bit 2012-10-01 15:18:53 -04:00
version.h
volumes.c Btrfs: fix max chunk size on raid5/6 2013-02-20 17:08:18 -05:00
volumes.h Merge branch 'raid56-experimental' into for-linus-3.9 2013-02-20 14:06:05 -05:00
xattr.c Btrfs: only log the inode item if we can get away with it 2012-12-16 20:46:21 -05:00
xattr.h
zlib.c btrfs: fix message printing 2012-10-09 09:19:57 -04:00