kernel_optimize_test/fs
Josef Bacik c8eaeac7b7 btrfs: reserve delalloc metadata differently
With the per-inode block reserves we started refilling the reserve based
on the calculated size of the outstanding csum bytes and extents for the
inode, including the amount we were adding with the new operation.

However, generic/224 exposed a problem with this approach.  With 1000
files all writing at the same time we ended up with a bunch of bytes
being reserved but unusable.

When you write to a file we reserve space for the csum leaves for those
bytes, the number of extent items required to cover those bytes, and a
single transaction item for updating the inode at ordered extent finish
for that range of bytes.  This is held until the ordered extent finishes
and we release all of the reserved space.

If a second write comes in at this point we would add a single
reservation for the new outstanding extent and however many reservations
for the csum leaves.  At this point we find the delta of how much we
have reserved and how much outstanding size this is and attempt to
reserve this delta.  If the first write finishes it will not release any
space, because the space it had reserved for the initial write is still
needed for the second write.  However some space would have been used,
as we have added csums, extent items, and dirtied the inode.  Our
reserved space would be > 0 but less than the total needed reserved
space.

This is just for a single inode, now consider generic/224.  This has
1000 inodes writing in parallel to a very small file system, 1GiB.  In
my testing this usually means we get about a 120MiB metadata area to
work with, more than enough to allow the writes to continue, but not
enough if all of the inodes are stuck trying to reserve the slack space
while continuing to hold their leftovers from their initial writes.

Fix this by pre-reserved _only_ for the space we are currently trying to
add.  Then once that is successful modify our inodes csum count and
outstanding extents, and then add the newly reserved space to the inodes
block_rsv.  This allows us to actually pass generic/224 without running
out of metadata space.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-05-02 13:47:12 +02:00
..
9p
adfs
affs
afs afs: Fix in-progess ops to ignore server-level callback invalidation 2019-04-13 08:37:37 +01:00
autofs
befs
bfs
btrfs btrfs: reserve delalloc metadata differently 2019-05-02 13:47:12 +02:00
cachefiles
ceph ceph: fix ci->i_head_snapc leak 2019-04-23 21:37:54 +02:00
cifs cifs: fix page reference leak with readv/writev 2019-04-24 12:33:59 -05:00
coda
configfs
cramfs
crypto
debugfs debugfs: fix use-after-free on symlink traversal 2019-04-01 00:31:02 -04:00
devpts
dlm
ecryptfs
efivarfs
efs
exportfs
ext2
ext4
f2fs
fat
freevxfs
fscache
fuse Merge branch 'page-refs' (page ref overflow) 2019-04-14 15:09:40 -07:00
gfs2
hfs
hfsplus
hostfs
hpfs
hugetlbfs hugetlbfs: fix memory leak for resv_map 2019-04-05 16:02:31 -10:00
isofs
jbd2
jffs2 jffs2: fix use-after-free on symlink traversal 2019-04-01 00:31:02 -04:00
jfs
kernfs
lockd
minix
nfs NFSv4.1 fix incorrect return value in copy_file_range 2019-04-11 15:23:48 -04:00
nfs_common
nfsd nfsd: wake blocked file lock waiters before sending callback 2019-04-22 15:38:41 -04:00
nilfs2
nls
notify
ntfs
ocfs2
omfs
openpromfs
orangefs
overlayfs
proc fs/proc/proc_sysctl.c: Fix a NULL pointer dereference 2019-04-26 09:18:05 -07:00
pstore
qnx4
qnx6
quota
ramfs
reiserfs
romfs
squashfs
sysfs
sysv
tracefs
ubifs ubifs: fix use-after-free on symlink traversal 2019-04-01 00:31:02 -04:00
udf
ufs
xfs
aio.c aio: use kmem_cache_free() instead of kfree() 2019-04-04 20:13:59 -04:00
anon_inodes.c
attr.c
bad_inode.c
binfmt_aout.c
binfmt_elf_fdpic.c
binfmt_elf.c
binfmt_em86.c
binfmt_flat.c
binfmt_misc.c
binfmt_script.c
block_dev.c block: fix the return errno for direct IO 2019-04-11 21:22:21 -06:00
buffer.c
char_dev.c
compat_binfmt_elf.c
compat_ioctl.c
compat.c
coredump.c
d_path.c
dax.c
dcache.c
dcookies.c
direct-io.c
drop_caches.c
eventfd.c
eventpoll.c
exec.c
fcntl.c
fhandle.c
file_table.c
file.c
filesystems.c
fs_context.c
fs_parser.c
fs_pin.c
fs_struct.c
fs_types.c
fs-writeback.c
inode.c
internal.h
io_uring.c io_uring: remove 'state' argument from io_{read,write} path 2019-04-23 08:17:58 -06:00
ioctl.c
iomap.c
Kconfig
Kconfig.binfmt
libfs.c
locks.c
Makefile
mbcache.c
mount.h
mpage.c
namei.c
namespace.c
no-block.c
nsfs.c
open.c fs: stream_open - opener for stream-like files so that read and write can run simultaneously without deadlock 2019-04-06 07:01:55 -10:00
pipe.c Merge branch 'page-refs' (page ref overflow) 2019-04-14 15:09:40 -07:00
pnode.c
pnode.h
posix_acl.c
proc_namespace.c
read_write.c fs: stream_open - opener for stream-like files so that read and write can run simultaneously without deadlock 2019-04-06 07:01:55 -10:00
readdir.c
select.c
seq_file.c
signalfd.c
splice.c There tracing fixes: 2019-04-26 11:09:55 -07:00
stack.c
stat.c
statfs.c
super.c
sync.c
timerfd.c
userfaultfd.c coredump: fix race condition between mmget_not_zero()/get_task_mm() and core dumping 2019-04-19 09:46:05 -07:00
utimes.c
xattr.c