kernel_optimize_test/fs/btrfs
Filipe Manana c1ea39a77c btrfs: return -EAGAIN for NOWAIT dio reads/writes on compressed and inline extents
commit a4527e1853f8ff6e0b7c2dadad6268bd38427a31 upstream.

When doing a direct IO read or write, we always return -ENOTBLK when we
find a compressed extent (or an inline extent) so that we fallback to
buffered IO. This however is not ideal in case we are in a NOWAIT context
(io_uring for example), because buffered IO can block and we currently
have no support for NOWAIT semantics for buffered IO, so if we need to
fallback to buffered IO we should first signal the caller that we may
need to block by returning -EAGAIN instead.

This behaviour can also result in short reads being returned to user
space, which although it's not incorrect and user space should be able
to deal with partial reads, it's somewhat surprising and even some popular
applications like QEMU (Link tag #1) and MariaDB (Link tag #2) don't
deal with short reads properly (or at all).

The short read case happens when we try to read from a range that has a
non-compressed and non-inline extent followed by a compressed extent.
After having read the first extent, when we find the compressed extent we
return -ENOTBLK from btrfs_dio_iomap_begin(), which results in iomap to
treat the request as a short read, returning 0 (success) and waiting for
previously submitted bios to complete (this happens at
fs/iomap/direct-io.c:__iomap_dio_rw()). After that, and while at
btrfs_file_read_iter(), we call filemap_read() to use buffered IO to
read the remaining data, and pass it the number of bytes we were able to
read with direct IO. Than at filemap_read() if we get a page fault error
when accessing the read buffer, we return a partial read instead of an
-EFAULT error, because the number of bytes previously read is greater
than zero.

So fix this by returning -EAGAIN for NOWAIT direct IO when we find a
compressed or an inline extent.

Reported-by: Dominique MARTINET <dominique.martinet@atmark-techno.com>
Link: https://lore.kernel.org/linux-btrfs/YrrFGO4A1jS0GI0G@atmark-techno.com/
Link: https://jira.mariadb.org/browse/MDEV-27900?focusedCommentId=216582&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-216582
Tested-by: Dominique MARTINET <dominique.martinet@atmark-techno.com>
CC: stable@vger.kernel.org # 5.10+
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-07-21 21:20:01 +02:00
..
tests btrfs: fix missing delalloc new bit for new delalloc ranges 2020-11-13 22:15:59 +01:00
acl.c
async-thread.c btrfs: fix memory ordering between normal and ordered work functions 2021-11-26 10:39:20 +01:00
async-thread.h
backref.c btrfs: remove BUG_ON(!eie) in find_parent_nodes 2022-01-27 10:54:19 +01:00
backref.h btrfs: add asserts for deleting backref cache nodes 2021-03-04 11:38:29 +01:00
block-group.c btrfs: remove unused variable in btrfs_{start,write}_dirty_block_groups() 2022-04-20 09:23:10 +02:00
block-group.h btrfs: fix race between writes to swap files and scrub 2021-03-09 11:11:11 +01:00
block-rsv.c btrfs: print the block rsv type when we fail our reservation 2020-11-05 13:02:05 +01:00
block-rsv.h
btrfs_inode.h btrfs: fix race between marking inode needs to be logged and log syncing 2021-09-03 10:09:28 +02:00
check-integrity.c
check-integrity.h
compression.c btrfs: mark compressed range uptodate only if all bio succeed 2021-08-04 12:46:39 +02:00
compression.h
ctree.c btrfs: check the root node for uptodate before returning it 2022-01-27 10:54:27 +01:00
ctree.h btrfs: fix race between writes to swap files and scrub 2021-03-09 11:11:11 +01:00
delalloc-space.c
delalloc-space.h
delayed-inode.c btrfs: abort transaction if we fail to update the delayed inode 2021-07-14 16:55:55 +02:00
delayed-inode.h
delayed-ref.c btrfs: account for new extents being deleted in total_bytes_pinned 2021-03-04 11:38:30 +01:00
delayed-ref.h btrfs: handle space_info::total_bytes_pinned inside the delayed ref itself 2021-03-04 11:38:30 +01:00
dev-replace.c btrfs: fix deadlock when cloning inline extent and low on free metadata space 2021-01-17 14:16:54 +01:00
dev-replace.h
dir-item.c
discard.c btrfs: merge critical sections of discard lock in workfn 2021-01-19 18:27:24 +01:00
discard.h
disk-io.c btrfs: add "0x" prefix for unsupported optional features 2022-06-09 10:20:49 +02:00
disk-io.h btrfs: add a helper to read the tree_root commit root for backref lookup 2020-10-26 15:04:57 +01:00
export.c
export.h
extent_io.c btrfs: clear extent buffer uptodate when we fail to write it 2021-12-14 11:32:38 +01:00
extent_io.h btrfs: fix qgroup reserve overflow the qgroup limit 2022-04-13 21:01:08 +02:00
extent_map.c
extent_map.h
extent-io-tree.h btrfs: remove struct extent_io_ops 2020-10-07 12:13:25 +02:00
extent-tree.c btrfs: unlock newly allocated extent buffer after error 2021-10-20 11:44:59 +02:00
file-item.c btrfs: replace BUG_ON() in btrfs_csum_one_bio() with proper error handling 2021-10-09 14:40:56 +02:00
file.c btrfs: fix fallocate to use file_modified to update permissions consistently 2022-04-20 09:23:19 +02:00
free-space-cache.c btrfs: fix race between extent freeing/allocation when using bitmaps 2021-03-09 11:11:11 +01:00
free-space-cache.h
free-space-tree.c btrfs: fix possible free space tree corruption with online conversion 2021-02-03 23:28:40 +01:00
free-space-tree.h
inode-item.c
inode-map.c
inode-map.h
inode.c btrfs: return -EAGAIN for NOWAIT dio reads/writes on compressed and inline extents 2022-07-21 21:20:01 +02:00
ioctl.c fsnotify: invalidate dcache before IN_DELETE event 2022-02-01 17:25:48 +01:00
Kconfig btrfs: disable build on platforms having page size 256K 2021-07-14 16:55:56 +02:00
locking.c
locking.h btrfs: introduce BTRFS_NESTING_NEW_ROOT for adding new roots 2020-10-07 12:12:17 +02:00
lzo.c
Makefile
misc.h
ordered-data.c btrfs: remove inode argument from btrfs_start_ordered_extent 2020-10-07 12:13:22 +02:00
ordered-data.h btrfs: remove inode argument from btrfs_start_ordered_extent 2020-10-07 12:13:22 +02:00
orphan.c
print-tree.c btrfs: print the actual offset in btrfs_root_name 2021-01-27 11:55:06 +01:00
print-tree.h btrfs: print the actual offset in btrfs_root_name 2021-01-27 11:55:06 +01:00
props.c
props.h
qgroup.c btrfs: qgroup: fix deadlock between rescan worker and remove qgroup 2022-03-08 19:09:38 +01:00
qgroup.h btrfs: export and rename qgroup_reserve_meta 2021-03-11 14:17:22 +01:00
raid56.c treewide: Change list_sort to use const pointers 2021-09-30 10:11:04 +02:00
raid56.h
rcu-string.h
reada.c btrfs: fix readahead hang and use-after-free after removing a device 2020-10-26 15:03:59 +01:00
ref-verify.c btrfs: ref-verify: fix memory leak in btrfs_ref_tree_mod 2020-11-05 13:03:39 +01:00
ref-verify.h
reflink.c btrfs: fix unexpected error path when reflinking an inline extent 2022-04-08 14:40:04 +02:00
reflink.h
relocation.c btrfs: convert logic BUG_ON()'s in replace_path to ASSERT()'s 2021-05-11 14:47:22 +02:00
root-tree.c btrfs: replace the BUG_ON in btrfs_del_root_ref with proper error handling 2021-12-14 11:32:38 +01:00
scrub.c btrfs: fix race between writes to swap files and scrub 2021-03-09 11:11:11 +01:00
send.c btrfs: send: in case of IO error log it 2022-02-23 12:00:58 +01:00
send.h btrfs: send: avoid copying file data 2020-10-07 12:13:17 +02:00
space-info.c btrfs: prevent __btrfs_dump_space_info() to underflow its free space 2021-09-30 10:11:00 +02:00
space-info.h btrfs: handle space_info::total_bytes_pinned inside the delayed ref itself 2021-03-04 11:38:30 +01:00
struct-funcs.c btrfs: use unaligned helpers for stack and header set/get helpers 2020-10-07 12:13:23 +02:00
super.c btrfs: add error messages to all unrecognized mount options 2022-06-29 08:59:45 +02:00
sysfs.c btrfs: sysfs: fix format string for some discard stats 2021-07-14 16:55:55 +02:00
sysfs.h btrfs: split and refactor btrfs_sysfs_remove_devices_dir 2020-10-07 12:12:21 +02:00
transaction.c btrfs: clear defrag status of a root if starting transaction fails 2021-07-14 16:55:40 +02:00
transaction.h btrfs: fix race between marking inode needs to be logged and log syncing 2021-09-03 10:09:28 +02:00
tree-checker.c btrfs: tree-checker: check item_size for dev_item 2022-03-02 11:42:45 +01:00
tree-checker.h
tree-defrag.c
tree-log.c btrfs: always log symlinks in full mode 2022-05-12 12:25:43 +02:00
tree-log.h
ulist.c
ulist.h
uuid-tree.c
volumes.c btrfs: repair super block num_devices automatically 2022-06-09 10:20:49 +02:00
volumes.h btrfs: fix lockdep warning due to seqcount_mutex on 32bit arch 2021-02-03 23:28:40 +01:00
xattr.c btrfs: fix warning when creating a directory with smack enabled 2021-03-09 11:11:12 +01:00
xattr.h
zlib.c
zstd.c