kernel_optimize_test/fs/btrfs
Filipe Manana c1ea39a77c btrfs: return -EAGAIN for NOWAIT dio reads/writes on compressed and inline extents
commit a4527e1853f8ff6e0b7c2dadad6268bd38427a31 upstream.

When doing a direct IO read or write, we always return -ENOTBLK when we
find a compressed extent (or an inline extent) so that we fallback to
buffered IO. This however is not ideal in case we are in a NOWAIT context
(io_uring for example), because buffered IO can block and we currently
have no support for NOWAIT semantics for buffered IO, so if we need to
fallback to buffered IO we should first signal the caller that we may
need to block by returning -EAGAIN instead.

This behaviour can also result in short reads being returned to user
space, which although it's not incorrect and user space should be able
to deal with partial reads, it's somewhat surprising and even some popular
applications like QEMU (Link tag #1) and MariaDB (Link tag #2) don't
deal with short reads properly (or at all).

The short read case happens when we try to read from a range that has a
non-compressed and non-inline extent followed by a compressed extent.
After having read the first extent, when we find the compressed extent we
return -ENOTBLK from btrfs_dio_iomap_begin(), which results in iomap to
treat the request as a short read, returning 0 (success) and waiting for
previously submitted bios to complete (this happens at
fs/iomap/direct-io.c:__iomap_dio_rw()). After that, and while at
btrfs_file_read_iter(), we call filemap_read() to use buffered IO to
read the remaining data, and pass it the number of bytes we were able to
read with direct IO. Than at filemap_read() if we get a page fault error
when accessing the read buffer, we return a partial read instead of an
-EFAULT error, because the number of bytes previously read is greater
than zero.

So fix this by returning -EAGAIN for NOWAIT direct IO when we find a
compressed or an inline extent.

Reported-by: Dominique MARTINET <dominique.martinet@atmark-techno.com>
Link: https://lore.kernel.org/linux-btrfs/YrrFGO4A1jS0GI0G@atmark-techno.com/
Link: https://jira.mariadb.org/browse/MDEV-27900?focusedCommentId=216582&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-216582
Tested-by: Dominique MARTINET <dominique.martinet@atmark-techno.com>
CC: stable@vger.kernel.org # 5.10+
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-07-21 21:20:01 +02:00
..
tests
acl.c
async-thread.c btrfs: fix memory ordering between normal and ordered work functions 2021-11-26 10:39:20 +01:00
async-thread.h
backref.c btrfs: remove BUG_ON(!eie) in find_parent_nodes 2022-01-27 10:54:19 +01:00
backref.h
block-group.c btrfs: remove unused variable in btrfs_{start,write}_dirty_block_groups() 2022-04-20 09:23:10 +02:00
block-group.h
block-rsv.c
block-rsv.h
btrfs_inode.h btrfs: fix race between marking inode needs to be logged and log syncing 2021-09-03 10:09:28 +02:00
check-integrity.c
check-integrity.h
compression.c btrfs: mark compressed range uptodate only if all bio succeed 2021-08-04 12:46:39 +02:00
compression.h
ctree.c btrfs: check the root node for uptodate before returning it 2022-01-27 10:54:27 +01:00
ctree.h
delalloc-space.c
delalloc-space.h
delayed-inode.c btrfs: abort transaction if we fail to update the delayed inode 2021-07-14 16:55:55 +02:00
delayed-inode.h
delayed-ref.c
delayed-ref.h
dev-replace.c
dev-replace.h
dir-item.c
discard.c
discard.h
disk-io.c btrfs: add "0x" prefix for unsupported optional features 2022-06-09 10:20:49 +02:00
disk-io.h
export.c
export.h
extent_io.c btrfs: clear extent buffer uptodate when we fail to write it 2021-12-14 11:32:38 +01:00
extent_io.h btrfs: fix qgroup reserve overflow the qgroup limit 2022-04-13 21:01:08 +02:00
extent_map.c
extent_map.h
extent-io-tree.h
extent-tree.c btrfs: unlock newly allocated extent buffer after error 2021-10-20 11:44:59 +02:00
file-item.c btrfs: replace BUG_ON() in btrfs_csum_one_bio() with proper error handling 2021-10-09 14:40:56 +02:00
file.c btrfs: fix fallocate to use file_modified to update permissions consistently 2022-04-20 09:23:19 +02:00
free-space-cache.c
free-space-cache.h
free-space-tree.c
free-space-tree.h
inode-item.c
inode-map.c
inode-map.h
inode.c btrfs: return -EAGAIN for NOWAIT dio reads/writes on compressed and inline extents 2022-07-21 21:20:01 +02:00
ioctl.c fsnotify: invalidate dcache before IN_DELETE event 2022-02-01 17:25:48 +01:00
Kconfig btrfs: disable build on platforms having page size 256K 2021-07-14 16:55:56 +02:00
locking.c
locking.h
lzo.c
Makefile
misc.h
ordered-data.c
ordered-data.h
orphan.c
print-tree.c
print-tree.h
props.c
props.h
qgroup.c btrfs: qgroup: fix deadlock between rescan worker and remove qgroup 2022-03-08 19:09:38 +01:00
qgroup.h
raid56.c treewide: Change list_sort to use const pointers 2021-09-30 10:11:04 +02:00
raid56.h
rcu-string.h
reada.c
ref-verify.c
ref-verify.h
reflink.c btrfs: fix unexpected error path when reflinking an inline extent 2022-04-08 14:40:04 +02:00
reflink.h
relocation.c
root-tree.c btrfs: replace the BUG_ON in btrfs_del_root_ref with proper error handling 2021-12-14 11:32:38 +01:00
scrub.c
send.c btrfs: send: in case of IO error log it 2022-02-23 12:00:58 +01:00
send.h
space-info.c btrfs: prevent __btrfs_dump_space_info() to underflow its free space 2021-09-30 10:11:00 +02:00
space-info.h
struct-funcs.c
super.c btrfs: add error messages to all unrecognized mount options 2022-06-29 08:59:45 +02:00
sysfs.c btrfs: sysfs: fix format string for some discard stats 2021-07-14 16:55:55 +02:00
sysfs.h
transaction.c btrfs: clear defrag status of a root if starting transaction fails 2021-07-14 16:55:40 +02:00
transaction.h btrfs: fix race between marking inode needs to be logged and log syncing 2021-09-03 10:09:28 +02:00
tree-checker.c btrfs: tree-checker: check item_size for dev_item 2022-03-02 11:42:45 +01:00
tree-checker.h
tree-defrag.c
tree-log.c btrfs: always log symlinks in full mode 2022-05-12 12:25:43 +02:00
tree-log.h
ulist.c
ulist.h
uuid-tree.c
volumes.c btrfs: repair super block num_devices automatically 2022-06-09 10:20:49 +02:00
volumes.h
xattr.c
xattr.h
zlib.c
zstd.c