kernel_optimize_test/fs/ocfs2
Junxiao Bi da4f4916da ocfs2: issue zeroout to EOF blocks
commit 9449ad33be8480f538b11a593e2dda2fb33ca06d upstream.

For punch holes in EOF blocks, fallocate used buffer write to zero the
EOF blocks in last cluster.  But since ->writepage will ignore EOF
pages, those zeros will not be flushed.

This "looks" ok as commit 6bba4471f0cc ("ocfs2: fix data corruption by
fallocate") will zero the EOF blocks when extend the file size, but it
isn't.  The problem happened on those EOF pages, before writeback, those
pages had DIRTY flag set and all buffer_head in them also had DIRTY flag
set, when writeback run by write_cache_pages(), DIRTY flag on the page
was cleared, but DIRTY flag on the buffer_head not.

When next write happened to those EOF pages, since buffer_head already
had DIRTY flag set, it would not mark page DIRTY again.  That made
writeback ignore them forever.  That will cause data corruption.  Even
directio write can't work because it will fail when trying to drop pages
caches before direct io, as it found the buffer_head for those pages
still had DIRTY flag set, then it will fall back to buffer io mode.

To make a summary of the issue, as writeback ingores EOF pages, once any
EOF page is generated, any write to it will only go to the page cache,
it will never be flushed to disk even file size extends and that page is
not EOF page any more.  The fix is to avoid zero EOF blocks with buffer
write.

The following code snippet from qemu-img could trigger the corruption.

  656   open("6b3711ae-3306-4bdd-823c-cf1c0060a095.conv.2", O_RDWR|O_DIRECT|O_CLOEXEC) = 11
  ...
  660   fallocate(11, FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE, 2275868672, 327680 <unfinished ...>
  660   fallocate(11, 0, 2275868672, 327680) = 0
  658   pwrite64(11, "

Link: https://lkml.kernel.org/r/20210722054923.24389-2-junxiao.bi@oracle.com
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-08-04 12:46:40 +02:00
..
cluster
dlm
dlmfs
acl.c
acl.h
alloc.c
alloc.h
aops.c
aops.h
blockcheck.c
blockcheck.h
buffer_head_io.c
buffer_head_io.h
dcache.c
dcache.h
dir.c
dir.h
dlmglue.c
dlmglue.h
export.c
export.h
extent_map.c
extent_map.h
file.c ocfs2: issue zeroout to EOF blocks 2021-08-04 12:46:40 +02:00
file.h
filecheck.c
filecheck.h
heartbeat.c
heartbeat.h
inode.c
inode.h
ioctl.c
ioctl.h
journal.c
journal.h
Kconfig
localalloc.c
localalloc.h
locks.c
locks.h
Makefile
mmap.c
mmap.h
move_extents.c
move_extents.h
namei.c
namei.h
ocfs1_fs_compat.h
ocfs2_fs.h
ocfs2_ioctl.h
ocfs2_lockid.h
ocfs2_lockingver.h
ocfs2_trace.h
ocfs2.h
quota_global.c
quota_local.c
quota.h
refcounttree.c
refcounttree.h
reservations.c
reservations.h
resize.c
resize.h
slot_map.c
slot_map.h
stack_o2cb.c
stack_user.c
stackglue.c
stackglue.h
suballoc.c
suballoc.h
super.c
super.h
symlink.c
symlink.h
sysfile.c
sysfile.h
uptodate.c
uptodate.h
xattr.c
xattr.h