kernel_optimize_test/fs
Zygo Blaxell e1699d2d7b btrfs: add missing memset while reading compressed inline extents
This is a story about 4 distinct (and very old) btrfs bugs.

Commit c8b978188c ("Btrfs: Add zlib compression support") added
three data corruption bugs for inline extents (bugs #1-3).

Commit 93c82d5750 ("Btrfs: zero page past end of inline file items")
fixed bug #1:  uncompressed inline extents followed by a hole and more
extents could get non-zero data in the hole as they were read.  The fix
was to add a memset in btrfs_get_extent to zero out the hole.

Commit 166ae5a418 ("btrfs: fix inline compressed read err corruption")
fixed bug #2:  compressed inline extents which contained non-zero bytes
might be replaced with zero bytes in some cases.  This patch removed an
unhelpful memset from uncompress_inline, but the case where memset is
required was missed.

There is also a memset in the decompression code, but this only covers
decompressed data that is shorter than the ram_bytes from the extent
ref record.  This memset doesn't cover the region between the end of the
decompressed data and the end of the page.  It has also moved around a
few times over the years, so there's no single patch to refer to.

This patch fixes bug #3:  compressed inline extents followed by a hole
and more extents could get non-zero data in the hole as they were read
(i.e. bug #3 is the same as bug #1, but s/uncompressed/compressed/).
The fix is the same:  zero out the hole in the compressed case too,
by putting a memset back in uncompress_inline, but this time with
correct parameters.

The last and oldest bug, bug #0, is the cause of the offending inline
extent/hole/extent pattern.  Bug #0 is a subtle and mostly-harmless quirk
of behavior somewhere in the btrfs write code.  In a few special cases,
an inline extent and hole are allowed to persist where they normally
would be combined with later extents in the file.

A fast reproducer for bug #0 is presented below.  A few offending extents
are also created in the wild during large rsync transfers with the -S
flag.  A Linux kernel build (git checkout; make allyesconfig; make -j8)
will produce a handful of offending files as well.  Once an offending
file is created, it can present different content to userspace each
time it is read.

Bug #0 is at least 4 and possibly 8 years old.  I verified every vX.Y
kernel back to v3.5 has this behavior.  There are fossil records of this
bug's effects in commits all the way back to v2.6.32.  I have no reason
to believe bug #0 wasn't present at the beginning of btrfs compression
support in v2.6.29, but I can't easily test kernels that old to be sure.

It is not clear whether bug #0 is worth fixing.  A fix would likely
require injecting extra reads into currently write-only paths, and most
of the exceptional cases caused by bug #0 are already handled now.

Whether we like them or not, bug #0's inline extents followed by holes
are part of the btrfs de-facto disk format now, and we need to be able
to read them without data corruption or an infoleak.  So enough about
bug #0, let's get back to bug #3 (this patch).

An example of on-disk structure leading to data corruption found in
the wild:

        item 61 key (606890 INODE_ITEM 0) itemoff 9662 itemsize 160
                inode generation 50 transid 50 size 47424 nbytes 49141
                block group 0 mode 100644 links 1 uid 0 gid 0
                rdev 0 flags 0x0(none)
        item 62 key (606890 INODE_REF 603050) itemoff 9642 itemsize 20
                inode ref index 3 namelen 10 name: DB_File.so
        item 63 key (606890 EXTENT_DATA 0) itemoff 8280 itemsize 1362
                inline extent data size 1341 ram 4085 compress(zlib)
        item 64 key (606890 EXTENT_DATA 4096) itemoff 8227 itemsize 53
                extent data disk byte 5367308288 nr 20480
                extent data offset 0 nr 45056 ram 45056
                extent compression(zlib)

Different data appears in userspace during each read of the 11 bytes
between 4085 and 4096.  The extent in item 63 is not long enough to
fill the first page of the file, so a memset is required to fill the
space between item 63 (ending at 4085) and item 64 (beginning at 4096)
with zero.

Here is a reproducer from Liu Bo, which demonstrates another method
of creating the same inline extent and hole pattern:

Using 'page_poison=on' kernel command line (or enable
CONFIG_PAGE_POISONING) run the following:

	# touch foo
	# chattr +c foo
	# xfs_io -f -c "pwrite -W 0 1000" foo
	# xfs_io -f -c "falloc 4 8188" foo
	# od -x foo
	# echo 3 >/proc/sys/vm/drop_caches
	# od -x foo

This produce the following on my box:

Correct output:  file contains 1000 data bytes followed
by zeros:

	0000000 cdcd cdcd cdcd cdcd cdcd cdcd cdcd cdcd
	*
	0001740 cdcd cdcd cdcd cdcd 0000 0000 0000 0000
	0001760 0000 0000 0000 0000 0000 0000 0000 0000
	*
	0020000

Actual output:  the data after the first 1000 bytes
will be different each run:

	0000000 cdcd cdcd cdcd cdcd cdcd cdcd cdcd cdcd
	*
	0001740 cdcd cdcd cdcd cdcd 6c63 7400 635f 006d
	0001760 5f74 6f43 7400 435f 0053 5f74 7363 7400
	0002000 435f 0056 5f74 6164 7400 645f 0062 5f74
	(...)

Signed-off-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: Chris Mason <clm@fb.com>
Signed-off-by: Chris Mason <clm@fb.com>
2017-03-17 13:47:10 -07:00
..
9p Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
adfs
affs vfs: remove ".readlink = generic_readlink" assignments 2016-12-09 16:45:04 +01:00
afs Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
autofs4 Merge uncontroversial parts of branch 'readlink' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs 2016-12-17 19:16:12 -08:00
befs befs: add NFS export support 2016-12-22 11:25:24 +00:00
bfs Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
btrfs btrfs: add missing memset while reading compressed inline extents 2017-03-17 13:47:10 -07:00
cachefiles
ceph ceph: fix bad endianness handling in parse_reply_info_extra 2017-01-18 17:58:45 +01:00
cifs cifs: initialize file_info_lock 2017-01-14 14:58:29 -06:00
coda vfs: remove ".readlink = generic_readlink" assignments 2016-12-09 16:45:04 +01:00
configfs Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
cramfs
crypto fscrypt: fix renaming and linking special files 2016-12-31 00:47:05 -05:00
debugfs
devpts
dlm ktime: Get rid of ktime_equal() 2016-12-25 17:21:23 +01:00
ecryptfs vfs: remove ".readlink = generic_readlink" assignments 2016-12-09 16:45:04 +01:00
efivarfs
efs Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
exofs exofs: don't mess with simple_write_{begin,end} 2016-12-10 14:25:19 -05:00
exportfs
ext2 dax: fix build warnings with FS_DAX and !FS_IOMAP 2017-01-24 16:26:14 -08:00
ext4 dax: fix build warnings with FS_DAX and !FS_IOMAP 2017-01-24 16:26:14 -08:00
f2fs block: Rename blk_queue_zone_size and bdev_zone_size 2017-01-12 07:58:32 -07:00
fat
freevxfs
fscache fscache: Fix dead object requeue 2017-01-31 13:23:09 -05:00
fuse fuse: fix time_to_jiffies nsec sanity check 2017-01-13 17:20:47 +01:00
gfs2 ktime: Cleanup ktime_set() usage 2016-12-25 17:21:22 +01:00
hfs Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
hfsplus Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
hostfs vfs: remove ".readlink = generic_readlink" assignments 2016-12-09 16:45:04 +01:00
hpfs
hugetlbfs Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
isofs Merge branch 'for-4.10/block' of git://git.kernel.dk/linux-block 2016-12-13 10:19:16 -08:00
jbd2 Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
jffs2 vfs: remove ".readlink = generic_readlink" assignments 2016-12-09 16:45:04 +01:00
jfs Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
kernfs Merge uncontroversial parts of branch 'readlink' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs 2016-12-17 19:16:12 -08:00
lockd
minix vfs: remove ".readlink = generic_readlink" assignments 2016-12-09 16:45:04 +01:00
ncpfs Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
nfs pNFS: Fix a reference leak in _pnfs_return_layout 2017-01-26 15:50:41 -05:00
nfs_common
nfsd nfsd: Revert "nfsd: special case truncates some more" 2017-02-09 20:49:20 -05:00
nilfs2 Merge uncontroversial parts of branch 'readlink' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs 2016-12-17 19:16:12 -08:00
nls
notify Merge branch 'stable-4.10' of git://git.infradead.org/users/pcmoore/audit 2017-01-05 23:06:06 -08:00
ntfs Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
ocfs2 ocfs2: fix crash caused by stale lvb with fsdlm plugin 2017-01-10 18:31:54 -08:00
omfs
openpromfs Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
orangefs Merge uncontroversial parts of branch 'readlink' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs 2016-12-17 19:16:12 -08:00
overlayfs ovl: fix possible use after free on redirect dir lookup 2017-01-18 15:19:54 +01:00
proc mm: fix KPF_SWAPCACHE in /proc/kpageflags 2017-02-07 12:08:32 -08:00
pstore pstore: don't OOPS when there are no ftrace zones 2017-02-09 11:49:49 -08:00
qnx4
qnx6
quota Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs 2016-12-19 08:23:53 -08:00
ramfs Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
reiserfs Merge uncontroversial parts of branch 'readlink' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs 2016-12-17 19:16:12 -08:00
romfs romfs: use different way to generate fsid for BLOCK or MTD 2017-01-24 16:26:14 -08:00
squashfs Merge uncontroversial parts of branch 'readlink' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs 2016-12-17 19:16:12 -08:00
sysfs
sysv vfs: remove ".readlink = generic_readlink" assignments 2016-12-09 16:45:04 +01:00
tracefs
ubifs ubifs: Fix journal replay wrt. xattr nodes 2017-01-17 14:35:58 +01:00
udf
ufs Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
xfs xfs: prevent quotacheck from overloading inode lru 2017-01-27 09:32:30 -08:00
aio.c aio: fix lock dep warning 2017-01-14 19:31:40 -05:00
anon_inodes.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
attr.c
bad_inode.c bad_inode: add missing i_op initializers 2016-12-09 11:57:43 +01:00
binfmt_aout.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
binfmt_elf_fdpic.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
binfmt_elf.c coredump: Ensure proper size of sparse core files 2017-01-14 19:32:40 -05:00
binfmt_em86.c
binfmt_flat.c
binfmt_misc.c
binfmt_script.c
block_dev.c block: fix use after free in __blkdev_direct_IO 2017-01-24 07:55:53 -07:00
buffer.c clean_bdev_aliases: Prevent cleaning blocks that are not in block range 2017-01-02 09:35:14 -07:00
char_dev.c
compat_binfmt_elf.c
compat_ioctl.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
compat.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
coredump.c coredump: Ensure proper size of sparse core files 2017-01-14 19:32:40 -05:00
dax.c fs: break out of iomap_file_buffered_write on fatal signals 2017-02-03 14:13:19 -08:00
dcache.c mnt: Protect the mountpoint hashtable with mount_lock 2017-01-10 13:34:43 +13:00
dcookies.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
direct-io.c do_direct_IO: Use inode->i_blkbits to compute block count to be cleaned 2017-01-10 13:29:54 -07:00
drop_caches.c
eventfd.c
eventpoll.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
exec.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
fcntl.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
fhandle.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
file_table.c constify alloc_file() 2016-12-05 19:01:16 -05:00
file.c
filesystems.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
fs_pin.c
fs_struct.c
fs-writeback.c fs/fs-writeback.c: remove redundant if check 2016-12-12 18:55:08 -08:00
inode.c
internal.h Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-12-17 18:44:00 -08:00
ioctl.c vfs: call vfs_clone_file_range() under freeze protection 2016-12-16 11:02:54 +01:00
iomap.c fs: break out of iomap_file_buffered_write on fatal signals 2017-02-03 14:13:19 -08:00
Kconfig dax: fix build warnings with FS_DAX and !FS_IOMAP 2017-01-24 16:26:14 -08:00
Kconfig.binfmt
libfs.c libfs: Modify mount_pseudo_xattr to be clear it is not a userspace mount 2017-01-10 13:34:55 +13:00
locks.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
Makefile logfs: remove from tree 2016-12-14 23:48:11 -05:00
mbcache.c mbcache: document that "find" functions only return reusable entries 2016-12-03 15:55:01 -05:00
mount.h vfs: add path_is_mountpoint() helper 2016-12-03 20:51:35 -05:00
mpage.c
namei.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
namespace.c mnt: Protect the mountpoint hashtable with mount_lock 2017-01-10 13:34:43 +13:00
no-block.c
nsfs.c
open.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
pipe.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
pnode.c reorganize do_make_slave() 2016-12-16 16:30:49 -05:00
pnode.h
posix_acl.c tmpfs: clear S_ISGID when setting posix ACLs 2017-01-10 01:29:48 -05:00
proc_namespace.c
read_write.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
readdir.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
select.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
seq_file.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
signalfd.c
splice.c splice: reinstate SIGPIPE/EPIPE handling 2016-12-21 10:59:34 -08:00
stack.c
stat.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
statfs.c vfs: misc struct path constification 2016-12-05 19:03:49 -05:00
super.c
sync.c
timerfd.c ktime: Cleanup ktime_set() usage 2016-12-25 17:21:22 +01:00
userfaultfd.c userfaultfd: fix SIGBUS resulting from false rwsem wakeups 2017-01-24 16:26:14 -08:00
utimes.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
xattr.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00