kernel_optimize_test

History

Zygo Blaxell e1699d2d7b btrfs: add missing memset while reading compressed inline extents This is a story about 4 distinct (and very old) btrfs bugs. Commit `c8b978188c` ("Btrfs: Add zlib compression support") added three data corruption bugs for inline extents (bugs #1-3). Commit `93c82d5750` ("Btrfs: zero page past end of inline file items") fixed bug #1: uncompressed inline extents followed by a hole and more extents could get non-zero data in the hole as they were read. The fix was to add a memset in btrfs_get_extent to zero out the hole. Commit `166ae5a418` ("btrfs: fix inline compressed read err corruption") fixed bug #2: compressed inline extents which contained non-zero bytes might be replaced with zero bytes in some cases. This patch removed an unhelpful memset from uncompress_inline, but the case where memset is required was missed. There is also a memset in the decompression code, but this only covers decompressed data that is shorter than the ram_bytes from the extent ref record. This memset doesn't cover the region between the end of the decompressed data and the end of the page. It has also moved around a few times over the years, so there's no single patch to refer to. This patch fixes bug #3: compressed inline extents followed by a hole and more extents could get non-zero data in the hole as they were read (i.e. bug #3 is the same as bug #1, but s/uncompressed/compressed/). The fix is the same: zero out the hole in the compressed case too, by putting a memset back in uncompress_inline, but this time with correct parameters. The last and oldest bug, bug #0, is the cause of the offending inline extent/hole/extent pattern. Bug #0 is a subtle and mostly-harmless quirk of behavior somewhere in the btrfs write code. In a few special cases, an inline extent and hole are allowed to persist where they normally would be combined with later extents in the file. A fast reproducer for bug #0 is presented below. A few offending extents are also created in the wild during large rsync transfers with the -S flag. A Linux kernel build (git checkout; make allyesconfig; make -j8) will produce a handful of offending files as well. Once an offending file is created, it can present different content to userspace each time it is read. Bug #0 is at least 4 and possibly 8 years old. I verified every vX.Y kernel back to v3.5 has this behavior. There are fossil records of this bug's effects in commits all the way back to v2.6.32. I have no reason to believe bug #0 wasn't present at the beginning of btrfs compression support in v2.6.29, but I can't easily test kernels that old to be sure. It is not clear whether bug #0 is worth fixing. A fix would likely require injecting extra reads into currently write-only paths, and most of the exceptional cases caused by bug #0 are already handled now. Whether we like them or not, bug #0's inline extents followed by holes are part of the btrfs de-facto disk format now, and we need to be able to read them without data corruption or an infoleak. So enough about bug #0, let's get back to bug #3 (this patch). An example of on-disk structure leading to data corruption found in the wild: item 61 key (606890 INODE_ITEM 0) itemoff 9662 itemsize 160 inode generation 50 transid 50 size 47424 nbytes 49141 block group 0 mode 100644 links 1 uid 0 gid 0 rdev 0 flags 0x0(none) item 62 key (606890 INODE_REF 603050) itemoff 9642 itemsize 20 inode ref index 3 namelen 10 name: DB_File.so item 63 key (606890 EXTENT_DATA 0) itemoff 8280 itemsize 1362 inline extent data size 1341 ram 4085 compress(zlib) item 64 key (606890 EXTENT_DATA 4096) itemoff 8227 itemsize 53 extent data disk byte 5367308288 nr 20480 extent data offset 0 nr 45056 ram 45056 extent compression(zlib) Different data appears in userspace during each read of the 11 bytes between 4085 and 4096. The extent in item 63 is not long enough to fill the first page of the file, so a memset is required to fill the space between item 63 (ending at 4085) and item 64 (beginning at 4096) with zero. Here is a reproducer from Liu Bo, which demonstrates another method of creating the same inline extent and hole pattern: Using 'page_poison=on' kernel command line (or enable CONFIG_PAGE_POISONING) run the following: # touch foo # chattr +c foo # xfs_io -f -c "pwrite -W 0 1000" foo # xfs_io -f -c "falloc 4 8188" foo # od -x foo # echo 3 >/proc/sys/vm/drop_caches # od -x foo This produce the following on my box: Correct output: file contains 1000 data bytes followed by zeros: 0000000 cdcd cdcd cdcd cdcd cdcd cdcd cdcd cdcd * 0001740 cdcd cdcd cdcd cdcd 0000 0000 0000 0000 `0001760` 0000 0000 0000 0000 0000 0000 0000 0000 * 0020000 Actual output: the data after the first 1000 bytes will be different each run: 0000000 cdcd cdcd cdcd cdcd cdcd cdcd cdcd cdcd * 0001740 cdcd cdcd cdcd cdcd 6c63 7400 635f 006d `0001760` 5f74 6f43 7400 435f 0053 5f74 7363 7400 0002000 435f 0056 5f74 6164 7400 645f 0062 5f74 (...) Signed-off-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org> Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: Chris Mason <clm@fb.com> Signed-off-by: Chris Mason <clm@fb.com>		2017-03-17 13:47:10 -07:00
..
tests	btrfs: Make get_extent_t take btrfs_inode	2017-02-28 11:30:11 +01:00
acl.c	posix_acl: Clear SGID bit when setting file permissions	2016-09-22 10:55:32 +02:00
async-thread.c	btrfs: fix crash when tracepoint arguments are freed by wq callbacks	2017-01-09 11:24:50 +01:00
async-thread.h	btrfs: limit async_work allocation and worker func duration	2016-12-13 11:01:30 -08:00
backref.c	btrfs: remove unused parameter from __add_inline_refs	2017-02-17 12:03:54 +01:00
backref.h
btrfs_inode.h	btrfs: make btrfs_inode_resume_unlocked_dio take btrfs_inode	2017-02-28 11:30:12 +01:00
check-integrity.c	btrfs: take an fs_info directly when the root is not used otherwise	2016-12-06 16:06:59 +01:00
check-integrity.h	btrfs: take an fs_info directly when the root is not used otherwise	2016-12-06 16:06:59 +01:00
compression.c	btrfs: derive maximum output size in the compression implementation	2017-02-28 14:26:36 +01:00
compression.h	btrfs: derive maximum output size in the compression implementation	2017-02-28 14:26:36 +01:00
ctree.c	Merge branch 'for-chris-4.11-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus-4.11	2017-02-28 14:35:09 -08:00
ctree.h	btrfs: Make btrfs_add_link take btrfs_inode	2017-02-28 11:30:11 +01:00
dedupe.h	btrfs: expand cow_file_range() to support in-band dedup and subpage-blocksize	2016-07-26 13:52:25 +02:00
delayed-inode.c	btrfs: Make btrfs_i_size_write take btrfs_inode	2017-02-28 11:30:06 +01:00
delayed-inode.h	btrfs: Make btrfs_inode_delayed_dir_index_count take btrfs_inode	2017-02-14 15:50:53 +01:00
delayed-ref.c	btrfs: qgroup: Move half of the qgroup accounting time out of commit trans	2017-02-17 12:03:55 +01:00
delayed-ref.h	Btrfs: pass delayed_refs directly to btrfs_find_delayed_ref_head	2017-02-14 15:50:59 +01:00
dev-replace.c	btrfs: constify device path passed to relevant helpers	2017-02-28 14:26:07 +01:00
dev-replace.h	btrfs: constify device path passed to relevant helpers	2017-02-28 14:26:07 +01:00
dir-item.c	btrfs: do proper error handling in btrfs_insert_xattr_item	2017-02-28 14:27:11 +01:00
disk-io.c	Merge branch 'for-chris-4.11-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus-4.11	2017-02-28 14:35:09 -08:00
disk-io.h	btrfs: constify input buffer of btrfs_csum_data	2017-02-28 14:26:07 +01:00
export.c	btrfs: Make btrfs_ino take a struct btrfs_inode	2017-02-14 15:50:51 +01:00
export.h
extent_io.c	Btrfs: fix regression in lock_delalloc_pages	2017-03-17 13:47:09 -07:00
extent_io.h	btrfs: add dummy callback for readpage_io_failed and drop checks	2017-02-28 14:29:24 +01:00
extent_map.c	btrfs: Fix slab accounting flags	2016-07-26 13:52:25 +02:00
extent_map.h	btrfs: cleanup, stop casting for extent_map->lookup everywhere	2016-01-15 19:22:28 +01:00
extent-tree.c	Merge branch 'for-chris-4.11-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus-4.11	2017-02-28 14:35:09 -08:00
file-item.c	Merge branch 'for-chris-4.11-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus-4.11	2017-02-28 14:35:09 -08:00
file.c	btrfs: Make get_extent_t take btrfs_inode	2017-02-28 11:30:11 +01:00
free-space-cache.c	btrfs: all btrfs_delalloc_release_metadata take btrfs_inode	2017-02-28 11:30:07 +01:00
free-space-cache.h	btrfs: free-space-cache, clean up unnecessary root arguments	2017-02-17 12:03:56 +01:00
free-space-tree.c	btrfs: remove unused parameter from clean_tree_block	2017-02-17 12:03:51 +01:00
free-space-tree.h	Btrfs: implement the free space B-tree	2015-12-17 12:16:47 -08:00
hash.c	btrfs: advertise which crc32c implementation is being used at module load	2016-06-06 14:08:28 +02:00
hash.h	btrfs: advertise which crc32c implementation is being used at module load	2016-06-06 14:08:28 +02:00
inode-item.c	btrfs: take an fs_info directly when the root is not used otherwise	2016-12-06 16:06:59 +01:00
inode-map.c	btrfs: all btrfs_delalloc_release_metadata take btrfs_inode	2017-02-28 11:30:07 +01:00
inode-map.h	Btrfs: Initialize btrfs_root->highest_objectid when loading tree root and subvolume roots	2016-01-15 19:25:02 +01:00
inode.c	btrfs: add missing memset while reading compressed inline extents	2017-03-17 13:47:10 -07:00
ioctl.c	btrfs: constify name of subvolume in creation helpers	2017-02-28 14:26:08 +01:00
Kconfig
locking.c	btrfs: cleanup, remove stray return statements	2016-01-07 14:30:52 +01:00
locking.h
lzo.c	btrfs: derive maximum output size in the compression implementation	2017-02-28 14:26:36 +01:00
Makefile	Btrfs: add free space tree sanity tests	2015-12-17 12:16:47 -08:00
math.h
ordered-data.c	btrfs: Make btrfs_lookup_ordered_range take btrfs_inode	2017-02-28 11:30:08 +01:00
ordered-data.h	btrfs: Make btrfs_lookup_ordered_range take btrfs_inode	2017-02-28 11:30:08 +01:00
orphan.c
print-tree.c	btrfs: take an fs_info directly when the root is not used otherwise	2016-12-06 16:06:59 +01:00
print-tree.h	btrfs: take an fs_info directly when the root is not used otherwise	2016-12-06 16:06:59 +01:00
props.c	btrfs: Make btrfs_ino take a struct btrfs_inode	2017-02-14 15:50:51 +01:00
props.h
qgroup.c	btrfs: qgroup: Move half of the qgroup accounting time out of commit trans	2017-02-17 12:03:55 +01:00
qgroup.h	btrfs: qgroup: Move half of the qgroup accounting time out of commit trans	2017-02-17 12:03:55 +01:00
raid56.c	btrfs: raid56: Remove unused variable in lock_stripe_add	2017-02-14 15:50:59 +01:00
raid56.h	btrfs: take an fs_info directly when the root is not used otherwise	2016-12-06 16:06:59 +01:00
rcu-string.h
reada.c	btrfs: take an fs_info directly when the root is not used otherwise	2016-12-06 16:06:59 +01:00
relocation.c	btrfs: Make btrfs_orphan_add take btrfs_inode	2017-02-28 11:30:10 +01:00
root-tree.c	Btrfs: constify struct btrfs_{,disk_}key wherever possible	2017-02-14 15:50:58 +01:00
scrub.c	btrfs: Make check_extent_to_block take btrfs_inode	2017-02-28 11:30:11 +01:00
send.c	Btrfs: incremental send, fix unnecessary hole writes for sparse files	2017-02-24 00:39:21 +00:00
send.h	Btrfs: use linux/sizes.h to represent constants	2016-01-07 14:38:02 +01:00
struct-funcs.c	btrfs: fix string and comment grammatical issues and typos	2016-05-25 22:35:14 +02:00
super.c	btrfs: remove unused parameter from btrfs_fill_super	2017-02-17 12:03:53 +01:00
sysfs.c	btrfs: convert printk(KERN_* to use pr_* calls	2016-09-26 18:08:44 +02:00
sysfs.h	btrfs: sysfs: introduce helper for syncing bits with sysfs files	2016-01-21 18:50:40 +01:00
transaction.c	btrfs: Make btrfs_i_size_write take btrfs_inode	2017-02-28 11:30:06 +01:00
transaction.h	btrfs: remove root parameter from transaction commit/end routines	2016-12-06 16:07:00 +01:00
tree-defrag.c	Btrfs: fix locking bugs when defragging leaves	2015-12-18 02:51:32 +00:00
tree-log.c	Merge branch 'for-chris-4.11-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus-4.11	2017-02-28 14:35:09 -08:00
tree-log.h	btrfs: Make btrfs_del_inode_ref take btrfs_inode	2017-02-14 15:50:54 +01:00
ulist.c	btrfs: ulist: rename ulist_fini to ulist_release	2017-02-17 12:03:50 +01:00
ulist.h	btrfs: ulist: rename ulist_fini to ulist_release	2017-02-17 12:03:50 +01:00
uuid-tree.c	btrfs: return the actual error value from from btrfs_uuid_tree_iterate	2016-12-19 18:08:15 +01:00
volumes.c	btrfs: handle allocation error in update_dev_stat_item	2017-02-28 14:27:11 +01:00
volumes.h	btrfs: constify device path passed to relevant helpers	2017-02-28 14:26:07 +01:00
xattr.c	btrfs: fix over-80 lines introduced by previous cleanups	2017-02-14 15:50:57 +01:00
xattr.h	btrfs: Switch to generic xattr handlers	2016-05-17 19:17:09 -04:00
zlib.c	btrfs: derive maximum output size in the compression implementation	2017-02-28 14:26:36 +01:00