kernel_optimize_test

History

Dave Chinner 709da6a61a xfs: fix split buffer vector log recovery support A long time ago in a galaxy far away.... .. the was a commit made to fix some ilinux specific "fragmented buffer" log recovery problem: http://oss.sgi.com/cgi-bin/gitweb.cgi?p=archive/xfs-import.git;a=commitdiff;h=b29c0bece51da72fb3ff3b61391a391ea54e1603 That problem occurred when a contiguous dirty region of a buffer was split across across two pages of an unmapped buffer. It's been a long time since that has been done in XFS, and the changes to log the entire inode buffers for CRC enabled filesystems has re-introduced that corner case. And, of course, it turns out that the above commit didn't actually fix anything - it just ensured that log recovery is guaranteed to fail when this situation occurs. And now for the gory details. xfstest xfs/085 is failing with this assert: XFS (vdb): bad number of regions (0) in inode log format XFS: Assertion failed: 0, file: fs/xfs/xfs_log_recover.c, line: 1583 Largely undocumented factoid #1: Log recovery depends on all log buffer format items starting with this format: struct foo_log_format { __uint16_t type; __uint16_t size; .... As recoery uses the size field and assumptions about 32 bit alignment in decoding format items. So don't pay much attention to the fact log recovery thinks that it decoding an inode log format item - it just uses them to determine what the size of the item is. But why would it see a log format item with a zero size? Well, luckily enough xfs_logprint uses the same code and gives the same error, so with a bit of gdb magic, it turns out that it isn't a log format that is being decoded. What logprint tells us is this: Oper (130): tid: a0375e1a len: 28 clientid: TRANS flags: none BUF: #regs: 2 start blkno: 144 (0x90) len: 16 bmap size: 2 flags: 0x4000 Oper (131): tid: a0375e1a len: 4096 clientid: TRANS flags: none BUF DATA ---------------------------------------------------------------------------- Oper (132): tid: a0375e1a len: 4096 clientid: TRANS flags: none xfs_logprint: unknown log operation type (4e49) ********************************************************************** * ERROR: data block=2 * ********************************************************************** That we've got a buffer format item (oper 130) that has two regions; the format item itself and one dirty region. The subsequent region after the buffer format item and it's data is them what we are tripping over, and the first bytes of it at an inode magic number. Not a log opheader like there is supposed to be. That means there's a problem with the buffer format item. It's dirty data region is 4096 bytes, and it contains - you guessed it - initialised inodes. But inode buffers are 8k, not 4k, and we log them in their entirety. So something is wrong here. The buffer format item contains: (gdb) p /x (struct xfs_buf_log_format )in_f $22 = {blf_type = 0x123c, blf_size = 0x2, blf_flags = 0x4000, blf_len = 0x10, blf_blkno = 0x90, blf_map_size = 0x2, blf_data_map = {0xffffffff, 0xffffffff, .... }} Two regions, and a signle dirty contiguous region of 64 bits. 64 * 128 = 8k, so this should be followed by a single 8k region of data. And the blf_flags tell us that the type of buffer is a XFS_BLFT_DINO_BUF. It contains inodes. And because it doesn't have the XFS_BLF_INODE_BUF flag set, that means it's an inode allocation buffer. So, it should be followed by 8k of inode data. But we know that the next region has a header of: (gdb) p /x ohead $25 = {oh_tid = 0x1a5e37a0, oh_len = 0x100000, oh_clientid = 0x69, oh_flags = 0x0, oh_res2 = 0x0} and so be32_to_cpu(oh_len) = 0x1000 = 4096 bytes. It's simply not long enough to hold all the logged data. There must be another region. There is - there's a following opheader for another 4k of data that contains the other half of the inode cluster data - the one we assert fail on because it's not a log format header. So why is the second part of the data not being accounted to the correct buffer log format structure? It took a little more work with gdb to work out that the buffer log format structure was both expecting it to be there but hadn't accounted for it. It was at that point I went to the kernel code, as clearly this wasn't a bug in xfs_logprint and the kernel was writing bad stuff to the log. First port of call was the buffer item formatting code, and the discontiguous memory/contiguous dirty region handling code immediately stood out. I've wondered for a long time why the code had this comment in it: vecp->i_addr = xfs_buf_offset(bp, buffer_offset); vecp->i_len = nbits XFS_BLF_CHUNK; vecp->i_type = XLOG_REG_TYPE_BCHUNK; /* * You would think we need to bump the nvecs here too, but we do not * this number is used by recovery, and it gets confused by the boundary * split here * nvecs++; */ vecp++; And it didn't account for the extra vector pointer. The case being handled here is that a contiguous dirty region lies across a boundary that cannot be memcpy()d across, and so has to be split into two separate operations for xlog_write() to perform. What this code assumes is that what is written to the log is two consecutive blocks of data that are accounted in the buf log format item as the same contiguous dirty region and so will get decoded as such by the log recovery code. The thing is, xlog_write() knows nothing about this, and so just does it's normal thing of adding an opheader for each vector. That means the 8k region gets written to the log as two separate regions of 4k each, but because nvecs has not been incremented, the buf log format item accounts for only one of them. Hence when we come to log recovery, we process the first 4k region and then expect to come across a new item that starts with a log format structure of some kind that tells us whenteh next data is going to be. Instead, we hit raw buffer data and things go bad real quick. So, the commit from 2002 that commented out nvecs++ is just plain wrong. It breaks log recovery completely, and it would seem the only reason this hasn't been since then is that we don't log large contigous regions of multi-page unmapped buffers very often. Never would be a closer estimate, at least until the CRC code came along.... So, lets fix that by restoring the nvecs accounting for the extra region when we hit this case..... .... and there's the problemin log recovery it is apparently working around: XFS: Assertion failed: i == item->ri_total, file: fs/xfs/xfs_log_recover.c, line: 2135 Yup, xlog_recover_do_reg_buffer() doesn't handle contigous dirty regions being broken up into multiple regions by the log formatting code. That's an easy fix, though - if the number of contiguous dirty bits exceeds the length of the region being copied out of the log, only account for the number of dirty bits that region covers, and then loop again and copy more from the next region. It's a 2 line fix. Now xfstests xfs/085 passes, we have one less piece of mystery code, and one more important piece of knowledge about how to structure new log format items.. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>		2013-05-30 12:48:33 -05:00
..
Kconfig	xfs: introduce CONFIG_XFS_WARN	2013-05-07 18:45:36 -05:00
kmem.c
kmem.h
Makefile	xfs: split remote attribute code out	2013-04-27 12:49:32 -05:00
mrlock.h	xfs: introduce CONFIG_XFS_WARN	2013-05-07 18:45:36 -05:00
time.h
uuid.c
uuid.h	xfs: add CRC infrastructure	2012-11-19 20:11:24 -06:00
xfs_acl.c	userns: Pass a userns parameter into posix_acl_to_xattr and posix_acl_from_xattr	2012-09-18 01:01:35 -07:00
xfs_acl.h
xfs_ag.h	xfs: add CRC checks to the AGI	2013-04-21 14:57:43 -05:00
xfs_alloc_btree.c	xfs: introduce CONFIG_XFS_WARN	2013-05-07 18:45:36 -05:00
xfs_alloc_btree.h	xfs: add support for large btree blocks	2013-04-21 14:53:46 -05:00
xfs_alloc.c	xfs: Avoid pathological backwards allocation	2013-05-20 13:09:11 -05:00
xfs_alloc.h	xfs: convert buffer verifiers to an ops structure.	2012-11-15 21:35:12 -06:00
xfs_aops.c	xfs: fix sub-page blocksize data integrity writes	2013-05-20 14:14:25 -05:00
xfs_aops.h	Prefix IO_XX flags with XFS_IO_XX to avoid namespace colision.	2012-07-22 11:00:55 -05:00
xfs_attr_leaf.c	xfs: rework remote attr CRCs	2013-05-23 18:04:06 -05:00
xfs_attr_leaf.h	xfs: add CRCs to attr leaf blocks	2013-04-27 12:45:01 -05:00
xfs_attr_remote.c	xfs: rework remote attr CRCs	2013-05-23 18:04:06 -05:00
xfs_attr_remote.h	xfs: rework remote attr CRCs	2013-05-23 18:04:06 -05:00
xfs_attr_sf.h
xfs_attr.c	xfs: split remote attribute code out	2013-04-27 12:49:32 -05:00
xfs_attr.h	xfs: split remote attribute code out	2013-04-27 12:49:32 -05:00
xfs_bit.c
xfs_bit.h
xfs_bmap_btree.c	xfs: introduce CONFIG_XFS_WARN	2013-05-07 18:45:36 -05:00
xfs_bmap_btree.h	xfs: add support for large btree blocks	2013-04-21 14:53:46 -05:00
xfs_bmap.c	xfs: buffer type overruns blf_flags field	2013-04-27 13:01:58 -05:00
xfs_bmap.h	xfs: move allocation stack switch up to xfs_bmapi_allocate	2012-10-18 17:42:48 -05:00
xfs_btree.c	xfs: buffer type overruns blf_flags field	2013-04-27 13:01:58 -05:00
xfs_btree.h	xfs: introduce CONFIG_XFS_WARN	2013-05-07 18:45:36 -05:00
xfs_buf_item.c	xfs: fix split buffer vector log recovery support	2013-05-30 12:48:33 -05:00
xfs_buf_item.h	xfs: buffer type overruns blf_flags field	2013-04-27 13:01:58 -05:00
xfs_buf.c	xfs: rework remote attr CRCs	2013-05-23 18:04:06 -05:00
xfs_buf.h	xfs: use b_maps[] for discontiguous buffers	2013-01-16 16:07:11 -06:00
xfs_cksum.h	xfs: add CRC infrastructure	2012-11-19 20:11:24 -06:00
xfs_da_btree.c	xfs: xfs_da3_node_read_verify() doesn't handle XFS_ATTR3_LEAF_MAGIC	2013-05-20 16:32:30 -05:00
xfs_da_btree.h	xfs: add buffer types to directory and attribute buffers	2013-04-27 13:01:06 -05:00
xfs_dfrag.c	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2013-02-26 20:16:07 -08:00
xfs_dfrag.h
xfs_dinode.h	xfs: add version 3 inode format with CRCs	2013-04-21 15:03:33 -05:00
xfs_dir2_block.c	xfs: buffer type overruns blf_flags field	2013-04-27 13:01:58 -05:00
xfs_dir2_data.c	xfs: buffer type overruns blf_flags field	2013-04-27 13:01:58 -05:00
xfs_dir2_format.h	xfs: shortform directory offsets change for dir3 format	2013-04-27 12:24:32 -05:00
xfs_dir2_leaf.c	xfs: fix missing KM_NOFS tags to keep lockdep happy	2013-05-20 16:18:05 -05:00
xfs_dir2_node.c	xfs: introduce CONFIG_XFS_WARN	2013-05-07 18:45:36 -05:00
xfs_dir2_priv.h	xfs: add buffer types to directory and attribute buffers	2013-04-27 13:01:06 -05:00
xfs_dir2_sf.c	xfs: shortform directory offsets change for dir3 format	2013-04-27 12:24:32 -05:00
xfs_dir2.c	xfs: remove struct xfs_dabuf and infrastructure	2012-07-01 14:50:07 -05:00
xfs_dir2.h
xfs_discard.c	xfs: check for possible overflow in xfs_ioc_trim	2012-08-23 14:48:44 -05:00
xfs_discard.h
xfs_dquot_item.c
xfs_dquot_item.h
xfs_dquot.c	xfs: add CRC checks for quota blocks	2013-04-21 14:58:22 -05:00
xfs_dquot.h	xfs: xfs_dquot prealloc throttling watermarks and low free space	2013-03-22 16:06:30 -05:00
xfs_error.c	xfs: increase hexdump output in xfs_corruption_error	2013-04-21 14:48:41 -05:00
xfs_error.h
xfs_export.c	fs: encode_fh: return FILEID_INVALID if invalid fid_type	2013-02-26 02:46:10 -05:00
xfs_export.h
xfs_extent_busy.c
xfs_extent_busy.h
xfs_extfree_item.c	xfs: Don't reference the EFI after it is freed	2013-05-20 14:29:34 -05:00
xfs_extfree_item.h	xfs: don't free EFIs before the EFDs are committed	2013-04-05 13:25:35 -05:00
xfs_file.c	aio: don't include aio.h in sched.h	2013-05-07 20:16:25 -07:00
xfs_filestream.c
xfs_filestream.h
xfs_fs.h	xfs: add minimum file size filtering to eofblocks scan	2012-11-08 15:32:29 -06:00
xfs_fsops.c	xfs: add CRC checks to the AGI	2013-04-21 14:57:43 -05:00
xfs_fsops.h
xfs_globals.c	xfs: add background scanning to clear eofblocks inodes	2012-11-08 15:34:59 -06:00
xfs_ialloc_btree.c	xfs: introduce CONFIG_XFS_WARN	2013-05-07 18:45:36 -05:00
xfs_ialloc_btree.h	xfs: add support for large btree blocks	2013-04-21 14:53:46 -05:00
xfs_ialloc.c	xfs: buffer type overruns blf_flags field	2013-04-27 13:01:58 -05:00
xfs_ialloc.h	xfs: convert buffer verifiers to an ops structure.	2012-11-15 21:35:12 -06:00
xfs_icache.c	xfs: add background scanning to clear eofblocks inodes	2012-11-08 15:34:59 -06:00
xfs_icache.h	xfs: add background scanning to clear eofblocks inodes	2012-11-08 15:34:59 -06:00
xfs_inode_item.c	xfs: add version 3 inode format with CRCs	2013-04-21 15:03:33 -05:00
xfs_inode_item.h	xfs remove the XFS_TRANS_DEBUG routines	2012-12-17 16:29:00 -06:00
xfs_inode.c	xfs: introduce CONFIG_XFS_WARN	2013-05-07 18:45:36 -05:00
xfs_inode.h	xfs: add version 3 inode format with CRCs	2013-04-21 15:03:33 -05:00
xfs_inum.h
xfs_ioctl.c	xfs: fallback to vmalloc for large buffers in xfs_attrlist_by_handle	2013-05-07 18:56:38 -05:00
xfs_ioctl.h
xfs_ioctl32.c	xfs: fallback to vmalloc for large buffers in xfs_compat_attrlist_by_handle	2013-05-07 19:00:10 -05:00
xfs_ioctl32.h
xfs_iomap.c	xfs: xfs_iomap_prealloc_size() tracepoint	2013-03-22 16:07:56 -05:00
xfs_iomap.h
xfs_iops.c	xfs: remove xfs_flush_pages	2012-11-14 15:12:45 -06:00
xfs_iops.h
xfs_itable.c	xfs: convert buffer verifiers to an ops structure.	2012-11-15 21:35:12 -06:00
xfs_itable.h
xfs_linux.h	xfs: introduce CONFIG_XFS_WARN	2013-05-07 18:45:36 -05:00
xfs_log_cil.c	xfs: fix missing KM_NOFS tags to keep lockdep happy	2013-05-20 16:18:05 -05:00
xfs_log_priv.h	xfs: Remove the obsolete XLOG_CIL_HARD_SPACE_LIMIT() macros	2013-04-16 13:18:33 -05:00
xfs_log_recover.c	xfs: fix split buffer vector log recovery support	2013-05-30 12:48:33 -05:00
xfs_log_recover.h
xfs_log.c	xfs: rename random32() to prandom_u32()	2013-03-07 12:33:57 -06:00
xfs_log.h	xfs: xfs_quiesce_attr() should quiesce the log like unmount	2012-10-17 13:39:14 -05:00
xfs_message.c	xfs: introduce CONFIG_XFS_WARN	2013-05-07 18:45:36 -05:00
xfs_message.h	xfs: introduce CONFIG_XFS_WARN	2013-05-07 18:45:36 -05:00
xfs_mount.c	xfs: don't emit v5 superblock warnings on write	2013-05-30 12:24:19 -05:00
xfs_mount.h	xfs: add CRC checks to the superblock	2013-04-27 13:03:12 -05:00
xfs_mru_cache.c
xfs_mru_cache.h
xfs_qm_bhv.c	xfs: Remove boolean_t typedef completely.	2013-01-17 17:32:57 -06:00
xfs_qm_syscalls.c	xfs: avoid nesting transactions in xfs_qm_scall_setqlim()	2013-05-21 13:57:05 -05:00
xfs_qm.c	xfs: add CRC checks for quota blocks	2013-04-21 14:58:22 -05:00
xfs_qm.h	xfs: add CRC checks for quota blocks	2013-04-21 14:58:22 -05:00
xfs_quota_priv.h
xfs_quota.h	xfs: add CRC checks for quota blocks	2013-04-21 14:58:22 -05:00
xfs_quotaops.c	userns: Convert qutoactl	2012-09-18 01:01:39 -07:00
xfs_rename.c
xfs_rtalloc.c	xfs: uncached buffer reads need to return an error	2012-11-15 21:34:05 -06:00
xfs_rtalloc.h
xfs_sb.h	xfs: implement extended feature masks	2013-04-27 13:05:18 -05:00
xfs_stats.c
xfs_stats.h
xfs_super.c	fs: Limit sys_mount to only request filesystem modules.	2013-03-03 19:36:31 -08:00
xfs_super.h	xfs: xfs_sync_data is redundant.	2012-10-17 12:01:25 -05:00
xfs_symlink.c	xfs: fix incorrect remote symlink block count	2013-05-30 12:37:04 -05:00
xfs_symlink.h	xfs: add CRC checks to remote symlinks	2013-04-27 11:49:28 -05:00
xfs_sysctl.c	xfs: add background scanning to clear eofblocks inodes	2012-11-08 15:34:59 -06:00
xfs_sysctl.h	xfs: add background scanning to clear eofblocks inodes	2012-11-08 15:34:59 -06:00
xfs_trace.c	xfs: add CRCs to dir2/da node blocks	2013-04-27 12:33:38 -05:00
xfs_trace.h	xfs: xfs_iomap_prealloc_size() tracepoint	2013-03-22 16:07:56 -05:00
xfs_trans_ail.c	xfs remove the XFS_TRANS_DEBUG routines	2012-12-17 16:29:00 -06:00
xfs_trans_buf.c	xfs: buffer type overruns blf_flags field	2013-04-27 13:01:58 -05:00
xfs_trans_dquot.c	xfs: pass xfs_dquot to xfs_qm_adjust_dqlimits() instead of xfs_disk_dquot_t	2013-03-22 16:05:52 -05:00
xfs_trans_extfree.c
xfs_trans_inode.c	xfs remove the XFS_TRANS_DEBUG routines	2012-12-17 16:29:00 -06:00
xfs_trans_priv.h	xfs: re-enable xfsaild idle mode and fix associated races	2012-07-29 16:27:57 -05:00
xfs_trans_space.h
xfs_trans.c	xfs: refactor space log reservation for XFS_TRANS_ATTR_SET	2013-02-01 14:56:31 -06:00
xfs_trans.h	xfs: introduce CONFIG_XFS_WARN	2013-05-07 18:45:36 -05:00
xfs_types.h	xfs: Remove boolean_t typedef completely.	2013-01-17 17:32:57 -06:00
xfs_utils.c	xfs: remove the alloc_done argument to xfs_dialloc	2012-07-29 16:00:31 -05:00
xfs_utils.h
xfs_vnode.h
xfs_vnodeops.c	xfs: fix rounding in xfs_free_file_space	2013-05-20 14:25:50 -05:00
xfs_vnodeops.h	xfs: byte range granularity for XFS_IOC_ZERO_RANGE	2012-11-29 14:21:46 -06:00
xfs_xattr.c
xfs.h	xfs: introduce CONFIG_XFS_WARN	2013-05-07 18:45:36 -05:00