kernel_optimize_test/fs/xfs/Makefile
Christoph Hellwig 6bdcf26ade xfs: use a b+tree for the in-core extent list
Replace the current linear list and the indirection array for the in-core
extent list with a b+tree to avoid the need for larger memory allocations
for the indirection array when lots of extents are present.  The current
extent list implementations leads to heavy pressure on the memory
allocator when modifying files with a high extent count, and can lead
to high latencies because of that.

The replacement is a b+tree with a few quirks.  The leaf nodes directly
store the extent record in two u64 values.  The encoding is a little bit
different from the existing in-core extent records so that the start
offset and length which are required for lookups can be retreived with
simple mask operations.  The inner nodes store a 64-bit key containing
the start offset in the first half of the node, and the pointers to the
next lower level in the second half.  In either case we walk the node
from the beginninig to the end and do a linear search, as that is more
efficient for the low number of cache lines touched during a search
(2 for the inner nodes, 4 for the leaf nodes) than a binary search.
We store termination markers (zero length for the leaf nodes, an
otherwise impossible high bit for the inner nodes) to terminate the key
list / records instead of storing a count to use the available cache
lines as efficiently as possible.

One quirk of the algorithm is that while we normally split a node half and
half like usual btree implementations we just spill over entries added at
the very end of the list to a new node on its own.  This means we get a
100% fill grade for the common cases of bulk insertion when reading an
inode into memory, and when only sequentially appending to a file.  The
downside is a slightly higher chance of splits on the first random
insertions.

Both insert and removal manually recurse into the lower levels, but
the bulk deletion of the whole tree is still implemented as a recursive
function call, although one limited by the overall depth and with very
little stack usage in every iteration.

For the first few extents we dynamically grow the list from a single
extent to the next powers of two until we have a first full leaf block
and that building the actual tree.

The code started out based on the generic lib/btree.c code from Joern
Engel based on earlier work from Peter Zijlstra, but has since been
rewritten beyond recognition.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-11-06 11:53:41 -08:00

167 lines
4.0 KiB
Makefile

#
# Copyright (c) 2000-2005 Silicon Graphics, Inc.
# All Rights Reserved.
#
# This program is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License as
# published by the Free Software Foundation.
#
# This program is distributed in the hope that it would be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write the Free Software Foundation,
# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
#
ccflags-y += -I$(src) # needed for trace events
ccflags-y += -I$(src)/libxfs
ccflags-$(CONFIG_XFS_DEBUG) += -g
obj-$(CONFIG_XFS_FS) += xfs.o
# this one should be compiled first, as the tracing macros can easily blow up
xfs-y += xfs_trace.o
# build the libxfs code first
xfs-y += $(addprefix libxfs/, \
xfs_alloc.o \
xfs_alloc_btree.o \
xfs_attr.o \
xfs_attr_leaf.o \
xfs_attr_remote.o \
xfs_bit.o \
xfs_bmap.o \
xfs_bmap_btree.o \
xfs_btree.o \
xfs_da_btree.o \
xfs_da_format.o \
xfs_defer.o \
xfs_dir2.o \
xfs_dir2_block.o \
xfs_dir2_data.o \
xfs_dir2_leaf.o \
xfs_dir2_node.o \
xfs_dir2_sf.o \
xfs_dquot_buf.o \
xfs_ialloc.o \
xfs_ialloc_btree.o \
xfs_iext_tree.o \
xfs_inode_fork.o \
xfs_inode_buf.o \
xfs_log_rlimit.o \
xfs_ag_resv.o \
xfs_rmap.o \
xfs_rmap_btree.o \
xfs_refcount.o \
xfs_refcount_btree.o \
xfs_sb.o \
xfs_symlink_remote.o \
xfs_trans_resv.o \
)
# xfs_rtbitmap is shared with libxfs
xfs-$(CONFIG_XFS_RT) += $(addprefix libxfs/, \
xfs_rtbitmap.o \
)
# highlevel code
xfs-y += xfs_aops.o \
xfs_attr_inactive.o \
xfs_attr_list.o \
xfs_bmap_util.o \
xfs_buf.o \
xfs_dir2_readdir.o \
xfs_discard.o \
xfs_error.o \
xfs_export.o \
xfs_extent_busy.o \
xfs_file.o \
xfs_filestream.o \
xfs_fsmap.o \
xfs_fsops.o \
xfs_globals.o \
xfs_icache.o \
xfs_ioctl.o \
xfs_iomap.o \
xfs_iops.o \
xfs_inode.o \
xfs_itable.o \
xfs_message.o \
xfs_mount.o \
xfs_mru_cache.o \
xfs_reflink.o \
xfs_stats.o \
xfs_super.o \
xfs_symlink.o \
xfs_sysfs.o \
xfs_trans.o \
xfs_xattr.o \
kmem.o
# low-level transaction/log code
xfs-y += xfs_log.o \
xfs_log_cil.o \
xfs_bmap_item.o \
xfs_buf_item.o \
xfs_extfree_item.o \
xfs_icreate_item.o \
xfs_inode_item.o \
xfs_refcount_item.o \
xfs_rmap_item.o \
xfs_log_recover.o \
xfs_trans_ail.o \
xfs_trans_bmap.o \
xfs_trans_buf.o \
xfs_trans_extfree.o \
xfs_trans_inode.o \
xfs_trans_refcount.o \
xfs_trans_rmap.o \
# optional features
xfs-$(CONFIG_XFS_QUOTA) += xfs_dquot.o \
xfs_dquot_item.o \
xfs_trans_dquot.o \
xfs_qm_syscalls.o \
xfs_qm_bhv.o \
xfs_qm.o \
xfs_quotaops.o
# xfs_rtbitmap is shared with libxfs
xfs-$(CONFIG_XFS_RT) += xfs_rtalloc.o
xfs-$(CONFIG_XFS_POSIX_ACL) += xfs_acl.o
xfs-$(CONFIG_SYSCTL) += xfs_sysctl.o
xfs-$(CONFIG_COMPAT) += xfs_ioctl32.o
xfs-$(CONFIG_EXPORTFS_BLOCK_OPS) += xfs_pnfs.o
# online scrub/repair
ifeq ($(CONFIG_XFS_ONLINE_SCRUB),y)
# Tracepoints like to blow up, so build that before everything else
xfs-y += $(addprefix scrub/, \
trace.o \
agheader.o \
alloc.o \
attr.o \
bmap.o \
btree.o \
common.o \
dabtree.o \
dir.o \
ialloc.o \
inode.o \
parent.o \
refcount.o \
rmap.o \
scrub.o \
symlink.o \
)
xfs-$(CONFIG_XFS_RT) += scrub/rtbitmap.o
xfs-$(CONFIG_XFS_QUOTA) += scrub/quota.o
endif