Commit Graph

47 Commits

Author SHA1 Message Date
Chris Mason
61b4944018 Btrfs: Fix streaming read performance with checksumming on
Large streaming reads make for large bios, which means each entry on the
list async work queues represents a large amount of data.  IO
congestion throttling on the device was kicking in before the async
worker threads decided a single thread was busy and needed some help.

The end result was that a streaming read would result in a single CPU
running at 100% instead of balancing the work off to other CPUs.

This patch also changes the pre-IO checksum lookup done by reads to
work on a per-bio basis instead of a per-page.  This results in many
extra btree lookups on large streaming reads.  Doing the checksum lookup
right before bio submit allows us to reuse searches while processing
adjacent offsets.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25 11:04:05 -04:00
Yan
bcc63abbf3 Btrfs: implement memory reclaim for leaf reference cache
The memory reclaiming issue happens when snapshot exists. In that
case, some cache entries may not be used during old snapshot dropping,
so they will remain in the cache until umount.

The patch adds a field to struct btrfs_leaf_ref to record create time. Besides,
the patch makes all dead roots of a given snapshot linked together in order of
create time. After a old snapshot was completely dropped, we check the dead
root list and remove all cache entries created before the oldest dead root in
the list.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25 11:04:05 -04:00
Chris Mason
ed98b56a63 Btrfs: Take the csum mutex while reading checksums
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25 11:04:05 -04:00
Chris Mason
e5a2217ef6 Fix btrfs_wait_ordered_extent_range to properly wait
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25 11:04:05 -04:00
Chris Mason
7f3c74fb83 Btrfs: Keep extent mappings in ram until pending ordered extents are done
It was possible for stale mappings from disk to be used instead of the
new pending ordered extent.  This adds a flag to the extent map struct
to keep it pinned until the pending ordered extent is actually on disk.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25 11:04:05 -04:00
Chris Mason
3edf7d33f4 Btrfs: Handle data checksumming on bios that span multiple ordered extents
Data checksumming is done right before the bio is sent down the IO stack,
which means a single bio might span more than one ordered extent.  In
this case, the checksumming data is split between two ordered extents.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25 11:04:05 -04:00
Chris Mason
e6dcd2dc9c Btrfs: New data=ordered implementation
The old data=ordered code would force commit to wait until
all the data extents from the transaction were fully on disk.  This
introduced large latencies into the commit and stalled new writers
in the transaction for a long time.

The new code changes the way data allocations and extents work:

* When delayed allocation is filled, data extents are reserved, and
  the extent bit EXTENT_ORDERED is set on the entire range of the extent.
  A struct btrfs_ordered_extent is allocated an inserted into a per-inode
  rbtree to track the pending extents.

* As each page is written EXTENT_ORDERED is cleared on the bytes corresponding
  to that page.

* When all of the bytes corresponding to a single struct btrfs_ordered_extent
  are written, The previously reserved extent is inserted into the FS
  btree and into the extent allocation trees.  The checksums for the file
  data are also updated.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25 11:04:04 -04:00
Sage Weil
f2eb0a241f Btrfs: Clone file data ioctl
Add a new ioctl to clone file data

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25 11:04:02 -04:00
Chris Mason
e015640f9c Btrfs: Write bio checksumming outside the FS mutex
This significantly improves streaming write performance by allowing
concurrency in the data checksumming.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25 11:04:01 -04:00
Chris Mason
eb20978f31 Btrfs: Use KM_USERN instead of KM_IRQ during data summing
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25 11:04:00 -04:00
Chris Mason
2e1a992e31 Btrfs: Make sure bio pages are adjacent during bulk csumming
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25 11:04:00 -04:00
Chris Mason
6e92f5e651 Btrfs: While doing checksums on bios, cache the extent_buffer mapping
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25 11:04:00 -04:00
Chris Mason
065631f6dc Btrfs: checksum file data at bio submission time instead of during writepage
When we checkum file data during writepage, the checksumming is done one
page at a time, making it difficult to do bulk metadata modifications
to insert checksums for large ranges of the file at once.

This patch changes btrfs to checksum on a per-bio basis instead.  The
bios are checksummed before they are handed off to the block layer, so
each bio is contiguous and only has pages from the same inode.

Checksumming on a bio basis allows us to insert and modify the file
checksum items in large groups.  It also allows the checksumming to
be done more easily by async worker threads.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25 11:04:00 -04:00
Chris Mason
aadfeb6e39 Btrfs: Add some extra debugging around file data checksum failures
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25 11:03:59 -04:00
Chris Mason
179e29e488 Btrfs: Fix a number of inline extent problems that Yan Zheng reported.
The fixes do a number of things:

1) Most btrfs_drop_extent callers will try to leave the inline extents in
place.  It can truncate bytes off the beginning of the inline extent if
required.

2) writepage can now update the inline extent, allowing mmap writes to
go directly into the inline extent.

3) btrfs_truncate_in_transaction truncates inline extents

4) extent_map.c fixed to not merge inline extent mappings and hole
mappings together

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25 11:03:57 -04:00
Yan
b56baf5bed Minor fix for btrfs_csum_file_block.
Execution should goto label 'insert' when 'btrfs_next_leaf' return a
non-zero value, otherwise the parameter 'slot' for
'btrfs_item_key_to_cpu' may be out of bounds. The original codes jump
to  label 'insert' only when 'btrfs_next_leaf' return a negative
value.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25 11:03:57 -04:00
Chris Mason
f578d4bd7e Btrfs: Optimize csum insertion to create larger items when possible
This reduces the number of calls to btrfs_extend_item and greatly lowers
the cpu usage while writing large files.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25 11:03:57 -04:00
Chris Mason
ff79f8190b Btrfs: Add back file data checksumming
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25 11:03:57 -04:00
Chris Mason
db94535db7 Btrfs: Allow tree blocks larger than the page size
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25 11:03:56 -04:00
Chris Mason
5f39d397df Btrfs: Create extent_buffer interface for large blocksizes
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25 11:03:56 -04:00
Zach Brown
ec6b910fb3 Btrfs: trivial include fixups
Almost none of the files including module.h need to do so,
remove them.

Include sched.h in extent-tree.c to silence a warning about cond_resched()
being undeclared.

Signed-off-by: Zach Brown <zach.brown@oracle.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-07-11 10:00:37 -04:00
Chris Mason
54aa1f4dfd Btrfs: Audit callers and return codes to make sure -ENOSPC gets up the stack
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-06-22 14:16:25 -04:00
Chris Mason
8c2383c3dd Subject: Rework btrfs_file_write to only allocate while page locks are held
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-06-18 09:57:58 -04:00
Chris Mason
9ebefb180b Btrfs: patch queue: page_mkwrite
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-06-15 13:50:00 -04:00
Aneesh
f1ace244c8 btrfs: Code cleanup
Attaching below is some of the code cleanups that i came across while
reading the code.

a) alloc_path already calls init_path.
b) Mention that btrfs_inode is the in memory copy.Ext4 have ext4_inode_info as
the in memory copy ext4_inode as the disk copy

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-06-13 16:18:26 -04:00
Chris Mason
6cbd557078 Btrfs: add GPLv2
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-06-12 09:07:21 -04:00
Chris Mason
5af3981c18 Btrfs: printk fixes
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-06-12 07:50:13 -04:00
Chris Mason
84f54cfa78 Btrfs: 64 bit div fixes
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-06-12 07:43:08 -04:00
Chris Mason
1de037a43e Btrfs: fixup various fsx failures
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-05-29 15:17:08 -04:00
Chris Mason
3a68637562 Btrfs: sparse files!
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-05-24 13:35:57 -04:00
Chris Mason
509659cde5 Btrfs: switch to crc32c instead of sha256
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-05-10 12:36:17 -04:00
Chris Mason
236454dfff Btrfs: many file_write fixes, inline data
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-04-19 13:37:44 -04:00
Chris Mason
a429e51371 Btrfs: working file_write, reorganized key flags
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-04-18 16:15:28 -04:00
Chris Mason
70b2befd0c Btrfs: rework csums and extent item ordering
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-04-17 15:39:32 -04:00
Chris Mason
b18c668581 Btrfs: progress on file_write
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-04-17 13:26:50 -04:00
Chris Mason
6567e837df Btrfs: early work to file_write in big extents
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-04-16 09:22:45 -04:00
Chris Mason
d0dbc6245c Btrfs: drop owner and parentid
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-04-10 12:36:36 -04:00
Chris Mason
2da566edd8 Btrfs: csum_verify_file_block locking fix
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-04-02 15:43:21 -04:00
Chris Mason
5caf2a0029 Btrfs: dynamic allocation of path struct
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-04-02 11:20:42 -04:00
Chris Mason
d602557953 Btrfs: corruption hunt continues
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-03-30 14:27:56 -04:00
Chris Mason
f254e52c1c Btrfs: verify csums on read
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-03-29 15:15:27 -04:00
Chris Mason
9773a78868 Btrfs: byte offsets for file keys
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-03-27 11:26:26 -04:00
Chris Mason
71951f35a6 Btrfs: add generation field to file extent
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-03-27 09:16:29 -04:00
Chris Mason
dee26a9f7a btrfs_get_block, file read/write
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-03-26 16:00:06 -04:00
Chris Mason
2e635a2783 Btrfs: initial move to kernel module land
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-03-21 11:12:56 -04:00
Chris Mason
9f5fae2fe6 Btrfs: Add inode map, and the start of file extent items
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-03-20 14:38:32 -04:00
Chris Mason
1e1d27017c Btrfs: add inode item
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-03-15 19:03:33 -04:00