kernel_optimize_test

Author	SHA1	Message	Date
Jan Kara	d9b22cf9f5	ext4: fix stripe-unaligned allocations When a filesystem is created using: mkfs.ext4 -b 4096 -E stride=512 <dev> and we try to allocate 64MB extent, we will end up directly in ext4_mb_complex_scan_group(). This is because the request is detected as power-of-two allocation (so we start in ext4_mb_regular_allocator() with ac_criteria == 0) however the check before ext4_mb_simple_scan_group() refuses the direct buddy scan because the allocation request is too large. Since cr == 0, the check whether we should use ext4_mb_scan_aligned() fails as well and we fall back to ext4_mb_complex_scan_group(). Fix the problem by checking for upper limit on power-of-two requests directly when detecting them. Reported-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2017-02-10 00:50:56 -05:00
Christoph Hellwig	168316db35	dax: assert that i_rwsem is held exclusive for writes Make sure all callers follow the same locking protocol, given that DAX transparantly replaced the normal buffered I/O path. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>	2017-02-08 14:43:13 -05:00
Christoph Hellwig	ff5462e39c	ext4: fix DAX write locking Unlike O_DIRECT DAX is not an optional opt-in feature selected by the application, so we'll have to provide the traditional synchronіzation of overlapping writes as we do for buffered writes. This was broken historically for DAX, but got fixed for ext2 and XFS as part of the iomap conversion. Fix up ext4 as well. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>	2017-02-08 14:39:27 -05:00
Theodore Ts'o	783d948544	ext4: add EXT4_IOC_GOINGDOWN ioctl This ioctl is modeled after the xfs's XFS_IOC_GOINGDOWN ioctl. (In fact, it uses the same code points.) Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2017-02-05 19:47:14 -05:00
Theodore Ts'o	0db1ff222d	ext4: add shutdown bit and check for it Add a shutdown bit that will cause ext4 processing to fail immediately with EIO. Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2017-02-05 01:28:48 -05:00
Theodore Ts'o	9549a168bd	ext4: rename s_resize_flags to s_ext4_flags We are currently using one bit in s_resize_flags; rename it in order to allow more of the bits in that unsigned long for other purposes. Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2017-02-05 01:27:48 -05:00
Theodore Ts'o	4753d8a24d	ext4: return EROFS if device is r/o and journal replay is needed If the file system requires journal recovery, and the device is read-ony, return EROFS to the mount system call. This allows xfstests generic/050 to pass. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org	2017-02-05 01:26:48 -05:00
Theodore Ts'o	97abd7d4b5	ext4: preserve the needs_recovery flag when the journal is aborted If the journal is aborted, the needs_recovery feature flag should not be removed. Otherwise, it's the journal might not get replayed and this could lead to more data getting lost. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org	2017-02-04 23:38:06 -05:00
Theodore Ts'o	e112666b49	jbd2: don't leak modified metadata buffers on an aborted journal If the journal has been aborted, we shouldn't mark the underlying buffer head as dirty, since that will cause the metadata block to get modified. And if the journal has been aborted, we shouldn't allow this since it will almost certainly lead to a corrupted file system. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org	2017-02-04 23:14:19 -05:00
Theodore Ts'o	eb5efbcb76	ext4: fix inline data error paths The write_end() function must always unlock the page and drop its ref count, even on an error. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org	2017-02-04 23:04:00 -05:00
Jason A. Donenfeld	1c83a9aab8	ext4: move halfmd4 into hash.c directly The "half md4" transform should not be used by any new code. And fortunately, it's only used now by ext4. Since ext4 supports several hashing methods, at some point it might be desirable to move to something like SipHash. As an intermediate step, remove half md4 from cryptohash.h and lib, and make it just a local function in ext4's hash.c. There's precedent for doing this; the other function ext can use for its hashes -- TEA -- is also implemented in the same place. Also, by being a local function, this might allow gcc to perform some additional optimizations. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2017-02-02 11:52:14 -05:00
Eric Biggers	dd01b690f8	ext4: fix use-after-iput when fscrypt contexts are inconsistent In the case where the child's encryption context was inconsistent with its parent directory, we were using inode->i_sb and inode->i_ino after the inode had already been iput(). Fix this by doing the iput() in the correct places. Note: only ext4 had this bug, not f2fs and ubifs. Fixes: `d9cdc90331` ("ext4 crypto: enforce context consistency") Cc: stable@vger.kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2017-02-01 21:07:11 -05:00
Sahitya Tummala	dbfcef6b0f	jbd2: fix use after free in kjournald2() Below is the synchronization issue between unmount and kjournald2 contexts, which results into use after free issue in kjournald2(). Fix this issue by using journal->j_state_lock to synchronize the wait_event() done in journal_kill_thread() and the wake_up() done in kjournald2(). TASK 1: umount cmd: \|--jbd2_journal_destroy() { \|--journal_kill_thread() { write_lock(&journal->j_state_lock); journal->j_flags \|= JBD2_UNMOUNT; ... write_unlock(&journal->j_state_lock); wake_up(&journal->j_wait_commit); TASK 2 wakes up here: kjournald2() { ... checks JBD2_UNMOUNT flag and calls goto end-loop; ... end_loop: write_unlock(&journal->j_state_lock); journal->j_task = NULL; --> If this thread gets pre-empted here, then TASK 1 wait_event will exit even before this thread is completely done. wait_event(journal->j_wait_done_commit, journal->j_task == NULL); ... write_lock(&journal->j_state_lock); write_unlock(&journal->j_state_lock); } \|--kfree(journal); } } wake_up(&journal->j_wait_done_commit); --> this step now results into use after free issue. } Signed-off-by: Sahitya Tummala <stummala@codeaurora.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2017-02-01 20:49:35 -05:00
Jan Kara	3b136499e9	ext4: fix data corruption in data=journal mode ext4_journalled_write_end() did not propely handle all the cases when generic_perform_write() did not copy all the data into the target page and could mark buffers with uninitialized contents as uptodate and dirty leading to possible data corruption (which would be quickly fixed by generic_perform_write() retrying the write but still). Fix the problem by carefully handling the case when the page that is written to is not uptodate. CC: stable@vger.kernel.org Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2017-01-27 14:35:38 -05:00
Jan Kara	cd648b8a8f	ext4: trim allocation requests to group size If filesystem groups are artifically small (using parameter -g to mkfs.ext4), ext4_mb_normalize_request() can result in a request that is larger than a block group. Trim the request size to not confuse allocation code. Reported-by: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org	2017-01-27 14:34:30 -05:00
Theodore Ts'o	43c73221b3	ext4: replace BUG_ON with WARN_ON in mb_find_extent() The last BUG_ON in mb_find_extent() is apparently triggering in some rare cases. Most of the time it indicates a bug in the buddy bitmap algorithms, but there are some weird cases where it can trigger when buddy bitmap is still in memory, but the block bitmap has to be read from disk, and there is disk or memory corruption such that the block bitmap and the buddy bitmap are out of sync. Google-Bug-Id: #33702157 Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2017-01-22 19:35:52 -05:00
Theodore Ts'o	01daf94525	ext4: propagate error values from ext4_inline_data_truncate() Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2017-01-22 19:35:49 -05:00
Theodore Ts'o	b907f2d519	ext4: avoid calling ext4_mark_inode_dirty() under unneeded semaphores There is no need to call ext4_mark_inode_dirty while holding xattr_sem or i_data_sem, so where it's easy to avoid it, move it out from the critical region. Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2017-01-11 22:14:49 -05:00
Theodore Ts'o	c755e25135	ext4: fix deadlock between inline_data and ext4_expand_extra_isize_ea() The xattr_sem deadlock problems fixed in commit `2e81a4eeed`: "ext4: avoid deadlock when expanding inode size" didn't include the use of xattr_sem in fs/ext4/inline.c. With the addition of project quota which added a new extra inode field, this exposed deadlocks in the inline_data code similar to the ones fixed by `2e81a4eeed`. The deadlock can be reproduced via: dmesg -n 7 mke2fs -t ext4 -O inline_data -Fq -I 256 /dev/vdc 32768 mount -t ext4 -o debug_want_extra_isize=24 /dev/vdc /vdc mkdir /vdc/a umount /vdc mount -t ext4 /dev/vdc /vdc echo foo > /vdc/a/foo and looks like this: [ 11.158815] [ 11.160276] ============================================= [ 11.161960] [ INFO: possible recursive locking detected ] [ 11.161960] 4.10.0-rc3-00015-g011b30a8a3cf #160 Tainted: G W [ 11.161960] --------------------------------------------- [ 11.161960] bash/2519 is trying to acquire lock: [ 11.161960] (&ei->xattr_sem){++++..}, at: [<c1225a4b>] ext4_expand_extra_isize_ea+0x3d/0x4cd [ 11.161960] [ 11.161960] but task is already holding lock: [ 11.161960] (&ei->xattr_sem){++++..}, at: [<c1227941>] ext4_try_add_inline_entry+0x3a/0x152 [ 11.161960] [ 11.161960] other info that might help us debug this: [ 11.161960] Possible unsafe locking scenario: [ 11.161960] [ 11.161960] CPU0 [ 11.161960] ---- [ 11.161960] lock(&ei->xattr_sem); [ 11.161960] lock(&ei->xattr_sem); [ 11.161960] [ 11.161960] * DEADLOCK * [ 11.161960] [ 11.161960] May be due to missing lock nesting notation [ 11.161960] [ 11.161960] 4 locks held by bash/2519: [ 11.161960] #0: (sb_writers#3){.+.+.+}, at: [<c11a2414>] mnt_want_write+0x1e/0x3e [ 11.161960] #1: (&type->i_mutex_dir_key){++++++}, at: [<c119508b>] path_openat+0x338/0x67a [ 11.161960] #2: (jbd2_handle){++++..}, at: [<c123314a>] start_this_handle+0x582/0x622 [ 11.161960] #3: (&ei->xattr_sem){++++..}, at: [<c1227941>] ext4_try_add_inline_entry+0x3a/0x152 [ 11.161960] [ 11.161960] stack backtrace: [ 11.161960] CPU: 0 PID: 2519 Comm: bash Tainted: G W 4.10.0-rc3-00015-g011b30a8a3cf #160 [ 11.161960] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.1-1 04/01/2014 [ 11.161960] Call Trace: [ 11.161960] dump_stack+0x72/0xa3 [ 11.161960] __lock_acquire+0xb7c/0xcb9 [ 11.161960] ? kvm_clock_read+0x1f/0x29 [ 11.161960] ? __lock_is_held+0x36/0x66 [ 11.161960] ? __lock_is_held+0x36/0x66 [ 11.161960] lock_acquire+0x106/0x18a [ 11.161960] ? ext4_expand_extra_isize_ea+0x3d/0x4cd [ 11.161960] down_write+0x39/0x72 [ 11.161960] ? ext4_expand_extra_isize_ea+0x3d/0x4cd [ 11.161960] ext4_expand_extra_isize_ea+0x3d/0x4cd [ 11.161960] ? _raw_read_unlock+0x22/0x2c [ 11.161960] ? jbd2_journal_extend+0x1e2/0x262 [ 11.161960] ? __ext4_journal_get_write_access+0x3d/0x60 [ 11.161960] ext4_mark_inode_dirty+0x17d/0x26d [ 11.161960] ? ext4_add_dirent_to_inline.isra.12+0xa5/0xb2 [ 11.161960] ext4_add_dirent_to_inline.isra.12+0xa5/0xb2 [ 11.161960] ext4_try_add_inline_entry+0x69/0x152 [ 11.161960] ext4_add_entry+0xa3/0x848 [ 11.161960] ? __brelse+0x14/0x2f [ 11.161960] ? _raw_spin_unlock_irqrestore+0x44/0x4f [ 11.161960] ext4_add_nondir+0x17/0x5b [ 11.161960] ext4_create+0xcf/0x133 [ 11.161960] ? ext4_mknod+0x12f/0x12f [ 11.161960] lookup_open+0x39e/0x3fb [ 11.161960] ? __wake_up+0x1a/0x40 [ 11.161960] ? lock_acquire+0x11e/0x18a [ 11.161960] path_openat+0x35c/0x67a [ 11.161960] ? sched_clock_cpu+0xd7/0xf2 [ 11.161960] do_filp_open+0x36/0x7c [ 11.161960] ? _raw_spin_unlock+0x22/0x2c [ 11.161960] ? __alloc_fd+0x169/0x173 [ 11.161960] do_sys_open+0x59/0xcc [ 11.161960] SyS_open+0x1d/0x1f [ 11.161960] do_int80_syscall_32+0x4f/0x61 [ 11.161960] entry_INT80_32+0x2f/0x2f [ 11.161960] EIP: 0xb76ad469 [ 11.161960] EFLAGS: 00000286 CPU: 0 [ 11.161960] EAX: ffffffda EBX: 08168ac8 ECX: 00008241 EDX: 000001b6 [ 11.161960] ESI: b75e46bc EDI: b7755000 EBP: bfbdb108 ESP: bfbdafc0 [ 11.161960] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b Cc: stable@vger.kernel.org # 3.10 (requires `2e81a4eeed` as a prereq) Reported-by: George Spelvin <linux@sciencehorizons.net> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2017-01-11 21:50:46 -05:00
Theodore Ts'o	670e9875eb	ext4: add debug_want_extra_isize mount option In order to test the inode extra isize expansion code, it is useful to be able to easily create file systems that have inodes with extra isize values smaller than the current desired value. Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2017-01-11 15:32:22 -05:00
Roman Pen	03e916fa8b	ext4: do not polute the extents cache while shifting extents Inside ext4_ext_shift_extents() function ext4_find_extent() is called without EXT4_EX_NOCACHE flag, which should prevent cache population. This leads to oudated offsets in the extents tree and wrong blocks afterwards. Patch fixes the problem providing EXT4_EX_NOCACHE flag for each ext4_find_extents() call inside ext4_ext_shift_extents function. Fixes: `331573febb` Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: Namjae Jeon <namjae.jeon@samsung.com> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: stable@vger.kernel.org	2017-01-08 21:00:35 -05:00
Roman Pen	2a9b8cba62	ext4: Include forgotten start block on fallocate insert range While doing 'insert range' start block should be also shifted right. The bug can be easily reproduced by the following test: ptr = malloc(4096); assert(ptr); fd = open("./ext4.file", O_CREAT \| O_TRUNC \| O_RDWR, 0600); assert(fd >= 0); rc = fallocate(fd, 0, 0, 8192); assert(rc == 0); for (i = 0; i < 2048; i++) ((unsigned short )ptr + i) = 0xbeef; rc = pwrite(fd, ptr, 4096, 0); assert(rc == 4096); rc = pwrite(fd, ptr, 4096, 4096); assert(rc == 4096); for (block = 2; block < 1000; block++) { rc = fallocate(fd, FALLOC_FL_INSERT_RANGE, 4096, 4096); assert(rc == 0); for (i = 0; i < 2048; i++) ((unsigned short )ptr + i) = block; rc = pwrite(fd, ptr, 4096, 4096); assert(rc == 4096); } Because start block is not included in the range the hole appears at the wrong offset (just after the desired offset) and the following pwrite() overwrites already existent block, keeping hole untouched. Simple way to verify wrong behaviour is to check zeroed blocks after the test: $ hexdump ./ext4.file \| grep '0000 0000' The root cause of the bug is a wrong range (start, stop], where start should be inclusive, i.e. [start, stop]. This patch fixes the problem by including start into the range. But not to break left shift (range collapse) stop points to the beginning of the a block, not to the end. The other not obvious change is an iterator check on validness in a main loop. Because iterator is unsigned the following corner case should be considered with care: insert a block at 0 offset, when stop variables overflows and never becomes less than start, which is 0. To handle this special case iterator is set to NULL to indicate that end of the loop is reached. Fixes: `331573febb` Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: Namjae Jeon <namjae.jeon@samsung.com> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: stable@vger.kernel.org	2017-01-08 20:59:35 -05:00
Theodore Ts'o	56735be053	Merge branch 'fscrypt' into d	2017-01-08 20:57:35 -05:00
Eric Biggers	a5d431eff2	fscrypt: make fscrypt_operations.key_prefix a string There was an unnecessary amount of complexity around requesting the filesystem-specific key prefix. It was unclear why; perhaps it was envisioned that different instances of the same filesystem type could use different key prefixes, or that key prefixes could be binary. However, neither of those things were implemented or really make sense at all. So simplify the code by making key_prefix a const char *. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Richard Weinberger <richard@nod.at> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2017-01-08 01:03:41 -05:00
Theodore Ts'o	173b8439e1	ext4: don't allow encrypted operations without keys While we allow deletes without the key, the following should not be permitted: # cd /vdc/encrypted-dir-without-key # ls -l total 4 -rw-r--r-- 1 root root 0 Dec 27 22:35 6,LKNRJsp209FbXoSvJWzB -rw-r--r-- 1 root root 286 Dec 27 22:35 uRJ5vJh9gE7vcomYMqTAyD # mv uRJ5vJh9gE7vcomYMqTAyD 6,LKNRJsp209FbXoSvJWzB Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2017-01-08 00:58:23 -05:00
Linus Torvalds	6989606a72	Merge branch 'stable-4.10' of git://git.infradead.org/users/pcmoore/audit Pull audit fixes from Paul Moore: "Two small fixes relating to audit's use of fsnotify. The first patch plugs a leak and the second fixes some lock shenanigans. The patches are small and I banged on this for an afternoon with our testsuite and didn't see anything odd" * 'stable-4.10' of git://git.infradead.org/users/pcmoore/audit: audit: Fix sleep in atomic fsnotify: Remove fsnotify_duplicate_mark()	2017-01-05 23:06:06 -08:00
Linus Torvalds	e02003b515	Contained in this update: - Fixes for crashes and double-cleanup errors - XFS maintainership handover - Fix to prevent absurdly large block reservations - Fix broken sysfs getter/setters -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABCgAGBQJYbH10AAoJEPh/dxk0SrTrozkQAIo1ikGMKI0x52izCJyKA+HP gUfqqFnFDOsz0pWBDhRMBBGXbQlgU6uPRMicTdkNSRq3BnKPyfC7EK8WkXlOGT8A JGQhz96sfr4gShuo4lul2nhgfThOyL7M3Vu+xHL7wgrrOV1Y4Haz2m2FKYzNentj 2ca2WeNCcuEQLdWwtwkeOnsjnC2gV9cA5pRsx59rktr/t6fU1Q+AcBSjpwDX43op cQ0uTqiBB51pWe2tbw+VHSsYzyakkjsNsiYCZNOghN+p/5g4QpgT7LgiO9r5CVvu UhAXv6285K4pVsD7cIy2yHvSFbYNz1khM1Tvv26npg72NoQxMXhxAaPkke03u2lT 4nakgqLpxybrrnviVsJy2VbnmVYN5mcXSV2XTmRdAtxZmrW9C6H2PCs9pQYSV1Ji H6UT7kqusqp4ceQUBb5dtFCaUndNGnLxKDrmNOz/It7PPpI0zSRMiv+IC+qrQ/ci oEExMUtRLLah5zXOXQej2IchZmP2SBXm/a2JhmQywTIyLiSPoDwKsNET9zT4CiXw CHQj3spGh8uE7rh0QbujWjD7odE1IWDvOZ9Zh5vXVrwqTIeVOlxjfBqO/9J0z2Ak z7WpRrQ2IhWMwD6pHg0oDUD9oB02LxnW24telQs35wgqgdZm5TfG6BaiKutzA7KH nVlR22SQ4m+eTD8+IFMO =TgC5 -----END PGP SIGNATURE----- Merge tag 'xfs-for-linus-4.10-rc3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux Pull xfs fixes from Darrick Wong: - fixes for crashes and double-cleanup errors - XFS maintainership handover - fix to prevent absurdly large block reservations - fix broken sysfs getter/setters * tag 'xfs-for-linus-4.10-rc3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: fix max_retries _show and _store functions xfs: update MAINTAINERS xfs: fix crash and data corruption due to removal of busy COW extents xfs: use the actual AG length when reserving blocks xfs: fix double-cleanup when CUI recovery fails	2017-01-04 18:33:35 -08:00
Linus Torvalds	62f8c40592	Merge branch 'for-linus' of git://git.kernel.dk/linux-block Pull block layer fixes from Jens Axboe: "A set of fixes for the current series, one fixing a regression with block size < page cache size in the alias series from Jan. Outside of that, two small cleanups for wbt from Bart, a nvme pull request from Christoph, and a few small fixes of documentation updates" * 'for-linus' of git://git.kernel.dk/linux-block: block: fix up io_poll documentation block: Avoid that sparse complains about context imbalance in __wbt_wait() block: Make wbt_wait() definition consistent with declaration clean_bdev_aliases: Prevent cleaning blocks that are not in block range genhd: remove dead and duplicated scsi code block: add back plugging in __blkdev_direct_IO nvmet/fcloop: remove some logically dead code performing redundant ret checks nvmet: fix KATO offset in Set Features nvme/fc: simplify error handling of nvme_fc_create_hw_io_queues nvme/fc: correct some printk information nvme/scsi: Remove START STOP emulation nvme/pci: Delete misleading queue-wrap comment nvme/pci: Fix whitespace problem nvme: simplify stripe quirk nvme: update maintainers information	2017-01-04 09:03:37 -08:00
Carlos Maiolino	ff97f2399e	xfs: fix max_retries _show and _store functions max_retries _show and _store functions should test against cfg->max_retries, not cfg->retry_timeout Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2017-01-03 20:34:17 -08:00
Christoph Hellwig	a1b7a4dea6	xfs: fix crash and data corruption due to removal of busy COW extents There is a race window between write_cache_pages calling clear_page_dirty_for_io and XFS calling set_page_writeback, in which the mapping for an inode is tagged neither as dirty, nor as writeback. If the COW shrinker hits in exactly that window we'll remove the delayed COW extents and writepages trying to write it back, which in release kernels will manifest as corruption of the bmap btree, and in debug kernels will trip the ASSERT about now calling xfs_bmapi_write with the COWFORK flag for holes. A complex customer load manages to hit this window fairly reliably, probably by always having COW writeback in flight while the cow shrinker runs. This patch adds another check for having the I_DIRTY_PAGES flag set, which is still set during this race window. While this fixes the problem I'm still not overly happy about the way the COW shrinker works as it still seems a bit fragile. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2017-01-03 18:39:33 -08:00
Darrick J. Wong	20e73b000b	xfs: use the actual AG length when reserving blocks We need to use the actual AG length when making per-AG reservations, since we could otherwise end up reserving more blocks out of the last AG than there are actual blocks. Complained-about-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2017-01-03 18:39:33 -08:00
Darrick J. Wong	7a21272b08	xfs: fix double-cleanup when CUI recovery fails Dan Carpenter reported a double-free of rcur if _defer_finish fails while we're recovering CUI items. Fix the error recovery to prevent this. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2017-01-03 18:39:32 -08:00
Linus Torvalds	c8b4ec8351	Two fscrypt bug fixes, one of which was unmasked by an update to the crypto tree during the merge window. -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAlhrCIIACgkQ8vlZVpUN gaP0rAf8DehnxAXTdGwCDKJ76Xgkd4C0vYwNYsWrwbEsD6dMXPmfhDVA40ZefFWY 4UQaPeoDSXQnIxw+6gi6LFCJeYs+dc9ZWHk++w5kEMclIUONomODDAQLMJbpG+5t pkEwOzjTaKbIQ5n4r3rMJtlBlrZX+ZVJmMt3sYAMWhIq7Bf7dRy6AC7+vyM5VTce AYvFpureLd7pJT0AcNvg5oPnXIFiPlKi6knlmAdJ32I4FQQO07aDA37mLPKdff4/ uKs4PGKTa9MCGw+blMDJ/208kBQPPn8JZ7yGQCdGw16CUaoSXLregqu6SNs2MKaQ WjmBFyEUssScTeAq8rYJVlU7FYxwdQ== =VVfb -----END PGP SIGNATURE----- Merge tag 'fscrypt-for-stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/fscrypt Pull fscrypt fixes from Ted Ts'o: "Two fscrypt bug fixes, one of which was unmasked by an update to the crypto tree during the merge window" * tag 'fscrypt-for-stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/fscrypt: fscrypt: fix renaming and linking special files fscrypt: fix the test_dummy_encryption mount option	2017-01-02 18:32:59 -08:00
Theodore Ts'o	5bbdcbbb39	fscrypt: make test_dummy_encryption require a keyring key Currently, the test_dummy_encryption ext4 mount option, which exists only to test encrypted I/O paths with xfstests, overrides all per-inode encryption keys with a fixed key. This change minimizes test_dummy_encryption-specific code path changes by supplying a fake context for directories which are not encrypted for use when creating new directories, files, or symlinks. This allows us to properly exercise the keyring lookup, derivation, and context inheritance code paths. Before mounting a file system using test_dummy_encryption, userspace must execute the following shell commands: mode='\x00\x00\x00\x00' raw="$(printf ""\\\\x%02x"" $(seq 0 63))" if lscpu \| grep "Byte Order" \| grep -q Little ; then size='\x40\x00\x00\x00' else size='\x00\x00\x00\x40' fi key="${mode}${raw}${size}" keyctl new_session echo -n -e "${key}" \| keyctl padd logon fscrypt:4242424242424242 @s Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2017-01-02 15:39:46 -05:00
Chandan Rajendra	6c006a9d94	clean_bdev_aliases: Prevent cleaning blocks that are not in block range The first block to be cleaned may start at a non-zero page offset. In such a scenario clean_bdev_aliases() will end up cleaning blocks that do not fall in the range of blocks to be cleaned. This commit fixes the issue by skipping blocks that do not fall in valid block range. Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Eryu Guan <eguan@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-01-02 09:35:14 -07:00
Richard Weinberger	58ae74683a	fscrypt: factor out bio specific functions That way we can get rid of the direct dependency on CONFIG_BLOCK. Fixes: `d475a50745` ("ubifs: Add skeleton for fscrypto") Reported-by: Arnd Bergmann <arnd@arndb.de> Reported-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Gstir <david@sigma-star.at> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2017-01-01 16:18:49 -05:00
Eric Biggers	efee590e4a	fscrypt: pass up error codes from ->get_context() It was possible for the ->get_context() operation to fail with a specific error code, which was then not returned to the caller of FS_IOC_SET_ENCRYPTION_POLICY or FS_IOC_GET_ENCRYPTION_POLICY. Make sure to pass through these error codes. Also reorganize the code so that ->get_context() only needs to be called one time when setting an encryption policy, and handle contexts of unrecognized sizes more appropriately. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-12-31 16:26:21 -05:00
Eric Biggers	868e1bc64d	fscrypt: remove user-triggerable warning messages Several warning messages were not rate limited and were user-triggerable from FS_IOC_SET_ENCRYPTION_POLICY. These shouldn't really have been there in the first place, but either way they aren't as useful now that the error codes have been improved. So just remove them. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-12-31 16:26:21 -05:00
Eric Biggers	8488cd96ff	fscrypt: use EEXIST when file already uses different policy As part of an effort to clean up fscrypt-related error codes, make FS_IOC_SET_ENCRYPTION_POLICY fail with EEXIST when the file already uses a different encryption policy. This is more descriptive than EINVAL, which was ambiguous with some of the other error cases. I am not aware of any users who might be relying on the previous error code of EINVAL, which was never documented anywhere. This failure case will be exercised by an xfstest. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-12-31 16:26:20 -05:00
Eric Biggers	dffd0cfa06	fscrypt: use ENOTDIR when setting encryption policy on nondirectory As part of an effort to clean up fscrypt-related error codes, make FS_IOC_SET_ENCRYPTION_POLICY fail with ENOTDIR when the file descriptor does not refer to a directory. This is more descriptive than EINVAL, which was ambiguous with some of the other error cases. I am not aware of any users who might be relying on the previous error code of EINVAL, which was never documented anywhere, and in some buggy kernels did not exist at all as the S_ISDIR() check was missing. This failure case will be exercised by an xfstest. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-12-31 16:26:20 -05:00
Eric Biggers	54475f531b	fscrypt: use ENOKEY when file cannot be created w/o key As part of an effort to clean up fscrypt-related error codes, make attempting to create a file in an encrypted directory that hasn't been "unlocked" fail with ENOKEY. Previously, several error codes were used for this case, including ENOENT, EACCES, and EPERM, and they were not consistent between and within filesystems. ENOKEY is a better choice because it expresses that the failure is due to lacking the encryption key. It also matches the error code returned when trying to open an encrypted regular file without the key. I am not aware of any users who might be relying on the previous inconsistent error codes, which were never documented anywhere. This failure case will be exercised by an xfstest. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-12-31 16:26:20 -05:00
Eric Biggers	42d97eb0ad	fscrypt: fix renaming and linking special files Attempting to link a device node, named pipe, or socket file into an encrypted directory through rename(2) or link(2) always failed with EPERM. This happened because fscrypt_has_permitted_context() saw that the file was unencrypted and forbid creating the link. This behavior was unexpected because such files are never encrypted; only regular files, directories, and symlinks can be encrypted. To fix this, make fscrypt_has_permitted_context() always return true on special files. This will be covered by a test in my encryption xfstests patchset. Fixes: `9bd8212f98` ("ext4 crypto: add encryption policy and password salt support") Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Richard Weinberger <richard@nod.at> Cc: stable@vger.kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-12-31 00:47:05 -05:00
Theodore Ts'o	fe4f6c801c	fscrypt: fix the test_dummy_encryption mount option Commit `f1c131b454`: "crypto: xts - Convert to skcipher" now fails the setkey operation if the AES key is the same as the tweak key. Previously this check was only done if FIPS mode is enabled. Now this check is also done if weak key checking was requested. This is reasonable, but since we were using the dummy key which was a constant series of 0x42 bytes, it now caused dummy encrpyption test mode to fail. Fix this by using 0x42... and 0x24... for the two keys, so they are different. Fixes: `f1c131b454` Cc: stable@vger.kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-12-27 19:46:27 -05:00
Jan Kara	1db175428e	ext4: Simplify DAX fault path Now that dax_iomap_fault() calls ->iomap_begin() without entry lock, we can use transaction starting in ext4_iomap_begin() and thus simplify ext4_dax_fault(). It also provides us proper retries in case of ENOSPC. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-12-26 20:29:25 -08:00
Jan Kara	9f141d6ef6	dax: Call ->iomap_begin without entry lock during dax fault Currently ->iomap_begin() handler is called with entry lock held. If the filesystem held any locks between ->iomap_begin() and ->iomap_end() (such as ext4 which will want to hold transaction open), this would cause lock inversion with the iomap_apply() from standard IO path which first calls ->iomap_begin() and only then calls ->actor() callback which grabs entry locks for DAX (if it faults when copying from/to user provided buffers). Fix the problem by nesting grabbing of entry lock inside ->iomap_begin() - ->iomap_end() pair. Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-12-26 20:29:25 -08:00
Jan Kara	f449b936f1	dax: Finish fault completely when loading holes The only case when we do not finish the page fault completely is when we are loading hole pages into a radix tree. Avoid this special case and finish the fault in that case as well inside the DAX fault handler. It will allow us for easier iomap handling. Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-12-26 20:29:25 -08:00
Jan Kara	e3fce68cdb	dax: Avoid page invalidation races and unnecessary radix tree traversals Currently dax_iomap_rw() takes care of invalidating page tables and evicting hole pages from the radix tree when write(2) to the file happens. This invalidation is only necessary when there is some block allocation resulting from write(2). Furthermore in current place the invalidation is racy wrt page fault instantiating a hole page just after we have invalidated it. So perform the page invalidation inside dax_iomap_actor() where we can do it only when really necessary and after blocks have been allocated so nobody will be instantiating new hole pages anymore. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-12-26 20:29:24 -08:00
Jan Kara	c6dcf52c23	mm: Invalidate DAX radix tree entries only if appropriate Currently invalidate_inode_pages2_range() and invalidate_mapping_pages() just delete all exceptional radix tree entries they find. For DAX this is not desirable as we track cache dirtiness in these entries and when they are evicted, we may not flush caches although it is necessary. This can for example manifest when we write to the same block both via mmap and via write(2) (to different offsets) and fsync(2) then does not properly flush CPU caches when modification via write(2) was the last one. Create appropriate DAX functions to handle invalidation of DAX entries for invalidate_inode_pages2_range() and invalidate_mapping_pages() and wire them up into the corresponding mm functions. Acked-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-12-26 20:29:24 -08:00
Jan Kara	e568df6b84	ext2: Return BH_New buffers for zeroed blocks So far we did not return BH_New buffers from ext2_get_blocks() when we allocated and zeroed-out a block for DAX inode to avoid racy zeroing in DAX code. This zeroing is gone these days so we can remove the workaround. Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-12-26 20:29:24 -08:00
Thomas Gleixner	1f3a8e49d8	ktime: Get rid of ktime_equal() No point in going through loops and hoops instead of just comparing the values. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org>	2016-12-25 17:21:23 +01:00

1 2 3 4 5 ...

47485 Commits