kernel_optimize_test

Author	SHA1	Message	Date
Linus Torvalds	7130098096	Add support for the CSUM_SEED feature which will allow future userspace utilities to change the file system's UUID without rewriting all of the file system metadata. A number of miscellaneous fixes, the most significant of which are in the ext4 encryption support. Anyone wishing to use the encryption feature should backport all of the ext4 crypto patches up to 4.4 to get fixes to a memory leak and file system corruption bug. There are also cleanups in ext4's feature test macros and in ext4's sysfs support code. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQEcBAABCAAGBQJWPMOAAAoJEPL5WVaVDYGjDYAH/jVIz+WpPkZn/J1Oepy1WPDm Azxv+tUvKffXQ7sRwGZEU7snLaeamKxtrvHUXKLT4PIQB6m6o4/kk8o71ovv5PRT FvuMz1j7qsqEk5ZFQnqI7xosao/RovjfkvL5g+WsAN//C+vxO35bg4fVvZmf73GQ vloe6igM/qxiUAuMX0jlXjUI1Zyo+H3bXOC6LjnUCOPwZMRcQ9zMtoYjkWryTzo1 4udepGRSfcWeZkWXqt9KIe6slYRmq3BtXWJ0+Zvx6gKWrXVIisINLDHYAEVBXAf4 6VUiDKL6wIytkwt3vGwYSY11wNQC5ky3qV/tJlPnpbYfUP0vEvT4UXljoaR/YQc= =m3Q9 -----END PGP SIGNATURE----- Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "Add support for the CSUM_SEED feature which will allow future userspace utilities to change the file system's UUID without rewriting all of the file system metadata. A number of miscellaneous fixes, the most significant of which are in the ext4 encryption support. Anyone wishing to use the encryption feature should backport all of the ext4 crypto patches up to 4.4 to get fixes to a memory leak and file system corruption bug. There are also cleanups in ext4's feature test macros and in ext4's sysfs support code" * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (26 commits) fs/ext4: remove unnecessary new_valid_dev check ext4: fix abs() usage in ext4_mb_check_group_pa ext4: do not allow journal_opts for fs w/o journal ext4: explicit mount options parsing cleanup ext4, jbd2: ensure entering into panic after recording an error in superblock [PATCH] fix calculation of meta_bg descriptor backups ext4: fix potential use after free in __ext4_journal_stop jbd2: fix checkpoint list cleanup ext4: fix xfstest generic/269 double revoked buffer bug with bigalloc ext4: make the bitmap read routines return real error codes jbd2: clean up feature test macros with predicate functions ext4: clean up feature test macros with predicate functions ext4: call out CRC and corruption errors with specific error codes ext4: store checksum seed in superblock ext4: reserve code points for the project quota feature ext4: promote ext4 over ext2 in the default probe order jbd2: gate checksum calculations on crc driver presence, not sb flags ext4: use private version of page_zero_new_buffers() for data=journal mode ext4 crypto: fix bugs in ext4_encrypted_zeroout() ext4 crypto: replace some BUG_ON()'s with error checks ...	2015-11-06 16:23:27 -08:00
Linus Torvalds	1873499e13	Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull security subsystem update from James Morris: "This is mostly maintenance updates across the subsystem, with a notable update for TPM 2.0, and addition of Jarkko Sakkinen as a maintainer of that" * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (40 commits) apparmor: clarify CRYPTO dependency selinux: Use a kmem_cache for allocation struct file_security_struct selinux: ioctl_has_perm should be static selinux: use sprintf return value selinux: use kstrdup() in security_get_bools() selinux: use kmemdup in security_sid_to_context_core() selinux: remove pointless cast in selinux_inode_setsecurity() selinux: introduce security_context_str_to_sid selinux: do not check open perm on ftruncate call selinux: change CONFIG_SECURITY_SELINUX_CHECKREQPROT_VALUE default KEYS: Merge the type-specific data with the payload data KEYS: Provide a script to extract a module signature KEYS: Provide a script to extract the sys cert list from a vmlinux file keys: Be more consistent in selection of union members used certs: add .gitignore to stop git nagging about x509_certificate_list KEYS: use kvfree() in add_key Smack: limited capability for changing process label TPM: remove unnecessary little endian conversion vTPM: support little endian guests char: Drop owner assignment from i2c_driver ...	2015-11-05 15:32:38 -08:00
Yaowei Bai	be69e1c19f	fs/ext4: remove unnecessary new_valid_dev check As new_valid_dev always returns 1, so !new_valid_dev check is not needed, remove it. Signed-off-by: Yaowei Bai <bywxiaobai@163.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2015-10-29 14:18:13 -04:00
David Howells	146aa8b145	KEYS: Merge the type-specific data with the payload data Merge the type-specific data with the payload data into one four-word chunk as it seems pointless to keep them separate. Use user_key_payload() for accessing the payloads of overloaded user-defined keys. Signed-off-by: David Howells <dhowells@redhat.com> cc: linux-cifs@vger.kernel.org cc: ecryptfs@vger.kernel.org cc: linux-ext4@vger.kernel.org cc: linux-f2fs-devel@lists.sourceforge.net cc: linux-nfs@vger.kernel.org cc: ceph-devel@vger.kernel.org cc: linux-ima-devel@lists.sourceforge.net	2015-10-21 15:18:36 +01:00
John Stultz	16175039e6	ext4: fix abs() usage in ext4_mb_check_group_pa The ext4_fsblk_t type is a long long, which should not be used with abs(), as is done in ext4_mb_check_group_pa(). This patch modifies ext4_mb_check_group_pa() to use abs64() instead. Signed-off-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2015-10-19 00:01:05 -04:00
Dmitry Monakhov	1e381f60da	ext4: do not allow journal_opts for fs w/o journal It is appeared that we can pass journal related mount options and such options be shown in /proc/mounts Example: #mkfs.ext4 -F /dev/vdb #tune2fs -O ^has_journal /dev/vdb #mount /dev/vdb /mnt/ -ocommit=20,journal_async_commit #cat /proc/mounts \| grep /mnt /dev/vdb /mnt ext4 rw,relatime,journal_checksum,journal_async_commit,commit=20,data=ordered 0 0 But options:"journal_checksum,journal_async_commit,commit=20,data=ordered" has nothing with reality because there is no journal at all. This patch disallow following options for journalless configurations: - journal_checksum - journal_async_commit - commit=%ld - data={writeback,ordered,journal} Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Andreas Dilger <adilger@dilger.ca>	2015-10-18 23:50:26 -04:00
Dmitry Monakhov	c93cf2d757	ext4: explicit mount options parsing cleanup Currently MOPT_EXPLICIT treated as EXPLICIT_DELALLOC which may be changed in future. Let's fix it now. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2015-10-18 23:35:32 -04:00
Daeho Jeong	4327ba52af	ext4, jbd2: ensure entering into panic after recording an error in superblock If a EXT4 filesystem utilizes JBD2 journaling and an error occurs, the journaling will be aborted first and the error number will be recorded into JBD2 superblock and, finally, the system will enter into the panic state in "errors=panic" option. But, in the rare case, this sequence is little twisted like the below figure and it will happen that the system enters into panic state, which means the system reset in mobile environment, before completion of recording an error in the journal superblock. In this case, e2fsck cannot recognize that the filesystem failure occurred in the previous run and the corruption wouldn't be fixed. Task A Task B ext4_handle_error() -> jbd2_journal_abort() -> __journal_abort_soft() -> __jbd2_journal_abort_hard() \| -> journal->j_flags \|= JBD2_ABORT; \| \| __ext4_abort() \| -> jbd2_journal_abort() \| \| -> __journal_abort_soft() \| \| -> if (journal->j_flags & JBD2_ABORT) \| \| return; \| -> panic() \| -> jbd2_journal_update_sb_errno() Tested-by: Hobin Woo <hobin.woo@samsung.com> Signed-off-by: Daeho Jeong <daeho.jeong@samsung.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org	2015-10-18 17:02:56 -04:00
Andy Leiserson	904dad4742	[PATCH] fix calculation of meta_bg descriptor backups "group" is the group where the backup will be placed, and is initialized to zero in the declaration. This meant that backups for meta_bg descriptors were erroneously written to the backup block group descriptors in groups 1 and (desc_per_block-1). Reproduction information: mke2fs -Fq -t ext4 -b 1024 -O ^resize_inode /tmp/foo.img 16G truncate -s 24G /tmp/foo.img losetup /dev/loop0 /tmp/foo.img mount /dev/loop0 /mnt resize2fs /dev/loop0 umount /dev/loop0 dd if=/dev/zero of=/dev/loop0 bs=1024 count=2 e2fsck -fy /dev/loop0 losetup -d /dev/loop0 Signed-off-by: Andy Leiserson <andy@leiserson.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org	2015-10-18 00:36:29 -04:00
Lukas Czerner	6934da9238	ext4: fix potential use after free in __ext4_journal_stop There is a use-after-free possibility in __ext4_journal_stop() in the case that we free the handle in the first jbd2_journal_stop() because we're referencing handle->h_err afterwards. This was introduced in `9705acd63b` and it is wrong. Fix it by storing the handle->h_err value beforehand and avoid referencing potentially freed handle. Fixes: `9705acd63b` Signed-off-by: Lukas Czerner <lczerner@redhat.com> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Cc: stable@vger.kernel.org	2015-10-17 22:57:06 -04:00
Daeho Jeong	9c02ac9798	ext4: fix xfstest generic/269 double revoked buffer bug with bigalloc When you repeatly execute xfstest generic/269 with bigalloc_1k option enabled using the below command: "./kvm-xfstests -c bigalloc_1k -m nodelalloc -C 1000 generic/269" you can easily see the below bug message. "JBD2 unexpected failure: jbd2_journal_revoke: !buffer_revoked(bh);" This means that an already revoked buffer is erroneously revoked again and it is caused by doing revoke for the buffer at the wrong position in ext4_free_blocks(). We need to re-position the buffer revoke procedure for an unspecified buffer after checking the cluster boundary for bigalloc option. If not, some part of the cluster can be doubly revoked. Signed-off-by: Daeho Jeong <daeho.jeong@samsung.com>	2015-10-17 22:28:21 -04:00
Darrick J. Wong	9008a58e5d	ext4: make the bitmap read routines return real error codes Make the bitmap reaading routines return real error codes (EIO, EFSCORRUPTED, EFSBADCRC) which can then be reflected back to userspace for more precise diagnosis work. In particular, this means that mballoc no longer claims that we're out of memory if the block bitmaps become corrupt. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2015-10-17 21:33:24 -04:00
Darrick J. Wong	e2b911c535	ext4: clean up feature test macros with predicate functions Create separate predicate functions to test/set/clear feature flags, thereby replacing the wordy old macros. Furthermore, clean out the places where we open-coded feature tests. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2015-10-17 16:18:43 -04:00
Darrick J. Wong	6a797d2737	ext4: call out CRC and corruption errors with specific error codes Instead of overloading EIO for CRC errors and corrupt structures, return the same error codes that XFS returns for the same issues. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2015-10-17 16:16:04 -04:00
Darrick J. Wong	8c81bd8f58	ext4: store checksum seed in superblock Allow the filesystem to store the metadata checksum seed in the superblock and add an incompat feature to say that we're using it. This enables tune2fs to change the UUID on a mounted metadata_csum FS without having to (racy!) rewrite all disk metadata. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2015-10-17 16:16:02 -04:00
Theodore Ts'o	8b4953e13f	ext4: reserve code points for the project quota feature Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2015-10-17 16:15:18 -04:00
Linus Torvalds	3d875182d7	Merge branch 'akpm' (patches from Andrew) Merge misc fixes from Andrew Morton: "6 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: sh: add copy_user_page() alias for __copy_user() lib/Kconfig: ZLIB_DEFLATE must select BITREVERSE mm, dax: fix DAX deadlocks memcg: convert threshold to bytes builddeb: remove debian/files before build mm, fs: obey gfp_mapping for add_to_page_cache()	2015-10-16 11:42:37 -07:00
Michal Hocko	063d99b4fa	mm, fs: obey gfp_mapping for add_to_page_cache() Commit `6afdb859b7` ("mm: do not ignore mapping_gfp_mask in page cache allocation paths") has caught some users of hardcoded GFP_KERNEL used in the page cache allocation paths. This, however, wasn't complete and there were others which went unnoticed. Dave Chinner has reported the following deadlock for xfs on loop device: : With the recent merge of the loop device changes, I'm now seeing : XFS deadlock on my single CPU, 1GB RAM VM running xfs/073. : : The deadlocked is as follows: : : kloopd1: loop_queue_read_work : xfs_file_iter_read : lock XFS inode XFS_IOLOCK_SHARED (on image file) : page cache read (GFP_KERNEL) : radix tree alloc : memory reclaim : reclaim XFS inodes : log force to unpin inodes : <wait for log IO completion> : : xfs-cil/loop1: <does log force IO work> : xlog_cil_push : xlog_write : <loop issuing log writes> : xlog_state_get_iclog_space() : <blocks due to all log buffers under write io> : <waits for IO completion> : : kloopd1: loop_queue_write_work : xfs_file_write_iter : lock XFS inode XFS_IOLOCK_EXCL (on image file) : <wait for inode to be unlocked> : : i.e. the kloopd, with it's split read and write work queues, has : introduced a dependency through memory reclaim. i.e. that writes : need to be able to progress for reads make progress. : : The problem, fundamentally, is that mpage_readpages() does a : GFP_KERNEL allocation, rather than paying attention to the inode's : mapping gfp mask, which is set to GFP_NOFS. : : The didn't used to happen, because the loop device used to issue : reads through the splice path and that does: : : error = add_to_page_cache_lru(page, mapping, index, : GFP_KERNEL & mapping_gfp_mask(mapping)); This has changed by commit `aa4d86163e` ("block: loop: switch to VFS ITER_BVEC"). This patch changes mpage_readpage{s} to follow gfp mask set for the mapping. There are, however, other places which are doing basically the same. lustre:ll_dir_filler is doing GFP_KERNEL from the function which apparently uses GFP_NOFS for other allocations so let's make this consistent. cifs:readpages_get_pages is called from cifs_readpages and __cifs_readpages_from_fscache called from the same path obeys mapping gfp. ramfs_nommu_expand_for_mapping is hardcoding GFP_KERNEL as well regardless it uses mapping_gfp_mask for the page allocation. ext4_mpage_readpages is the called from the page cache allocation path same as read_pages and read_cache_pages As I've noticed in my previous post I cannot say I would be happy about sprinkling mapping_gfp_mask all over the place and it sounds like we should drop gfp_mask argument altogether and use it internally in __add_to_page_cache_locked that would require all the filesystems to use mapping gfp consistently which I am not sure is the case here. From a quick glance it seems that some file system use it all the time while others are selective. Signed-off-by: Michal Hocko <mhocko@suse.com> Reported-by: Dave Chinner <david@fromorbit.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Ming Lei <ming.lei@canonical.com> Cc: Andreas Dilger <andreas.dilger@intel.com> Cc: Oleg Drokin <oleg.drokin@intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-10-16 11:42:28 -07:00
Theodore Ts'o	b90197b655	ext4: use private version of page_zero_new_buffers() for data=journal mode If there is a error while copying data from userspace into the page cache during a write(2) system call, in data=journal mode, in ext4_journalled_write_end() were using page_zero_new_buffers() from fs/buffer.c. Unfortunately, this sets the buffer dirty flag, which is no good if journalling is enabled. This is a long-standing bug that goes back for years and years in ext3, but a combination of (a) data=journal not being very common, (b) in many case it only results in a warning message. and (c) only very rarely causes the kernel hang, means that we only really noticed this as a problem when commit `998ef75ddb` caused this failure to happen frequently enough to cause generic/208 to fail when run in data=journal mode. The fix is to have our own version of this function that doesn't call mark_dirty_buffer(), since we will end up calling ext4_handle_dirty_metadata() on the buffer head(s) in questions very shortly afterwards in ext4_journalled_write_end(). Thanks to Dave Hansen and Linus Torvalds for helping to identify the root cause of the problem. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.com>	2015-10-15 10:29:05 -04:00
Theodore Ts'o	36086d43f6	ext4 crypto: fix bugs in ext4_encrypted_zeroout() Fix multiple bugs in ext4_encrypted_zeroout(), including one that could cause us to write an encrypted zero page to the wrong location on disk, potentially causing data and file system corruption. Fortunately, this tends to only show up in stress tests, but even with these fixes, we are seeing some test failures with generic/127 --- but these are now caused by data failures instead of metadata corruption. Since ext4_encrypted_zeroout() is only used for some optimizations to keep the extent tree from being too fragmented, and ext4_encrypted_zeroout() itself isn't all that optimized from a time or IOPS perspective, disable the extent tree optimization for encrypted inodes for now. This prevents the data corruption issues reported by generic/127 until we can figure out what's going wrong. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org	2015-10-03 10:49:29 -04:00
Theodore Ts'o	687c3c36e7	ext4 crypto: replace some BUG_ON()'s with error checks Buggy (or hostile) userspace should not be able to cause the kernel to crash. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org	2015-10-03 10:49:27 -04:00
Theodore Ts'o	3684de8ca2	ext4 crypto: ext4_page_crypto() doesn't need a encryption context Since ext4_page_crypto() doesn't need an encryption context (at least not any more), this allows us to simplify a number function signature and also allows us to avoid needing to allocate a context in ext4_block_write_begin(). It also means we no longer need a separate ext4_decrypt_one() function. Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2015-10-03 10:49:26 -04:00
Theodore Ts'o	cccd147a57	ext4: optimize ext4_writepage() for attempted 4k delalloc writes In cases where the file system block size is the same as the page size, and ext4_writepage() is asked to write out a page which is either has the unwritten bit set in the extent tree, or which does not yet have a block assigned due to delayed allocation, we can bail out early and, unlocking the page earlier and avoiding a round trip through ext4_bio_write_page() with the attendant calls to set_page_writeback() and redirty_page_for_writeback(). Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2015-10-03 10:49:23 -04:00
Theodore Ts'o	937d7b84dc	ext4 crypto: fix memory leak in ext4_bio_write_page() There are times when ext4_bio_write_page() is called even though we don't actually need to do any I/O. This happens when ext4_writepage() gets called by the jbd2 commit path when an inode needs to force its pages written out in order to provide data=ordered guarantees --- and a page is backed by an unwritten (e.g., uninitialized) block on disk, or if delayed allocation means the page's backing store hasn't been allocated yet. In that case, we need to skip the call to ext4_encrypt_page(), since in addition to wasting CPU, it leads to a bounce page and an ext4 crypto context getting leaked. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org	2015-10-02 23:54:58 -04:00
Jean Delvare	d4eb6dee47	ext4: Update EXT4_USE_FOR_EXT2 description Configuration option EXT4_USE_FOR_EXT2 has no effect on ext3 support. Support for ext3 is always included now. Signed-off-by: Jean Delvare <jdelvare@suse.de> Fixes: `c290ea01ab` ("fs: Remove ext3 filesystem driver") Cc: Jan Kara <jack@suse.cz> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Signed-off-by: Jan Kara <jack@suse.com>	2015-09-24 13:27:47 +02:00
Theodore Ts'o	ebd173beb8	ext4: move procfs registration code to fs/ext4/sysfs.c This allows us to refactor the procfs code, which saves a bit of compiled space. More importantly it isolates most of the procfs support code into a single file, so it's easier to #ifdef it out if the proc file system has been disabled. Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2015-09-23 12:46:17 -04:00
Theodore Ts'o	76d33bca55	ext4: refactor sysfs support code Make the code more easily extensible as well as taking up less compiled space. Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2015-09-23 12:45:17 -04:00
Theodore Ts'o	b579901882	ext4: move sysfs code from super.c to fs/ext4/sysfs.c Also statically allocate the ext4_kset and ext4_feat objects, since we only need exactly one of each, and it's simpler and less code if we drop the dynamic allocation and deallocation when it's not needed. Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2015-09-23 12:44:17 -04:00
Matthew Wilcox	01a33b4ace	ext4: start transaction before calling into DAX Jan Kara pointed out that in the case where we are writing to a hole, we can end up with a lock inversion between the page lock and the journal lock. We can avoid this by starting the transaction in ext4 before calling into DAX. The journal lock nests inside the superblock pagefault lock, so we have to duplicate that code from dax_fault, like XFS does. Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Cc: Jan Kara <jack@suse.cz> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-09-08 15:35:28 -07:00
Matthew Wilcox	ed923b5776	ext4: add ext4_get_block_dax() DAX wants different semantics from any currently-existing ext4 get_block callback. Unlike ext4_get_block_write(), it needs to honour the 'create' flag, and unlike ext4_get_block(), it needs to be able to return unwritten extents. So introduce a new ext4_get_block_dax() which has those semantics. We could also change ext4_get_block_write() to honour the 'create' flag, but that might have consequences on other users that I do not currently understand. Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-09-08 15:35:28 -07:00
Matthew Wilcox	e676a4c191	ext4: use ext4_get_block_write() for DAX DAX relies on the get_block function either zeroing newly allocated blocks before they're findable by subsequent calls to get_block, or marking newly allocated blocks as unwritten. ext4_get_block() cannot create unwritten extents, but ext4_get_block_write() can. Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Reported-by: Andy Rudoff <andy.rudoff@intel.com> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-09-08 15:35:28 -07:00
Matthew Wilcox	11bd1a9ecd	ext4: huge page fault support Use DAX to provide support for huge pages. Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Cc: Hillf Danton <dhillf@gmail.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-09-08 15:35:28 -07:00
Matthew Wilcox	c94c2acf84	dax: move DAX-related functions to a new header In order to handle the !CONFIG_TRANSPARENT_HUGEPAGES case, we need to return VM_FAULT_FALLBACK from the inlined dax_pmd_fault(), which is defined in linux/mm.h. Given that we don't want to include <linux/mm.h> in <linux/fs.h>, the easiest solution is to move the DAX-related functions to a new header, <linux/dax.h>. We could also have moved VM_FAULT_* definitions to a new header, or a different header that isn't quite such a boil-the-ocean header as <linux/mm.h>, but this felt like the best option. Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Cc: Hillf Danton <dhillf@gmail.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-09-08 15:35:28 -07:00
Kees Cook	a068acf2ee	fs: create and use seq_show_option for escaping Many file systems that implement the show_options hook fail to correctly escape their output which could lead to unescaped characters (e.g. new lines) leaking into /proc/mounts and /proc/[pid]/mountinfo files. This could lead to confusion, spoofed entries (resulting in things like systemd issuing false d-bus "mount" notifications), and who knows what else. This looks like it would only be the root user stepping on themselves, but it's possible weird things could happen in containers or in other situations with delegated mount privileges. Here's an example using overlay with setuid fusermount trusting the contents of /proc/mounts (via the /etc/mtab symlink). Imagine the use of "sudo" is something more sneaky: $ BASE="ovl" $ MNT="$BASE/mnt" $ LOW="$BASE/lower" $ UP="$BASE/upper" $ WORK="$BASE/work/ 0 0 none /proc fuse.pwn user_id=1000" $ mkdir -p "$LOW" "$UP" "$WORK" $ sudo mount -t overlay -o "lowerdir=$LOW,upperdir=$UP,workdir=$WORK" none /mnt $ cat /proc/mounts none /root/ovl/mnt overlay rw,relatime,lowerdir=ovl/lower,upperdir=ovl/upper,workdir=ovl/work/ 0 0 none /proc fuse.pwn user_id=1000 0 0 $ fusermount -u /proc $ cat /proc/mounts cat: /proc/mounts: No such file or directory This fixes the problem by adding new seq_show_option and seq_show_option_n helpers, and updating the vulnerable show_option handlers to use them as needed. Some, like SELinux, need to be open coded due to unusual existing escape mechanisms. [akpm@linux-foundation.org: add lost chunk, per Kees] [keescook@chromium.org: seq_show_option should be using const parameters] Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Acked-by: Jan Kara <jack@suse.com> Acked-by: Paul Moore <paul@paul-moore.com> Cc: J. R. Okajima <hooanon05g@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-09-04 16:54:41 -07:00
Linus Torvalds	ea814ab9aa	Pretty much all bug fixes and clean ups for 4.3, after a lot of features and other churn going into 4.2. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQEcBAABCAAGBQJV55TlAAoJEPL5WVaVDYGjyzYH/1WtZpIzRjp7o+3H4/vFqONg R1Fsw785C1w8WX2QuIK/m31u4XO+VeCV4jWA9DuqnSzWm9w9C/4kTqITd4Hwp416 /9gJvYoZHHaDikxpWWADptDi8IoLohTlcFVCHIvvf53cGehVEEsc2WOijUZo7Cgv O454Nm3tK0CQ3yrCIlf5SyvkUZSMTiawLLJJzd4GCyvU13C1SnABNQj8UxKisBA5 cP8q4O2nPg/S9rkYxnFAifQyZppd3jMvorUaq9eHiWMjl95o6e/6+wYGnHhoFUvr /P1dNjJYbzk+TUzlsDkq2zANK2UsB3iNNi8YwLFOpfFcuYopmUAYRIWOgIZWYUQ= =ijuI -----END PGP SIGNATURE----- Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "Pretty much all bug fixes and clean ups for 4.3, after a lot of features and other churn going into 4.2" * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: Revert "ext4: remove block_device_ejected" ext4: ratelimit the file system mounted message ext4: silence a format string false positive ext4: simplify some code in read_mmp_block() ext4: don't manipulate recovery flag when freezing no-journal fs jbd2: limit number of reserved credits ext4 crypto: remove duplicate header file ext4: update c/mtime on truncate up jbd2: avoid infinite loop when destroying aborted journal ext4, jbd2: add REQ_FUA flag when recording an error in the superblock ext4 crypto: fix spelling typo in comment ext4 crypto: exit cleanly if ext4_derive_key_aes() fails ext4: reject journal options for ext2 mounts ext4: implement cgroup writeback support ext4: replace ext4_io_submit->io_op with ->io_wbc ext4 crypto: check for too-short encrypted file names ext4 crypto: use a jbd2 transaction when adding a crypto policy jbd2: speedup jbd2_journal_dirty_metadata()	2015-09-03 12:52:19 -07:00
Linus Torvalds	e31fb9e005	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull ext3 removal, quota & udf fixes from Jan Kara: "The biggest change in the pull is the removal of ext3 filesystem driver (~28k lines removed). Ext4 driver is a full featured replacement these days and both RH and SUSE use it for several years without issues. Also there are some workarounds in VM & block layer mainly for ext3 which we could eventually get rid of. Other larger change is addition of proper error handling for dquot_initialize(). The rest is small fixes and cleanups" [ I wasn't convinced about the ext3 removal and worried about things falling through the cracks for legacy users, but ext4 maintainers piped up and were all unanimously in favor of removal, and maintaining all legacy ext3 support inside ext4. - Linus ] * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: udf: Don't modify filesystem for read-only mounts quota: remove an unneeded condition ext4: memory leak on error in ext4_symlink() mm/Kconfig: NEED_BOUNCE_POOL: clean-up condition ext4: Improve ext4 Kconfig test block: Remove forced page bouncing under IO fs: Remove ext3 filesystem driver doc: Update doc about journalling layer jfs: Handle error from dquot_initialize() reiserfs: Handle error from dquot_initialize() ocfs2: Handle error from dquot_initialize() ext4: Handle error from dquot_initialize() ext2: Handle error from dquot_initalize() quota: Propagate error from ->acquire_dquot()	2015-09-03 12:28:30 -07:00
Theodore Ts'o	bdfe0cbd74	Revert "ext4: remove block_device_ejected" This reverts commit `08439fec26`. Unfortunately we still need to test for bdi->dev to avoid a crash when a USB stick is yanked out while a file system is mounted: usb 2-2: USB disconnect, device number 2 Buffer I/O error on dev sdb1, logical block 15237120, lost sync page write JBD2: Error -5 detected when updating journal superblock for sdb1-8. BUG: unable to handle kernel paging request at 34beb000 IP: [<c136ce88>] __percpu_counter_add+0x18/0xc0 pdpt = 0000000023db9001 pde = 0000000000000000 Oops: 0000 [#1] SMP CPU: 0 PID: 4083 Comm: umount Tainted: G U OE 4.1.1-040101-generic #201507011435 Hardware name: LENOVO 7675CTO/7675CTO, BIOS 7NETC2WW (2.22 ) 03/22/2011 task: ebf06b50 ti: ebebc000 task.ti: ebebc000 EIP: 0060:[<c136ce88>] EFLAGS: 00010082 CPU: 0 EIP is at __percpu_counter_add+0x18/0xc0 EAX: f21c8e88 EBX: f21c8e88 ECX: 00000000 EDX: 00000001 ESI: 00000001 EDI: 00000000 EBP: ebebde60 ESP: ebebde40 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 CR0: 8005003b CR2: 34beb000 CR3: 33354200 CR4: 000007f0 Stack: c1abe100 edcb0098 edcb00ec ffffffff f21c8e68 ffffffff f21c8e68 f286d160 ebebde84 c1160454 00000010 00000282 f72a77f8 00000984 f72a77f8 f286d160 f286d170 ebebdea0 c11e613f 00000000 00000282 f72a77f8 edd7f4d0 00000000 Call Trace: [<c1160454>] account_page_dirtied+0x74/0x110 [<c11e613f>] __set_page_dirty+0x3f/0xb0 [<c11e6203>] mark_buffer_dirty+0x53/0xc0 [<c124a0cb>] ext4_commit_super+0x17b/0x250 [<c124ac71>] ext4_put_super+0xc1/0x320 [<c11f04ba>] ? fsnotify_unmount_inodes+0x1aa/0x1c0 [<c11cfeda>] ? evict_inodes+0xca/0xe0 [<c11b925a>] generic_shutdown_super+0x6a/0xe0 [<c10a1df0>] ? prepare_to_wait_event+0xd0/0xd0 [<c1165a50>] ? unregister_shrinker+0x40/0x50 [<c11b92f6>] kill_block_super+0x26/0x70 [<c11b94f5>] deactivate_locked_super+0x45/0x80 [<c11ba007>] deactivate_super+0x47/0x60 [<c11d2b39>] cleanup_mnt+0x39/0x80 [<c11d2bc0>] __cleanup_mnt+0x10/0x20 [<c1080b51>] task_work_run+0x91/0xd0 [<c1011e3c>] do_notify_resume+0x7c/0x90 [<c1720da5>] work_notify Code: 8b 55 e8 e9 f4 fe ff ff 90 90 90 90 90 90 90 90 90 90 90 55 89 e5 83 ec 20 89 5d f4 89 c3 89 75 f8 89 d6 89 7d fc 89 cf 8b 48 14 <64> 8b 01 89 45 ec 89 c2 8b 45 08 c1 fa 1f 01 75 ec 89 55 f0 89 EIP: [<c136ce88>] __percpu_counter_add+0x18/0xc0 SS:ESP 0068:ebebde40 CR2: 0000000034beb000 ---[ end trace dd564a7bea834ecd ]--- Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=101011 Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org	2015-08-16 10:03:57 -04:00
Theodore Ts'o	e294a5371b	ext4: ratelimit the file system mounted message The xfstests ext4/305 will mount and unmount the same file system over 4,000 times, and each one of these will cause a system log message. Ratelimit this message since if we are getting more than a few dozen of these messages, they probably aren't going to be helpful. Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2015-08-15 14:59:44 -04:00
Dan Carpenter	da0b5e40ab	ext4: silence a format string false positive Static checkers complain that the format string should be "%s". It does not make a difference for the current code. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2015-08-15 11:38:13 -04:00
Dan Carpenter	9810446836	ext4: simplify some code in read_mmp_block() My static check complains because we have: if (!bh) return -ENOMEM; if (bh) { The second check is unnecessary. I've simplified this code by moving the "if (!*bh)" checks around. Also Andreas Dilger says we should probably print a warning if sb_getblk() fails. [ Restructured the code so that we print a warning message as well if the mmp block doesn't check out, and to print the error code to disambiguate between the error cases. - TYT ] Reviewed-by: Andreas Dilger <adilger@dilger.ca> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2015-08-15 11:30:31 -04:00
Eric Sandeen	c642dc9e1a	ext4: don't manipulate recovery flag when freezing no-journal fs At some point along this sequence of changes: `f6e63f9` ext4: fold ext4_nojournal_sops into ext4_sops `bb04457` ext4: support freezing ext2 (nojournal) file systems `9ca9238` ext4: Use separate super_operations structure for no_journal filesystems ext4 started setting needs_recovery on filesystems without journals when they are unfrozen. This makes no sense, and in fact confuses blkid to the point where it doesn't recognize the filesystem at all. (freeze ext2; unfreeze ext2; run blkid; see no output; run dumpe2fs, see needs_recovery set on fs w/ no journal). To fix this, don't manipulate the INCOMPAT_RECOVER feature on filesystems without journals. Reported-by: Stu Mark <smark@datto.com> Reviewed-by: Jan Kara <jack@suse.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org	2015-08-15 10:45:06 -04:00
Kent Overstreet	b54ffb73ca	block: remove bio_get_nr_vecs() We can always fill up the bio now, no need to estimate the possible size based on queue parameters. Acked-by: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> [hch: rebased and wrote a changelog] Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lin <ming.l@ssi.samsung.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-08-13 12:32:04 -06:00
Christoph Hellwig	4246a0b63b	block: add a bi_error field to struct bio Currently we have two different ways to signal an I/O error on a BIO: (1) by clearing the BIO_UPTODATE flag (2) by returning a Linux errno value to the bi_end_io callback The first one has the drawback of only communicating a single possible error (-EIO), and the second one has the drawback of not beeing persistent when bios are queued up, and are not passed along from child to parent bio in the ever more popular chaining scenario. Having both mechanisms available has the additional drawback of utterly confusing driver authors and introducing bugs where various I/O submitters only deal with one of them, and the others have to add boilerplate code to deal with both kinds of error returns. So add a new bi_error field to store an errno value directly in struct bio and remove the existing mechanisms to clean all this up. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: NeilBrown <neilb@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-07-29 08:55:15 -06:00
zilong.liu	923ae47177	ext4 crypto: remove duplicate header file Remove key.h which is included twice in crypto_fname.c Signed-off-by: zilong.liu <liuziloong@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2015-07-28 15:12:18 -04:00
Eryu Guan	911af577de	ext4: update c/mtime on truncate up Commit `3da40c7b08` ("ext4: only call ext4_truncate when size <= isize") introduced a bug that c/mtime is not updated on truncate up. Fix the issue by setting c/mtime explicitly in the truncate up case. Note that ftruncate(2) is not affected, so you won't see this bug using truncate(1) and xfs_io(1). Signed-off-by: Zirong Lang <zorro.lang@gmail.com> Signed-off-by: Eryu Guan <guaneryu@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2015-07-28 15:08:41 -04:00
Dan Carpenter	926631c201	ext4: memory leak on error in ext4_symlink() We should release "sd" before returning. Fixes: 0fa12ad1b285 ('ext4: Handle error from dquot_initialize()') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jan Kara <jack@suse.com>	2015-07-27 14:30:45 +02:00
Jan Kara	c8962f4be4	ext4: Improve ext4 Kconfig test Now that ext4 driver must be used to access ext3 filesystems, improve the Kconfig help text to better explain that using ext4 driver to access the filesystem is fully compatible with the old ext3 driver. Acked-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jan Kara <jack@suse.com>	2015-07-23 20:59:40 +02:00
Jan Kara	c290ea01ab	fs: Remove ext3 filesystem driver The functionality of ext3 is fully supported by ext4 driver. Major distributions (SUSE, RedHat) already use ext4 driver to handle ext3 filesystems for quite some time. There is some ugliness in mm resulting from jbd cleaning buffers in a dirty page without cleaning page dirty bit and also support for buffer bouncing in the block layer when stable pages are required is there only because of jbd. So let's remove the ext3 driver. This saves us some 28k lines of duplicated code. Acked-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jan Kara <jack@suse.cz>	2015-07-23 20:59:40 +02:00
Jan Kara	a7cdadee0e	ext4: Handle error from dquot_initialize() dquot_initialize() can now return error. Handle it where possible. Acked-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jan Kara <jack@suse.com>	2015-07-23 20:59:37 +02:00
Daeho Jeong	564bc40252	ext4, jbd2: add REQ_FUA flag when recording an error in the superblock When an error condition is detected, an error status should be recorded into superblocks of EXT4 or JBD2. However, the write request is submitted now without REQ_FUA flag, even in "barrier=1" mode, which is followed by panic() function in "errors=panic" mode. On mobile devices which make whole system reset as soon as kernel panic occurs, this write request containing an error flag will disappear just from storage cache without written to the physical cells. Therefore, when next start, even forever, the error flag cannot be shown in both superblocks, and e2fsck cannot fix the filesystem problems automatically, unless e2fsck is executed in force checking mode. [ Changed use test_opt(sb, BARRIER) of checking the journal flags -- TYT ] Signed-off-by: Daeho Jeong <daeho.jeong@samsung.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2015-07-23 09:46:11 -04:00

1 2 3 4 5 ...

2599 Commits