kernel_optimize_test/fs
Al Viro 8aef188452 VFS: Fix vfsmount overput on simultaneous automount
[Kudos to dhowells for tracking that crap down]

If two processes attempt to cause automounting on the same mountpoint at the
same time, the vfsmount holding the mountpoint will be left with one too few
references on it, causing a BUG when the kernel tries to clean up.

The problem is that lock_mount() drops the caller's reference to the
mountpoint's vfsmount in the case where it finds something already mounted on
the mountpoint as it transits to the mounted filesystem and replaces path->mnt
with the new mountpoint vfsmount.

During a pathwalk, however, we don't take a reference on the vfsmount if it is
the same as the one in the nameidata struct, but do_add_mount() doesn't know
this.

The fix is to make sure we have a ref on the vfsmount of the mountpoint before
calling do_add_mount().  However, if lock_mount() doesn't transit, we're then
left with an extra ref on the mountpoint vfsmount which needs releasing.
We can handle that in follow_managed() by not making assumptions about what
we can and what we cannot get from lookup_mnt() as the current code does.

The callers of follow_managed() expect that reference to path->mnt will be
grabbed iff path->mnt has been changed.  follow_managed() and follow_automount()
keep track of whether such reference has been grabbed and assume that it'll
happen in those and only those cases that'll have us return with changed
path->mnt.  That assumption is almost correct - it breaks in case of
racing automounts and in even harder to hit race between following a mountpoint
and a couple of mount --move.  The thing is, we don't need to make that
assumption at all - after the end of loop in follow_manage() we can check
if path->mnt has ended up unchanged and do mntput() if needed.

The BUG can be reproduced with the following test program:

	#include <stdio.h>
	#include <sys/types.h>
	#include <sys/stat.h>
	#include <unistd.h>
	#include <sys/wait.h>
	int main(int argc, char **argv)
	{
		int pid, ws;
		struct stat buf;
		pid = fork();
		stat(argv[1], &buf);
		if (pid > 0) wait(&ws);
		return 0;
	}

and the following procedure:

 (1) Mount an NFS volume that on the server has something else mounted on a
     subdirectory.  For instance, I can mount / from my server:

	mount warthog:/ /mnt -t nfs4 -r

     On the server /data has another filesystem mounted on it, so NFS will see
     a change in FSID as it walks down the path, and will mark /mnt/data as
     being a mountpoint.  This will cause the automount code to be triggered.

     !!! Do not look inside the mounted fs at this point !!!

 (2) Run the above program on a file within the submount to generate two
     simultaneous automount requests:

	/tmp/forkstat /mnt/data/testfile

 (3) Unmount the automounted submount:

	umount /mnt/data

 (4) Unmount the original mount:

	umount /mnt

     At this point the kernel should throw a BUG with something like the
     following:

	BUG: Dentry ffff880032e3c5c0{i=2,n=} still in use (1) [unmount of nfs4 0:12]

Note that the bug appears on the root dentry of the original mount, not the
mountpoint and not the submount because sys_umount() hasn't got to its final
mntput_no_expire() yet, but this isn't so obvious from the call trace:

 [<ffffffff8117cd82>] shrink_dcache_for_umount+0x69/0x82
 [<ffffffff8116160e>] generic_shutdown_super+0x37/0x15b
 [<ffffffffa00fae56>] ? nfs_super_return_all_delegations+0x2e/0x1b1 [nfs]
 [<ffffffff811617f3>] kill_anon_super+0x1d/0x7e
 [<ffffffffa00d0be1>] nfs4_kill_super+0x60/0xb6 [nfs]
 [<ffffffff81161c17>] deactivate_locked_super+0x34/0x83
 [<ffffffff811629ff>] deactivate_super+0x6f/0x7b
 [<ffffffff81186261>] mntput_no_expire+0x18d/0x199
 [<ffffffff811862a8>] mntput+0x3b/0x44
 [<ffffffff81186d87>] release_mounts+0xa2/0xbf
 [<ffffffff811876af>] sys_umount+0x47a/0x4ba
 [<ffffffff8109e1ca>] ? trace_hardirqs_on_caller+0x1fd/0x22f
 [<ffffffff816ea86b>] system_call_fastpath+0x16/0x1b

as do_umount() is inlined.  However, you can see release_mounts() in there.

Note also that it may be necessary to have multiple CPU cores to be able to
trigger this bug.

Tested-by: Jeff Layton <jlayton@redhat.com>
Tested-by: Ian Kent <raven@themaw.net>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-06-16 11:28:16 -04:00
..
9p 9p: remove unnecessary dentry_unhash on rmdir, dir rename 2011-05-28 01:02:53 -04:00
adfs Fix common misspellings 2011-03-31 11:26:23 -03:00
affs affs: remove unnecessary dentry_unhash on rmdir, dir rename 2011-05-28 01:02:53 -04:00
afs afs: fix sget() races, close leak on umount 2011-06-12 17:45:36 -04:00
autofs4 autofs4: bogus dentry_unhash() added in ->unlink() 2011-05-30 01:50:53 -04:00
befs Fix common misspellings 2011-03-31 11:26:23 -03:00
bfs bfs: remove unnecessary dentry_unhash on dir rename 2011-05-28 01:02:50 -04:00
btrfs Merge branch 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 2011-06-07 18:36:59 -07:00
cachefiles Fix common misspellings 2011-03-31 11:26:23 -03:00
ceph ceph: remove unnecessary dentry_unhash calls 2011-05-26 07:26:53 -04:00
cifs cifs: trivial: add space in fsc error message 2011-06-08 16:03:29 +00:00
coda coda: remove unnecessary dentry_unhash on rmdir, dir rename 2011-05-28 01:02:53 -04:00
configfs configfs: remove unnecessary dentry_unhash on rmdir, dir rename 2011-05-28 01:02:54 -04:00
cramfs cramfs: generate unique inode number for better inode cache usage 2011-01-13 08:03:23 -08:00
debugfs debugfs: move to new strtobool 2011-05-19 16:55:28 +09:30
devpts fs/devpts/inode.c: correctly check d_alloc_name() return code in devpts_pty_new() 2011-03-22 17:44:17 -07:00
dlm Merge branch 'trivial' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild-2.6 2011-05-26 13:19:00 -07:00
ecryptfs eCryptfs: Remove ecryptfs_header_cache_2 2011-05-29 14:24:25 -05:00
efs block: remove per-queue plugging 2011-03-10 08:52:07 +01:00
exofs exofs: remove unnecessary dentry_unhash on rmdir/rename_dir 2011-05-26 07:26:57 -04:00
exportfs vfs: Add open by file handle support 2011-03-15 02:21:44 -04:00
ext2 ext2: remove unnecessary dentry_unhash on rmdir/rename_dir 2011-05-26 07:26:56 -04:00
ext3 fs: pass exact type of data dirties to ->dirty_inode 2011-05-27 07:04:40 -04:00
ext4 fs: pass exact type of data dirties to ->dirty_inode 2011-05-27 07:04:40 -04:00
fat fat: Fix corrupt inode flags when remove ATTR_SYS flag 2011-05-31 19:42:24 +09:00
freevxfs treewide: fix a few typos in comments 2011-05-10 10:16:21 +02:00
fscache fscache: remove dead code under CONFIG_WORKQUEUE_DEBUGFS 2011-05-25 08:39:44 -07:00
fuse more conservative S_NOSEC handling 2011-06-03 18:24:58 -04:00
gfs2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes 2011-06-07 18:44:10 -07:00
hfs hfs: remove unnecessary dentry_unhash on rmdir, dir rename 2011-05-28 01:02:52 -04:00
hfsplus hfsplus: remove unnecessary dentry_unhash on rmdir, dir rename 2011-05-28 01:02:52 -04:00
hostfs hostfs: remove unnecessary dentry_unhash on rmdir, dir rename 2011-05-28 01:02:52 -04:00
hpfs hpfs: remove unnecessary dentry_unhash on rmdir, dir rename 2011-05-28 01:02:54 -04:00
hppfs fs: icache RCU free inodes 2011-01-07 17:50:26 +11:00
hugetlbfs mm: don't access vm_flags as 'int' 2011-05-26 09:20:31 -07:00
isofs Merge branch 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-block 2011-03-24 10:16:26 -07:00
jbd jbd: Fix comment to match the code in journal_start() 2011-05-24 00:27:53 +02:00
jbd2 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 2011-05-26 09:53:20 -07:00
jffs2 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 2011-05-28 13:03:41 -07:00
jfs lmLogOpen() broken failure exit 2011-06-07 08:50:59 -04:00
lockd NLM: Fix "kernel BUG at fs/lockd/host.c:417!" or ".../host.c:283!" 2011-01-25 15:24:47 -05:00
logfs logfs: remove unnecessary dentry_unhash from rmdir, dir rename 2011-05-28 01:02:51 -04:00
minix minix: remove unnecessary dentry_unhash on rmdir, dir rename 2011-05-28 01:02:54 -04:00
ncpfs ncpfs: fix rename over directory with dangling references 2011-05-28 01:02:53 -04:00
nfs Merge branch 'pnfs-submit' of git://git.open-osd.org/linux-open-osd 2011-05-29 14:10:13 -07:00
nfs_common Fix common misspellings 2011-03-31 11:26:23 -03:00
nfsd Merge branch 'for-2.6.40' of git://linux-nfs.org/~bfields/linux 2011-05-29 11:21:12 -07:00
nilfs2 nilfs2: remove unnecessary dentry_unhash from rmdir, dir rename 2011-05-28 01:02:51 -04:00
nls
notify Merge branch 'for-linus2' of git://git.profusion.mobi/users/lucas/linux-2.6 2011-04-07 11:14:49 -07:00
ntfs Fix common misspellings 2011-03-31 11:26:23 -03:00
ocfs2 more conservative S_NOSEC handling 2011-06-03 18:24:58 -04:00
omfs omfs: remove unnecessary dentry_unhash on rmdir, dir rneame 2011-05-28 01:02:52 -04:00
openpromfs fs: icache RCU free inodes 2011-01-07 17:50:26 +11:00
partitions Revert "block: Remove extra discard_alignment from hd_struct." 2011-05-30 07:42:51 +02:00
proc fix leak in proc_set_super() 2011-06-12 17:45:28 -04:00
pstore pstore: fix pstore filesystem mount/remount issue 2011-05-16 11:05:00 -07:00
qnx4 block: remove per-queue plugging 2011-03-10 08:52:07 +01:00
quota vmscan: change shrinker API by passing shrink_control struct 2011-05-25 08:39:26 -07:00
ramfs ramfs: fix memleak on no-mmu arch 2011-04-14 16:06:56 -07:00
reiserfs reiserfs: remove unnecessary dentry_unhash from rmdir, dir rename 2011-05-28 01:02:51 -04:00
romfs fs: icache RCU free inodes 2011-01-07 17:50:26 +11:00
squashfs Merge git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-linus 2011-05-29 11:19:45 -07:00
sysfs Delay struct net freeing while there's a sysfs instance refering to it 2011-06-12 17:45:41 -04:00
sysv sysv: remove unnecessary dentry_unhash from rmdir, dir rename 2011-05-28 01:02:50 -04:00
ubifs ubifs: fix sget races 2011-06-12 17:45:34 -04:00
udf udf: remove unnecessary dentry_unhash from rmdir, dir rename 2011-05-28 01:02:52 -04:00
ufs ufs: remove unnecessary dentry_unhash from rmdir, dir rename 2011-05-28 01:02:51 -04:00
xfs fs: pass exact type of data dirties to ->dirty_inode 2011-05-27 07:04:40 -04:00
aio.c Merge branch 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-block 2011-03-24 10:16:26 -07:00
anon_inodes.c sanitize vfsmount refcounting changes 2011-01-16 13:47:07 -05:00
attr.c Cache xattr security drop check for write v2 2011-05-28 12:02:09 -04:00
bad_inode.c fs: provide rcu-walk aware permission i_ops 2011-01-07 17:50:29 +11:00
binfmt_aout.c
binfmt_elf_fdpic.c
binfmt_elf.c brk: COMPAT_BRK: fix detection of randomized brk 2011-04-14 16:06:55 -07:00
binfmt_em86.c
binfmt_flat.c CRED: Fix load_flat_shared_library() to initialise bprm correctly 2011-05-03 10:10:51 +10:00
binfmt_misc.c
binfmt_script.c
binfmt_som.c
bio-integrity.c block: Require subsystems to explicitly allocate bio_set integrity mempool 2011-03-17 11:11:05 +01:00
bio.c block: improve the bio_add_page() and bio_add_pc_page() descriptions 2011-05-28 14:44:46 +02:00
block_dev.c block: blkdev_get() should access ->bd_disk only after success 2011-06-01 08:28:47 +02:00
buffer.c fs: block_page_mkwrite should wait for writeback to finish 2011-05-28 01:03:21 -04:00
char_dev.c Merge branch 'for-2.6.38/core' of git://git.kernel.dk/linux-2.6-block 2011-01-13 10:45:01 -08:00
compat_binfmt_elf.c
compat_ioctl.c Merge branch 'tty-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6 2011-01-07 14:39:20 -08:00
compat.c exec: unify do_execve/compat_do_execve code 2011-04-09 15:53:56 +02:00
dcache.c vmscan: change shrinker API by passing shrink_control struct 2011-05-25 08:39:26 -07:00
dcookies.c
direct-io.c Merge branch 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-block 2011-03-24 10:16:26 -07:00
drop_caches.c vmscan: change shrinker API by passing shrink_control struct 2011-05-25 08:39:26 -07:00
eventfd.c Docbook: add fs/eventfd.c and fix typos in it 2011-02-21 15:07:04 -08:00
eventpoll.c Fix common misspellings 2011-03-31 11:26:23 -03:00
exec.c exec: delay address limit change until point of no return 2011-06-09 12:50:05 -07:00
fcntl.c userns: rename is_owner_or_cap to inode_owner_or_capable 2011-03-23 19:47:13 -07:00
fhandle.c fs/fhandle.c: add <linux/personality.h> for ia64 2011-04-14 16:06:56 -07:00
fifo.c Filesystem: fifo: Fixed coding style issue. 2011-03-21 00:16:09 -04:00
file_table.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 2011-03-16 13:26:17 -07:00
file.c vfs: avoid large kmalloc()s for the fdtable 2011-04-28 11:28:20 -07:00
filesystems.c fs: synchronize_rcu when unregister_filesystem success not failure 2011-04-17 10:42:01 -07:00
fs_struct.c sanitize vfsmount refcounting changes 2011-01-16 13:47:07 -05:00
fs-writeback.c fs: pass exact type of data dirties to ->dirty_inode 2011-05-27 07:04:40 -04:00
generic_acl.c userns: rename is_owner_or_cap to inode_owner_or_capable 2011-03-23 19:47:13 -07:00
inode.c fs: cosmetic inode.c cleanups 2011-05-27 09:43:00 -04:00
internal.h fs: move i_wb_list out from under inode_lock 2011-03-24 21:17:51 -04:00
ioctl.c vfs: cleanup do_vfs_ioctl() 2011-03-21 00:16:08 -04:00
ioprio.c
Kconfig Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 2011-05-26 09:52:14 -07:00
Kconfig.binfmt
libfs.c libfs: drop unneeded dentry_unhash 2011-05-26 07:26:50 -04:00
locks.c Merge branch 'for-2.6.39' of git://linux-nfs.org/~bfields/linux 2011-03-24 08:20:39 -07:00
Makefile Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6 2011-03-16 19:01:29 -07:00
mbcache.c vmscan: change shrinker API by passing shrink_control struct 2011-05-25 08:39:26 -07:00
mpage.c mm/fs: add hooks to support cleancache 2011-05-26 10:01:43 -06:00
namei.c VFS: Fix vfsmount overput on simultaneous automount 2011-06-16 11:28:16 -04:00
namespace.c fs/namespace.c: bound mount propagation fix 2011-05-26 07:26:44 -04:00
nfsctl.c open-style analog of vfs_path_lookup() 2011-03-14 09:15:28 -04:00
no-block.c
open.c fs: Use BUG_ON(!mnt) at dentry_open(). 2011-03-21 01:10:41 -04:00
pipe.c Fix broken "pipe: use event aware wakeups" optimization 2011-01-20 16:21:59 -08:00
pnode.c fs: scale mntget/mntput 2011-01-07 17:50:33 +11:00
pnode.h
posix_acl.c NFS: Prevent memory allocation failure in nfsacl_encode() 2011-01-25 15:24:47 -05:00
read_write.c fix signedness mess in rw_verify_area() on 64bit architectures 2011-01-12 20:06:58 -05:00
read_write.h
readdir.c
select.c select: remove unused MAX_SELECT_SECONDS 2011-03-21 00:16:08 -04:00
seq_file.c
signalfd.c
splice.c splice: add wakeup_pipe_readers() 2011-05-23 19:58:53 +02:00
stack.c
stat.c readlinkat(), fchownat() and fstatat() with empty relative pathnames 2011-03-15 02:21:45 -04:00
statfs.c clean statfs-like syscalls up 2011-03-14 09:15:28 -04:00
super.c more conservative S_NOSEC handling 2011-06-03 18:24:58 -04:00
sync.c Merge branch 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-block 2011-03-24 10:16:26 -07:00
timerfd.c timerfd: Manage cancelable timers in timerfd 2011-05-23 13:59:53 +02:00
utimes.c userns: rename is_owner_or_cap to inode_owner_or_capable 2011-03-23 19:47:13 -07:00
xattr_acl.c
xattr.c Cache xattr security drop check for write v2 2011-05-28 12:02:09 -04:00