currently all marking is done by functions in inode-mark.c. Some of this
is pretty generic and should be instead done in a generic function and we
should only put the inode specific code in inode-mark.c
Signed-off-by: Eric Paris <eparis@redhat.com>
All callers to fsnotify_find_mark_entry() except one take and
release inode->i_lock around the call. Take the lock inside
fsnotify_find_mark_entry() instead.
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: Eric Paris <eparis@redhat.com>
The addition of marks on vfs mounts will be simplified if the inode
specific parts of a mark and the vfsmnt specific parts of a mark are
actually in a union so naming can be easy. This patch just implements the
inode struct and the union.
Signed-off-by: Eric Paris <eparis@redhat.com>
To ensure that a group will not duplicate events when it receives it based
on the vfsmount and the inode should_send_event test we should distinguish
those two cases. We pass a vfsmount to this function so groups can make
their own determinations.
Signed-off-by: Eric Paris <eparis@redhat.com>
fsnotify_obtain_group was intended to be able to find an already existing
group. Nothing uses that functionality. This just renames it to
fsnotify_alloc_group so it is clear what it is doing.
Signed-off-by: Eric Paris <eparis@redhat.com>
The original fsnotify interface has a group-num which was intended to be
able to find a group after it was added. I no longer think this is a
necessary thing to do and so we remove the group_num.
Signed-off-by: Eric Paris <eparis@redhat.com>
inotify only wishes to merge a new event with the last event on the
notification fifo. fanotify is willing to merge any events including by
means of bitwise OR masks of multiple events together. This patch moves
the inotify event merging logic out of the generic fsnotify notification.c
and into the inotify code. This allows each use of fsnotify to provide
their own merge functionality.
Signed-off-by: Eric Paris <eparis@redhat.com>
fanotify is going to need to look at file->private_data to know if an event
should be sent or not. This passes the data (which might be a file,
dentry, inode, or none) to the should_send function calls so fanotify can
get that information when available
Signed-off-by: Eric Paris <eparis@redhat.com>
fanotify is only interested in event types which contain enough information
to open the original file in the context of the fanotify listener. Since
fanotify may not want to send events if that data isn't present we pass
the data type to the should_send_event function call so fanotify can express
its lack of interest.
Signed-off-by: Eric Paris <eparis@redhat.com>
inotify was supposed to have a dmesg printk ratelimitor which would cause
inotify to only emit one message per boot. The static bool was never set
so it kept firing messages. This patch correctly limits warnings in multiple
places.
Signed-off-by: Eric Paris <eparis@redhat.com>
Prior to 2.6.31 inotify would not reuse watch descriptors until all of
them had been used at least once. After the rewrite inotify would reuse
watch descriptors. The selinux utility 'restorecond' was found to have
problems when watch descriptors were reused. This patch reverts to the
pre inotify rewrite behavior to not reuse watch descriptors.
Signed-off-by: Eric Paris <eparis@redhat.com>
inotify_free_mark casts directly from an fsnotify_mark_entry to an
inotify_inode_mark_entry. This works, but should use container_of instead
for future proofing.
Signed-off-by: Eric Paris <eparis@redhat.com>
This patch allows a task to add a second fsnotify mark to an inode for the
same group. This mark will be added to the end of the inode's list and
this will never be found by the stand fsnotify_find_mark() function. This
is useful if a user wants to add a new mark before removing the old one.
Signed-off-by: Eric Paris <eparis@redhat.com>
This patch moves all of the idr editing operations into their own idr
functions. It makes it easier to prove locking correctness and to to
understand the code flow.
Signed-off-by: Eric Paris <eparis@redhat.com>
Make sure that s_umount is acquired *before* we drop the final
active reference; we still have the fast path (atomic_dec_unless)
and we have gotten rid of the window between the moment when
s_active hits zero and s_umount is acquired. Which simplifies
the living hell out of grab_super() and inotify pin_to_kill()
stuff.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* 'for-linus' of git://git.infradead.org/users/eparis/notify:
inotify: don't leak user struct on inotify release
inotify: race use after free/double free in inotify inode marks
inotify: clean up the inotify_add_watch out path
Inotify: undefined reference to `anon_inode_getfd'
Manual merge to remove duplicate "select ANON_INODES" from Kconfig file
inotify_new_group() receives a get_uid-ed user_struct and saves the
reference on group->inotify_data.user. The problem is that free_uid() is
never called on it.
Issue seem to be introduced by 63c882a0 (inotify: reimplement inotify
using fsnotify) after 2.6.30.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Eric Paris <eparis@parisplace.org>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Eric Paris <eparis@redhat.com>
There is a race in the inotify add/rm watch code. A task can find and
remove a mark which doesn't have all of it's references. This can
result in a use after free/double free situation.
Task A Task B
------------ -----------
inotify_new_watch()
allocate a mark (refcnt == 1)
add it to the idr
inotify_rm_watch()
inotify_remove_from_idr()
fsnotify_put_mark()
refcnt hits 0, free
take reference because we are on idr
[at this point it is a use after free]
[time goes on]
refcnt may hit 0 again, double free
The fix is to take the reference BEFORE the object can be found in the
idr.
Signed-off-by: Eric Paris <eparis@redhat.com>
Cc: <stable@kernel.org>
inotify_add_watch explictly frees the unused inode mark, but it can just
use the generic code. Just do that.
Signed-off-by: Eric Paris <eparis@redhat.com>
Fix:
fs/built-in.o: In function `sys_inotify_init1':
summary.c:(.text+0x347a4): undefined reference to `anon_inode_getfd'
found by kautobuild with arms bcmring_defconfig, which ends up with
INOTIFY_USER enabled (through the 'default y') but leaves ANON_INODES
unset. However, inotify_user.c uses anon_inode_getfd().
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Eric Paris <eparis@redhat.com>
CONFIG_INOTIFY_USER defined but CONFIG_ANON_INODES undefined will result
in the following build failure:
LD vmlinux
fs/built-in.o: In function 'sys_inotify_init1':
(.text.sys_inotify_init1+0x22c): undefined reference to 'anon_inode_getfd'
fs/built-in.o: In function `sys_inotify_init1':
(.text.sys_inotify_init1+0x22c): relocation truncated to fit: R_MIPS_26 against 'anon_inode_getfd'
make[2]: *** [vmlinux] Error 1
make[1]: *** [sub-make] Error 2
make: *** [all] Error 2
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
inotify will WARN() if it finds that the idr and the fsnotify internals
somehow got out of sync. It was only supposed to do this once but due
to this stupid bug it would warn every single time a problem was
detected.
Signed-off-by: Eric Paris <eparis@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Since commit 7e790dd5fc ("inotify: fix
error paths in inotify_update_watch") inotify changed the manor in which
it gave watch descriptors back to userspace. Previous to this commit
inotify acted like the following:
inotify_add_watch(X, Y, Z) = 1
inotify_rm_watch(X, 1);
inotify_add_watch(X, Y, Z) = 2
but after this patch inotify would return watch descriptors like so:
inotify_add_watch(X, Y, Z) = 1
inotify_rm_watch(X, 1);
inotify_add_watch(X, Y, Z) = 1
which I saw as equivalent to opening an fd where
open(file) = 1;
close(1);
open(file) = 1;
seemed perfectly reasonable. The issue is that quite a bit of userspace
apparently relies on the behavior in which watch descriptors will not be
quickly reused. KDE relies on it, I know some selinux packages rely on
it, and I have heard complaints from other random sources such as debian
bug 558981.
Although the man page implies what we do is ok, we broke userspace so
this patch almost reverts us to the old behavior. It is still slightly
racey and I have patches that would fix that, but they are rather large
and this will fix it for all real world cases. The race is as follows:
- task1 creates a watch and blocks in idr_new_watch() before it updates
the hint.
- task2 creates a watch and updates the hint.
- task1 updates the hint with it's older wd
- task removes the watch created by task2
- task adds a new watch and will reuse the wd originally given to task2
it requires moving some locking around the hint (last_wd) but this should
solve it for the real world and be -stable safe.
As a side effect this patch papers over a bug in the lib/idr code which
is causing a large number WARN's to pop on people's system and many
reports in kerneloops.org. I'm working on the root cause of that idr
bug seperately but this should make inotify immune to that issue.
Signed-off-by: Eric Paris <eparis@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
For consistency drop & in front of every proc_handler. Explicity
taking the address is unnecessary and it prevents optimizations
like stubbing the proc_handlers to NULL.
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Now that sys_sysctl is a generic wrapper around /proc/sys .ctl_name
and .strategy members of sysctl tables are dead code. Remove them.
Cc: Jan Harkes <jaharkes@cs.cmu.edu>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Seperating the addition and update of marks in inotify resulted in a
regression in that inotify never gets events. The inotify group mask is
always 0. This mask should be updated any time a new mark is added.
Signed-off-by: Eric Paris <eparis@redhat.com>
0db501bd06 introduced a regresion in that it now sends a nul
terminator but the length accounting when checking for space or
reporting to userspace did not take this into account. This corrects
all of the rounding logic.
Signed-off-by: Eric Paris <eparis@redhat.com>
When an event has no pathname, there's no need to pad it with a null byte and
therefore generate an inotify_event sized block of zeros. This fixes a
regression introduced by commit 0db501bd06 where
my system wouldn't finish booting because some process was being confused by
this.
Signed-off-by: Brian Rogers <brian@xyzw.org>
Signed-off-by: Eric Paris <eparis@redhat.com>
Before the rewrite copy_event_to_user always wrote a terqminating '\0'
byte to user space after the filename. Since the rewrite that
terminating byte was skipped if your filename is exactly a multiple of
event_size. Ouch!
So add one byte to name_size before we round up and use clear_user to
set userspace to zero like /dev/zero does instead of copying the
strange nul_inotify_event. I can't quite convince myself len_to_zero
will never exceed 16 and even if it doesn't clear_user should be more
efficient and a more accurate reflection of what the code is trying to
do.
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
The are races around the idr storage of inotify watches. It's possible
that a watch could be found from sys_inotify_rm_watch() in the idr, but it
could be removed from the idr before that code does it's removal. Move the
locking and the refcnt'ing so that these have to happen atomically.
Signed-off-by: Eric Paris <eparis@redhat.com>
If an inotify watch is left in the idr when an fsnotify group is destroyed
this will lead to a BUG. This is not a dangerous situation and really
indicates a programming bug and leak of memory. This patch changes it to
use a WARN and a printk rather than killing people's boxes.
Signed-off-by: Eric Paris <eparis@redhat.com>
There is nothing known wrong with the inotify watch addition/modification
but this patch seperates the two code paths to make them each easy to
verify as correct.
Signed-off-by: Eric Paris <eparis@redhat.com>
The inotify_add_watch man page specifies that inotify_add_watch() will
return a non-negative integer. However, historically the inotify
watches started at 1, not at 0.
Turns out that the inotifywait program provided by the inotify-tools
package doesn't properly handle a 0 watch descriptor. In 7e790dd5 we
changed from starting at 1 to starting at 0. This patch starts at 1,
just like in previous kernels, but also just like in previous kernels
it's possible for it to wrap back to 0. This preserves the kernel
functionality exactly like it was before the patch (neither method broke
the spec)
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
inotify decides if private data it passed to get added to an event was
used by checking list_empty(). But it's possible that the event may
have been dequeued and the private event removed so it would look empty.
The fix is to use the return code from fsnotify_add_notify_event rather
than looking at the list.
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
inotify can have a watchs removed under filesystem reclaim.
=================================
[ INFO: inconsistent lock state ]
2.6.31-rc2 #16
---------------------------------
inconsistent {IN-RECLAIM_FS-W} -> {RECLAIM_FS-ON-W} usage.
khubd/217 [HC0[0]:SC0[0]:HE1:SE1] takes:
(iprune_mutex){+.+.?.}, at: [<c10ba899>] invalidate_inodes+0x20/0xe3
{IN-RECLAIM_FS-W} state was registered at:
[<c10536ab>] __lock_acquire+0x2c9/0xac4
[<c1053f45>] lock_acquire+0x9f/0xc2
[<c1308872>] __mutex_lock_common+0x2d/0x323
[<c1308c00>] mutex_lock_nested+0x2e/0x36
[<c10ba6ff>] shrink_icache_memory+0x38/0x1b2
[<c108bfb6>] shrink_slab+0xe2/0x13c
[<c108c3e1>] kswapd+0x3d1/0x55d
[<c10449b5>] kthread+0x66/0x6b
[<c1003fdf>] kernel_thread_helper+0x7/0x10
[<ffffffff>] 0xffffffff
Two things are needed to fix this. First we need a method to tell
fsnotify_create_event() to use GFP_NOFS and second we need to stop using
one global IN_IGNORED event and allocate them one at a time. This solves
current issues with multiple IN_IGNORED on a queue having tail drop
problems and simplifies the allocations since we don't have to worry about
two tasks opperating on the IGNORED event concurrently.
Signed-off-by: Eric Paris <eparis@redhat.com>
fsnotify doens't give the user anything. If someone chooses inotify or
dnotify it should build fsnotify, if they don't select one it shouldn't be
built. This patch changes fsnotify to be a def_bool=n and makes everything
else select it. Also fixes the issue people complained about on lwn where
gdm hung because they didn't have inotify and they didn't get the inotify
build option.....
Signed-off-by: Eric Paris <eparis@redhat.com>
inotify_update_watch could leave things in a horrid state on a number of
error paths. We could try to remove idr entries that didn't exist, we
could send an IN_IGNORED to userspace for watches that don't exist, and a
bit of other stupidity. Clean these up by doing the idr addition before we
put the mark on the inode since we can clean that up on error and getting
off the inode's mark list is hard.
Signed-off-by: Eric Paris <eparis@redhat.com>
inotify_add_watch had a couple of problems. The biggest being that if
inotify_add_watch was called on the same inode twice (to update or change the
event mask) a refence was taken on the original inode mark by
fsnotify_find_mark_entry but was not being dropped at the end of the
inotify_add_watch call. Thus if inotify_rm_watch was called although the mark
was removed from the inode, the refcnt wouldn't hit zero and we would leak
memory.
Reported-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
The inotify rewrite forgot to drop the inotify watch use cound when a watch
was removed. This means that a single inotify fd can only ever register a
maximum of /proc/sys/fs/max_user_watches even if some of those had been
freed.
Signed-off-by: Eric Paris <eparis@redhat.com>