kernel_optimize_test/kernel
Kees Cook 800179c9b8 fs: add link restrictions
This adds symlink and hardlink restrictions to the Linux VFS.

Symlinks:

A long-standing class of security issues is the symlink-based
time-of-check-time-of-use race, most commonly seen in world-writable
directories like /tmp. The common method of exploitation of this flaw
is to cross privilege boundaries when following a given symlink (i.e. a
root process follows a symlink belonging to another user). For a likely
incomplete list of hundreds of examples across the years, please see:
http://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=/tmp

The solution is to permit symlinks to only be followed when outside
a sticky world-writable directory, or when the uid of the symlink and
follower match, or when the directory owner matches the symlink's owner.

Some pointers to the history of earlier discussion that I could find:

 1996 Aug, Zygo Blaxell
  http://marc.info/?l=bugtraq&m=87602167419830&w=2
 1996 Oct, Andrew Tridgell
  http://lkml.indiana.edu/hypermail/linux/kernel/9610.2/0086.html
 1997 Dec, Albert D Cahalan
  http://lkml.org/lkml/1997/12/16/4
 2005 Feb, Lorenzo Hernández García-Hierro
  http://lkml.indiana.edu/hypermail/linux/kernel/0502.0/1896.html
 2010 May, Kees Cook
  https://lkml.org/lkml/2010/5/30/144

Past objections and rebuttals could be summarized as:

 - Violates POSIX.
   - POSIX didn't consider this situation and it's not useful to follow
     a broken specification at the cost of security.
 - Might break unknown applications that use this feature.
   - Applications that break because of the change are easy to spot and
     fix. Applications that are vulnerable to symlink ToCToU by not having
     the change aren't. Additionally, no applications have yet been found
     that rely on this behavior.
 - Applications should just use mkstemp() or O_CREATE|O_EXCL.
   - True, but applications are not perfect, and new software is written
     all the time that makes these mistakes; blocking this flaw at the
     kernel is a single solution to the entire class of vulnerability.
 - This should live in the core VFS.
   - This should live in an LSM. (https://lkml.org/lkml/2010/5/31/135)
 - This should live in an LSM.
   - This should live in the core VFS. (https://lkml.org/lkml/2010/8/2/188)

Hardlinks:

On systems that have user-writable directories on the same partition
as system files, a long-standing class of security issues is the
hardlink-based time-of-check-time-of-use race, most commonly seen in
world-writable directories like /tmp. The common method of exploitation
of this flaw is to cross privilege boundaries when following a given
hardlink (i.e. a root process follows a hardlink created by another
user). Additionally, an issue exists where users can "pin" a potentially
vulnerable setuid/setgid file so that an administrator will not actually
upgrade a system fully.

The solution is to permit hardlinks to only be created when the user is
already the existing file's owner, or if they already have read/write
access to the existing file.

Many Linux users are surprised when they learn they can link to files
they have no access to, so this change appears to follow the doctrine
of "least surprise". Additionally, this change does not violate POSIX,
which states "the implementation may require that the calling process
has permission to access the existing file"[1].

This change is known to break some implementations of the "at" daemon,
though the version used by Fedora and Ubuntu has been fixed[2] for
a while. Otherwise, the change has been undisruptive while in use in
Ubuntu for the last 1.5 years.

[1] http://pubs.opengroup.org/onlinepubs/9699919799/functions/linkat.html
[2] http://anonscm.debian.org/gitweb/?p=collab-maint/at.git;a=commitdiff;h=f4114656c3a6c6f6070e315ffdf940a49eda3279

This patch is based on the patches in Openwall and grsecurity, along with
suggestions from Al Viro. I have added a sysctl to enable the protected
behavior, and documentation.

Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-29 21:37:58 +04:00
..
debug KGDB/KDB regression fixes 2012-04-04 17:26:08 -07:00
events perf: Use css_tryget() to avoid propping up css refcount 2012-06-18 11:45:57 +02:00
gcov
irq merge task_work and rcu_head, get rid of separate allocation for keyring case 2012-07-22 23:57:56 +04:00
power PM / Hibernate: Use get_gendisk to verify partition if resume_file is integer format 2012-05-18 20:44:59 +02:00
sched Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2012-06-08 14:59:29 -07:00
time timekeeping: Provide hrtimer update function 2012-07-11 23:34:39 +02:00
trace Merge branch 'for-linus' of git://git.kernel.dk/linux-block 2012-07-03 15:45:10 -07:00
.gitignore
acct.c
async.c
audit_tree.c VFS: Make clone_mnt()/copy_tree()/collect_mounts() return errors 2012-07-14 16:37:27 +04:00
audit_watch.c get rid of kern_path_parent() 2012-07-14 16:35:02 +04:00
audit.c
audit.h
auditfilter.c
auditsc.c seccomp: remove duplicated failure logging 2012-04-14 11:13:20 +10:00
backtracetest.c
bounds.c
capability.c userns: Teach inode_capable to understand inodes whose uids map to other namespaces. 2012-05-15 14:59:24 -07:00
cgroup_freezer.c cgroup: convert all non-memcg controllers to the new cftype interface 2012-04-01 12:09:55 -07:00
cgroup.c VFS: Pass mount flags to sget() 2012-07-14 16:38:34 +04:00
compat.c new helper: sigsuspend() 2012-05-21 23:52:30 -04:00
configs.c
cpu_pm.c kernel/cpu_pm.c: fix various typos 2012-05-31 17:49:27 -07:00
cpu.c kernel/cpu.c: document clear_tasks_mm_cpumask() 2012-05-31 17:49:30 -07:00
cpuset.c Merge branch 'for-3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup 2012-05-22 17:40:19 -07:00
crash_dump.c
cred.c keys: kill task_struct->replacement_session_keyring 2012-05-23 22:11:41 -04:00
delayacct.c
dma.c Remove all #inclusions of asm/system.h 2012-03-28 18:30:03 +01:00
elfcore.c
exec_domain.c
exit.c move exit_task_work() past exit_files() et.al. 2012-07-22 23:57:57 +04:00
extable.c extable: Skip sorting if sorted at build time. 2012-04-19 15:06:55 -07:00
fork.c trim task_work: get rid of hlist 2012-07-22 23:57:55 +04:00
freezer.c
futex_compat.c futex: Mark get_robust_list as deprecated 2012-03-29 11:37:17 +02:00
futex.c futex: Mark get_robust_list as deprecated 2012-03-29 11:37:17 +02:00
groups.c userns: Convert in_group_p and in_egroup_p to use kgid_t 2012-05-03 03:29:33 -07:00
hrtimer.c hrtimer: Update hrtimer base offsets each hrtimer_interrupt 2012-07-11 23:34:39 +02:00
hung_task.c hung task debugging: Inject NMI when hung and going to panic 2012-04-25 12:39:25 +02:00
irq_work.c irq_work: fix compile failure on tile from missing include 2012-04-13 13:15:16 -04:00
itimer.c itimer: Use printk_once instead of WARN_ONCE 2012-04-10 11:00:30 +02:00
jump_label.c
kallsyms.c vsprintf: fix %ps on non symbols when using kallsyms 2012-05-29 16:22:32 -07:00
kcmp.c syscalls, x86: add __NR_kcmp syscall 2012-05-31 17:49:32 -07:00
Kconfig.freezer
Kconfig.hz
Kconfig.locks locking/kconfig: Simplify INLINE_SPIN_UNLOCK usage 2012-03-23 13:18:57 +01:00
Kconfig.preempt locking/kconfig: Simplify INLINE_SPIN_UNLOCK usage 2012-03-23 13:18:57 +01:00
kexec.c Merge branch 'akpm' (Andrew's patch-bomb) 2012-03-28 17:19:28 -07:00
kfifo.c [media] kernel:kfifo: export __kfifo_max_r symbol 2012-04-11 18:24:37 -03:00
kmod.c kmod.c: fix kernel-doc warning 2012-05-31 17:49:28 -07:00
kprobes.c
ksysfs.c
kthread.c
latencytop.c
lglock.c brlocks/lglocks: turn into functions 2012-05-29 23:28:41 -04:00
lockdep_internals.h
lockdep_proc.c
lockdep_states.h
lockdep.c
Makefile Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-06-01 10:34:35 -07:00
module.c Guard check in module loader against integer overflow 2012-05-23 22:28:53 +09:30
mutex-debug.c
mutex-debug.h
mutex.c
mutex.h
notifier.c
nsproxy.c
padata.c padata: Fix cpu hotplug 2012-03-29 19:52:46 +08:00
panic.c kdump: Execute kmsg_dump(KMSG_DUMP_PANIC) after smp_send_stop() 2012-05-18 14:02:10 +02:00
params.c params: replace printk(KERN_<LVL>...) with pr_<lvl>(...) 2012-05-04 17:28:18 -07:00
pid_namespace.c pidns: guarantee that the pidns init will be the last pidns process reaped 2012-06-20 14:39:36 -07:00
pid.c mm: add a low limit to alloc_large_system_hash 2012-05-24 00:28:21 -04:00
posix-cpu-timers.c
posix-timers.c
printk.c kmsg: merge continuation records while printing 2012-07-09 12:15:42 -07:00
profile.c
ptrace.c userns: Convert ptrace, kill, set_priority permission checks to work with kuids and kgids 2012-05-03 03:28:51 -07:00
range.c
rcu.h
rcupdate.c rcu: Make exit_rcu() more precise and consolidate 2012-05-02 14:48:27 -07:00
rcutiny_plugin.h rcu: Make exit_rcu() more precise and consolidate 2012-05-02 14:48:27 -07:00
rcutiny.c
rcutorture.c rcu: Add rcutorture test for call_srcu() 2012-04-30 10:48:26 -07:00
rcutree_plugin.h rcu: Precompute RCU_FAST_NO_HZ timer offsets 2012-06-06 20:43:28 -07:00
rcutree_trace.c rcu: Make rcu_barrier() less disruptive 2012-05-09 14:27:54 -07:00
rcutree.c rcu: Stop rcu_do_batch() from multiplexing the "count" variable 2012-06-25 12:35:25 -07:00
rcutree.h rcu: Move RCU_FAST_NO_HZ per-CPU variables to rcu_dynticks structure 2012-06-06 20:43:28 -07:00
relay.c splice: fix racy pipe->buffers uses 2012-06-13 21:16:42 +02:00
res_counter.c rescounters: add res_counter_uncharge_until() 2012-05-29 16:22:27 -07:00
resource.c kernel/resource.c: correct the comment of allocate_resource() 2012-05-31 17:49:26 -07:00
rtmutex_common.h
rtmutex-debug.c
rtmutex-debug.h
rtmutex-tester.c
rtmutex.c
rtmutex.h
rwsem.c Remove all #inclusions of asm/system.h 2012-03-28 18:30:03 +01:00
seccomp.c seccomp: fix build warnings when there is no CONFIG_SECCOMP_FILTER 2012-04-18 12:24:52 +10:00
semaphore.c semaphore: fix improper comment reference to mutex 2012-04-05 17:15:55 -07:00
signal.c signal: make sure we don't get stopped with pending task_work 2012-07-22 23:57:54 +04:00
smp.c smp: Implement kick_all_cpus_sync() 2012-05-08 12:35:06 +02:00
smpboot.c smpboot, idle: Fix comment mismatch over idle_threads_init() 2012-05-24 22:58:08 +02:00
smpboot.h smp: Fix idle_thread_init() inline stub 2012-05-04 12:52:25 +02:00
softirq.c
spinlock.c locking/kconfig: Simplify INLINE_SPIN_UNLOCK usage 2012-03-23 13:18:57 +01:00
srcu.c rcu: Implement per-domain single-threaded call_srcu() state machine 2012-04-30 10:48:25 -07:00
stacktrace.c
stop_machine.c
sys_ni.c syscalls, x86: add __NR_kcmp syscall 2012-05-31 17:49:32 -07:00
sys.c c/r: prctl: less paranoid prctl_set_mm_exe_file() 2012-07-11 16:04:43 -07:00
sysctl_binary.c
sysctl.c fs: add link restrictions 2012-07-29 21:37:58 +04:00
task_work.c deal with task_work callbacks adding more work 2012-07-22 23:57:57 +04:00
taskstats.c
test_kprobes.c
time.c
timeconst.pl
timer.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2012-05-23 17:42:39 -07:00
tracepoint.c
tsacct.c
uid16.c userns: Convert setting and getting uid and gid system calls to use kuid and kgid 2012-05-03 03:28:41 -07:00
up.c
user_namespace.c userns: Store uid and gid values in struct cred with kuid_t and kgid_t types 2012-05-03 03:28:38 -07:00
user-return-notifier.c
user.c userns: Silence silly gcc warning. 2012-05-19 15:44:40 -06:00
utsname_sysctl.c
utsname.c userns: Use cred->user_ns instead of cred->user->user_ns 2012-04-07 16:55:51 -07:00
wait.c
watchdog.c watchdog: Quiet down the boot messages 2012-06-14 12:20:50 +02:00
workqueue_sched.h
workqueue.c lockdep: fix oops in processing workqueue 2012-05-15 08:08:31 -07:00