kernel_optimize_test/arch/Kconfig
Josh Triplett 3033f14ab7 clone: support passing tls argument via C rather than pt_regs magic
clone has some of the quirkiest syscall handling in the kernel, with a
pile of special cases, historical curiosities, and architecture-specific
calling conventions.  In particular, clone with CLONE_SETTLS accepts a
parameter "tls" that the C entry point completely ignores and some
assembly entry points overwrite; instead, the low-level arch-specific
code pulls the tls parameter out of the arch-specific register captured
as part of pt_regs on entry to the kernel.  That's a massive hack, and
it makes the arch-specific code only work when called via the specific
existing syscall entry points; because of this hack, any new clone-like
system call would have to accept an identical tls argument in exactly
the same arch-specific position, rather than providing a unified system
call entry point across architectures.

The first patch allows architectures to handle the tls argument via
normal C parameter passing, if they opt in by selecting
HAVE_COPY_THREAD_TLS.  The second patch makes 32-bit and 64-bit x86 opt
into this.

These two patches came out of the clone4 series, which isn't ready for
this merge window, but these first two cleanup patches were entirely
uncontroversial and have acks.  I'd like to go ahead and submit these
two so that other architectures can begin building on top of this and
opting into HAVE_COPY_THREAD_TLS.  However, I'm also happy to wait and
send these through the next merge window (along with v3 of clone4) if
anyone would prefer that.

This patch (of 2):

clone with CLONE_SETTLS accepts an argument to set the thread-local
storage area for the new thread.  sys_clone declares an int argument
tls_val in the appropriate point in the argument list (based on the
various CLONE_BACKWARDS variants), but doesn't actually use or pass along
that argument.  Instead, sys_clone calls do_fork, which calls
copy_process, which calls the arch-specific copy_thread, and copy_thread
pulls the corresponding syscall argument out of the pt_regs captured at
kernel entry (knowing what argument of clone that architecture passes tls
in).

Apart from being awful and inscrutable, that also only works because only
one code path into copy_thread can pass the CLONE_SETTLS flag, and that
code path comes from sys_clone with its architecture-specific
argument-passing order.  This prevents introducing a new version of the
clone system call without propagating the same architecture-specific
position of the tls argument.

However, there's no reason to pull the argument out of pt_regs when
sys_clone could just pass it down via C function call arguments.

Introduce a new CONFIG_HAVE_COPY_THREAD_TLS for architectures to opt into,
and a new copy_thread_tls that accepts the tls parameter as an additional
unsigned long (syscall-argument-sized) argument.  Change sys_clone's tls
argument to an unsigned long (which does not change the ABI), and pass
that down to copy_thread_tls.

Architectures that don't opt into copy_thread_tls will continue to ignore
the C argument to sys_clone in favor of the pt_regs captured at kernel
entry, and thus will be unable to introduce new versions of the clone
syscall.

Patch co-authored by Josh Triplett and Thiago Macieira.

Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thiago Macieira <thiago.macieira@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 17:00:38 -07:00

556 lines
16 KiB
Plaintext

#
# General architecture dependent options
#
config OPROFILE
tristate "OProfile system profiling"
depends on PROFILING
depends on HAVE_OPROFILE
select RING_BUFFER
select RING_BUFFER_ALLOW_SWAP
help
OProfile is a profiling system capable of profiling the
whole system, include the kernel, kernel modules, libraries,
and applications.
If unsure, say N.
config OPROFILE_EVENT_MULTIPLEX
bool "OProfile multiplexing support (EXPERIMENTAL)"
default n
depends on OPROFILE && X86
help
The number of hardware counters is limited. The multiplexing
feature enables OProfile to gather more events than counters
are provided by the hardware. This is realized by switching
between events at an user specified time interval.
If unsure, say N.
config HAVE_OPROFILE
bool
config OPROFILE_NMI_TIMER
def_bool y
depends on PERF_EVENTS && HAVE_PERF_EVENTS_NMI && !PPC64
config KPROBES
bool "Kprobes"
depends on MODULES
depends on HAVE_KPROBES
select KALLSYMS
help
Kprobes allows you to trap at almost any kernel address and
execute a callback function. register_kprobe() establishes
a probepoint and specifies the callback. Kprobes is useful
for kernel debugging, non-intrusive instrumentation and testing.
If in doubt, say "N".
config JUMP_LABEL
bool "Optimize very unlikely/likely branches"
depends on HAVE_ARCH_JUMP_LABEL
help
This option enables a transparent branch optimization that
makes certain almost-always-true or almost-always-false branch
conditions even cheaper to execute within the kernel.
Certain performance-sensitive kernel code, such as trace points,
scheduler functionality, networking code and KVM have such
branches and include support for this optimization technique.
If it is detected that the compiler has support for "asm goto",
the kernel will compile such branches with just a nop
instruction. When the condition flag is toggled to true, the
nop will be converted to a jump instruction to execute the
conditional block of instructions.
This technique lowers overhead and stress on the branch prediction
of the processor and generally makes the kernel faster. The update
of the condition is slower, but those are always very rare.
( On 32-bit x86, the necessary options added to the compiler
flags may increase the size of the kernel slightly. )
config OPTPROBES
def_bool y
depends on KPROBES && HAVE_OPTPROBES
depends on !PREEMPT
config KPROBES_ON_FTRACE
def_bool y
depends on KPROBES && HAVE_KPROBES_ON_FTRACE
depends on DYNAMIC_FTRACE_WITH_REGS
help
If function tracer is enabled and the arch supports full
passing of pt_regs to function tracing, then kprobes can
optimize on top of function tracing.
config UPROBES
def_bool n
select PERCPU_RWSEM
help
Uprobes is the user-space counterpart to kprobes: they
enable instrumentation applications (such as 'perf probe')
to establish unintrusive probes in user-space binaries and
libraries, by executing handler functions when the probes
are hit by user-space applications.
( These probes come in the form of single-byte breakpoints,
managed by the kernel and kept transparent to the probed
application. )
config HAVE_64BIT_ALIGNED_ACCESS
def_bool 64BIT && !HAVE_EFFICIENT_UNALIGNED_ACCESS
help
Some architectures require 64 bit accesses to be 64 bit
aligned, which also requires structs containing 64 bit values
to be 64 bit aligned too. This includes some 32 bit
architectures which can do 64 bit accesses, as well as 64 bit
architectures without unaligned access.
This symbol should be selected by an architecture if 64 bit
accesses are required to be 64 bit aligned in this way even
though it is not a 64 bit architecture.
See Documentation/unaligned-memory-access.txt for more
information on the topic of unaligned memory accesses.
config HAVE_EFFICIENT_UNALIGNED_ACCESS
bool
help
Some architectures are unable to perform unaligned accesses
without the use of get_unaligned/put_unaligned. Others are
unable to perform such accesses efficiently (e.g. trap on
unaligned access and require fixing it up in the exception
handler.)
This symbol should be selected by an architecture if it can
perform unaligned accesses efficiently to allow different
code paths to be selected for these cases. Some network
drivers, for example, could opt to not fix up alignment
problems with received packets if doing so would not help
much.
See Documentation/unaligned-memory-access.txt for more
information on the topic of unaligned memory accesses.
config ARCH_USE_BUILTIN_BSWAP
bool
help
Modern versions of GCC (since 4.4) have builtin functions
for handling byte-swapping. Using these, instead of the old
inline assembler that the architecture code provides in the
__arch_bswapXX() macros, allows the compiler to see what's
happening and offers more opportunity for optimisation. In
particular, the compiler will be able to combine the byteswap
with a nearby load or store and use load-and-swap or
store-and-swap instructions if the architecture has them. It
should almost *never* result in code which is worse than the
hand-coded assembler in <asm/swab.h>. But just in case it
does, the use of the builtins is optional.
Any architecture with load-and-swap or store-and-swap
instructions should set this. And it shouldn't hurt to set it
on architectures that don't have such instructions.
config KRETPROBES
def_bool y
depends on KPROBES && HAVE_KRETPROBES
config USER_RETURN_NOTIFIER
bool
depends on HAVE_USER_RETURN_NOTIFIER
help
Provide a kernel-internal notification when a cpu is about to
switch to user mode.
config HAVE_IOREMAP_PROT
bool
config HAVE_KPROBES
bool
config HAVE_KRETPROBES
bool
config HAVE_OPTPROBES
bool
config HAVE_KPROBES_ON_FTRACE
bool
config HAVE_NMI_WATCHDOG
bool
#
# An arch should select this if it provides all these things:
#
# task_pt_regs() in asm/processor.h or asm/ptrace.h
# arch_has_single_step() if there is hardware single-step support
# arch_has_block_step() if there is hardware block-step support
# asm/syscall.h supplying asm-generic/syscall.h interface
# linux/regset.h user_regset interfaces
# CORE_DUMP_USE_REGSET #define'd in linux/elf.h
# TIF_SYSCALL_TRACE calls tracehook_report_syscall_{entry,exit}
# TIF_NOTIFY_RESUME calls tracehook_notify_resume()
# signal delivery calls tracehook_signal_handler()
#
config HAVE_ARCH_TRACEHOOK
bool
config HAVE_DMA_ATTRS
bool
config HAVE_DMA_CONTIGUOUS
bool
config GENERIC_SMP_IDLE_THREAD
bool
config GENERIC_IDLE_POLL_SETUP
bool
# Select if arch init_task initializer is different to init/init_task.c
config ARCH_INIT_TASK
bool
# Select if arch has its private alloc_task_struct() function
config ARCH_TASK_STRUCT_ALLOCATOR
bool
# Select if arch has its private alloc_thread_info() function
config ARCH_THREAD_INFO_ALLOCATOR
bool
config HAVE_REGS_AND_STACK_ACCESS_API
bool
help
This symbol should be selected by an architecure if it supports
the API needed to access registers and stack entries from pt_regs,
declared in asm/ptrace.h
For example the kprobes-based event tracer needs this API.
config HAVE_CLK
bool
help
The <linux/clk.h> calls support software clock gating and
thus are a key power management tool on many systems.
config HAVE_DMA_API_DEBUG
bool
config HAVE_HW_BREAKPOINT
bool
depends on PERF_EVENTS
config HAVE_MIXED_BREAKPOINTS_REGS
bool
depends on HAVE_HW_BREAKPOINT
help
Depending on the arch implementation of hardware breakpoints,
some of them have separate registers for data and instruction
breakpoints addresses, others have mixed registers to store
them but define the access type in a control register.
Select this option if your arch implements breakpoints under the
latter fashion.
config HAVE_USER_RETURN_NOTIFIER
bool
config HAVE_PERF_EVENTS_NMI
bool
help
System hardware can generate an NMI using the perf event
subsystem. Also has support for calculating CPU cycle events
to determine how many clock cycles in a given period.
config HAVE_PERF_REGS
bool
help
Support selective register dumps for perf events. This includes
bit-mapping of each registers and a unique architecture id.
config HAVE_PERF_USER_STACK_DUMP
bool
help
Support user stack dumps for perf event samples. This needs
access to the user stack pointer which is not unified across
architectures.
config HAVE_ARCH_JUMP_LABEL
bool
config HAVE_RCU_TABLE_FREE
bool
config ARCH_HAVE_NMI_SAFE_CMPXCHG
bool
config HAVE_ALIGNED_STRUCT_PAGE
bool
help
This makes sure that struct pages are double word aligned and that
e.g. the SLUB allocator can perform double word atomic operations
on a struct page for better performance. However selecting this
might increase the size of a struct page by a word.
config HAVE_CMPXCHG_LOCAL
bool
config HAVE_CMPXCHG_DOUBLE
bool
config ARCH_WANT_IPC_PARSE_VERSION
bool
config ARCH_WANT_COMPAT_IPC_PARSE_VERSION
bool
config ARCH_WANT_OLD_COMPAT_IPC
select ARCH_WANT_COMPAT_IPC_PARSE_VERSION
bool
config HAVE_ARCH_SECCOMP_FILTER
bool
help
An arch should select this symbol if it provides all of these things:
- syscall_get_arch()
- syscall_get_arguments()
- syscall_rollback()
- syscall_set_return_value()
- SIGSYS siginfo_t support
- secure_computing is called from a ptrace_event()-safe context
- secure_computing return value is checked and a return value of -1
results in the system call being skipped immediately.
- seccomp syscall wired up
For best performance, an arch should use seccomp_phase1 and
seccomp_phase2 directly. It should call seccomp_phase1 for all
syscalls if TIF_SECCOMP is set, but seccomp_phase1 does not
need to be called from a ptrace-safe context. It must then
call seccomp_phase2 if seccomp_phase1 returns anything other
than SECCOMP_PHASE1_OK or SECCOMP_PHASE1_SKIP.
As an additional optimization, an arch may provide seccomp_data
directly to seccomp_phase1; this avoids multiple calls
to the syscall_xyz helpers for every syscall.
config SECCOMP_FILTER
def_bool y
depends on HAVE_ARCH_SECCOMP_FILTER && SECCOMP && NET
help
Enable tasks to build secure computing environments defined
in terms of Berkeley Packet Filter programs which implement
task-defined system call filtering polices.
See Documentation/prctl/seccomp_filter.txt for details.
config HAVE_CC_STACKPROTECTOR
bool
help
An arch should select this symbol if:
- its compiler supports the -fstack-protector option
- it has implemented a stack canary (e.g. __stack_chk_guard)
config CC_STACKPROTECTOR
def_bool n
help
Set when a stack-protector mode is enabled, so that the build
can enable kernel-side support for the GCC feature.
choice
prompt "Stack Protector buffer overflow detection"
depends on HAVE_CC_STACKPROTECTOR
default CC_STACKPROTECTOR_NONE
help
This option turns on the "stack-protector" GCC feature. This
feature puts, at the beginning of functions, a canary value on
the stack just before the return address, and validates
the value just before actually returning. Stack based buffer
overflows (that need to overwrite this return address) now also
overwrite the canary, which gets detected and the attack is then
neutralized via a kernel panic.
config CC_STACKPROTECTOR_NONE
bool "None"
help
Disable "stack-protector" GCC feature.
config CC_STACKPROTECTOR_REGULAR
bool "Regular"
select CC_STACKPROTECTOR
help
Functions will have the stack-protector canary logic added if they
have an 8-byte or larger character array on the stack.
This feature requires gcc version 4.2 or above, or a distribution
gcc with the feature backported ("-fstack-protector").
On an x86 "defconfig" build, this feature adds canary checks to
about 3% of all kernel functions, which increases kernel code size
by about 0.3%.
config CC_STACKPROTECTOR_STRONG
bool "Strong"
select CC_STACKPROTECTOR
help
Functions will have the stack-protector canary logic added in any
of the following conditions:
- local variable's address used as part of the right hand side of an
assignment or function argument
- local variable is an array (or union containing an array),
regardless of array type or length
- uses register local variables
This feature requires gcc version 4.9 or above, or a distribution
gcc with the feature backported ("-fstack-protector-strong").
On an x86 "defconfig" build, this feature adds canary checks to
about 20% of all kernel functions, which increases the kernel code
size by about 2%.
endchoice
config HAVE_CONTEXT_TRACKING
bool
help
Provide kernel/user boundaries probes necessary for subsystems
that need it, such as userspace RCU extended quiescent state.
Syscalls need to be wrapped inside user_exit()-user_enter() through
the slow path using TIF_NOHZ flag. Exceptions handlers must be
wrapped as well. Irqs are already protected inside
rcu_irq_enter/rcu_irq_exit() but preemption or signal handling on
irq exit still need to be protected.
config HAVE_VIRT_CPU_ACCOUNTING
bool
config HAVE_VIRT_CPU_ACCOUNTING_GEN
bool
default y if 64BIT
help
With VIRT_CPU_ACCOUNTING_GEN, cputime_t becomes 64-bit.
Before enabling this option, arch code must be audited
to ensure there are no races in concurrent read/write of
cputime_t. For example, reading/writing 64-bit cputime_t on
some 32-bit arches may require multiple accesses, so proper
locking is needed to protect against concurrent accesses.
config HAVE_IRQ_TIME_ACCOUNTING
bool
help
Archs need to ensure they use a high enough resolution clock to
support irq time accounting and then call enable_sched_clock_irqtime().
config HAVE_ARCH_TRANSPARENT_HUGEPAGE
bool
config HAVE_ARCH_HUGE_VMAP
bool
config HAVE_ARCH_SOFT_DIRTY
bool
config HAVE_MOD_ARCH_SPECIFIC
bool
help
The arch uses struct mod_arch_specific to store data. Many arches
just need a simple module loader without arch specific data - those
should not enable this.
config MODULES_USE_ELF_RELA
bool
help
Modules only use ELF RELA relocations. Modules with ELF REL
relocations will give an error.
config MODULES_USE_ELF_REL
bool
help
Modules only use ELF REL relocations. Modules with ELF RELA
relocations will give an error.
config HAVE_UNDERSCORE_SYMBOL_PREFIX
bool
help
Some architectures generate an _ in front of C symbols; things like
module loading and assembly files need to know about this.
config HAVE_IRQ_EXIT_ON_IRQ_STACK
bool
help
Architecture doesn't only execute the irq handler on the irq stack
but also irq_exit(). This way we can process softirqs on this irq
stack instead of switching to a new one when we call __do_softirq()
in the end of an hardirq.
This spares a stack switch and improves cache usage on softirq
processing.
config PGTABLE_LEVELS
int
default 2
config ARCH_HAS_ELF_RANDOMIZE
bool
help
An architecture supports choosing randomized locations for
stack, mmap, brk, and ET_DYN. Defined functions:
- arch_mmap_rnd()
- arch_randomize_brk()
config HAVE_COPY_THREAD_TLS
bool
help
Architecture provides copy_thread_tls to accept tls argument via
normal C parameter passing, rather than extracting the syscall
argument from pt_regs.
#
# ABI hall of shame
#
config CLONE_BACKWARDS
bool
help
Architecture has tls passed as the 4th argument of clone(2),
not the 5th one.
config CLONE_BACKWARDS2
bool
help
Architecture has the first two arguments of clone(2) swapped.
config CLONE_BACKWARDS3
bool
help
Architecture has tls passed as the 3rd argument of clone(2),
not the 5th one.
config ODD_RT_SIGACTION
bool
help
Architecture has unusual rt_sigaction(2) arguments
config OLD_SIGSUSPEND
bool
help
Architecture has old sigsuspend(2) syscall, of one-argument variety
config OLD_SIGSUSPEND3
bool
help
Even weirder antique ABI - three-argument sigsuspend(2)
config OLD_SIGACTION
bool
help
Architecture has old sigaction(2) syscall. Nope, not the same
as OLD_SIGSUSPEND | OLD_SIGSUSPEND3 - alpha has sigsuspend(2),
but fairly different variant of sigaction(2), thanks to OSF/1
compatibility...
config COMPAT_OLD_SIGACTION
bool
source "kernel/gcov/Kconfig"