kernel_optimize_test/init/Kconfig
Christoph Lameter 81819f0fc8 SLUB core
This is a new slab allocator which was motivated by the complexity of the
existing code in mm/slab.c. It attempts to address a variety of concerns
with the existing implementation.

A. Management of object queues

   A particular concern was the complex management of the numerous object
   queues in SLAB. SLUB has no such queues. Instead we dedicate a slab for
   each allocating CPU and use objects from a slab directly instead of
   queueing them up.

B. Storage overhead of object queues

   SLAB Object queues exist per node, per CPU. The alien cache queue even
   has a queue array that contain a queue for each processor on each
   node. For very large systems the number of queues and the number of
   objects that may be caught in those queues grows exponentially. On our
   systems with 1k nodes / processors we have several gigabytes just tied up
   for storing references to objects for those queues  This does not include
   the objects that could be on those queues. One fears that the whole
   memory of the machine could one day be consumed by those queues.

C. SLAB meta data overhead

   SLAB has overhead at the beginning of each slab. This means that data
   cannot be naturally aligned at the beginning of a slab block. SLUB keeps
   all meta data in the corresponding page_struct. Objects can be naturally
   aligned in the slab. F.e. a 128 byte object will be aligned at 128 byte
   boundaries and can fit tightly into a 4k page with no bytes left over.
   SLAB cannot do this.

D. SLAB has a complex cache reaper

   SLUB does not need a cache reaper for UP systems. On SMP systems
   the per CPU slab may be pushed back into partial list but that
   operation is simple and does not require an iteration over a list
   of objects. SLAB expires per CPU, shared and alien object queues
   during cache reaping which may cause strange hold offs.

E. SLAB has complex NUMA policy layer support

   SLUB pushes NUMA policy handling into the page allocator. This means that
   allocation is coarser (SLUB does interleave on a page level) but that
   situation was also present before 2.6.13. SLABs application of
   policies to individual slab objects allocated in SLAB is
   certainly a performance concern due to the frequent references to
   memory policies which may lead a sequence of objects to come from
   one node after another. SLUB will get a slab full of objects
   from one node and then will switch to the next.

F. Reduction of the size of partial slab lists

   SLAB has per node partial lists. This means that over time a large
   number of partial slabs may accumulate on those lists. These can
   only be reused if allocator occur on specific nodes. SLUB has a global
   pool of partial slabs and will consume slabs from that pool to
   decrease fragmentation.

G. Tunables

   SLAB has sophisticated tuning abilities for each slab cache. One can
   manipulate the queue sizes in detail. However, filling the queues still
   requires the uses of the spin lock to check out slabs. SLUB has a global
   parameter (min_slab_order) for tuning. Increasing the minimum slab
   order can decrease the locking overhead. The bigger the slab order the
   less motions of pages between per CPU and partial lists occur and the
   better SLUB will be scaling.

G. Slab merging

   We often have slab caches with similar parameters. SLUB detects those
   on boot up and merges them into the corresponding general caches. This
   leads to more effective memory use. About 50% of all caches can
   be eliminated through slab merging. This will also decrease
   slab fragmentation because partial allocated slabs can be filled
   up again. Slab merging can be switched off by specifying
   slub_nomerge on boot up.

   Note that merging can expose heretofore unknown bugs in the kernel
   because corrupted objects may now be placed differently and corrupt
   differing neighboring objects. Enable sanity checks to find those.

H. Diagnostics

   The current slab diagnostics are difficult to use and require a
   recompilation of the kernel. SLUB contains debugging code that
   is always available (but is kept out of the hot code paths).
   SLUB diagnostics can be enabled via the "slab_debug" option.
   Parameters can be specified to select a single or a group of
   slab caches for diagnostics. This means that the system is running
   with the usual performance and it is much more likely that
   race conditions can be reproduced.

I. Resiliency

   If basic sanity checks are on then SLUB is capable of detecting
   common error conditions and recover as best as possible to allow the
   system to continue.

J. Tracing

   Tracing can be enabled via the slab_debug=T,<slabcache> option
   during boot. SLUB will then protocol all actions on that slabcache
   and dump the object contents on free.

K. On demand DMA cache creation.

   Generally DMA caches are not needed. If a kmalloc is used with
   __GFP_DMA then just create this single slabcache that is needed.
   For systems that have no ZONE_DMA requirement the support is
   completely eliminated.

L. Performance increase

   Some benchmarks have shown speed improvements on kernbench in the
   range of 5-10%. The locking overhead of slub is based on the
   underlying base allocation size. If we can reliably allocate
   larger order pages then it is possible to increase slub
   performance much further. The anti-fragmentation patches may
   enable further performance increases.

Tested on:
i386 UP + SMP, x86_64 UP + SMP + NUMA emulation, IA64 NUMA + Simulator

SLUB Boot options

slub_nomerge		Disable merging of slabs
slub_min_order=x	Require a minimum order for slab caches. This
			increases the managed chunk size and therefore
			reduces meta data and locking overhead.
slub_min_objects=x	Mininum objects per slab. Default is 8.
slub_max_order=x	Avoid generating slabs larger than order specified.
slub_debug		Enable all diagnostics for all caches
slub_debug=<options>	Enable selective options for all caches
slub_debug=<o>,<cache>	Enable selective options for a certain set of
			caches

Available Debug options
F		Double Free checking, sanity and resiliency
R		Red zoning
P		Object / padding poisoning
U		Track last free / alloc
T		Trace all allocs / frees (only use for individual slabs).

To use SLUB: Apply this patch and then select SLUB as the default slab
allocator.

[hugh@veritas.com: fix an oops-causing locking error]
[akpm@linux-foundation.org: various stupid cleanups and small fixes]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07 12:12:53 -07:00

632 lines
22 KiB
Plaintext

config DEFCONFIG_LIST
string
depends on !UML
option defconfig_list
default "/lib/modules/$UNAME_RELEASE/.config"
default "/etc/kernel-config"
default "/boot/config-$UNAME_RELEASE"
default "arch/$ARCH/defconfig"
menu "Code maturity level options"
config EXPERIMENTAL
bool "Prompt for development and/or incomplete code/drivers"
---help---
Some of the various things that Linux supports (such as network
drivers, file systems, network protocols, etc.) can be in a state
of development where the functionality, stability, or the level of
testing is not yet high enough for general use. This is usually
known as the "alpha-test" phase among developers. If a feature is
currently in alpha-test, then the developers usually discourage
uninformed widespread use of this feature by the general public to
avoid "Why doesn't this work?" type mail messages. However, active
testing and use of these systems is welcomed. Just be aware that it
may not meet the normal level of reliability or it may fail to work
in some special cases. Detailed bug reports from people familiar
with the kernel internals are usually welcomed by the developers
(before submitting bug reports, please read the documents
<file:README>, <file:MAINTAINERS>, <file:REPORTING-BUGS>,
<file:Documentation/BUG-HUNTING>, and
<file:Documentation/oops-tracing.txt> in the kernel source).
This option will also make obsoleted drivers available. These are
drivers that have been replaced by something else, and/or are
scheduled to be removed in a future kernel release.
Unless you intend to help test and develop a feature or driver that
falls into this category, or you have a situation that requires
using these features, you should probably say N here, which will
cause the configurator to present you with fewer choices. If
you say Y here, you will be offered the choice of using features or
drivers that are currently considered to be in the alpha-test phase.
config BROKEN
bool
config BROKEN_ON_SMP
bool
depends on BROKEN || !SMP
default y
config LOCK_KERNEL
bool
depends on SMP || PREEMPT
default y
config INIT_ENV_ARG_LIMIT
int
default 32 if !UML
default 128 if UML
help
Maximum of each of the number of arguments and environment
variables passed to init from the kernel command line.
endmenu
menu "General setup"
config LOCALVERSION
string "Local version - append to kernel release"
help
Append an extra string to the end of your kernel version.
This will show up when you type uname, for example.
The string you set here will be appended after the contents of
any files with a filename matching localversion* in your
object and source tree, in that order. Your total string can
be a maximum of 64 characters.
config LOCALVERSION_AUTO
bool "Automatically append version information to the version string"
default y
help
This will try to automatically determine if the current tree is a
release tree by looking for git tags that belong to the current
top of tree revision.
A string of the format -gxxxxxxxx will be added to the localversion
if a git-based tree is found. The string generated by this will be
appended after any matching localversion* files, and after the value
set in CONFIG_LOCALVERSION.
(The actual string used here is the first eight characters produced
by running the command:
$ git rev-parse --verify HEAD
which is done within the script "scripts/setlocalversion".)
config SWAP
bool "Support for paging of anonymous memory (swap)"
depends on MMU && BLOCK
default y
help
This option allows you to choose whether you want to have support
for so called swap devices or swap files in your kernel that are
used to provide more virtual memory than the actual RAM present
in your computer. If unsure say Y.
config SYSVIPC
bool "System V IPC"
---help---
Inter Process Communication is a suite of library functions and
system calls which let processes (running programs) synchronize and
exchange information. It is generally considered to be a good thing,
and some programs won't run unless you say Y here. In particular, if
you want to run the DOS emulator dosemu under Linux (read the
DOSEMU-HOWTO, available from <http://www.tldp.org/docs.html#howto>),
you'll need to say Y here.
You can find documentation about IPC with "info ipc" and also in
section 6.4 of the Linux Programmer's Guide, available from
<http://www.tldp.org/guides.html>.
config IPC_NS
bool "IPC Namespaces"
depends on SYSVIPC
default n
help
Support ipc namespaces. This allows containers, i.e. virtual
environments, to use ipc namespaces to provide different ipc
objects for different servers. If unsure, say N.
config SYSVIPC_SYSCTL
bool
depends on SYSVIPC
depends on SYSCTL
default y
config POSIX_MQUEUE
bool "POSIX Message Queues"
depends on NET && EXPERIMENTAL
---help---
POSIX variant of message queues is a part of IPC. In POSIX message
queues every message has a priority which decides about succession
of receiving it by a process. If you want to compile and run
programs written e.g. for Solaris with use of its POSIX message
queues (functions mq_*) say Y here. To use this feature you will
also need mqueue library, available from
<http://www.mat.uni.torun.pl/~wrona/posix_ipc/>
POSIX message queues are visible as a filesystem called 'mqueue'
and can be mounted somewhere if you want to do filesystem
operations on message queues.
If unsure, say Y.
config BSD_PROCESS_ACCT
bool "BSD Process Accounting"
help
If you say Y here, a user level program will be able to instruct the
kernel (via a special system call) to write process accounting
information to a file: whenever a process exits, information about
that process will be appended to the file by the kernel. The
information includes things such as creation time, owning user,
command name, memory usage, controlling terminal etc. (the complete
list is in the struct acct in <file:include/linux/acct.h>). It is
up to the user level program to do useful things with this
information. This is generally a good idea, so say Y.
config BSD_PROCESS_ACCT_V3
bool "BSD Process Accounting version 3 file format"
depends on BSD_PROCESS_ACCT
default n
help
If you say Y here, the process accounting information is written
in a new file format that also logs the process IDs of each
process and it's parent. Note that this file format is incompatible
with previous v0/v1/v2 file formats, so you will need updated tools
for processing it. A preliminary version of these tools is available
at <http://www.physik3.uni-rostock.de/tim/kernel/utils/acct/>.
config TASKSTATS
bool "Export task/process statistics through netlink (EXPERIMENTAL)"
depends on NET
default n
help
Export selected statistics for tasks/processes through the
generic netlink interface. Unlike BSD process accounting, the
statistics are available during the lifetime of tasks/processes as
responses to commands. Like BSD accounting, they are sent to user
space on task exit.
Say N if unsure.
config TASK_DELAY_ACCT
bool "Enable per-task delay accounting (EXPERIMENTAL)"
depends on TASKSTATS
help
Collect information on time spent by a task waiting for system
resources like cpu, synchronous block I/O completion and swapping
in pages. Such statistics can help in setting a task's priorities
relative to other tasks for cpu, io, rss limits etc.
Say N if unsure.
config TASK_XACCT
bool "Enable extended accounting over taskstats (EXPERIMENTAL)"
depends on TASKSTATS
help
Collect extended task accounting data and send the data
to userland for processing over the taskstats interface.
Say N if unsure.
config TASK_IO_ACCOUNTING
bool "Enable per-task storage I/O accounting (EXPERIMENTAL)"
depends on TASK_XACCT
help
Collect information on the number of bytes of storage I/O which this
task has caused.
Say N if unsure.
config UTS_NS
bool "UTS Namespaces"
default n
help
Support uts namespaces. This allows containers, i.e.
vservers, to use uts namespaces to provide different
uts info for different servers. If unsure, say N.
config AUDIT
bool "Auditing support"
depends on NET
help
Enable auditing infrastructure that can be used with another
kernel subsystem, such as SELinux (which requires this for
logging of avc messages output). Does not do system-call
auditing without CONFIG_AUDITSYSCALL.
config AUDITSYSCALL
bool "Enable system-call auditing support"
depends on AUDIT && (X86 || PPC || PPC64 || S390 || IA64 || UML || SPARC64)
default y if SECURITY_SELINUX
help
Enable low-overhead system-call auditing infrastructure that
can be used independently or with another kernel subsystem,
such as SELinux. To use audit's filesystem watch feature, please
ensure that INOTIFY is configured.
config IKCONFIG
tristate "Kernel .config support"
---help---
This option enables the complete Linux kernel ".config" file
contents to be saved in the kernel. It provides documentation
of which kernel options are used in a running kernel or in an
on-disk kernel. This information can be extracted from the kernel
image file with the script scripts/extract-ikconfig and used as
input to rebuild the current kernel or to build another kernel.
It can also be extracted from a running kernel by reading
/proc/config.gz if enabled (below).
config IKCONFIG_PROC
bool "Enable access to .config through /proc/config.gz"
depends on IKCONFIG && PROC_FS
---help---
This option enables access to the kernel configuration file
through /proc/config.gz.
config CPUSETS
bool "Cpuset support"
depends on SMP
help
This option will let you create and manage CPUSETs which
allow dynamically partitioning a system into sets of CPUs and
Memory Nodes and assigning tasks to run only within those sets.
This is primarily useful on large SMP or NUMA systems.
Say N if unsure.
config SYSFS_DEPRECATED
bool "Create deprecated sysfs files"
default y
help
This option creates deprecated symlinks such as the
"device"-link, the <subsystem>:<name>-link, and the
"bus"-link. It may also add deprecated key in the
uevent environment.
None of these features or values should be used today, as
they export driver core implementation details to userspace
or export properties which can't be kept stable across kernel
releases.
If enabled, this option will also move any device structures
that belong to a class, back into the /sys/class heirachy, in
order to support older versions of udev.
If you are using a distro that was released in 2006 or later,
it should be safe to say N here.
config RELAY
bool "Kernel->user space relay support (formerly relayfs)"
help
This option enables support for relay interface support in
certain file systems (such as debugfs).
It is designed to provide an efficient mechanism for tools and
facilities to relay large amounts of data from kernel space to
user space.
If unsure, say N.
config BLK_DEV_INITRD
bool "Initial RAM filesystem and RAM disk (initramfs/initrd) support"
depends on BROKEN || !FRV
help
The initial RAM filesystem is a ramfs which is loaded by the
boot loader (loadlin or lilo) and that is mounted as root
before the normal boot procedure. It is typically used to
load modules needed to mount the "real" root file system,
etc. See <file:Documentation/initrd.txt> for details.
If RAM disk support (BLK_DEV_RAM) is also included, this
also enables initial RAM disk (initrd) support and adds
15 Kbytes (more on some other architectures) to the kernel size.
If unsure say Y.
if BLK_DEV_INITRD
source "usr/Kconfig"
endif
config CC_OPTIMIZE_FOR_SIZE
bool "Optimize for size (Look out for broken compilers!)"
default y
depends on ARM || H8300 || EXPERIMENTAL
help
Enabling this option will pass "-Os" instead of "-O2" to gcc
resulting in a smaller kernel.
WARNING: some versions of gcc may generate incorrect code with this
option. If problems are observed, a gcc upgrade may be needed.
If unsure, say N.
config SYSCTL
bool
menuconfig EMBEDDED
bool "Configure standard kernel features (for small systems)"
help
This option allows certain base kernel options and settings
to be disabled or tweaked. This is for specialized
environments which can tolerate a "non-standard" kernel.
Only use this if you really know what you are doing.
config UID16
bool "Enable 16-bit UID system calls" if EMBEDDED
depends on ARM || CRIS || FRV || H8300 || X86_32 || M68K || (S390 && !64BIT) || SUPERH || SPARC32 || (SPARC64 && SPARC32_COMPAT) || UML || (X86_64 && IA32_EMULATION)
default y
help
This enables the legacy 16-bit UID syscall wrappers.
config SYSCTL_SYSCALL
bool "Sysctl syscall support" if EMBEDDED
default y
select SYSCTL
---help---
sys_sysctl uses binary paths that have been found challenging
to properly maintain and use. The interface in /proc/sys
using paths with ascii names is now the primary path to this
information.
Almost nothing using the binary sysctl interface so if you are
trying to save some space it is probably safe to disable this,
making your kernel marginally smaller.
If unsure say Y here.
config KALLSYMS
bool "Load all symbols for debugging/ksymoops" if EMBEDDED
default y
help
Say Y here to let the kernel print out symbolic crash information and
symbolic stack backtraces. This increases the size of the kernel
somewhat, as all symbols have to be loaded into the kernel image.
config KALLSYMS_ALL
bool "Include all symbols in kallsyms"
depends on DEBUG_KERNEL && KALLSYMS
help
Normally kallsyms only contains the symbols of functions, for nicer
OOPS messages. Some debuggers can use kallsyms for other
symbols too: say Y here to include all symbols, if you need them
and you don't care about adding 300k to the size of your kernel.
Say N.
config KALLSYMS_EXTRA_PASS
bool "Do an extra kallsyms pass"
depends on KALLSYMS
help
If kallsyms is not working correctly, the build will fail with
inconsistent kallsyms data. If that occurs, log a bug report and
turn on KALLSYMS_EXTRA_PASS which should result in a stable build.
Always say N here unless you find a bug in kallsyms, which must be
reported. KALLSYMS_EXTRA_PASS is only a temporary workaround while
you wait for kallsyms to be fixed.
config HOTPLUG
bool "Support for hot-pluggable devices" if EMBEDDED
default y
help
This option is provided for the case where no hotplug or uevent
capabilities is wanted by the kernel. You should only consider
disabling this option for embedded systems that do not use modules, a
dynamic /dev tree, or dynamic device discovery. Just say Y.
config PRINTK
default y
bool "Enable support for printk" if EMBEDDED
help
This option enables normal printk support. Removing it
eliminates most of the message strings from the kernel image
and makes the kernel more or less silent. As this makes it
very difficult to diagnose system problems, saying N here is
strongly discouraged.
config BUG
bool "BUG() support" if EMBEDDED
default y
help
Disabling this option eliminates support for BUG and WARN, reducing
the size of your kernel image and potentially quietly ignoring
numerous fatal conditions. You should only consider disabling this
option for embedded systems with no facilities for reporting errors.
Just say Y.
config ELF_CORE
default y
bool "Enable ELF core dumps" if EMBEDDED
help
Enable support for generating core dumps. Disabling saves about 4k.
config BASE_FULL
default y
bool "Enable full-sized data structures for core" if EMBEDDED
help
Disabling this option reduces the size of miscellaneous core
kernel data structures. This saves memory on small machines,
but may reduce performance.
config FUTEX
bool "Enable futex support" if EMBEDDED
default y
select RT_MUTEXES
help
Disabling this option will cause the kernel to be built without
support for "fast userspace mutexes". The resulting kernel may not
run glibc-based applications correctly.
config EPOLL
bool "Enable eventpoll support" if EMBEDDED
default y
help
Disabling this option will cause the kernel to be built without
support for epoll family of system calls.
config SHMEM
bool "Use full shmem filesystem" if EMBEDDED
default y
depends on MMU
help
The shmem is an internal filesystem used to manage shared memory.
It is backed by swap and manages resource limits. It is also exported
to userspace as tmpfs if TMPFS is enabled. Disabling this
option replaces shmem and tmpfs with the much simpler ramfs code,
which may be appropriate on small systems without swap.
config VM_EVENT_COUNTERS
default y
bool "Enable VM event counters for /proc/vmstat" if EMBEDDED
help
VM event counters are needed for event counts to be shown.
This option allows the disabling of the VM event counters
on EMBEDDED systems. /proc/vmstat will only show page counts
if VM event counters are disabled.
choice
prompt "Choose SLAB allocator"
default SLAB
help
This option allows to select a slab allocator.
config SLAB
bool "SLAB"
help
The regular slab allocator that is established and known to work
well in all environments. It organizes chache hot objects in
per cpu and per node queues. SLAB is the default choice for
slab allocator.
config SLUB
depends on EXPERIMENTAL && !ARCH_USES_SLAB_PAGE_STRUCT
bool "SLUB (Unqueued Allocator)"
help
SLUB is a slab allocator that minimizes cache line usage
instead of managing queues of cached objects (SLAB approach).
Per cpu caching is realized using slabs of objects instead
of queues of objects. SLUB can use memory efficiently
way and has enhanced diagnostics.
config SLOB
#
# SLOB cannot support SMP because SLAB_DESTROY_BY_RCU does not work
# properly.
#
depends on EMBEDDED && !SMP && !SPARSEMEM
bool "SLOB (Simple Allocator)"
help
SLOB replaces the SLAB allocator with a drastically simpler
allocator. SLOB is more space efficient that SLAB but does not
scale well (single lock for all operations) and is more susceptible
to fragmentation. SLOB it is a great choice to reduce
memory usage and code size for embedded systems.
endchoice
endmenu # General setup
config RT_MUTEXES
boolean
select PLIST
config TINY_SHMEM
default !SHMEM
bool
config BASE_SMALL
int
default 0 if BASE_FULL
default 1 if !BASE_FULL
menu "Loadable module support"
config MODULES
bool "Enable loadable module support"
help
Kernel modules are small pieces of compiled code which can
be inserted in the running kernel, rather than being
permanently built into the kernel. You use the "modprobe"
tool to add (and sometimes remove) them. If you say Y here,
many parts of the kernel can be built as modules (by
answering M instead of Y where indicated): this is most
useful for infrequently used options which are not required
for booting. For more information, see the man pages for
modprobe, lsmod, modinfo, insmod and rmmod.
If you say Y here, you will need to run "make
modules_install" to put the modules under /lib/modules/
where modprobe can find them (you may need to be root to do
this).
If unsure, say Y.
config MODULE_UNLOAD
bool "Module unloading"
depends on MODULES
help
Without this option you will not be able to unload any
modules (note that some modules may not be unloadable
anyway), which makes your kernel slightly smaller and
simpler. If unsure, say Y.
config MODULE_FORCE_UNLOAD
bool "Forced module unloading"
depends on MODULE_UNLOAD && EXPERIMENTAL
help
This option allows you to force a module to unload, even if the
kernel believes it is unsafe: the kernel will remove the module
without waiting for anyone to stop using it (using the -f option to
rmmod). This is mainly for kernel developers and desperate users.
If unsure, say N.
config MODVERSIONS
bool "Module versioning support"
depends on MODULES
help
Usually, you have to use modules compiled with your kernel.
Saying Y here makes it sometimes possible to use modules
compiled for different kernels, by adding enough information
to the modules to (hopefully) spot any changes which would
make them incompatible with the kernel you are running. If
unsure, say N.
config MODULE_SRCVERSION_ALL
bool "Source checksum for all modules"
depends on MODULES
help
Modules which contain a MODULE_VERSION get an extra "srcversion"
field inserted into their modinfo section, which contains a
sum of the source files which made it. This helps maintainers
see exactly which source was used to build a module (since
others sometimes change the module source without updating
the version). With this option, such a "srcversion" field
will be created for all modules. If unsure, say N.
config KMOD
bool "Automatic kernel module loading"
depends on MODULES
help
Normally when you have selected some parts of the kernel to
be created as kernel modules, you must load them (using the
"modprobe" command) before you can use them. If you say Y
here, some parts of the kernel will be able to load modules
automatically: when a part of the kernel needs a module, it
runs modprobe with the appropriate arguments, thereby
loading the module if it is available. If unsure, say Y.
config STOP_MACHINE
bool
default y
depends on (SMP && MODULE_UNLOAD) || HOTPLUG_CPU
help
Need stop_machine() primitive.
endmenu
menu "Block layer"
source "block/Kconfig"
endmenu