Reimplement the balance on exec balancing to be sched-domains aware. Use this
to also do balance on fork balancing. Make x86_64 do balance on fork over the
NUMA domain.
The problem that the non sched domains aware blancing became apparent on dual
core, multi socket opterons. What we want is for the new tasks to be sent to
a different socket, but more often than not, we would first load up our
sibling core, or fill two cores of a single remote socket before selecting a
new one.
This gives large improvements to STREAM on such systems.
Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove the very aggressive idle stuff that has recently gone into 2.6 - it is
going against the direction we are trying to go. Hopefully we can regain
performance through other methods.
Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Do CPU load averaging over a number of different intervals. Allow each
interval to be chosen by sending a parameter to source_load and target_load.
0 is instantaneous, idx > 0 returns a decaying average with the most recent
sample weighted at 2^(idx-1). To a maximum of 3 (could be easily increased).
So generally a higher number will result in more conservative balancing.
Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2.6.12-rc6-mm1 has a few remaining synchronize_kernel()s, some (but not
all) in comments. This patch changes these synchronize_kernel() calls (and
comments) to synchronize_rcu() or synchronize_sched() as follows:
- arch/x86_64/kernel/mce.c mce_read(): change to synchronize_sched() to
handle races with machine-check exceptions (synchronize_rcu() would not cut
it given RCU implementations intended for hardcore realtime use.
- drivers/input/serio/i8042.c i8042_stop(): change to synchronize_sched() to
handle races with i8042_interrupt() interrupt handler. Again,
synchronize_rcu() would not cut it given RCU implementations intended for
hardcore realtime use.
- include/*/kdebug.h comments: change to synchronize_sched() to handle races
with NMIs. As before, synchronize_rcu() would not cut it...
- include/linux/list.h comment: change to synchronize_rcu(), since this
comment is for list_del_rcu().
- security/keys/key.c unregister_key_type(): change to synchronize_rcu(),
since this is interacting with RCU read side.
- security/keys/process_keys.c install_session_keyring(): change to
synchronize_rcu(), since this is interacting with RCU read side.
Signed-off-by: "Paul E. McKenney" <paulmck@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch changes the memory allocation method for the s390 debug feature.
Trace buffers had been allocated using the get_free_pages() function before.
Therefore it was not possible to get big memory areas in a running system due
to memory fragmentation. Now the trace buffers are subdivided into several
subbuffers with pagesize. Therefore it is now possible to allocate more
memory for the trace buffers and more trace records can be written.
In addition to that, dynamic specification of the size of the trace buffers is
implemented. It is now possible to change the size of a trace buffer using a
new debugfs file instance. When writing a number into this file, the trace
buffer size is changed to 'number * pagesize'.
In the past all the traces could be obtained from userspace by accessing files
in the "proc" filesystem. Now with debugfs we have a new filesystem which
should be used for debugging purposes. This patch moves the debug feature
from procfs to debugfs.
Since the interface of debug_register() changed, all device drivers, which use
the debug feature had to be adjusted.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add interface to issue VM control program commands.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Improved machine check handling. Kernel is now able to receive machine checks
while in kernel mode (system call, interrupt and program check handling).
Also register validation is now performed.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Cope with a conditional i386 definition, which is wrong for UML. Before we
just used that one, but it wasn't defined for CONFIG_SMP, so in that case
we got link errors.
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Cc: Jeff Dike <jdike@addtoit.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch fixes register saving so that each register is only saved once,
and adds missing saving of %cr8 on x86-64. Some reordering so that
save/restore is more logical/safer (segment registers should be restored
after gdt).
Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Without this patch, Linux provokes emergency disk shutdowns and
similar nastiness. It was in SuSE kernels for some time, IIRC.
Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Using CPU hotplug to support suspend/resume SMP. Both S3 and S4 use
disable/enable_nonboot_cpus API. The S4 part is based on Pavel's original S4
SMP patch.
Signed-off-by: Li Shaohua<shaohua.li@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Broadcast IPI's provide un-expected behaviour for cpu hotplug. CPU's in
offline state also end up receiving the IPI. Once the cpus become online they
receive these stale IPI's which are bad and introduce unexpected behaviour.
This is easily avoided by not sending a broadcast and addressing just the
CPU's in online map. Doing prelim cycle counts it appears there is no big
overhead and numbers seem around 0x3000-0x3900 on an average on x86 and x86_64
systems with CPUS running 3G, both for broadcast and mask version of the
API's.
The shortcuts are useful only for flat mode (where the perf shows no
degradation), and in cluster mode, its unicast anyway. Its simpler to just
not use broadcast anymore.
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Acked-by: Andi Kleen <ak@muc.de>
Acked-by: Zwane Mwaikambo <zwane@arm.linux.org.uk>
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Experimental CPU hotplug patch for x86_64
-----------------------------------------
This supports logical CPU online and offline.
- Test with maxcpus=1, and then kick other cpu's off to test if init code
is all cleaned up. CONFIG_SCHED_SMT works as well.
- idle threads are forked on demand from keventd threads for clean startup
TBD:
1. Not tested on a real NUMA machine (tested with numa=fake=2)
2. Handle ACPI pieces for physical hotplug support.
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Acked-by: Andi Kleen <ak@muc.de>
Acked-by: Zwane Mwaikambo <zwane@arm.linux.org.uk>
Signed-off-by: Shaohua.li<shaohua.li@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch adds __cpuinit and __cpuinitdata sections that need to exist past
boot to support cpu hotplug.
Caveat: This is done *only* for EM64T CPU Hotplug support, on request from
Andi Kleen. Much of the generic hotplug code in kernel, and none of the other
archs that support CPU hotplug today, i386, ia64, ppc64, s390 and parisc dont
mark sections with __cpuinit, but only mark them as __devinit, and
__devinitdata.
If someone is motivated to change generic code, we need to make sure all
existing hotplug code does not break, on other arch's that dont use __cpuinit,
and __cpudevinit.
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Acked-by: Andi Kleen <ak@muc.de>
Acked-by: Zwane Mwaikambo <zwane@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
I really wish smp_prepare_cpu() would disappear eventually. In the interim
this is ideally a weak function, so we dont end up changing several places
to define this dummy in headers.
Today since the dummy declaration is done only in drivers/base/cpu.c but
the function is called in kernel/power/smp.c i get undefined reference in
my cpu hotplug code for x86_64 under development.
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Clean CPU states in order to reuse smp boot code for CPU hotplug.
Signed-off-by: Li Shaohua<shaohua.li@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Make SEP init per-cpu, so it is hotplug safe.
Signed-off-by: Li Shaohua<shaohua.li@intel.com>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch introduces a startup parameter no_broadcast. When we enable
CONFIG_HOTPLUG_CPU, we dont want to use broadcast shortcut as it has ill
effects on a offline cpu. If we issue broadcast, the IPI is also delivered
to offline cpus, or partially up cpu causing stale IPI's to be handled,
which is a problem and can cause undesirable effects.
Introduces a new startup cmdline option no_ipi_broadcast, that can be
switched at cmdline if necessary.
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Acked-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
(The i386 CPU hotplug patch provides infrastructure for some work which Pavel
is doing as well as for ACPI S3 (suspend-to-RAM) work which Li Shaohua
<shaohua.li@intel.com> is doing)
The following provides i386 architecture support for safely unregistering and
registering processors during runtime, updated for the current -mm tree. In
order to avoid dumping cpu hotplug code into kernel/irq/* i dropped the
cpu_online check in do_IRQ() by modifying fixup_irqs(). The difference being
that on cpu offline, fixup_irqs() is called before we clear the cpu from
cpu_online_map and a long delay in order to ensure that we never have any
queued external interrupts on the APICs. There are additional changes to s390
and ppc64 to account for this change.
1) Add CONFIG_HOTPLUG_CPU
2) disable local APIC timer on dead cpus.
3) Disable preempt around irq balancing to prevent CPUs going down.
4) Print irq stats for all possible cpus.
5) Debugging check for interrupts on offline cpus.
6) Hacky fixup_irqs() to redirect irqs when cpus go off/online.
7) play_dead() for offline cpus to spin inside.
8) Handle offline cpus set in flush_tlb_others().
9) Grab lock earlier in smp_call_function() to prevent CPUs going down.
10) Implement __cpu_disable() and __cpu_die().
11) Enable local interrupts in cpu_enable() after fixup_irqs()
12) Don't fiddle with NMI on dead cpu, but leave intact on other cpus.
13) Program IRQ affinity whilst cpu is still in cpu_online_map on offline.
Signed-off-by: Zwane Mwaikambo <zwane@linuxpower.ca>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Support for the OCP device model on Freescale (FSL) PPC's is no longer used.
All FSL PPC's that were using OCP have be converted to using the platform
device model.
Signed-off-by: Kumar Gala <kumar.gala@freescale.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The e200 core is a Book-E core (similar to e500) that has a unified L1 cache
and is not cache coherent on the bus. The e200 core also adds a separate
exception level for debug exceptions. Part of this patch helps to cleanup a
few cases that are true for all Freescale Book-E parts, not just e500.
Signed-off-by: Kim Phillips <kim.phillips@freescale.com>
Signed-off-by: Kumar Gala <kumar.gala@freescale.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2.6.12-git6 doesn't boot on some MIPS machines. They need the support of flat
memory and discontig memory.
Signed-off-by: Yoichi Yuasa <yuasa@hh.iij4u.or.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
I8K: Change to use stock dmi infrastructure instead of homegrown
parsing code. The driver now requires box's DMI data to match
list of supported models so driver can be safely compiled-in
by default without fear of it poking into random SMM BIOS
code. DMI checks can be ignored with i8k.ignore_dmi option.
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch adds support for Dynamic Tick Timer for ARM. Dynamic Tick is
also known as VST (Variable Scheduling Timeouts).
Dynamic Tick has been in use in the OMAP tree since last October. The
patch is not intrusive, and does not do anything unless CONFIG_NO_IDLE_HZ
is defined. This patch has the following fixed based on comments from
RMK:
- Time is updated before calling interrupt handlers.
- Added new interrupt flag SA_TIMER to avoid duplicate timer interrupts
- Moved struct dyn_tick_timer to time.h until we at some point probably
have an arch independent dyn-tick.h
- Cleaned up testing for DYN_TICK_ENABLED in irq.c
I've cleaned up this patch to fix some remaining issues:
- Call the timer tick handler with irqs disabled, as it would be from
a normal interrupt
- if we have a dyn_tick, we better implement all methods.
- generic timer_dyn_reprogram() call, to be called before sleeping
- added command line option - "dyntick=" to allow boot-time control
of this feature
-- rmk
Signed-off-by: Tony Lindgren
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Patch from Lennert Buytenhek
The later ixp2000 models don't need the PCI I/O workaround that we
currently perform. Add a config option to disable the workaround,
and panic on boot if a kernel without the workaround is booted on a
buggy chip. As only pre-production ixp2000s need the workaround,
the default is for it not to be configured in.
Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org>
Signed-off-by: Deepak Saxena
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Patch from Lennert Buytenhek
A number of ixp2000 models have a bug where the byte lanes for PCI I/O
transactions are swapped. We already work around this in our versions
of {in,out}{b,w,l}, but we also need to perform these workarounds in a
custom implementation of the new iomap API, provided in this patch.
Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org>
Signed-off-by: Deepak Saxena
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Patch from Lennert Buytenhek
This patch conditionalises the io{read,write}{8,16,32} defines and the
prototypes for ioport_map/ioport_unmap in asm-arm/io.h on ioread8 not
already having been defined. This is done so that platforms can provide
their own implementation of the iomap API, ixp2000 for example needs
this.
Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org>
Signed-off-by: Deepak Saxena
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
This patch fixes sparse warnings in the qnx4fs (and might even make
qnx4fs work on big-endian boxes)
Signed-off-by: Alexey Dobriyan <adobriyan@mail.ru>
Signed-off-by: Domen Puncer <domen@coderock.org>
Signed-off-by: Anders Larsen <al@alarsen.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Patch from Deepak Saxena
Current IXP4xx debug macros do not work in the small window between
the MMU being enabled and the call to map_io() b/c the standard
peripheral mapping is not properly setup for use with the low-level
debug code. This patch creates a new section-aligned mapping for the
UART specifically for use with the debug macros.
Signed-off-by: Deepak Saxena
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Patch from Lennert Buytenhek
This patch cleans up the ixp2000 gpio irq code and implements the
set_irq_type method for gpio irqs so that users can select for which
events (falling edge/rising edge/level low/level high) on the gpio
pin they want the corresponding gpio irq to be triggered.
Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org>
Signed-off-by: Deepak Saxena
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Patch from Lennert Buytenhek
Two macros that are used on the ixp2000 to fixup byte lane enables
for I/O space accesses, align{b,w}, use their arguments without
parenthesizing them.
Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
They don't actually clobber memory, but gcc doesn't even know they
_read_ memory, so can apparently re-order memory accesses around them.
Which obviously does the wrong thing if the memory access happens to
change the memory that the compare function is accessing..
Verified to fix a strange boot problem by Jens Axboe.
Another rollup of patches which give various symbols static scope
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch reworks filemap_xip.c with the goal to reduce code duplication
from mm/filemap.c. It applies agains 2.6.12-rc6-mm1. Instead of
implementing the aio functions, this one implements the synchronous
read/write functions only. For readv and writev, the generic fallback is
used. For aio, we rely on the application doing the fallback. Since our
"synchronous" function does memcpy immediately anyway, there is no
performance difference between using the fallbacks or implementing each
operation.
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
These are the ext2 related parts. Ext2 now uses the xip_* file operations
along with the get_xip_page aop when mounted with -o xip.
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- generic_file* file operations do no longer have a xip/non-xip split
- filemap_xip.c implements a new set of fops that require get_xip_page
aop to work proper. all new fops are exported GPL-only (don't like to
see whatever code use those except GPL modules)
- __xip_unmap now uses page_check_address, which is no longer static
in rmap.c, and defined in linux/rmap.h
- mm/filemap.h is now much more clean, plainly having just Linus'
inline funcs moved here from filemap.c
- fix includes in filemap_xip to make it build cleanly on i386
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This is the block device related part. The block device operation
direct_access now has a struct block_device as first parameter.
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch adds version and srcversion files to
/sys/module/${modulename} containing the version and srcversion fields
of the module's modinfo section (if present).
/sys/module/e1000
|-- srcversion
`-- version
This patch differs slightly from the version posted in January, as it
now uses the new kstrdup() call in -mm.
Why put this in sysfs?
a) Tools like DKMS, which deal with changing out individual kernel
modules without replacing the whole kernel, can behave smarter if they
can tell the version of a given module. The autoinstaller feature, for
example, which determines if your system has a "good" version of a
driver (i.e. if the one provided by DKMS has a newer verson than that
provided by the kernel package installed), and to automatically compile
and install a newer version if DKMS has it but your kernel doesn't yet
have that version.
b) Because sysadmins manually, or with tools like DKMS, can switch out
modules on the file system, you can't count on 'modinfo foo.ko', which
looks at /lib/modules/${kernelver}/... actually matching what is loaded
into the kernel already. Hence asking sysfs for this.
c) as the unbind-driver-from-device work takes shape, it will be
possible to rebind a driver that's built-in (no .ko to modinfo for the
version) to a newly loaded module. sysfs will have the
currently-built-in version info, for comparison.
d) tech support scripts can then easily grab the version info for what's
running presently - a question I get often.
There has been renewed interest in this patch on linux-scsi by driver
authors.
As the idea originated from GregKH, I leave his Signed-off-by: intact,
though the implementation is nearly completely new. Compiled and run on
x86 and x86_64.
From: Matthew Dobson <colpatch@us.ibm.com>
build fix
From: Thierry Vignaud <tvignaud@mandriva.com>
build fix
From: Matthew Dobson <colpatch@us.ibm.com>
warning fix
Signed-off-by: Greg Kroah-Hartman <greg@kroah.com>
Signed-off-by: Matt Domsch <Matt_Domsch@dell.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch adds support for various SAA7134 cards and brings some fixes.
Signed-off-by: Mauro Carvalho Chehab <mchehab@brturbo.com.br>
Signed-off-by: Fabrice Aeschbacher <fabrice.aeschbacher@laposte.net>
Signed-off-by: Hermann Pitton <hermann.pitton@onlinehome.de>.
Signed-off-by: Nickolay V Shmyrev <nshmyrev@yandex.ru>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Tuner improvements and additions. TEA5767 FM tuner added. Several small
fixes.
Signed-off-by: Mauro Carvalho Chehab <mchehab@brturbo.com.br>
Signed-off-by: Nickolay V Shmyrev <nshmyrev@yandex.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The following patch adds support for the PixelView Ultra Pro video capture
card in v4l.
- It removes the remote control key definitions from ir-kbd-gpio.c and
moves them to ir-common.c so that they can be shared between bt878 and
cx88 based cards.
- The patch also moves the FUSIONHDTV_3_GOLD_Q card from number 27 to 28
to regain compatibility with the V4L cvs.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Set the recovery directory via /proc/fs/nfsd/nfs4recoverydir.
It may be changed any time, but is used only on startup.
Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch adds the code to create and remove client subdirectories from the
recovery directory, as described in the previous patch comment.
Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
NFSv4 clients are required to know what state they have on the server so that
they can reclaim it on server reboot. However, it is possible for
pathalogical combinations of server reboots and network partitions to leave a
client in a state where it cannot know whether it has lost its state on the
server.
For this reason, rfc3530 requires that we store some information about clients
to stable storage.
So we maintain a directory /var/lib/nfs/v4recovery with a subdirectory for
each client with active state. We leave open the possibility of including
files underneath each such subdirectory with information about the client, but
for now the subdirectories are empty.
We create a client subdirectory whenever a client makes its first non-reclaim
open_confirm.
We remove a client subdirectory whenever either
a) its lease expires, or
b) the grace period ends without it reclaiming anything.
When handling reclaims, we allow the reclaim if and only if the client doing
the reclaim has a subdirectory.
This patch adds just the code to scan the recovery directory on nfsd startup.
Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>