Merge branch 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking tree changes from Ingo Molnar: "Two changes: a documentation update and a ticket locks live lock fix" * 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/ticketlock: Fix spin_unlock_wait() livelock locking/lglocks: Add documentation of current lglocks implementation
This commit is contained in:
commit
9c37f95936
166
Documentation/locking/lglock.txt
Normal file
166
Documentation/locking/lglock.txt
Normal file
@ -0,0 +1,166 @@
|
||||
lglock - local/global locks for mostly local access patterns
|
||||
------------------------------------------------------------
|
||||
|
||||
Origin: Nick Piggin's VFS scalability series introduced during
|
||||
2.6.35++ [1] [2]
|
||||
Location: kernel/locking/lglock.c
|
||||
include/linux/lglock.h
|
||||
Users: currently only the VFS and stop_machine related code
|
||||
|
||||
Design Goal:
|
||||
------------
|
||||
|
||||
Improve scalability of globally used large data sets that are
|
||||
distributed over all CPUs as per_cpu elements.
|
||||
|
||||
To manage global data structures that are partitioned over all CPUs
|
||||
as per_cpu elements but can be mostly handled by CPU local actions
|
||||
lglock will be used where the majority of accesses are cpu local
|
||||
reading and occasional cpu local writing with very infrequent
|
||||
global write access.
|
||||
|
||||
|
||||
* deal with things locally whenever possible
|
||||
- very fast access to the local per_cpu data
|
||||
- reasonably fast access to specific per_cpu data on a different
|
||||
CPU
|
||||
* while making global action possible when needed
|
||||
- by expensive access to all CPUs locks - effectively
|
||||
resulting in a globally visible critical section.
|
||||
|
||||
Design:
|
||||
-------
|
||||
|
||||
Basically it is an array of per_cpu spinlocks with the
|
||||
lg_local_lock/unlock accessing the local CPUs lock object and the
|
||||
lg_local_lock_cpu/unlock_cpu accessing a remote CPUs lock object
|
||||
the lg_local_lock has to disable preemption as migration protection so
|
||||
that the reference to the local CPUs lock does not go out of scope.
|
||||
Due to the lg_local_lock/unlock only touching cpu-local resources it
|
||||
is fast. Taking the local lock on a different CPU will be more
|
||||
expensive but still relatively cheap.
|
||||
|
||||
One can relax the migration constraints by acquiring the current
|
||||
CPUs lock with lg_local_lock_cpu, remember the cpu, and release that
|
||||
lock at the end of the critical section even if migrated. This should
|
||||
give most of the performance benefits without inhibiting migration
|
||||
though needs careful considerations for nesting of lglocks and
|
||||
consideration of deadlocks with lg_global_lock.
|
||||
|
||||
The lg_global_lock/unlock locks all underlying spinlocks of all
|
||||
possible CPUs (including those off-line). The preemption disable/enable
|
||||
are needed in the non-RT kernels to prevent deadlocks like:
|
||||
|
||||
on cpu 1
|
||||
|
||||
task A task B
|
||||
lg_global_lock
|
||||
got cpu 0 lock
|
||||
<<<< preempt <<<<
|
||||
lg_local_lock_cpu for cpu 0
|
||||
spin on cpu 0 lock
|
||||
|
||||
On -RT this deadlock scenario is resolved by the arch_spin_locks in the
|
||||
lglocks being replaced by rt_mutexes which resolve the above deadlock
|
||||
by boosting the lock-holder.
|
||||
|
||||
|
||||
Implementation:
|
||||
---------------
|
||||
|
||||
The initial lglock implementation from Nick Piggin used some complex
|
||||
macros to generate the lglock/brlock in lglock.h - they were later
|
||||
turned into a set of functions by Andi Kleen [7]. The change to functions
|
||||
was motivated by the presence of multiple lock users and also by them
|
||||
being easier to maintain than the generating macros. This change to
|
||||
functions is also the basis to eliminated the restriction of not
|
||||
being initializeable in kernel modules (the remaining problem is that
|
||||
locks are not explicitly initialized - see lockdep-design.txt)
|
||||
|
||||
Declaration and initialization:
|
||||
-------------------------------
|
||||
|
||||
#include <linux/lglock.h>
|
||||
|
||||
DEFINE_LGLOCK(name)
|
||||
or:
|
||||
DEFINE_STATIC_LGLOCK(name);
|
||||
|
||||
lg_lock_init(&name, "lockdep_name_string");
|
||||
|
||||
on UP this is mapped to DEFINE_SPINLOCK(name) in both cases, note
|
||||
also that as of 3.18-rc6 all declaration in use are of the _STATIC_
|
||||
variant (and it seems that the non-static was never in use).
|
||||
lg_lock_init is initializing the lockdep map only.
|
||||
|
||||
Usage:
|
||||
------
|
||||
|
||||
From the locking semantics it is a spinlock. It could be called a
|
||||
locality aware spinlock. lg_local_* behaves like a per_cpu
|
||||
spinlock and lg_global_* like a global spinlock.
|
||||
No surprises in the API.
|
||||
|
||||
lg_local_lock(*lglock);
|
||||
access to protected per_cpu object on this CPU
|
||||
lg_local_unlock(*lglock);
|
||||
|
||||
lg_local_lock_cpu(*lglock, cpu);
|
||||
access to protected per_cpu object on other CPU cpu
|
||||
lg_local_unlock_cpu(*lglock, cpu);
|
||||
|
||||
lg_global_lock(*lglock);
|
||||
access all protected per_cpu objects on all CPUs
|
||||
lg_global_unlock(*lglock);
|
||||
|
||||
There are no _trylock variants of the lglocks.
|
||||
|
||||
Note that the lg_global_lock/unlock has to iterate over all possible
|
||||
CPUs rather than the actually present CPUs or a CPU could go off-line
|
||||
with a held lock [4] and that makes it very expensive. A discussion on
|
||||
these issues can be found at [5]
|
||||
|
||||
Constraints:
|
||||
------------
|
||||
|
||||
* currently the declaration of lglocks in kernel modules is not
|
||||
possible, though this should be doable with little change.
|
||||
* lglocks are not recursive.
|
||||
* suitable for code that can do most operations on the CPU local
|
||||
data and will very rarely need the global lock
|
||||
* lg_global_lock/unlock is *very* expensive and does not scale
|
||||
* on UP systems all lg_* primitives are simply spinlocks
|
||||
* in PREEMPT_RT the spinlock becomes an rt-mutex and can sleep but
|
||||
does not change the tasks state while sleeping [6].
|
||||
* in PREEMPT_RT the preempt_disable/enable in lg_local_lock/unlock
|
||||
is downgraded to a migrate_disable/enable, the other
|
||||
preempt_disable/enable are downgraded to barriers [6].
|
||||
The deadlock noted for non-RT above is resolved due to rt_mutexes
|
||||
boosting the lock-holder in this case which arch_spin_locks do
|
||||
not do.
|
||||
|
||||
lglocks were designed for very specific problems in the VFS and probably
|
||||
only are the right answer in these corner cases. Any new user that looks
|
||||
at lglocks probably wants to look at the seqlock and RCU alternatives as
|
||||
her first choice. There are also efforts to resolve the RCU issues that
|
||||
currently prevent using RCU in place of view remaining lglocks.
|
||||
|
||||
Note on brlock history:
|
||||
-----------------------
|
||||
|
||||
The 'Big Reader' read-write spinlocks were originally introduced by
|
||||
Ingo Molnar in 2000 (2.4/2.5 kernel series) and removed in 2003. They
|
||||
later were introduced by the VFS scalability patch set in 2.6 series
|
||||
again as the "big reader lock" brlock [2] variant of lglock which has
|
||||
been replaced by seqlock primitives or by RCU based primitives in the
|
||||
3.13 kernel series as was suggested in [3] in 2003. The brlock was
|
||||
entirely removed in the 3.13 kernel series.
|
||||
|
||||
Link: 1 http://lkml.org/lkml/2010/8/2/81
|
||||
Link: 2 http://lwn.net/Articles/401738/
|
||||
Link: 3 http://lkml.org/lkml/2003/3/9/205
|
||||
Link: 4 https://lkml.org/lkml/2011/8/24/185
|
||||
Link: 5 http://lkml.org/lkml/2011/12/18/189
|
||||
Link: 6 https://www.kernel.org/pub/linux/kernel/projects/rt/
|
||||
patch series - lglocks-rt.patch.patch
|
||||
Link: 7 http://lkml.org/lkml/2012/3/5/26
|
@ -183,8 +183,20 @@ static __always_inline void arch_spin_lock_flags(arch_spinlock_t *lock,
|
||||
|
||||
static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
|
||||
{
|
||||
while (arch_spin_is_locked(lock))
|
||||
__ticket_t head = ACCESS_ONCE(lock->tickets.head);
|
||||
|
||||
for (;;) {
|
||||
struct __raw_tickets tmp = ACCESS_ONCE(lock->tickets);
|
||||
/*
|
||||
* We need to check "unlocked" in a loop, tmp.head == head
|
||||
* can be false positive because of overflow.
|
||||
*/
|
||||
if (tmp.head == (tmp.tail & ~TICKET_SLOWPATH_FLAG) ||
|
||||
tmp.head != head)
|
||||
break;
|
||||
|
||||
cpu_relax();
|
||||
}
|
||||
}
|
||||
|
||||
/*
|
||||
|
Loading…
Reference in New Issue
Block a user