Linus noticed that atomic64_xchg() uses atomic_read(), which
happens to work because atomic_read() is a macro so the
.counter value gets u64-read on 32-bit too - but this is really
bogus and serious bugs are waiting to happen.
Fix atomic64_xchg() to use __atomic64_read() instead.
No code changed:
arch/x86/lib/atomic64_32.o:
text data bss dec hex filename
435 0 0 435 1b3 atomic64_32.o.before
435 0 0 435 1b3 atomic64_32.o.after
md5:
bd8ab95e69c93518578bfaf0ea3be4d9 atomic64_32.o.before.asm
bd8ab95e69c93518578bfaf0ea3be4d9 atomic64_32.o.after.asm
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
LKML-Reference: <alpine.LFD.2.01.0907021653030.3210@localhost.localdomain>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Linus noticed that atomic64_xchg() uses atomic_read(), which
happens to work because atomic_read() is a macro so the
.counter value gets u64-read on 32-bit too - but this is really
bogus and serious bugs are waiting to happen.
Change atomic_read() to be a type-safe inline, and this exposes
the atomic64 bogosity as well:
arch/x86/lib/atomic64_32.c: In function ‘atomic64_xchg’:
arch/x86/lib/atomic64_32.c:39: warning: passing argument 1 of ‘atomic_read’ from incompatible pointer type
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
LKML-Reference: <alpine.LFD.2.01.0907021653030.3210@localhost.localdomain>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
cmpxchg8b is a huge instruction in terms of register footprint,
we almost never want to inline it, not even within the same
code module.
GCC 4.3 still messes up for two functions, under-judging the
true cost of this instruction - so annotate two key functions
to reduce the bloat:
arch/x86/lib/atomic64_32.o:
text data bss dec hex filename
1763 0 0 1763 6e3 atomic64_32.o.before
435 0 0 435 1b3 atomic64_32.o.after
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
LKML-Reference: <alpine.LFD.2.01.0907021653030.3210@localhost.localdomain>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Linus noted (based on Eric Dumazet's numbers) that we would
probably be better off not trying an atomic_read() in
atomic64_add_return() but intead intentionally let the first
cmpxchg8b fail - to get a cache-friendly 'give me ownership
of this cacheline' transaction. That can then be followed
by the real cmpxchg8b which sets the value local to the CPU.
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
LKML-Reference: <alpine.LFD.2.01.0907021653030.3210@localhost.localdomain>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Rewrite cmpxchg8b() to not use %edi register but a generic "+m"
constraint, to increase compiler freedom in code generation and
possibly better code.
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
LKML-Reference: <alpine.LFD.2.01.0907021653030.3210@localhost.localdomain>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Linus noticed that the 32-bit version of atomic64_read() was
being overly complex with re-reading the value and doing a
retry loop over that.
Instead we can just rely on cmpxchg8b returning either the new
value or returning the current value.
We can use any 'old' value, which will be faster as it can be
loaded via immediates. Using some value that is not equal to
the real value in memory the instruction gets faster.
This also has the advantage that the CPU could avoid dirtying
the cacheline.
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
LKML-Reference: <alpine.LFD.2.01.0907021653030.3210@localhost.localdomain>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Linus noted that the atomic64_t primitives are all inlines
currently which is crazy because these functions have a large
register footprint anyway.
Move them to a separate file: arch/x86/lib/atomic64_32.c
Also, while at it, rename all uses of 'unsigned long long' to
the much shorter u64.
This makes the appearance of the prototypes a lot nicer - and
it also uncovered a few bugs where (yet unused) API variants
had 'long' as their return type instead of u64.
[ More intrusive changes are not yet done in this patch. ]
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
LKML-Reference: <alpine.LFD.2.01.0907021653030.3210@localhost.localdomain>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Locked instructions on two cache lines at once are painful. If
atomic64_t uses two cache lines, my test program is 10x slower.
The chance for that is significant: 4/32 or 12.5%.
Make sure an atomic64_t is 8 bytes aligned.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
LKML-Reference: <alpine.LFD.2.01.0907021653030.3210@localhost.localdomain>
[ changed it to __aligned(8) as per Andrew's suggestion ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Certain versions of GCC dont see the initialization that is done here:
builtin-report.c: In function ‘__cmd_report’:
builtin-report.c:1038: warning: ‘syms’ may be used uninitialized in this function
So annotate it with a NULL initialization.
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ingo Molnar wrote:
> i just bisected a 'perf report' bug that would cause us to not
> resolve all user-space symbols in a 'git gc' run to:
>
> f5812a7a33 is first bad commit
> commit f5812a7a33
> Author: Arnaldo Carvalho de Melo <acme@redhat.com>
> Date: Tue Jun 30 11:43:17 2009 -0300
>
> perf_counter tools: Adjust only prelinked symbol's addresses
Rename ->prelinked to ->adjust_symbols and making what was done
only for prelinked libraries also to ET_EXEC binaries, such as
/usr/bin/git:
[acme@doppio pahole]$ readelf -h /usr/bin/git | grep Type
Type: EXEC (Executable file)
[acme@doppio pahole]$
And after installing the 'git-debuginfo' package, I get correct results:
[acme@doppio linux-2.6-tip]$ perf report --sort comm,dso,symbol -d /usr/bin/git | head -20
#
# (1139614 samples)
#
# Overhead Command Shared Object Symbol
# ........ ................ ......................... ......
#
34.98% git /usr/bin/git [.] send_sideband
33.39% git /usr/bin/git [.] enter_repo
6.81% git /usr/bin/git [.] diff_opt_parse
4.95% git /usr/bin/git [.] is_repository_shallow
3.24% git /usr/bin/git [.] odb_mkstemp
1.39% git /usr/bin/git [.] output
1.34% git /usr/bin/git [.] xmmap
1.25% git /usr/bin/git [.] receive_pack_config
1.16% git /usr/bin/git [.] git_pathdup
0.90% git /usr/bin/git [.] read_object_with_reference
0.86% git /usr/bin/git [.] show_patch_diff
0.85% git /usr/bin/git 0x00000000095e2e
0.69% git /usr/bin/git [.] display
[acme@doppio linux-2.6-tip]$
I'll check what are the last cases where we can't resolve symbols, like
this 0x00000000095e2e later.
And I guess this will fix the problems Mike were seeing too:
[acme@doppio linux-2.6-tip]$ readelf -h ../build/perf/vmlinux | grep Type
Type: EXEC (Executable file)
[acme@doppio linux-2.6-tip]$
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This adds the use of colors to signal at a glance the important
overhead thresholds in callchains hit rates.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Anton Blanchard <anton@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <1246558475-10624-3-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Among perf annotate, perf report and perf top, we can find the
common colored printing of percents according to the following
rules:
High overhead = > 5%, colored in red
Mid overhead = > 0.5%, colored in green
Low overhead = < 0.5%, default color
Factorize these multiple checks in a single function named
percent_color_fprintf() and also provide a get_percent_color()
for sites which print percentages and other things at the same
time.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Anton Blanchard <anton@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <1246558475-10624-2-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Callchains output may become a burden on a trace because even
rarely hit site are exposed. This can be too much information.
Let the user set a threshold as a minimum percent of hits using
the new pattern for the -c option:
-c mode,min_percent
Example:
$ perf report -s sym -c flat,4
8.25% [k] copy_user_generic_string
4.19%
copy_user_generic_string
generic_file_aio_read
do_sync_read
vfs_read
sys_pread64
system_call_fastpath
pread64
5.39% [k] search_by_key
4.63% 0x00000000009e0a
2.36% [k] memcpy_c
[...]
$ perf report -s sym -c graph,2
8.25% [k] copy_user_generic_string
|
|--4.31%-- generic_file_aio_read
| do_sync_read
| vfs_read
| |
| --4.19%-- sys_pread64
| system_call_fastpath
| pread64
|
--3.24%-- generic_file_buffered_write
__generic_file_aio_write_nolock
generic_file_aio_write
do_sync_write
reiserfs_file_write
vfs_write
|
--3.14%-- sys_pwrite64
system_call_fastpath
__pwrite64
5.39% [k] search_by_key
|
--2.23%-- reiserfs_update_sd_size
4.63% 0x00000000009e0a
2.36% [k] memcpy_c
[...]
You can also omit it and it will default to 0.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Anton Blanchard <anton@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <1246558475-10624-1-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Currently, the printing of callchains is done in a single
vertical level, this is the "flat" mode:
8.25% [k] copy_user_generic_string
4.19%
copy_user_generic_string
generic_file_aio_read
do_sync_read
vfs_read
sys_pread64
system_call_fastpath
pread64
This patch introduces a new "graph" mode which provides a
hierarchical output of factorized paths recursively sorted:
8.25% [k] copy_user_generic_string
|
|--4.31%-- generic_file_aio_read
| do_sync_read
| vfs_read
| |
| |--4.19%-- sys_pread64
| | system_call_fastpath
| | pread64
| |
| --0.12%-- sys_read
| system_call_fastpath
| __read
|
|--3.24%-- generic_file_buffered_write
| __generic_file_aio_write_nolock
| generic_file_aio_write
| do_sync_write
| reiserfs_file_write
| vfs_write
| |
| |--3.14%-- sys_pwrite64
| | system_call_fastpath
| | __pwrite64
| |
| --0.10%-- sys_write
[...]
The command line has then changed.
By providing the -c option, the callchain will output in the
flat mode by default.
But you can override it:
perf report -c graph
or
perf report -c flat
You can also pass the abreviated mode:
perf report -c g
or
perf report -c gra
will both make use of the graph mode.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Anton Blanchard <anton@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <1246550301-8954-3-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
There is no predefined macro to create an option that can have
a custom value or a default one if none is given.
This patch provides a new helper OPT_CALLBACK_DEFAULT() which
defines such kind of option.
For example, considering an option -c, we want to get the
default value in the following cases:
perf command -c -d
perf command -d -c
And the foo value when it's given:
perf command -c foo -d
perf command -d -c foo
That's also why PARSE_OPT_LASTARG_DEFAULT is extended here to
support default values whatever the position of the option, not
only in the end.
Should it now be renamed to PARSE_OPT_ARG_DEFAULT ?
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Anton Blanchard <anton@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: git@vger.kernel.org
LKML-Reference: <1246550301-8954-2-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Iterating through children of a node in the callchain tree
shows something that may be quite confusing at a first glance.
The head is the children field of the parent and the list nodes
are in the brothers field of the children.
This is because the childs are linked to the parent as a list
of "brothers" using the "children" list of the parent as a
head:
---------------
| Parent (head) |-------------------------------------
--------------- |
| |
children |
| |
----------- ----------- |
| 1st child |---brother---| 2nd child |---brother-----
----------- -----------
This makes the following strange pattern often occuring:
list_for_each_entry(child, &parent->children, brothers) {
// do something with children
}
Abstract it to chain_for_each_child() to factorize and simplify
this pattern.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Anton Blanchard <anton@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <1246550301-8954-1-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Add the -m/--modules option to perf report and perf annotate,
which enables live module symbol/image loading. To be used
with -k/--vmlinux.
(Also give perf annotate a -P/--full-paths option.)
Signed-off-by: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <1246514986.13293.48.camel@marge.simson.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
perf_counter tools: Make symbol loading consistently return number of loaded symbols.
Signed-off-by: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <1246514758.13293.42.camel@marge.simson.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Building builtin-stat.c reports the following errors:
cc1: warnings being treated as errors
builtin-stat.c: In function ‘run_perf_stat’:
builtin-stat.c:242: erreur: ignoring return value of ‘read’, declared with attribute warn_unused_result
builtin-stat.c:255: erreur: ignoring return value of ‘read’, declared with attribute warn_unused_result
make: *** [builtin-stat.o] Erreur 1
This patch handles the possible pipe read failures.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Anton Blanchard <anton@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <1246474930-6088-2-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
About every callchains recorded with perf record are filled up
including the internal perfcounter nmi frame:
perf_callchain
perf_counter_overflow
intel_pmu_handle_irq
perf_counter_nmi_handler
notifier_call_chain
atomic_notifier_call_chain
notify_die
do_nmi
nmi
We want ignore this frame as it's not interesting for
instrumentation. To solve this, we simply ignore every frames
from nmi context.
New example of "perf report -s sym -c" after this patch:
9.59% [k] search_by_key
4.88%
search_by_key
reiserfs_read_locked_inode
reiserfs_iget
reiserfs_lookup
do_lookup
__link_path_walk
path_walk
do_path_lookup
user_path_at
vfs_fstatat
vfs_lstat
sys_newlstat
system_call_fastpath
__lxstat
0x406fb1
3.19%
search_by_key
search_by_entry_key
reiserfs_find_entry
reiserfs_lookup
do_lookup
__link_path_walk
path_walk
do_path_lookup
user_path_at
vfs_fstatat
vfs_lstat
sys_newlstat
system_call_fastpath
__lxstat
0x406fb1
[...]
For now this patch only solves the problem in x86-64.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Anton Blanchard <anton@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <1246474930-6088-1-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The copy we were using came from another copy I did for the dwarves
(pahole) package, that came from the kernel years ago.
The only function that is used by the perf tools and that isn't in the
kernel is list_del_range, that I'm leaving in the perf tools only for
now.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <20090701174608.GA5823@ghostprotocols.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The tools/perf/util/rbtree.c copy already drifted by three
csets:
4b324126e04c6011781116c047add3
So remove the copy and use the lib/rbtree.c directly, sharing
the source code while still generating a separate object file,
since tools/perf uses a far more agressive -O6 switch.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20090701152837.GG15682@ghostprotocols.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
MATCH_EVENT is useful:
1. for multiple attrs checking
2. avoid repetition of PERF_TYPE_ and PERF_COUNT_ and save space
3. avoids line breakage
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <1246440909.3403.5.camel@hpdv5.satnam>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Enable -Wextra. This found a few real bugs plus a number
of signed/unsigned type mismatches/uncleanlinesses. It
also required a few annotations
All things considered it was still worth it so lets try with
this enabled for now.
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Fix:
builtin-report.c: In function ‘hist_entry__add’:
builtin-report.c:1015: error: case label not within a switch statement
builtin-report.c:1017: error: break statement not within loop or switch
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This reworks the parser for event descriptors to make it more
consistent in what it accepts. It is now structured as a
recursive descent parser for the following grammar:
events ::= event ( ("," | space) space* event )*
event ::= ( raw_event | numeric_event | symbolic_event |
generic_hw_event ) [ event_modifier ]
raw_event ::= "r" hex_number
numeric_event ::= number ":" number
number ::= decimal_number | "0x" hex_number | "0" octal_number
symbolic_event ::= string_from_event_symbols_array
generic_hw_event::= cache_type ( "-" ( cache_op | cache_result ) )*
event_modifier ::= ":" ( "u" | "k" | "h" )+
with the extra restriction that you can have at most one
cache_op and at most one cache_result.
We pass the current string pointer by reference (i.e. as a
const char **) to the various parsing functions so that they
can advance the pointer to indicate how much they consumed.
They return 0 if they didn't recognize the thing at the pointer
or 1 if they did (and advance the pointer past it).
This also fixes parse_aliases to take the longest matching
alias from the table, not the first one. Otherwise "l1-data"
would match the "l1-d" alias and the "ata" would not be
consumed.
This allows event modifiers indicating what processor modes to
count in to be applied to any event, not just numeric events,
and adds a ":h" modifier to indicate counting in hypervisor
mode. Specifying ":u" now sets both exclude_kernel and
exclude_hv, and so on. Multiple modes can be specified, e.g.
":uk" will count in user or hypervisor mode (i.e. only
exclude_kernel will be set).
Signed-off-by: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <19018.53826.843815.189847@cargo.ozlabs.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
POWER7 has the same PR/HV bit layout as POWER6, so set the flag.
Signed-off-by: Anton Blanchard <anton@samba.org>
Acked-by: Paul Mackerras <paulus@samba.org>
Cc: a.p.zijlstra@chello.nl
Cc: benh@kernel.crashing.org
LKML-Reference: <20090701030701.GI3563@kryten>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The symbol resolving has of course revealed some bugs in the
callchain tree handling. This patch fixes some of them,
including:
- inherit the children from the parents while splitting a node
- fix list range moving
- fix indexes setting in callchains
- create a child on the current node if the path doesn't match in
the existent children (was only done on the root)
- compare using symbols when possible so that we can match a function
using any ip inside by referring to its start address.
The practical effects are:
- remove double callchains
- fix upside down or any random order of callchains
- fix wrong paths
- fix bad hits and percentage accounts
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Anton Blanchard <anton@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <1246419315-9968-4-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Fix a confusion while giving the size of a callchain list
during its allocation. We are using the wrong structure size.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Anton Blanchard <anton@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <1246419315-9968-2-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Merge reason: this branch was on a .30-ish base before, update
it to an almost-.31-rc2 upstream base to pick up fixes.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'kmemleak' of git://linux-arm.org/linux-2.6:
kmemleak: Inform kmemleak about pid_hash
kmemleak: Do not warn if an unknown object is freed
kmemleak: Do not report new leaked objects if the scanning was stopped
kmemleak: Slightly change the policy on newly allocated objects
kmemleak: Do not trigger a scan when reading the debug/kmemleak file
kmemleak: Simplify the reports logged by the scanning thread
kmemleak: Enable task stacks scanning by default
kmemleak: Allow the early log buffer to be configurable.
* git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-2.6-dm:
dm table: fix blk_stack_limits arg to use bytes not sectors
dm exception store: really fix type lookup
* 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (47 commits)
perf report: Add --symbols parameter
perf report: Add --comms parameter
perf report: Add --dsos parameter
perf_counter tools: Adjust only prelinked symbol's addresses
perf_counter: Provide a way to enable counters on exec
perf_counter tools: Reduce perf stat measurement overhead/skew
perf stat: Use percentages for scaling output
perf_counter, x86: Update x86_pmu after WARN()
perf stat: Micro-optimize the code: memcpy is only required if no event is selected and !null_run
perf stat: Improve output
perf stat: Fix multi-run stats
perf stat: Add -n/--null option to run without counters
perf_counter tools: Remove dead code
perf_counter: Complete counter swap
perf report: Print sorted callchains per histogram entries
perf_counter tools: Prepare a small callchain framework
perf record: Fix unhandled io return value
perf_counter tools: Add alias for 'l1d' and 'l1i'
perf-report: Add bare minimum PERF_EVENT_READ parsing
perf-report: Add modes for inherited stats and no-samples
...
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
Add Fenghua Yu as temporary co-maintainer for ia64
[IA64] address compiler warnings perfmon.c/salinfo.c
[IA64] Remove unnecessary semicolons
[IA64] sprintf should not be used with same source & destination address
Wire up new syscalls rt_tgsigqueueinfo and perf_counter_open.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Wire up new syscalls rt_tgsigqueueinfo and perf_counter_open.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Maximum file size for hostfs mounts defaults to 2GB, so bigger files cannot be
read/written through hostfs. This patch initializes the maximum file size to
MAX_LFS_SIZE.
Addresses http://bugzilla.kernel.org/show_bug.cgi?id=13531
Signed-off-by: Wolfgang Illmeyer <wolfgang@illmeyer.com>
Cc: Jeff Dike <jdike@addtoit.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A crappy macro prevents us unlocking on a fail path.
Expand the macro and unlock appropriatelly.
Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Make sure we do not actually request the RTC IRQ until the device driver
is fully ready to handle and process any interrupt. This way a spurious
interrupt won't crash the system (which may happen if the bootloader was
poking the RTC right before booting Linux).
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Block writes require 64 byte alignment. Since block writes could be used
with SGRAM or WRAM also refine the memory type detection to check for
either type before deciding to use the 64 byte alignment.
Signed-off-by: Ville Syrjala <syrjala@sci.fi>
Tested-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: Krzysztof Helt <krzysztof.h1@poczta.fm>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Apparently HP OmniBook 500's BIOS doesn't like the way atyfb reprograms
the hardware. The BIOS will simply hang after a reboot. Fix the problem
by restoring the hardware to it's original state on reboot.
Signed-off-by: Ville Syrjala <syrjala@sci.fi>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Cc: Krzysztof Helt <krzysztof.h1@poczta.fm>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
IRQ handling is wrong for any GPIO >= PL061_GPIO_NR.
Fix this by implementing and using a proper .to_irq method.
Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Cc: David Brownell <david-b@pacbell.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Note that IRQ has not been initialized when kmalloc() fails.
Also, use DECLARE_BITMAP() to make the code clearer.
Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Cc: David Brownell <david-b@pacbell.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Nathan reported that
| commit 73d60b7f74
| Author: Yinghai Lu <yinghai@kernel.org>
| Date: Tue Jun 16 15:33:00 2009 -0700
|
| page-allocator: clear N_HIGH_MEMORY map before we set it again
|
| SRAT tables may contains nodes of very small size. The arch code may
| decide to not activate such a node. However, currently the early boot
| code sets N_HIGH_MEMORY for such nodes. These nodes therefore seem to be
| active although these nodes have no present pages.
|
| For 64bit N_HIGH_MEMORY == N_NORMAL_MEMORY, so that works for 64 bit too
unintentionally and incorrectly clears the cpuset.mems cgroup attribute on
an i386 kvm guest, meaning that cpuset.mems can not be used.
Fix this by only clearing node_states[N_NORMAL_MEMORY] for 64bit only.
and need to do save/restore for that in find_zone_movable_pfn
Reported-by: Nathan Lynch <ntl@pobox.com>
Tested-by: Nathan Lynch <ntl@pobox.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@elte.hu>,
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
By writing a tasks's pid to the file, a process adds that task to that
cgroup/cpuset. But to add a cpu/mem to a cpuset, the new list of cpus
should be written to the cpuset.mems file which would replace the old list
of cpus. Make this clearer in the documentation.
Signed-off-by: Nikanth Karthikesan <knikanth@suse.de>
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Paul Menage <menage@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
balance_dirty_pages can overreact and move all of the dirty pages to
writeback unnecessarily.
balance_dirty_pages makes its decision to throttle based on the number of
dirty plus writeback pages that are over the calculated limit,so it will
continue to move pages even when there are plenty of pages in writeback
and less than the threshold still dirty.
This allows it to overshoot its limits and move all the dirty pages to
writeback while waiting for the drives to catch up and empty the writeback
list.
A simple fio test easily demonstrates this problem.
fio --name=f1 --directory=/disk1 --size=2G -rw=write --name=f2 --directory=/disk2 --size=1G --rw=write --startdelay=10
This is the simplest fix I could find, but I'm not entirely sure that it
alone will be enough for all cases. But it certainly is an improvement on
my desktop machine writing to 2 disks.
Do we need something more for machines with large arrays where
bdi_threshold * number_of_drives is greater than the dirty_ratio ?
Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>