kernel_optimize_test/arch/mips/kernel/perf_event.c

70 lines
1.8 KiB
C
Raw Normal View History

/*
* Linux performance counter support for MIPS.
*
* Copyright (C) 2010 MIPS Technologies, Inc.
* Author: Deng-Cheng Zhu
*
* This code is based on the implementation for ARM, which is in turn
* based on the sparc64 perf event code and the x86 code. Performance
* counter access is based on the MIPS Oprofile code. And the callchain
* support references the code of MIPS stacktrace.c.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 as
* published by the Free Software Foundation.
*/
#include <linux/perf_event.h>
#include <asm/stacktrace.h>
MIPS: Add support for hardware performance events (mipsxx) This patch adds the mipsxx Perf-events support based on the skeleton. Generic hardware events and cache events are now fully implemented for the 24K/34K/74K/1004K cores. To support other cores in mipsxx (such as R10000/SB1), the generic hardware event tables and cache event tables need to be filled out. To support other CPUs which have different PMU than mipsxx, such as RM9000 and LOONGSON2, the additional files perf_event_$cpu.c need to be created. Raw event is an important part of Perf-events. It helps the user collect performance data for events that are not listed as the generic hardware events and cache events but ARE supported by the CPU's PMU. This patch also adds this feature for mipsxx 24K/34K/74K/1004K. For how to use it, please refer to processor core software user's manual and the comments for mipsxx_pmu_map_raw_event() for more details. Please note that this is a "precise" implementation, which means the kernel will check whether the requested raw events are supported by this CPU and which hardware counters can be assigned for them. To test the functionality of Perf-event, you may want to compile the tool "perf" for your MIPS platform. You can refer to the following URL: http://www.linux-mips.org/archives/linux-mips/2010-10/msg00126.html You also need to customize the CFLAGS and LDFLAGS in tools/perf/Makefile for your libs, includes, etc. In case you encounter the boot failure in SMVP kernel on multi-threading CPUs, you may take a look at: http://www.linux-mips.org/git?p=linux-mti.git;a=commitdiff;h=5460815027d802697b879644c74f0e8365254020 Signed-off-by: Deng-Cheng Zhu <dengcheng.zhu@gmail.com> To: linux-mips@linux-mips.org Cc: a.p.zijlstra@chello.nl Cc: paulus@samba.org Cc: mingo@elte.hu Cc: acme@redhat.com Cc: jamie.iles@picochip.com Cc: ddaney@caviumnetworks.com Cc: matt@console-pimps.org Patchwork: https://patchwork.linux-mips.org/patch/1689/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org> create mode 100644 arch/mips/kernel/perf_event_mipsxx.c
2010-10-12 19:37:24 +08:00
/* Callchain handling code. */
/*
* Leave userspace callchain empty for now. When we find a way to trace
* the user stack callchains, we will add it here.
*/
static void save_raw_perf_callchain(struct perf_callchain_entry *entry,
unsigned long reg29)
{
unsigned long *sp = (unsigned long *)reg29;
unsigned long addr;
while (!kstack_end(sp)) {
addr = *sp++;
if (__kernel_text_address(addr)) {
MIPS, Perf-events: Work with the new callchain interface This is the MIPS part of the following commits by Frederic Weisbecker: - f72c1a931e311bb7780fee19e41a89ac42cab50e perf: Factorize callchain context handling Store the kernel and user contexts from the generic layer instead of archs, this gathers some repetitive code. - 56962b4449af34070bb1994621ef4f0265eed4d8 perf: Generalize some arch callchain code - Most archs use one callchain buffer per cpu, except x86 that needs to deal with NMIs. Provide a default perf_callchain_buffer() implementation that x86 overrides. - Centralize all the kernel/user regs handling and invoke new arch handlers from there: perf_callchain_user() / perf_callchain_kernel() That avoid all the user_mode(), current->mm checks and so... - Invert some parameters in perf_callchain_*() helpers: entry to the left, regs to the right, following the traditional (dst, src). - 70791ce9ba68a5921c9905ef05d23f62a90bc10c perf: Generalize callchain_store() callchain_store() is the same on every archs, inline it in perf_event.h and rename it to perf_callchain_store() to avoid any collision. This removes repetitive code. - c1a65932fd7216fdc9a0db8bbffe1d47842f862c perf: Drop unappropriate tests on arch callchains Drop the TASK_RUNNING test on user tasks for callchains as this check doesn't seem to make any sense. Also remove the tests for !current that is not supposed to happen and current->pid as this should be handled at the generic level, with exclude_idle attribute. Reported-by: Wu Zhangjin <wuzhangjin@gmail.com> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: David Daney <ddaney@caviumnetworks.com> Signed-off-by: Deng-Cheng Zhu <dengcheng.zhu@gmail.com> To: a.p.zijlstra@chello.nl To: will.deacon@arm.com Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org Cc: paulus@samba.org Cc: mingo@elte.hu Cc: acme@redhat.com Cc: dengcheng.zhu@gmail.com Cc: matt@console-pimps.org Cc: sshtylyov@mvista.com Patchwork: http://patchwork.linux-mips.org/patch/2014/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2011-01-21 16:19:20 +08:00
perf_callchain_store(entry, addr);
perf core: Allow setting up max frame stack depth via sysctl The default remains 127, which is good for most cases, and not even hit most of the time, but then for some cases, as reported by Brendan, 1024+ deep frames are appearing on the radar for things like groovy, ruby. And in some workloads putting a _lower_ cap on this may make sense. One that is per event still needs to be put in place tho. The new file is: # cat /proc/sys/kernel/perf_event_max_stack 127 Chaging it: # echo 256 > /proc/sys/kernel/perf_event_max_stack # cat /proc/sys/kernel/perf_event_max_stack 256 But as soon as there is some event using callchains we get: # echo 512 > /proc/sys/kernel/perf_event_max_stack -bash: echo: write error: Device or resource busy # Because we only allocate the callchain percpu data structures when there is a user, which allows for changing the max easily, its just a matter of having no callchain users at that point. Reported-and-Tested-by: Brendan Gregg <brendan.d.gregg@gmail.com> Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: David Ahern <dsahern@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: He Kuang <hekuang@huawei.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Milian Wolff <milian.wolff@kdab.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: Wang Nan <wangnan0@huawei.com> Cc: Zefan Li <lizefan@huawei.com> Link: http://lkml.kernel.org/r/20160426002928.GB16708@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-04-21 23:28:50 +08:00
if (entry->nr >= sysctl_perf_event_max_stack)
break;
}
}
}
MIPS, Perf-events: Work with the new callchain interface This is the MIPS part of the following commits by Frederic Weisbecker: - f72c1a931e311bb7780fee19e41a89ac42cab50e perf: Factorize callchain context handling Store the kernel and user contexts from the generic layer instead of archs, this gathers some repetitive code. - 56962b4449af34070bb1994621ef4f0265eed4d8 perf: Generalize some arch callchain code - Most archs use one callchain buffer per cpu, except x86 that needs to deal with NMIs. Provide a default perf_callchain_buffer() implementation that x86 overrides. - Centralize all the kernel/user regs handling and invoke new arch handlers from there: perf_callchain_user() / perf_callchain_kernel() That avoid all the user_mode(), current->mm checks and so... - Invert some parameters in perf_callchain_*() helpers: entry to the left, regs to the right, following the traditional (dst, src). - 70791ce9ba68a5921c9905ef05d23f62a90bc10c perf: Generalize callchain_store() callchain_store() is the same on every archs, inline it in perf_event.h and rename it to perf_callchain_store() to avoid any collision. This removes repetitive code. - c1a65932fd7216fdc9a0db8bbffe1d47842f862c perf: Drop unappropriate tests on arch callchains Drop the TASK_RUNNING test on user tasks for callchains as this check doesn't seem to make any sense. Also remove the tests for !current that is not supposed to happen and current->pid as this should be handled at the generic level, with exclude_idle attribute. Reported-by: Wu Zhangjin <wuzhangjin@gmail.com> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: David Daney <ddaney@caviumnetworks.com> Signed-off-by: Deng-Cheng Zhu <dengcheng.zhu@gmail.com> To: a.p.zijlstra@chello.nl To: will.deacon@arm.com Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org Cc: paulus@samba.org Cc: mingo@elte.hu Cc: acme@redhat.com Cc: dengcheng.zhu@gmail.com Cc: matt@console-pimps.org Cc: sshtylyov@mvista.com Patchwork: http://patchwork.linux-mips.org/patch/2014/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2011-01-21 16:19:20 +08:00
void perf_callchain_kernel(struct perf_callchain_entry *entry,
struct pt_regs *regs)
{
unsigned long sp = regs->regs[29];
#ifdef CONFIG_KALLSYMS
unsigned long ra = regs->regs[31];
unsigned long pc = regs->cp0_epc;
if (raw_show_trace || !__kernel_text_address(pc)) {
unsigned long stack_page =
(unsigned long)task_stack_page(current);
if (stack_page && sp >= stack_page &&
sp <= stack_page + THREAD_SIZE - 32)
save_raw_perf_callchain(entry, sp);
return;
}
do {
MIPS, Perf-events: Work with the new callchain interface This is the MIPS part of the following commits by Frederic Weisbecker: - f72c1a931e311bb7780fee19e41a89ac42cab50e perf: Factorize callchain context handling Store the kernel and user contexts from the generic layer instead of archs, this gathers some repetitive code. - 56962b4449af34070bb1994621ef4f0265eed4d8 perf: Generalize some arch callchain code - Most archs use one callchain buffer per cpu, except x86 that needs to deal with NMIs. Provide a default perf_callchain_buffer() implementation that x86 overrides. - Centralize all the kernel/user regs handling and invoke new arch handlers from there: perf_callchain_user() / perf_callchain_kernel() That avoid all the user_mode(), current->mm checks and so... - Invert some parameters in perf_callchain_*() helpers: entry to the left, regs to the right, following the traditional (dst, src). - 70791ce9ba68a5921c9905ef05d23f62a90bc10c perf: Generalize callchain_store() callchain_store() is the same on every archs, inline it in perf_event.h and rename it to perf_callchain_store() to avoid any collision. This removes repetitive code. - c1a65932fd7216fdc9a0db8bbffe1d47842f862c perf: Drop unappropriate tests on arch callchains Drop the TASK_RUNNING test on user tasks for callchains as this check doesn't seem to make any sense. Also remove the tests for !current that is not supposed to happen and current->pid as this should be handled at the generic level, with exclude_idle attribute. Reported-by: Wu Zhangjin <wuzhangjin@gmail.com> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: David Daney <ddaney@caviumnetworks.com> Signed-off-by: Deng-Cheng Zhu <dengcheng.zhu@gmail.com> To: a.p.zijlstra@chello.nl To: will.deacon@arm.com Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org Cc: paulus@samba.org Cc: mingo@elte.hu Cc: acme@redhat.com Cc: dengcheng.zhu@gmail.com Cc: matt@console-pimps.org Cc: sshtylyov@mvista.com Patchwork: http://patchwork.linux-mips.org/patch/2014/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2011-01-21 16:19:20 +08:00
perf_callchain_store(entry, pc);
perf core: Allow setting up max frame stack depth via sysctl The default remains 127, which is good for most cases, and not even hit most of the time, but then for some cases, as reported by Brendan, 1024+ deep frames are appearing on the radar for things like groovy, ruby. And in some workloads putting a _lower_ cap on this may make sense. One that is per event still needs to be put in place tho. The new file is: # cat /proc/sys/kernel/perf_event_max_stack 127 Chaging it: # echo 256 > /proc/sys/kernel/perf_event_max_stack # cat /proc/sys/kernel/perf_event_max_stack 256 But as soon as there is some event using callchains we get: # echo 512 > /proc/sys/kernel/perf_event_max_stack -bash: echo: write error: Device or resource busy # Because we only allocate the callchain percpu data structures when there is a user, which allows for changing the max easily, its just a matter of having no callchain users at that point. Reported-and-Tested-by: Brendan Gregg <brendan.d.gregg@gmail.com> Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: David Ahern <dsahern@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: He Kuang <hekuang@huawei.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Milian Wolff <milian.wolff@kdab.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: Wang Nan <wangnan0@huawei.com> Cc: Zefan Li <lizefan@huawei.com> Link: http://lkml.kernel.org/r/20160426002928.GB16708@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-04-21 23:28:50 +08:00
if (entry->nr >= sysctl_perf_event_max_stack)
break;
pc = unwind_stack(current, &sp, pc, &ra);
} while (pc);
#else
save_raw_perf_callchain(entry, sp);
#endif
}