Fix interrupt emulation code in kretprobe-booster according to
pt_regs update (es/ds change and gs adding).
This issue has been reported on systemtap-bugzilla:
http://sources.redhat.com/bugzilla/show_bug.cgi?id=9965
| On a -tip kernel on x86_32, kretprobe_example (from samples) triggers the
| following backtrace when its retprobing a class of functions that cause a
| copy_from/to_user().
|
| BUG: sleeping function called from invalid context at mm/memory.c:3196
| in_atomic(): 0, irqs_disabled(): 1, pid: 2286, name: cat
Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Tested-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Cc: systemtap-ml <systemtap@sources.redhat.com>
LKML-Reference: <49C7995C.2010601@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch move the timestamp from happening in the arch specific
code into the general code. This allows for better control by the tracer
to time manipulation.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
* 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6:
[S390] make page table upgrade work again
[S390] make page table walking more robust
[S390] Dont check for pfn_valid() in uaccess_pt.c
[S390] ftrace/mcount: fix kernel stack backchain
[S390] topology: define SD_MC_INIT to fix performance regression
[S390] __div64_31 broken for CONFIG_MARCH_G5
When I review the sensitive code ftrace_nmi_enter(), I found
the atomic variable nmi_running does protect NMI VS do_ftrace_mod_code(),
but it can not protects NMI(entered nmi) VS NMI(ftrace_nmi_enter()).
cpu#1 | cpu#2 | cpu#3
ftrace_nmi_enter() | do_ftrace_mod_code() |
not modify | |
------------------------|-----------------------|--
executing | set mod_code_write = 1|
executing --|-----------------------|--------------------
executing | | ftrace_nmi_enter()
executing | | do modify
------------------------|-----------------------|-----------------
ftrace_nmi_exit() | |
cpu#3 may be being modified the code which is still being executed on cpu#1,
it will have undefined results and possibly take a GPF, this patch
prevents it occurred.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
LKML-Reference: <49C0B411.30003@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
After TASK_SIZE now gives the current size of the address space the
upgrade of a 64 bit process from 3 to 4 levels of page table needs
to use the arch_mmap_check hook to catch large mmap lengths. The
get_unmapped_area* functions need to check for -ENOMEM from the
arch_get_unmapped_area*, upgrade the page table and retry.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Make page table walking on s390 more robust. The current code requires
that the pgd/pud/pmd/pte loop is only done for address ranges that are
below the end address of the last vma of the address space. But this
is not always true, e.g. the generic page table walker does not guarantee
this. Change TASK_SIZE/TASK_SIZE_OF to reflect the current size of the
address space. This makes the generic page table walker happy but it
breaks the upgrade of a 3 level page table to a 4 level page table.
To make the upgrade work again another fix is required.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
pfn_valid() actually checks for a valid struct page and not for a
valid pfn. Using xip mappings w/o struct pages, this will result in
-EFAULT returned by the (page table walk) user copy functions,
even though there is valid memory. Those user copy functions don't
need a struct page, so this patch just removes the pfn_valid() check.
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
With packed stack the backchain is at a different location.
Just use __SF_BACKCHAIN as an offset to store the backchain.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The default values for SD_MC_INIT cause an additional cpu usage of up
to 40% on some network benchmarks compared to the plain SD_CPU_INIT
values. So just define SD_MC_INIT to SD_CPU_INIT.
More tuning needs to be done.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The implementation of __div64_31 for G5 machines is broken. The comments
in __div64_31 are correct, only the code does not do what the comments
say. The part "If the remainder has overflown subtract base and increase
the quotient" is only partially realized, the base is subtracted correctly
but the quotient is only increased if the dividend had the last bit set.
Using the correct instruction fixes the problem.
Cc: stable@kernel.org
Reported-by: Frans Pop <elendil@planet.nl>
Tested-by: Frans Pop <elendil@planet.nl>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Don't boost at the addresses which are listed on exception tables,
because major page fault will occur on those addresses. In that case,
kprobes can not ensure that when instruction buffer can be freed since
some processes will sleep on the buffer.
kprobes-ia64 already has same check.
Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Since we now set _PAGE_COHERENT in the Linux PTE we shouldn't be clearing
it out before we setup the SW TLB. Today all the SW TLB machines
(603/e300) that we support are non-SMP, however there are some errata on
some devices that cause us to set _PAGE_COHERENT via CPU_FTR_NEED_COHERENT.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
BestComm, a DMA engine in MPC52xx SoC, requires snooping when
CPU caches are enabled to work properly.
Adding CPU_FTR_NEED_COHERENT fixes NFS problems on MPC52xx machines
introduced by 'powerpc/mm: Fix handling of _PAGE_COHERENT in BAT setup
code' (sha1: 4c456a67f5).
Signed-off-by: Piotr Ziecik <kosmo@semihalf.com>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
In order for ntpd to correctly synchronize the clocks, the frequency of
the system clock must not be off by more than 500 ppm (or, put another
way, 1:2000), or ntpd will end up giving up on trying to synchronize
properly, and ends up reseting the clock in jumps instead.
The fast TSC PIT calibration sometimes failed this test - it was
assuming that the PIT reads always took about one microsecond each (2us
for the two reads to get a 16-bit timer), and that calibrating TSC to
the PIT over 15ms should thus be sufficient to get much closer than
500ppm (max 2us error on both sides giving 4us over 15ms: a 270 ppm
error value).
However, that assumption does not always hold: apparently some hardware
is either very much slower at reading the PIT registers, or there was
other noise causing at least one machine to get 700+ ppm errors.
So instead of using a fixed 15ms timing loop, this changes the fast PIT
calibration to read the TSC delta over the individual PIT timer reads,
and use the result to calculate the error bars on the PIT read timing
properly. We then successfully calibrate the TSC only if the maximum
error bars fall below 500ppm.
In the process, we also relax the timing to allow up to 25ms for the
calibration, although it can happen much faster depending on hardware.
Reported-and-tested-by: Jesper Krogh <jesper@krogh.cc>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
During bootup, when we reprogram the PIT (programmable interval timer)
to start counting down from 0xffff in order to use it for the fast TSC
calibration, we should also make sure to delay a bit afterwards to allow
the PIT hardware to actually start counting with the new value.
That will happens at the next CLK pulse (1.193182 MHz), so the easiest
way to do that is to just wait at least one microsecond after
programming the new PIT counter value. We do that by just reading the
counter value back once - which will take about 2us on PC hardware.
Reported-and-tested-by: john stultz <johnstul@us.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* master.kernel.org:/home/rmk/linux-2.6-arm: (23 commits)
[ARM] Fix virtual to physical translation macro corner cases
[ARM] update mach-types
[ARM] 5421/1: ftrace: fix crash due to tracing of __naked functions
MX1 fix include
[ARM] 5419/1: ep93xx: fix build warnings about struct i2c_board_info
[ARM] 5418/1: restore lr before leaving mcount
ARM: OMAP: board-omap3beagle: set i2c-3 to 100kHz
ARM: OMAP: Allow I2C bus driver to be compiled as a module
ARM: OMAP: sched_clock() corrected
ARM: OMAP: Fix compile error if pm.h is included
[ARM] orion5x: pass dram mbus data to xor driver
[ARM] S3C64XX: Fix s3c64xx_setrate_clksrc
[ARM] S3C64XX: sparse warnings in arch/arm/plat-s3c64xx/irq.c
[ARM] S3C64XX: sparse warnings in arch/arm/plat-s3c64xx/s3c6400-clock.c
[ARM] S3C64XX: Fix USB host clock mux list
[ARM] S3C64XX: Fix name of USB host clock.
[ARM] S3C64XX: Rename IRQ_UHOST to IRQ_USBH
[ARM] S3C64XX: Do gpiolib configuration earlier
[ARM] S3C64XX: Staticise s3c64xx_init_irq_eint()
[ARM] SMDK6410: Declare iodesc table static
...
* 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus:
MIPS: Mark Eins: Fix configuration.
MIPS: Fix TIF_32BIT undefined problem when seccomp is disabled
Fix:
arch/x86/kernel/entry_32.S:446: Warning: 00000000080001d1 shortened to 00000000000001d1
arch/x86/kernel/entry_32.S:457: Warning: 000000000800feff shortened to 000000000000feff
arch/x86/kernel/entry_32.S:527: Warning: 00000000080001d1 shortened to 00000000000001d1
arch/x86/kernel/entry_32.S:541: Warning: 000000000800feff shortened to 000000000000feff
arch/x86/kernel/entry_32.S:676: Warning: 0000000008000091 shortened to 0000000000000091
TIF_SYSCALL_FTRACE is 0x08000000 and until now we checked the
first 16 bits of the work mask - bit 27 falls outside of that.
Update the entry_32.S code to check the full 32-bit mask.
[ %cx => %ecx fix from Cyrill Gorcunov <gorcunov@gmail.com> ]
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: "H. Peter Anvin" <hpa@kernel.org>
LKML-Reference: <1237012693.18733.3.camel@ht.satnam>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: build fix
kernel/built-in.o: In function `ftrace_syscall_exit':
(.text+0x76667): undefined reference to `syscall_nr_to_meta'
ftrace.o is built:
obj-$(CONFIG_DYNAMIC_FTRACE) += ftrace.o
obj-$(CONFIG_FUNCTION_GRAPH_TRACER) += ftrace.o
But now a CONFIG_FTRACE_SYSCALLS dependency is needed too.
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
LKML-Reference: <1236401580-5758-3-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Provide the x86 trace callbacks to trace syscalls.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
LKML-Reference: <1236401580-5758-3-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup, update to new cpumask API
Irq_desc.affinity and irq_desc.pending_mask are now cpumask_var_t's
so access to them should be using the new cpumask API.
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Kyle McMartin <kyle@mcmartin.ca>
Add braces around the macro arguments, else for example
"shl %r1, 5-3, %r2" would not expand to what you would assume.
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Kyle McMartin <kyle@mcmartin.ca>
Fix those compile warnings:
uaccess.h:244: warning: `struct pt_regs' declared inside parameter list
uaccess.h:244: warning: its scope is only this definition or declaration, which is probably not what you want
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Kyle McMartin <kyle@mcmartin.ca>
- convert a few "if (xx) BUG();" to BUG_ON(xx)
- remove a few printk()s, as we get a backtrace with BUG_ON() anyway
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Kyle McMartin <kyle@mcmartin.ca>
Convert the PS3 Video RAM Storage Driver from an MTD driver to a plain block
device driver.
The ps3vram driver exposes unused video RAM on the PS3 as a block device
suitable for storage or swap. Fast data transfer is achieved using a local
cache in system RAM and DMA transfers via the GPU.
The new driver is ca. 50% faster for reading, and ca. 10% for writing.
Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Acked-by: Geoff Levand <geoffrey.levand@am.sony.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Fix the following warning on x86_64:
LD vmlinux.o
MODPOST vmlinux.o
WARNING: vmlinux: 'memcpy' exported twice. Previous export was in vmlinux
For x86_64, this symbol is already exported from arch/um/sys-x86_64/ksyms.c.
Reported-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com>
Tested-by: Boaz Harrosh <bharrosh@panasas.com>
Cc: Jeff Dike <jdike@addtoit.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It is currently impossible to run a user-mode linux machine inside another
user-mode linux (UML on UML). It breaks after a few instructions. When
it tries to check whether SYSEMU is installed (the inner) UML receives an
inconsistent result (from the outer UML).
This is the output of a broken attempt:
$ ./linux mem=256m ubd0=cow
Locating the bottom of the address space ... 0x0
Locating the top of the address space ... 0xc0000000
Core dump limits :
soft - 0
hard - NONE
Checking that ptrace can change system call numbers...OK
Checking ptrace new tags for syscall emulation...unsupported
Checking syscall emulation patch for ptrace...check_sysemu : expected SIGTRAP, got status = 256
$
The problem is the following:
PTRACE_SYSCALL/SINGLESTEP is currently managed inside arch_ptrace for ARCH=um.
PTRACE_SYSEMU/SUSEMU_SINGLESTEP is not captured in arch_ptrace's switch,
therefore it is erroneously passed back to ptrace_request (in
kernel/ptrace).
This simple patch simply forces ptrace to return an error on
PTRACE_SYSEMU/SUSEMU_SINGLESTEP as it is unsupported on ARCH=um, and fixes
the problem.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Renzo Davoli <renzo@cs.unibo.it>
Reviewed-by: WANG Cong <xiyou.wangcong@gmail.com>
Cc: Jeff Dike <jdike@addtoit.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The current use of these macros works well when the conversion is
entirely linear. In this case, we can be assured that the following
holds true:
__va(p + s) - s = __va(p)
However, this is not always the case, especially when there is a
non-linear conversion (eg, when there is a 3.5GB hole in memory.)
In this case, if 's' is the size of the region (eg, PAGE_SIZE) and
'p' is the final page, the above is most definitely not true.
So, we must ensure that __va() and __pa() are only used with valid
kernel direct mapped RAM addresses. This patch tweaks the code
to achieve this.
Tested-by: Charles Moschel <fred99@carolina.rr.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>