It looks like most of the hugetlb code is doing the correct thing if
hugepages are not supported, but the mmap code is not. If we get into
the mmap code when hugepages are not supported, such as in an LPAR
which is running Active Memory Sharing, we can oops the kernel. This
fixes the oops being seen in this path.
oops: Kernel access of bad area, sig: 11 [#1]
SMP NR_CPUS=1024 NUMA pSeries
Modules linked in: nfs(N) lockd(N) nfs_acl(N) sunrpc(N) ipv6(N) fuse(N) loop(N)
dm_mod(N) sg(N) ibmveth(N) sd_mod(N) crc_t10dif(N) ibmvscsic(N)
scsi_transport_srp(N) scsi_tgt(N) scsi_mod(N)
Supported: No
NIP: c000000000038d60 LR: c00000000003945c CTR: c0000000000393f0
REGS: c000000077e7b830 TRAP: 0300 Tainted: G
(2.6.27.5-bz50170-2-ppc64)
MSR: 8000000000009032 <EE,ME,IR,DR> CR: 44000448 XER: 20000001
DAR: c000002000af90a8, DSISR: 0000000040000000
TASK = c00000007c1b8600[4019] 'hugemmap01' THREAD: c000000077e78000 CPU: 6
GPR00: 0000001fffffffe0 c000000077e7bab0 c0000000009a4e78 0000000000000000
GPR04: 0000000000010000 0000000000000001 00000000ffffffff 0000000000000001
GPR08: 0000000000000000 c000000000af90c8 0000000000000001 0000000000000000
GPR12: 000000000000003f c000000000a73880 0000000000000000 0000000000000000
GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000010000
GPR20: 0000000000000000 0000000000000003 0000000000010000 0000000000000001
GPR24: 0000000000000003 0000000000000000 0000000000000001 ffffffffffffffb5
GPR28: c000000077ca2e80 0000000000000000 c00000000092af78 0000000000010000
NIP [c000000000038d60] .slice_get_unmapped_area+0x6c/0x4e0
LR [c00000000003945c] .hugetlb_get_unmapped_area+0x6c/0x80
Call Trace:
[c000000077e7bbc0] [c00000000003945c] .hugetlb_get_unmapped_area+0x6c/0x80
[c000000077e7bc30] [c000000000107e30] .get_unmapped_area+0x64/0xd8
[c000000077e7bcb0] [c00000000010b140] .do_mmap_pgoff+0x140/0x420
[c000000077e7bd80] [c00000000000bf5c] .sys_mmap+0xc4/0x140
[c000000077e7be30] [c0000000000086b4] syscall_exit+0x0/0x40
Instruction dump:
fac1ffb0 fae1ffb8 fb01ffc0 fb21ffc8 fb41ffd0 fb61ffd8 fb81ffe0 fbc1fff0
fbe1fff8 f821fef1 f8c10158 f8e10160 <7d49002e> f9010168 e92d01b0 eb4902b0
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
* master.kernel.org:/home/rmk/linux-2.6-arm:
[ARM] 5348/1: fix documentation wrt location of the alignment trap interface
[ARM] Ensure linux/hardirqs.h is included where required
[ARM] fix kernel-doc syntax
[ARM] arch/arm/common/sa1111.c: Correct error handling code
[ARM] 5341/2: there is no copy_page on nommu ARM
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
Phonet: keep TX queue disabled when the device is off
SCHED: netem: Correct documentation comment in code.
netfilter: update rwlock initialization for nat_table
netlabel: Compiler warning and NULL pointer dereference fix
e1000e: fix double release of mutex
IA64: HP_SIMETH needs to depend upon NET
netpoll: fix race on poll_list resulting in garbage entry
ipv6: silence log messages for locally generated multicast
sungem: improve ethtool output with internal pcs and serdes
tcp: tcp_vegas cong avoid fix
sungem: Make PCS PHY support partially work again.
Otherwise those using it in transition patches (eg. kvm) can't compile
with CONFIG_SMP=n:
arch/x86/kvm/../../../virt/kvm/kvm_main.c: In function 'make_all_cpus_request':
arch/x86/kvm/../../../virt/kvm/kvm_main.c:380: error: implicit declaration of function 'smp_call_function_many'
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When a cgroup is removed, it's unlinked from its parent's children list,
but not actually freed until the last dentry on it is released (at which
point cgrp->root->number_of_cgroups is decremented).
Currently rebind_subsystems checks for the top cgroup's child list being
empty in order to rebind subsystems into or out of a hierarchy - this can
result in the set of subsystems bound to a hierarchy being
removed-but-not-freed cgroup.
The simplest fix for this is to forbid remounts that change the set of
subsystems on a hierarchy that has removed-but-not-freed cgroups. This
bug can be reproduced via:
mkdir /mnt/cg
mount -t cgroup -o ns,freezer cgroup /mnt/cg
mkdir /mnt/cg/foo
sleep 1h < /mnt/cg/foo &
rmdir /mnt/cg/foo
mount -t cgroup -o remount,ns,devices,freezer cgroup /mnt/cg
kill $!
Though the above will cause oops in -mm only but not mainline, but the bug
can cause memory leak in mainline (and even oops)
Signed-off-by: Paul Menage <menage@google.com>
Reviewed-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tyler Hicks and Dustin Kirkland are now the primary contact points for
eCryptfs issues that may arise from this point forward.
Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com>
Acked-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
Acked-by: Dustin Kirkland <kirkland@canonical.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The kmem_cache_create() function in the slob allocator passes the SLAB
flags as GFP flags to the slob_alloc() function. The patch changes this
call to pass GFP_KERNEL as the other allocators seem to do.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Matt Mackall <mpm@selenic.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The netem simulator is no longer limited by Linux timer resolution HZ.
Not since Patrick McHardy changed the QoS system to use hrtimer.
Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
The commit e099a17357
(netfilter: netns nat: per-netns NAT table) renamed the
nat_table from __nat_table to nat_table without updating the
__RW_LOCK_UNLOCKED(__nat_table.lock).
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
VMI initialiation can relocate the fixmap, causing early_ioremap to
malfunction if it is initialized before the relocation. To fix this,
VMI activation is split into two phases; the detection, which must
happen before setting up ioremap, and the activation, which must happen
after parsing early boot parameters.
This fixes a crash on boot when VMI is enabled under VMware.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This reverts commit 5b7dba4ff8, which
caused a regression in hibernate, reported by and bisected by Fabio
Comolli.
This revert fixes
http://bugzilla.kernel.org/show_bug.cgi?id=12155http://bugzilla.kernel.org/show_bug.cgi?id=12149
Bisected-by: Fabio Comolli <fabio.comolli@gmail.com>
Requested-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix kernel-doc notation to use correct syntax. Even though this should be
moved to where the function is actually implemented...
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
If it is reasonable to apply PTR_ERR to the result of calling clk_get, then
that result should first be tested with IS_ERR, not with !.
The semantic patch that makes this change is as follows:
(http://www.emn.fr/x-info/coccinelle/)
// <smpl>
@@
expression E,E1;
@@
if (
- E == NULL
+ IS_ERR(E)
) { <+... when != E = E1
PTR_ERR(E)
...+> }
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
According to http://bugzilla.kernel.org/show_bug.cgi?id=12206, Freecom
FireWire Hard Drive 1TB reports max_rom=2 but returns garbage if block
read requests are used to read the config ROM. Force max_rom=0 to limit
them to quadlet read requests.
Reported-by: Christian Mueller <cm1@mumac.de>
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
An example calling sequence which we did see:
copy_user_highpage -> kmap_atomic -> flush_tlb_page -> _tlbil_va
We got interrupted after setting up the MAS registers before the
tlbwe and the interrupt handler that caused the interrupt also did
a kmap_atomic (ide code) and thus on returning from the interrupt
the MAS registers no longer contained the proper values.
Since we dont save/restore MAS registers for normal interrupts we
need to disable interrupts in _tlbil_va to ensure atomicity.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
It's called under that lock everywhere else and it does alter the
request state, so it should be.
This one occurance in scsi_requeue_command() could open a window where
req->special is set to NULL while the requests is going through either
timeout or completion processing leading to NULL pointer derefs of the
sort complained of in bugzillas 12020 and 12195.
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
For the console, there is a 1:1 mapping of glyphs which cannot be found
in the current font. This seems to be meant as a kind of 'emergency
fallback' for fonts without unicode mapping which otherwise would
display nothing readable on the screen.
At the moment it affects all chars for which no substitution character
is defined. In particular this means that for all chars (>= 128) where
there is no iso88591-1/unicode character (e.g. control character area)
you'll get the very strange 1:1 mapping of the (cp437) graphics card
glyphs.
I'm pretty sure that the 1:1 mapping should only affect strict ASCII
code characters, i.e. chars < 128.
The patch limits the mapping as it probably was meant anyway.
Signed-off-by: Ingo Brueckl <ib@wupperonline.de>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Cc: Egmont Koblinger <egmont@uhulinux.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There is a major bug in the cp437 to unicode translation table. Char
0x7c is mapped to U+00a5 which is the Yen sign and wrong. The right
mapping is U+00a6 (broken bar).
Furthermore, a mapping for U+00b4 (a widely used character) is missing
even though easily possible.
The patch fixes these, as well as it provides a few other useful
mappings.
The changes are as follows:
0x0f (enhancement) enables a sort of currency symbol
0x27 (bug) enables a sort of acute accent which is a widely used character
0x44 (enhancement) enables a sort of icelandic capital letter eth
0x7c (major bug) corrects mapping
0xeb (enhancement) enables a sort of icelandic small letter eth
0xee (enhancement) enables a sort of math 'element of'
Signed-off-by: Ingo Brueckl <ib@wupperonline.de>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
dma_free_noncoherent() and dma_free_coherent() are missing calls to
plat_unmap_dma_mem(). This patch adds them.
Signed-off-by: David Daney <ddaney@caviumnetworks.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
The header path in the help text for the RUNTIME_DEBUG config option is
obsolete and needs to be updated to match the new location of
architecture-specific header files. While at it, fix the spelling mistake.
Signed-off-by: Dmitri Vorobiev <dmitri.vorobiev@movial.fi>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
For MIPS R2, use the EI and DI instructions to enable and disable
interrupts.
Signed-off-by: Tomaso Paoletti <tpaoletti@caviumnetworks.com>
Signed-off-by: David Daney <ddaney@caviumnetworks.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
The test-unit-ready portion of this patch was causing boots to fail on
my test machine (as in http://lkml.org/lkml/2008/12/5/161). With this
patch in place, the system is booting reliably.
Mike Anderson found the same problem in the hp_hw_start_stop code,
and I applied the same solution in cdrom_read_cdda_bpc.
Signed-off-by: Alan D. Brunelle <alan.brunelle@hp.com>
Cc: Mike Anderson <andmike@linux.vnet.ibm.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Fix the two compiler warnings show below. Thanks to Geert Uytterhoeven for
finding and reporting the problem.
net/netlabel/netlabel_unlabeled.c:567: warning: 'entry' may be used
uninitialized in this function
net/netlabel/netlabel_unlabeled.c:629: warning: 'entry' may be used
uninitialized in this function
Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
During a reset, releasing the swflag after it failed to be acquired would
cause a double unlock of the mutex. Instead, test whether acquisition of
the swflag was successful and if not, do not release the swflag. The reset
must still be done to bring the device to a quiescent state.
This resolves [BUG 12200] BUG: bad unlock balance detected! e1000e
http://bugzilla.kernel.org/show_bug.cgi?id=12200
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The cuboot-acadia.c wrapper can cause assembler errors on some
toolchains due to the lack of the proper BOOTCFLAGS. This adds
the proper flags for the file.
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
tmp is used as host-endian and is loaded from a be64, fix the cast and the
endian accessor used.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
... as it is defined with memcpy, therefore no copy_page symbol to
export.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
This reverts commit b1ee26bab1, along with
the "fixes" for it that all just caused problems:
- c4c6fa9891 "radeonfb: fix problem with
color expansion & alignment"
- f3179748a1 "radeonfb: Disable new color
expand acceleration unless explicitely enabled"
because even when disabled, it breaks for people. See
http://bugzilla.kernel.org/show_bug.cgi?id=12191
for the latest example.
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: David S. Miller <davem@davemloft.net>
Cc: Krzysztof Halasa <khc@pm.waw.pl>
Cc: James Cloos <cloos@jhcloos.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Krzysztof Helt <krzysztof.h1@poczta.fm>
Cc: Jean-Luc Coulon <jean.luc.coulon@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Lee Schermerhorn noticed yesterday that I broke the mapping_writably_mapped
test in 2.6.7! Bad bad bug, good good find.
The i_mmap_writable count must be incremented for VM_SHARED (just as
i_writecount is for VM_DENYWRITE, but while holding the i_mmap_lock)
when dup_mmap() copies the vma for fork: it has its own more optimal
version of __vma_link_file(), and I missed this out. So the count
was later going down to 0 (dangerous) when one end unmapped, then
wrapping negative (inefficient) when the other end unmapped.
The only impact on x86 would have been that setting a mandatory lock on
a file which has at some time been opened O_RDWR and mapped MAP_SHARED
(but not necessarily PROT_WRITE) across a fork, might fail with -EAGAIN
when it should succeed, or succeed when it should fail.
But those architectures which rely on flush_dcache_page() to flush
userspace modifications back into the page before the kernel reads it,
may in some cases have skipped the flush after such a fork - though any
repetitive test will soon wrap the count negative, in which case it will
flush_dcache_page() unnecessarily.
Fix would be a two-liner, but mapping variable added, and comment moved.
Reported-by: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The last patch to lib/idr.c caused a bug if idr_get_new_above() was
called on an empty idr.
Usually, nodes stay on the same layer. New layers are added to the top
of the tree.
The exception is idr_get_new_above() on an empty tree: In this case, the
new root node is first added on layer 0, then moved upwards. p->layer
was not updated.
As usual: You shall never rely on the source code comments, they will
only mislead you.
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Give the correct size when reserving the interrupt vector table. It should be
a page not a single byte.
Signed-off-by: Akira Takeuchi <takeuchi.akr@jp.panasonic.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix __put_user_asm8() by jumping to the end label (3:) from the exception
handler, rather than jumping back to retry the second store instruction (label
2:).
Signed-off-by: Akira Takeuchi <takeuchi.akr@jp.panasonic.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix the preemption resume_kernel() routine by inverting the test to see
whether interrupts are off (IM7 is all enabled, not all disabled).
Furthermore, interrupts should be disabled on entry to resume_kernel() so that
they're correctly set for jumping to restore_all() and doing the need
reschedule test.
Signed-off-by: Akira Takeuchi <takeuchi.akr@jp.panasonic.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Discard low-prioriy Tx interrupts when closing an MN10300 on-chip serial port.
The MN10300 on-chip serial port uses three interrupts to manage its serial
ports:
(1) A very high priority interrupt that drives virtual DMA for Rx.
(2) A very high priority interrupt that drives virtual DMA for Tx.
(3) A normal priority virtual interrupt that does the normal UART interrupt
stuff and is shared between Rx and Tx.
mn10300_serial_stop_tx() only disables the high priority Tx interrupt. It
doesn't also disable the normal priority one because it is shared with Rx.
However, the high priority interrupt may interrupt local_irq_disabled()
sections, and so may have queued up a low priority virtual interrupt whilst the
UART driver is asking for the Tx interrupt to be disabled.
The result of this can be an oops when we try to process the interrupt in
mn10300_serial_transmit_interrupt() as port->uart.info and port->uart.info->tty
may have gone away.
To deal with this, if either of those pointers is NULL, we make sure the
high-priority Tx interrupt is disabled and discard the interrupt. The low
priority interrupt is disabled by the mn10300_serial_pic irq_chip table.
Signed-off-by: Akira Takeuchi <takeuchi.akr@jp.panasonic.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>