Commit Graph

497 Commits

Author SHA1 Message Date
Benjamin Herrenschmidt
7dafd239ab Merge commit 'origin/master' into next 2009-06-15 10:36:54 +10:00
Sankar P
5cdcd9d691 trivial: spelling fix in ppc code comments
Fixes a trivial spelling error in powerpc code comments.

Signed-off-by: Sankar P <sankar.curiosity@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2009-06-12 18:01:47 +02:00
Benjamin Herrenschmidt
bc47ab0241 Merge commit 'origin/master' into next
Manual merge of:
	arch/powerpc/kernel/asm-offsets.c
2009-06-12 16:53:38 +10:00
Peter Zijlstra
f4dbfa8f31 perf_counter: Standardize event names
Pure renames only, to PERF_COUNT_HW_* and PERF_COUNT_SW_*.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-11 17:54:15 +02:00
Benjamin Herrenschmidt
944916858a powerpc: Shield code specific to 64-bit server processors
This is a random collection of added ifdef's around portions of
code that only mak sense on server processors. Using either
CONFIG_PPC_STD_MMU_64 or CONFIG_PPC_BOOK3S as seems appropriate.

This is meant to make the future merging of Book3E 64-bit support
easier.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-06-09 16:47:38 +10:00
Benjamin Herrenschmidt
d3f6204a7d powerpc: Set init_bootmem_done on NUMA platforms as well
For some obscure reason, we only set init_bootmem_done after initializing
bootmem when NUMA isn't enabled. We even document this next to the declaration
of that global in system.h which of course I didn't read before I had to
debug why some WIP code wasn't working properly...

This patch changes it so that we always set it after bootmem is initialized
which should have always been the case... go figure !

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-06-09 16:43:04 +10:00
Benjamin Herrenschmidt
b46b6942b3 powerpc/mm: Fix a AB->BA deadlock scenario with nohash MMU context lock
The MMU context_lock can be taken from switch_mm() while the
rq->lock is held. The rq->lock can also be taken from interrupts,
thus if we get interrupted in destroy_context() with the context
lock held and that interrupt tries to take the rq->lock, there's
a possible deadlock scenario with another CPU having the rq->lock
and calling switch_mm() which takes our context lock.

The fix is to always ensure interrupts are off when taking our
context lock. The switch_mm() path is already good so this fixes
the destroy_context() path.

While at it, turn the context lock into a new style spinlock.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-06-09 16:43:04 +10:00
Benjamin Herrenschmidt
3035c8634f powerpc/mm: Fix some SMP issues with MMU context handling
This patch fixes a couple of issues that can happen as a result
of steal_context() dropping the context_lock when all possible
PIDs are ineligible for stealing (hopefully an extremely hard to
hit occurence).

This case exposes the possibility of a stale context_mm[] entry
to be seen since destroy_context() doesn't clear it and the free
map isn't re-tested. It also means steal_context() will not notice
a context freed while the lock was help, thus possibly trying to
steal a context when a free one was available.

This fixes it by always returning to the caller from steal_context
when it dropped the lock with a return value that causes the
caller to re-samble the number of free contexts, along with
properly clearing the context_mm[] array for destroyed contexts.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-06-09 16:42:21 +10:00
Ingo Molnar
23db9f430b Merge branch 'linus' into perfcounters/core
Merge reason: merge almost-rc8 into perfcounters/core, which was -rc6
              based - to pick up the latest upstream fixes.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-01 10:01:39 +02:00
Benjamin Herrenschmidt
435462c6e6 Merge branch 'merge' into next 2009-05-29 13:54:52 +10:00
Benjamin Herrenschmidt
8b31e49d1d powerpc: Fix up dma_alloc_coherent() on platforms without cache coherency.
The implementation we just revived has issues, such as using a
Kconfig-defined virtual address area in kernel space that nothing
actually carves out (and thus will overlap whatever is there),
or having some dependencies on being self contained in a single
PTE page which adds unnecessary constraints on the kernel virtual
address space.

This fixes it by using more classic PTE accessors and automatically
locating the area for consistent memory, carving an appropriate hole
in the kernel virtual address space, leaving only the size of that
area as a Kconfig option. It also brings some dma-mask related fixes
from the ARM implementation which was almost identical initially but
grew its own fixes.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-05-27 16:33:59 +10:00
Benjamin Herrenschmidt
f637a49e50 powerpc: Minor cleanups of kernel virt address space definitions
Make FIXADDR_TOP a compile time constant and cleanup a
couple of definitions relative to the layout of the kernel
address space on ppc32. We also print out that layout at
boot time for debugging purposes.

This is a pre-requisite for properly fixing non-coherent
DMA allocactions.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-05-27 16:32:50 +10:00
Benjamin Herrenschmidt
b16e7766d6 powerpc: Move dma-noncoherent.c from arch/powerpc/lib to arch/powerpc/mm
(pre-requisite to make the next patches more palatable)

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-05-27 16:32:05 +10:00
Hideo Saito
8e35961b57 powerpc/mm: Fix broken MMU PID stealing on !SMP
The recent rework of the MMU PID handling for non-hash CPUs has a
subtle bug in the !SMP "optimized" variant of the PID stealing
function.  It clears the PID in the mm context before it calls
local_flush_tlb_mm(). However, the later will not flush anything
if the PID in the context is clear...

Signed-off-by: Hideo Saito <hsaito.ppc@gmail.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-05-26 13:46:49 +10:00
Milton Miller
60dbf43851 powerpc: Add 2.06 tlbie mnemonics
This adds the PowerPC 2.06 tlbie mnemonics and keeps backwards
compatibilty for CPUs before 2.06.

Only useful for bare metal systems.

Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-05-21 15:44:21 +10:00
Ingo Molnar
dc3f81b129 Merge commit 'v2.6.30-rc6' into perfcounters/core
Merge reason: this branch was on an -rc4 base, merge it up to -rc6
              to get the latest upstream fixes.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-18 07:37:49 +02:00
Mel Gorman
af3e4aca47 powerpc: Do not assert pte_locked for hugepage PTE entries
With CONFIG_DEBUG_VM, an assertion is made when changing the protection
flags of a PTE that the PTE is locked. Huge pages use a different pagetable
format and the assertion is bogus and will always trigger with a bug looking
something like

 Unable to handle kernel paging request for data at address 0xf1a00235800006f8
 Faulting instruction address: 0xc000000000034a80
 Oops: Kernel access of bad area, sig: 11 [#1]
 SMP NR_CPUS=32 NUMA Maple
 Modules linked in: dm_snapshot dm_mirror dm_region_hash
  dm_log dm_mod loop evdev ext3 jbd mbcache sg sd_mod ide_pci_generic
  pata_amd ata_generic ipr libata tg3 libphy scsi_mod windfarm_pid
  windfarm_smu_sat windfarm_max6690_sensor windfarm_lm75_sensor
  windfarm_cpufreq_clamp windfarm_core i2c_powermac
 NIP: c000000000034a80 LR: c000000000034b18 CTR: 0000000000000003
 REGS: c000000003037600 TRAP: 0300   Not tainted (2.6.30-rc3-autokern1)
 MSR: 9000000000009032 <EE,ME,IR,DR>  CR: 28002484  XER: 200fffff
 DAR: f1a00235800006f8, DSISR: 0000000040010000
 TASK = c0000002e54cc740[2960] 'map_high_trunca' THREAD: c000000003034000 CPU: 2
 GPR00: 4000000000000000 c000000003037880 c000000000895d30 c0000002e5a2e500
 GPR04: 00000000a0000000 c0000002edc40880 0000005700000393 0000000000000001
 GPR08: f000000011ac0000 01a00235800006e8 00000000000000f5 f1a00235800006e8
 GPR12: 0000000028000484 c0000000008dd780 0000000000001000 0000000000000000
 GPR16: fffffffffffff000 0000000000000000 00000000a0000000 c000000003037a20
 GPR20: c0000002e5f4ece8 0000000000001000 c0000002edc40880 0000000000000000
 GPR24: c0000002e5f4ece8 0000000000000000 00000000a0000000 c0000002e5f4ece8
 GPR28: 0000005700000393 c0000002e5a2e500 00000000a0000000 c000000003037880
 NIP [c000000000034a80] .assert_pte_locked+0xa4/0xd0
 LR [c000000000034b18] .ptep_set_access_flags+0x6c/0xb4
 Call Trace:
 [c000000003037880] [c000000003037990] 0xc000000003037990 (unreliable)
 [c000000003037910] [c000000000034b18] .ptep_set_access_flags+0x6c/0xb4
 [c0000000030379b0] [c00000000014bef8] .hugetlb_cow+0x124/0x674
 [c000000003037b00] [c00000000014c930] .hugetlb_fault+0x4e8/0x6f8
 [c000000003037c00] [c00000000013443c] .handle_mm_fault+0xac/0x828
 [c000000003037cf0] [c0000000000340a8] .do_page_fault+0x39c/0x584
 [c000000003037e30] [c0000000000057b0] handle_page_fault+0x20/0x5c
 Instruction dump:
 7d29582a 7d200074 7800d182 0b000000 3c004000 3960ffff 780007c6 796b00c4
 7d290214 7929a302 1d290068 7d6b4a14 <800b0010> 7c000074 7800d182 0b000000

This patch fixes the problem by not asseting the PTE is locked for VMAs
backed by huge pages.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-05-18 15:19:04 +10:00
Becky Bruce
49a8496525 powerpc: Allow mem=x cmdline to work with 4G+
We're currently choking on mem=4g (and above) due to memory_limit
being specified as an unsigned long. Make memory_limit
phys_addr_t to fix this.

Signed-off-by: Becky Bruce <beckyb@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-05-15 16:43:41 +10:00
Ingo Molnar
e7fd5d4b3d Merge branch 'linus' into perfcounters/core
Merge reason: This brach was on -rc1, refresh it to almost-rc4 to pick up
              the latest upstream fixes.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-29 14:47:05 +02:00
Stephen Rothwell
b62c31ae40 powerpc: fix for long standing bug noticed by gcc 4.4.0
Previous gcc versions didn't notice this because one of the preceding
#ifs always evaluated to true.

gcc 4.4.0 produced this error:

arch/powerpc/mm/tlb_nohash_low.S:206:6: error: #elif with no expression

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2009-04-23 08:52:16 -05:00
Kumar Gala
323d23aeac Revert "powerpc: Add support for early tlbilx opcode"
This reverts commit e996557740.  Our HW
guys were able to fix this so it never sees the light of day.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2009-04-23 08:51:22 -05:00
Michael Ellerman
24f1ce803c powerpc: Fix crash on CPU hotplug
early_init_mmu_secondary() is called at CPU hotplug time, so it
must be marked as __cpuinit, not __init.

Caused by 757c74d2 ("powerpc/mm: Introduce early_init_mmu() on 64-bit").

Tested-by: Sachin Sant <sachinp@in.ibm.com>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2009-04-22 14:56:34 +10:00
Peter Zijlstra
78f13e9525 perf_counter: allow for data addresses to be recorded
Paul suggested we allow for data addresses to be recorded along with
the traditional IPs as power can provide these.

For now, only the software pagefault events provide data addresses,
but in the future power might as well for some events.

x86 doesn't seem capable of providing this atm.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
LKML-Reference: <20090408130409.394816925@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-08 19:05:56 +02:00
Kumar Gala
52ce67f157 powerpc/mm: Fix compile warning
arch/powerpc/mm/tlb_nohash.c: In function 'flush_tlb_mm':
arch/powerpc/mm/tlb_nohash.c:128: warning: unused variable 'cpu_mask'

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2009-04-07 22:11:10 -05:00
Kumar Gala
e996557740 powerpc: Add support for early tlbilx opcode
During the ISA 2.06 development the opcode for tlbilx changed and some
early implementations used to old opcode.  Add support for a MMU_FTR
fixup to deal with this.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2009-04-07 01:36:30 -05:00
Peter Zijlstra
ac17dc8e58 perf_counter: provide major/minor page fault software events
Provide separate sw counters for major and minor page faults.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-06 09:29:40 +02:00
Peter Zijlstra
7dd1fcc258 perf_counter: provide pagefault software events
We use the generic software counter infrastructure to provide
page fault events.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-06 09:29:37 +02:00
Benjamin Herrenschmidt
757c74d298 powerpc/mm: Introduce early_init_mmu() on 64-bit
This moves some MMU related init code out of setup_64.c into hash_utils_64.c
and calls it early_init_mmu() and early_init_mmu_secondary(). This will
make it easier to plug in a new MMU type.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-03-24 13:47:34 +11:00
Benjamin Herrenschmidt
ff7c660092 powerpc/mm: Fix printk type warning in mmu_context_nohash
We need to use %zu instead of %d when printing a sizeof()

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-03-24 13:47:34 +11:00
Benjamin Herrenschmidt
d62cbf45a8 powerpc/mm: Rename arch/powerpc/kernel/mmap.c to mmap_64.c
This file is only useful on 64-bit, so we name it accordingly.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-03-24 13:47:33 +11:00
Benjamin Herrenschmidt
8d1cf34e7a powerpc/mm: Tweak PTE bit combination definitions
This patch tweaks the way some PTE bit combinations are defined, in such a
way that the 32 and 64-bit variant become almost identical and that will
make it easier to bring in a new common pte-* file for the new variant
of the Book3-E support.

The combination of bits defining access to kernel pages are now clearly
separated from the combination used by userspace and the core VM. The
resulting generated code should remain identical unless I made a mistake.

Note: While at it, I removed a non-sensical statement related to CONFIG_KGDB
in ppc_mmu_32.c which could cause kernel mappings to be user accessible when
that option is enabled. Probably something that bitrot.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-03-24 13:47:33 +11:00
Rusty Russell
56aa4129e8 cpumask: Use mm_cpumask() wrapper instead of cpu_vm_mask
Makes code futureproof against the impending change to mm->cpu_vm_mask.

It's also a chance to use the new cpumask_ ops which take a pointer
(the older ones are deprecated, but there's no hurry for arch code).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-03-24 13:47:29 +11:00
Benjamin Herrenschmidt
9e5efaa936 powerpc/mm: Properly wire up get_user_pages_fast() on 32-bit
While we did add support for _PAGE_SPECIAL on some 32-bit platforms,
we never actually built get_user_pages_fast() on them. This fixes
it which requires a little bit of ifdef'ing around.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-03-11 17:11:34 +11:00
Benjamin Herrenschmidt
1cdab55d8a powerpc: Wire up /proc/vmallocinfo to our ioremap()
This adds the necessary bits and pieces to powerpc implementation of
ioremap to benefit from caller tracking in /proc/vmallocinfo, at least
for ioremap's done after mem init as the older ones aren't tracked.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-03-11 17:10:14 +11:00
Kumar Gala
c3071951d0 powerpc/fsl-booke: Add support for tlbilx instructions
The e500mc core supports the new tlbilx instructions that do core
local invalidates and also provide us the ability to take down
all TLB entries matching a given PID.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2009-03-09 09:25:38 -05:00
Anton Blanchard
002b0ec73d powerpc: Increase stack gap on 64bit binaries
On 64bit there is a possibility our stack and mmap randomisation will put
the two close enough such that we can't expand our stack to match the ulimit
specified.

To avoid this, start the upper mmap address at 1GB + 128MB below the top of our
address space, so in the worst case we end up with the same ~128MB hole as in
32bit. This works because we randomise the stack over a 1GB range.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-23 15:53:21 +11:00
Anton Blanchard
a5adc91a4b powerpc: Ensure random space between stack and mmaps
get_random_int() returns the same value within a 1 jiffy interval. This means
that the mmap and stack regions will almost always end up the same distance
apart, making a relative offset based attack possible.

To fix this, shift the randomness we use for the mmap region by 1 bit.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-23 15:53:21 +11:00
Anton Blanchard
9f14c42d75 powerpc: Randomise mmap start address
Randomise mmap start address - 8MB on 32bit and 1GB on 64bit tasks.
Until ppc32 uses the mmap.c functionality, this is ppc64 specific.

Before:

# ./test & cat /proc/${!}/maps|tail -2|head -1
f75fe000-f7fff000 rw-p f75fe000 00:00 0
f75fe000-f7fff000 rw-p f75fe000 00:00 0
f75fe000-f7fff000 rw-p f75fe000 00:00 0
f75fe000-f7fff000 rw-p f75fe000 00:00 0
f75fe000-f7fff000 rw-p f75fe000 00:00 0

After:
# ./test & cat /proc/${!}/maps|tail -2|head -1
f718b000-f7b8c000 rw-p f718b000 00:00 0
f7551000-f7f52000 rw-p f7551000 00:00 0
f6ee7000-f78e8000 rw-p f6ee7000 00:00 0
f74d4000-f7ed5000 rw-p f74d4000 00:00 0
f6e9d000-f789e000 rw-p f6e9d000 00:00 0

Similar for 64bit, but with 1GB of scatter:
# ./test & cat /proc/${!}/maps|tail -2|head -1
fffb97b5000-fffb97b6000 rw-p fffb97b5000 00:00 0
fffce9a3000-fffce9a4000 rw-p fffce9a3000 00:00 0
fffeaaf2000-fffeaaf3000 rw-p fffeaaf2000 00:00 0
fffd88ac000-fffd88ad000 rw-p fffd88ac000 00:00 0
fffbc62e000-fffbc62f000 rw-p fffbc62e000 00:00 0

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-23 15:53:07 +11:00
Anton Blanchard
13a2cb3694 powerpc: Rearrange mmap.c
Rearrange mmap.c to better match the x86 version.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-23 15:53:06 +11:00
Nathan Fontenot
0f16ef7fd3 powerpc/numa: Cleanup hot_add_scn_to_nid
This patch reworks the hot_add_scn_to_nid and its supporting functions
to make them easier to understand.  There are no functional changes in
this patch and has been tested on machine with memory represented in the
device tree as memory nodes and in the ibm,dynamic-memory property.

My previous patch that introduced support for hotplug memory add on
systems whose memory was represented by the ibm,dynamic-memory property
of the device tree only left the code more unintelligible.  This
will hopefully makes things easier to understand.

Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-23 15:53:04 +11:00
Anton Blanchard
13870b6575 powerpc/mm: Reduce hashtable size when using 64kB pages
At the moment we size the hashtable based on 4kB pages / 2, even on a
64kB kernel. This results in a hashtable that is much larger than it
needs to be.

Grab the real page size and size the hashtable based on that

Note: This only has effect on non hypervisor machines.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-23 10:48:58 +11:00
Benjamin Herrenschmidt
3b7faeb49e Merge commit 'kumar/next' into next 2009-02-18 13:23:30 +11:00
Benjamin Herrenschmidt
82a0a1cc8f Merge commit 'origin/master' into next
Manual merge of:
	arch/powerpc/include/asm/pgtable-ppc32.h
2009-02-18 13:19:25 +11:00
Dave Hansen
06eccea6c3 powerpc/mm: Fix numa reserve bootmem page selection
Fix the powerpc NUMA reserve bootmem page selection logic.

commit 8f64e1f2d1 (powerpc: Reserve
in bootmem lmb reserved regions that cross NUMA nodes) changed
the logic for how the powerpc LMB reserved regions were converted
to bootmen reserved regions.  As the folowing discussion reports,
the new logic was not correct.

mark_reserved_regions_for_nid() goes through each LMB on the
system that specifies a reserved area.  It searches for
active regions that intersect with that LMB and are on the
specified node.  It attempts to bootmem-reserve only the area
where the active region and the reserved LMB intersect.  We
can not reserve things on other nodes as they may not have
bootmem structures allocated, yet.

We base the size of the bootmem reservation on two possible
things.  Normally, we just make the reservation start and
stop exactly at the start and end of the LMB.

However, the LMB reservations are not aware of NUMA nodes and
on occasion a single LMB may cross into several adjacent
active regions.  Those may even be on different NUMA nodes
and will require separate calls to the bootmem reserve
functions.  So, the bootmem reservation must be trimmed to
fit inside the current active region.

That's all fine and dandy, but we trim the reservation
in a page-aligned fashion.  That's bad because we start the
reservation at a non-page-aligned address: physbase.

The reservation may only span 2 bytes, but that those bytes
may span two pfns and cause a reserve_size of 2*PAGE_SIZE.

Take the case where you reserve 0x2 bytes at 0x0fff and
where the active region ends at 0x1000.  You'll jump into
that if() statment, but node_ar.end_pfn=0x1 and
start_pfn=0x0.  You'll end up with a reserve_size=0x1000,
and then call

  reserve_bootmem_node(node, physbase=0xfff, size=0x1000);

0x1000 may not be on the same node as 0xfff.  Oops.

In almost all the vm code, end_<anything> is not inclusive.
If you have an end_pfn of 0x1234, page 0x1234 is not
included in the range.  Using PFN_UP instead of the
(>> >> PAGE_SHIFT) will make this consistent with the other VM
code.

We also need to do math for the reserved size with physbase
instead of start_pfn.  node_ar.end_pfn << PAGE_SHIFT is
*precisely* the end of the node.  However,
(start_pfn << PAGE_SHIFT) is *NOT* precisely the beginning
of the reserved area.  That is, of course, physbase.
If we don't use physbase here, the reserve_size can be
made too large.

From: Dave Hansen <dave@linux.vnet.ibm.com>
Tested-by: Geoff Levand <geoffrey.levand@am.sony.com>  Tested on PS3.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-13 16:37:45 +11:00
Kumar Gala
96a8bac589 powerpc/fsl-booke: Fix compile warning
arch/powerpc/mm/fsl_booke_mmu.c: In function 'adjust_total_lowmem':
arch/powerpc/mm/fsl_booke_mmu.c:221: warning: format '%ld' expects type 'long int', but argument 3 has type 'phys_addr_t'

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2009-02-12 16:54:53 -06:00
Kumar Gala
d66c82ea45 powerpc/fsl-booke: Add new ISA 2.06 page sizes and MAS defines
The Power ISA 2.06 added power of two page sizes to the embedded MMU
architecture.  Its done it such a way to be code compatiable with the
existing HW.  Made the minor code changes to support both power of two
and power of four page sizes.  Also added some new MAS bits and macros
that are defined as part of the 2.06 ISA.  Renamed some things to use
the 'Book-3e' concept to convey the new MMU that is based on the
Freescale Book-E MMU programming model.

Note, its still invalid to try and use a page size that isn't supported
by cpu.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2009-02-12 16:37:11 -06:00
Kumar Gala
f99fb8a2cb powerpc/mm: Fix _PAGE_COHERENT support on classic ppc32 HW
The following commit:

commit 64b3d0e812
Author: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Date:   Thu Dec 18 19:13:51 2008 +0000

    powerpc/mm: Rework usage of _PAGE_COHERENT/NO_CACHE/GUARDED

broke setting of the _PAGE_COHERENT bit in the PPC HW PTE.  Since we now
actually set _PAGE_COHERENT in the Linux PTE we shouldn't be clearing it
out before we propogate it to the PPC HW PTE.

Reported-by: Martyn Welch <martyn.welch@gefanuc.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-11 16:07:02 +11:00
Benjamin Herrenschmidt
8d30c14cab powerpc/mm: Rework I$/D$ coherency (v3)
This patch reworks the way we do I and D cache coherency on PowerPC.

The "old" way was split in 3 different parts depending on the processor type:

   - Hash with per-page exec support (64-bit and >= POWER4 only) does it
at hashing time, by preventing exec on unclean pages and cleaning pages
on exec faults.

   - Everything without per-page exec support (32-bit hash, 8xx, and
64-bit < POWER4) does it for all page going to user space in update_mmu_cache().

   - Embedded with per-page exec support does it from do_page_fault() on
exec faults, in a way similar to what the hash code does.

That leads to confusion, and bugs. For example, the method using update_mmu_cache()
is racy on SMP where another processor can see the new PTE and hash it in before
we have cleaned the cache, and then blow trying to execute. This is hard to hit but
I think it has bitten us in the past.

Also, it's inefficient for embedded where we always end up having to do at least
one more page fault.

This reworks the whole thing by moving the cache sync into two main call sites,
though we keep different behaviours depending on the HW capability. The call
sites are set_pte_at() which is now made out of line, and ptep_set_access_flags()
which joins the former in pgtable.c

The base idea for Embedded with per-page exec support, is that we now do the
flush at set_pte_at() time when coming from an exec fault, which allows us
to avoid the double fault problem completely (we can even improve the situation
more by implementing TLB preload in update_mmu_cache() but that's for later).

If for some reason we didn't do it there and we try to execute, we'll hit
the page fault, which will do a minor fault, which will hit ptep_set_access_flags()
to do things like update _PAGE_ACCESSED or _PAGE_DIRTY if needed, we just make
this guys also perform the I/D cache sync for exec faults now. This second path
is the catch all for things that weren't cleaned at set_pte_at() time.

For cpus without per-pag exec support, we always do the sync at set_pte_at(),
thus guaranteeing that when the PTE is visible to other processors, the cache
is clean.

For the 64-bit hash with per-page exec support case, we keep the old mechanism
for now. I'll look into changing it later, once I've reworked a bit how we
use _PAGE_EXEC.

This is also a first step for adding _PAGE_EXEC support for embedded platforms

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-11 16:00:10 +11:00
Anton Blanchard
91b0f5ec53 powerpc/mm: Move 64-bit unmapped_area to top of address space
We currently place mmaps just below the stack on 32bit, but leave them
in the middle of the address space on 64bit:

00100000-00120000 r-xp 00100000 00:00 0                    [vdso]
10000000-10010000 r-xp 00000000 08:06 179534               /tmp/sleep
10010000-10020000 rw-p 00000000 08:06 179534               /tmp/sleep
10020000-10130000 rw-p 10020000 00:00 0                    [heap]
40000000000-40000030000 r-xp 00000000 08:06 440743         /lib64/ld-2.9.so
40000030000-40000040000 rw-p 00020000 08:06 440743         /lib64/ld-2.9.so
40000050000-400001f0000 r-xp 00000000 08:06 440671         /lib64/libc-2.9.so
400001f0000-40000200000 r--p 00190000 08:06 440671         /lib64/libc-2.9.so
40000200000-40000220000 rw-p 001a0000 08:06 440671         /lib64/libc-2.9.so
40000220000-40008230000 rw-p 40000220000 00:00 0
fffffbc0000-fffffd10000 rw-p fffffeb0000 00:00 0           [stack]

Right now it isn't an issue, but at some stage we will run into mmap or
hugetlb allocation issues. Using the same layout as 32bit gives us a
some breathing room. This matches what x86-64 is doing too.

00100000-00103000 r-xp 00100000 00:00 0                    [vdso]
10000000-10001000 r-xp 00000000 08:06 554894               /tmp/test
10010000-10011000 r--p 00000000 08:06 554894               /tmp/test
10011000-10012000 rw-p 00001000 08:06 554894               /tmp/test
10012000-10113000 rw-p 10012000 00:00 0                    [heap]
fffefdf7000-ffff7df8000 rw-p fffefdf7000 00:00 0
ffff7df8000-ffff7f97000 r-xp 00000000 08:06 130591         /lib64/libc-2.9.so
ffff7f97000-ffff7fa6000 ---p 0019f000 08:06 130591         /lib64/libc-2.9.so
ffff7fa6000-ffff7faa000 r--p 0019e000 08:06 130591         /lib64/libc-2.9.so
ffff7faa000-ffff7fc0000 rw-p 001a2000 08:06 130591         /lib64/libc-2.9.so
ffff7fc0000-ffff7fc4000 rw-p ffff7fc0000 00:00 0
ffff7fc4000-ffff7fec000 r-xp 00000000 08:06 130663         /lib64/ld-2.9.so
ffff7fee000-ffff7ff0000 rw-p ffff7fee000 00:00 0
ffff7ffa000-ffff7ffb000 rw-p ffff7ffa000 00:00 0
ffff7ffb000-ffff7ffc000 r--p 00027000 08:06 130663         /lib64/ld-2.9.so
ffff7ffc000-ffff7fff000 rw-p 00028000 08:06 130663         /lib64/ld-2.9.so
ffff7fff000-ffff8000000 rw-p ffff7fff000 00:00 0
fffffc59000-fffffc6e000 rw-p ffffffeb000 00:00 0           [stack]

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-11 16:00:07 +11:00
Milton Miller
8b16cd238d powerpc/numa: Remove redundant find_cpu_node()
Use of_get_cpu_node, which is a superset of numa.c's find_cpu_node in
a less restrictive section (text vs cpuinit).

Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-11 13:37:59 +11:00