kernel_optimize_test/include
Anton Blanchard 7a0268fa1a [PATCH] powerpc/64: per cpu data optimisations
The current ppc64 per cpu data implementation is quite slow. eg:

        lhz 11,18(13)           /* smp_processor_id() */
        ld 9,.LC63-.LCTOC1(30)  /* per_cpu__variable_name */
        ld 8,.LC61-.LCTOC1(30)  /* __per_cpu_offset */
        sldi 11,11,3            /* form index into __per_cpu_offset */
        mr 10,9
        ldx 9,11,8              /* __per_cpu_offset[smp_processor_id()] */
        ldx 0,10,9              /* load per cpu data */

5 loads for something that is supposed to be fast, pretty awful. One
reason for the large number of loads is that we have to synthesize 2
64bit constants (per_cpu__variable_name and __per_cpu_offset).

By putting __per_cpu_offset into the paca we can avoid the 2 loads
associated with it:

        ld 11,56(13)            /* paca->data_offset */
        ld 9,.LC59-.LCTOC1(30)  /* per_cpu__variable_name */
        ldx 0,9,11              /* load per cpu data

Longer term we can should be able to do even better than 3 loads.
If per_cpu__variable_name wasnt a 64bit constant and paca->data_offset
was in a register we could cut it down to one load. A suggestion from
Rusty is to use gcc's __thread extension here. In order to do this we
would need to free up r13 (the __thread register and where the paca
currently is). So far Ive had a few unsuccessful attempts at doing that :)

The patch also allocates per cpu memory node local on NUMA machines.
This patch from Rusty has been sitting in my queue _forever_ but stalled
when I hit the compiler bug. Sorry about that.

Finally I also only allocate per cpu data for possible cpus, which comes
straight out of the x86-64 port. On a pseries kernel (with NR_CPUS == 128)
and 4 possible cpus we see some nice gains:

             total       used       free     shared    buffers cached
Mem:       4012228     212860    3799368          0          0 162424

             total       used       free     shared    buffers cached
Mem:       4016200     212984    3803216          0          0 162424

A saving of 3.75MB. Quite nice for smaller machines. Note: we now have
to be careful of per cpu users that touch data for !possible cpus.

At this stage it might be worth making the NUMA and possible cpu
optimisations generic, but per cpu init is done so early we have to be
careful that all architectures have their possible map setup correctly.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-01-11 14:49:45 +11:00
..
acpi
asm-alpha [PATCH] mutex subsystem, add default include/asm-*/mutex.h files 2006-01-09 15:59:19 -08:00
asm-arm [PATCH] Generic ioctl.h 2006-01-10 08:01:34 -08:00
asm-arm26 [PATCH] Generic ioctl.h 2006-01-10 08:01:34 -08:00
asm-cris [PATCH] Generic ioctl.h 2006-01-10 08:01:34 -08:00
asm-frv [PATCH] Generic ioctl.h 2006-01-10 08:01:34 -08:00
asm-generic [PATCH] Generic ioctl.h 2006-01-10 08:01:34 -08:00
asm-h8300 [PATCH] Generic ioctl.h 2006-01-10 08:01:34 -08:00
asm-i386 [PATCH] fix i386 mutex fastpath on FRAME_POINTER && !DEBUG_MUTEXES 2006-01-10 13:20:47 -08:00
asm-ia64 [PATCH] kprobes: fix build breakage 2006-01-10 08:01:40 -08:00
asm-m32r [PATCH] Generic ioctl.h 2006-01-10 08:01:34 -08:00
asm-m68k [PATCH] Generic ioctl.h 2006-01-10 08:01:34 -08:00
asm-m68knommu [PATCH] m68knommu: save reg a5 on context change 2006-01-10 09:31:27 -08:00
asm-mips MIPS: R2: Try to bulletproof instruction_hazard against miss-compilation. 2006-01-10 13:39:08 +00:00
asm-parisc [PATCH] mutex subsystem, add default include/asm-*/mutex.h files 2006-01-09 15:59:19 -08:00
asm-powerpc [PATCH] powerpc/64: per cpu data optimisations 2006-01-11 14:49:45 +11:00
asm-ppc [PATCH] powerpc: pci_address_to_pio fix 2006-01-09 15:05:56 +11:00
asm-s390 [PATCH] Generic ioctl.h 2006-01-10 08:01:34 -08:00
asm-sh [PATCH] Generic ioctl.h 2006-01-10 08:01:34 -08:00
asm-sh64 [PATCH] include/asm-sh64/: "extern inline" -> "static inline" 2006-01-10 08:02:02 -08:00
asm-sparc [PATCH] mutex subsystem, add default include/asm-*/mutex.h files 2006-01-09 15:59:19 -08:00
asm-sparc64 [PATCH] kprobes: fix build breakage 2006-01-10 08:01:40 -08:00
asm-um [PATCH] dump_thread() cleanup 2006-01-10 08:01:25 -08:00
asm-v850 [PATCH] Generic ioctl.h 2006-01-10 08:01:34 -08:00
asm-x86_64 [PATCH] kprobes: fix build breakage 2006-01-10 08:01:40 -08:00
asm-xtensa [PATCH] Generic ioctl.h 2006-01-10 08:01:34 -08:00
keys [PATCH] Keys: Remove key duplication 2006-01-06 08:33:29 -08:00
linux [NETFILTER]: Remove unused function from NAT protocol helpers 2006-01-10 12:54:34 -08:00
math-emu
media V4L/DVB (3325): WSS output interface for av7110 2006-01-09 18:21:37 -02:00
mtd
net [INET]: congestion and af_ops can be const 2006-01-10 12:54:26 -08:00
pcmcia [PATCH] pcmcia: unify attach, EVENT_CARD_INSERTION handlers into one probe callback 2006-01-06 00:03:24 +01:00
rdma
rxrpc
scsi Merge branch 'post-2.6.15' of git://brick.kernel.dk/data/git/linux-2.6-block 2006-01-06 09:01:25 -08:00
sound [PATCH] DocBook: fix kernel-doc comments 2006-01-10 08:01:53 -08:00
video [PATCH] include/video/newport.h: "extern inline" -> "static inline" 2006-01-10 08:01:50 -08:00