Previous size thresholds were guessed from various user space benchmarks
using a kernel with and without the alternative uaccess option. This
is however not as precise as a kernel based test to measure the real
speed of each method.
This adds a simple test bench to show the time needed for each method.
With this, the optimal size treshold for the alternative implementation
can be determined with more confidence. It appears that the optimal
threshold for both copy_to_user and clear_user is around 64 bytes. This
is not a surprise knowing that the memcpy and memset implementations
need at least 64 bytes to achieve maximum throughput.
One might suggest that such test be used to determine the optimal
threshold at run time instead, but results are near enough to 64 on
tested targets concerned by this alternative copy_to_user implementation,
so adding some overhead associated with a variable threshold is probably
not worth it for now.
Signed-off-by: Nicolas Pitre <nico@marvell.com>
Because the alternate copy_to_user implementation has a higher setup cost
than the standard implementation, the size of the memory area to copy
is tested and the standard implementation invoked instead when that size
is too small. Still, that test is made after the processor has preserved
a bunch of registers on the stack which have to be reloaded right away
needlessly in that case, causing a measurable performance regression
compared to plain usage of the standard implementation only.
To make the size test overhead negligible, let's factorize it out of
the alternate copy_to_user function where it is clear to the compiler
that no stack frame is needed. Thanks to CONFIG_ARM_UNWIND allowing
for frame pointers to be disabled and tail call optimization to kick in,
the overhead in the small copy case becomes only 3 assembly instructions.
A similar trick is applied to clear_user as well.
Signed-off-by: Nicolas Pitre <nico@marvell.com>
This implements {copy_to,clear}_user() by faulting in the userland
pages and then using the regular kernel mem{cpy,set}() to copy the
data (while holding the page table lock). This is a win if the regular
mem{cpy,set}() implementations are faster than the user copy functions,
which is the case e.g. on Feroceon, where 8-word STMs (which memcpy()
uses under the right conditions) give significantly higher memory write
throughput than a sequence of individual 32bit stores.
Here are numbers for page sized buffers on some Feroceon cores:
- copy_to_user on Orion5x goes from 51 MB/s to 83 MB/s
- clear_user on Orion5x goes from 89MB/s to 314MB/s
- copy_to_user on Kirkwood goes from 240 MB/s to 356 MB/s
- clear_user on Kirkwood goes from 367 MB/s to 1108 MB/s
- copy_to_user on Disco-Duo goes from 248 MB/s to 398 MB/s
- clear_user on Disco-Duo goes from 328 MB/s to 1741 MB/s
Because the setup cost is non negligible, this is worthwhile only if
the amount of data to copy is large enough. The operation falls back
to the standard implementation when the amount of data is below a certain
threshold. This threshold was determined empirically, however some targets
could benefit from a lower runtime determined value for optimal results
eventually.
In the copy_from_user() case, this technique does not provide any
worthwhile performance gain due to the fact that any kind of read access
allocates the cache and subsequent 32bit loads are just as fast as the
equivalent 8-word LDM.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Signed-off-by: Nicolas Pitre <nico@marvell.com>
Tested-by: Martin Michlmayr <tbm@cyrius.com>
This allows for optional alternative implementations of __copy_to_user
and __clear_user, with a possible runtime fallback to the standard
version when the alternative provides no gain over that standard
version. This is done by making the standard __copy_to_user into a weak
alias for the symbol __copy_to_user_std. Same thing for __clear_user.
Those two functions are particularly good candidates to have alternative
implementations for, since they rely on the STRT instruction which has
lower performances than STM instructions on some CPU cores such as
the ARM1176 and Marvell Feroceon.
Signed-off-by: Nicolas Pitre <nico@marvell.com>
Remove the __initdata annotation for the clock lookups, since they
will be needed when loading modules which use clk_get().
Signed-off-by: Rabin Vincent <rabin@rab.in>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Remove the __initdata annotation for the clock lookups, since they
will be needed when loading modules which use clk_get().
Signed-off-by: Rabin Vincent <rabin@rab.in>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Pavel Roskin <proski@gnu.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
pfn_valid() is meant to be able to tell if a given PFN has valid memmap
associated with it or not. In FLATMEM, it is expected that holes always
have valid memmap as long as there is valid PFNs either side of the hole.
In SPARSEMEM, it is assumed that a valid section has a memmap for the
entire section.
However, ARM and maybe other embedded architectures in the future free
memmap backing holes to save memory on the assumption the memmap is never
used. The page_zone linkages are then broken even though pfn_valid()
returns true. A walker of the full memmap must then do this additional
check to ensure the memmap they are looking at is sane by making sure the
zone and PFN linkages are still valid. This is expensive, but walkers of
the full memmap are extremely rare.
This was caught before for FLATMEM and hacked around but it hits again for
SPARSEMEM because the page_zone linkages can look ok where the PFN linkages
are totally screwed. This looks like a hatchet job but the reality is that
any clean solution would end up consumning all the memory saved by punching
these unexpected holes in the memmap. For example, we tried marking the
memmap within the section invalid but the section size exceeds the size of
the hole in most cases so pfn_valid() starts returning false where valid
memmap exists. Shrinking the size of the section would increase memory
consumption offsetting the gains.
This patch identifies when an architecture is punching unexpected holes
in the memmap that the memory model cannot automatically detect and sets
ARCH_HAS_HOLES_MEMORYMODEL. At the moment, this is restricted to EP93xx
which is the model sub-architecture this has been reported on but may expand
later. When set, walkers of the full memmap must call memmap_valid_within()
for each PFN and passing in what it expects the page and zone to be for
that PFN. If it finds the linkages to be broken, it assumes the memmap is
invalid for that PFN.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Having discussed broadcast tick support with Thomas Glexiner, the
broadcast tick devices should be registered with a higher rating
than the global tick device, and it should have the ONESHOT and
PERIODIC feature flags set.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Acked-by: Thomas Glexiner <tglx@linutronix.de>
smp_cross_call_done() is a no-op for MPCore, and since it's only
used by platform code, there's no point in having it unless it's
doing something.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
The ARM SMP code wasn't properly updated for the cpumask changes, which
results in smp_timer_broadcast() broadcasting ticks to non-online CPUs.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Compilation for this board yields the following errors:
arch/arm/mach-pxa/viper.c:511: error: 'FFUART' undeclared here (not in a function)
arch/arm/mach-pxa/viper.c:520: error: 'BTUART' undeclared here (not in a function)
arch/arm/mach-pxa/viper.c:529: error: 'STUART' undeclared here (not in a function)
Fix them by including the necessary header.
Signed-off-by: Ricardo Martins <rasm@fe.up.pt>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Fix the clkdev API support for the ep93xx uart clocks.
The uarts available in the ep93xx have individual clock controls.
The current implementation assumes that the bootloader has enabled
the clocks before the kernel has booted. It also assumes that the
bootloader has set the UARTBAUD bit indicating that the uarts are
running off the 14.7456MHz external crystal.
This fixes both issues. It also allows the uart clocks to be stopped
when there are no users.
Tested-by: Matthias Kaehlcke <matthias@kaehlcke.net>
Cc: Ryan Mallon <ryan@bluewatersys.com>
Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
This makes the framebuffer work on omap3.
Also fix the clk_get usage for checkpatch.pl
"ERROR: do not use assignment in if condition".
Cc: Imre Deak <imre.deak@nokia.com>
Cc: linux-fbdev-devel@lists.sourceforge.net
Acked-by: Krzysztof Helt <krzysztof.h1@wp.pl>
Signed-off-by: Tony Lindgren <tony@atomide.com>
The OMAP3430ES2_SAVEANDRESTORE_SHIFT macro is used
by powerdomain code in
"1 << OMAP3430ES2_SAVEANDRESTORE_SHIFT" manner, but
the definition was also (1 << 4), meaning we actually
modified bit 16. So the definition needs to be 4.
This fixes also a cold reset HW bug in OMAP3430 ES3.x
where some of the efuse bits are not isolated during
wake-up from off mode. This can cause randomish
cold resets with off mode. Enabling the USBTLL hardware
SAVEANDRESTORE causes the core power up assert to be
delayed in a way that we will not get faulty values
when boot ROM is reading the unisolated registers.
Signed-off-by: Kalle Jokiniemi <kalle.jokiniemi@digia.com>
Acked-by: Kevin Hilman <khilman@deeprootsystems.com>
Acked-by: Paul Walmsley <paul@pwsan.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
As per 3430 TRM, there are 6 banks [0 to 191]
Signed-off-by: Tom Rix <Tom.Rix@windriver.com>
Signed-off-by: Vikram Pandita <vikram.pandita@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
The s3c24xx_register_clock() function has been doing a test
on clk->owner to see if it is NULL, and then setting itself
as the owner if clk->owner == NULL.
This is not needed, arch/arm/plat-s3c/clock.c cannot be
compiled as a module, and even if it was, it should not be
playing with this field if it being registered from somewhere
else.
The best course of action is to remove this bit of
code completely.
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
The BAST support code is calling s3c_i2c0_set_platdata() from
the map_io() entry, instead of the bast_init() code. This causes
the registration to fail due to kmalloc() not being available
at the time.
This fixes the following error:
s3c_i2c0_set_platdata: no memory for platform data
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Fix unused code warning in arch/arm/plat-s3c24xx/dma.c if there
is no PM support enabled. The function to_dma_chan() should
be marked inline so that the compiler will eliminate it without
warning if it isn't used.
arch/arm/plat-s3c24xx/dma.c:1239: warning: 'to_dma_chan' defined but not used
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Fix compilation bug when debug was enabled
Reviewed-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Cleanup arm/plat-s3c64xx/include/plat/gpio-bank-h.h include file.
Using shift-left operation with value >32 is a bad habit.
Reviewed-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
The symbol 'floatx80_is_nan' prototype was defined
locally in fpa11_cprt.c when it was built outside the
file in softfloat-specialisze.
Move this into softfloat.h to fix the following sparse
warning:
softfloat-specialize:276:6: warning: symbol 'floatx80_is_nan' was not declared. Should it be static?
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Add header file decleration for 'ExtendedCPDO' in fpa11.h
to stop the following sparse warning:
extended_cpdo.c:90:14: warning: symbol 'ExtendedCPDO' was not declared. Should it be static?
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
This is a build fix, resyncing the DaVinci EVM ASoC board code
with the version in the DaVinci tree. That resync includes
support for the DM355 EVM, although that board isn't yet in
mainline.
(NOTE: also includes a bugfix to the platform_add_resources
call, recently sent by Chaithrika U S <chaithrika@ti.com> but
not yet merged into the DaVinci tree.)
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
With the clkdev, musb_core.c needs to register clock with name "ick".
Once all the platforms using the musb driver have been converted
to use clockdev, the clock name does not need to be passed
from the low-level init code.
Signed-off-by: Tony Lindgren <tony@atomide.com>
SPI driver will do unhandled fault on OMAP2420 if trying to probe
non-existing SPI busses. Register those additional busses runtime only
for cpus having them.
Signed-off-by: Jarkko Nikula <jhnikula@gmail.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Fix "tusb6010 init error 5, -19" and compilation warning from function
tusb6010_platform_retime "warning: 'sysclk_ps' is used uninitialized in this
function".
I suppose commit c094ba34b8f780885d029ce3c2715a194b780e5d was meant to test
for zero fclk_ps instead of sysclk_ps.
Signed-off-by: Jarkko Nikula <jarkko.nikula@nokia.com>
Cc: Roel Kluin <roel.kluin@gmail.com>
Tested-by: Kalle Valo <kalle.valo@iki.fi>
Signed-off-by: Tony Lindgren <tony@atomide.com>
GPIO de-bounce clocks don't have any impact on the module idle state, so
the clock code should not wait for the module to enable after the de-bounce
clocks are enabled.
Problem found by Kevin Hilman <khilman@deeprootsystems.com>.
Signed-off-by: Paul Walmsley <paul@pwsan.com>
Signed-off-by: Kevin Hilman <khilman@deeprootsystems.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Print reserved memory only if it was actually reserved.
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@nokia.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (26 commits)
bonding: fix panic if initialization fails
IXP4xx: complete Ethernet netdev setup before calling register_netdev().
IXP4xx: use "ENODEV" instead of "ENOSYS" in module initialization.
ipvs: Fix IPv4 FWMARK virtual services
ipv4: Make INET_LRO a bool instead of tristate.
net: remove stale reference to fastroute from Kconfig help text
net: update skb_recycle_check() for hardware timestamping changes
bnx2: Fix panic in bnx2_poll_work().
net-sched: fix bfifo default limit
igb: resolve panic on shutdown when SR-IOV is enabled
wimax: oops: wimax_dev_add() is the only one that can initialize the state
wimax: fix oops if netlink fails to add attribute
Bluetooth: Move dev_set_name() to a context that can sleep
netfilter: ctnetlink: fix wrong message type in user updates
netfilter: xt_cluster: fix use of cluster match with 32 nodes
netfilter: ip6t_ipv6header: fix match on packets ending with NEXTHDR_NONE
netfilter: add missing linux/types.h include to xt_LED.h
mac80211: pid, fix memory corruption
mac80211: minstrel, fix memory corruption
cfg80211: fix comment on regulatory hint processing
...
From: Bruce Ashfield <bruce.ashfield@windriver.com>
To fully support the armv7-a instruction set/optimizations, support
for the R_ARM_MOVW_ABS_NC and R_ARM_MOVT_ABS relocation types is
required.
The MOVW and MOVT are both load-immediate instructions, MOVW loads 16
bits into the bottom half of a register, and MOVT loads 16 bits into the
top half of a register.
The relocation information for these instructions has a full 32 bit
value, plus an addend which is stored in the 16 immediate bits in the
instruction itself. The immediate bits in the instruction are not
contiguous (the register # splits it into a 4 bit and 12 bit value),
so the addend has to be extracted accordingly and added to the value.
The value is then split and put into the instruction; a MOVW uses the
bottom 16 bits of the value, and a MOVT uses the top 16 bits.
Signed-off-by: David Borman <david.borman@windriver.com>
Signed-off-by: Bruce Ashfield <bruce.ashfield@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
As per commit 284901a90a, use
DMA_BIT_MASK(n)
Signed-off-by: Kevin Hilman <khilman@deeprootsystems.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
The i.MX31 ARM11 core is not a v6K core. Disable this option as it
is incompatible with non v6K cores.
Signed-off-by: Magnus Lilja <lilja.magnus@gmail.com>
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Before this patch I got the following line in my dmesg:
[ 0.000000] BUG: mapping for 0xd4000000 at 0xeb000000 overlaps vmalloc space
VMALLOC_END is 0xf4000000 and there are the following other mappings
defined for mx27ads:
(0xa0500000,+0x00001000) maps to 0xffff0000
(0x10000000,+0x00100000) maps to 0xf4000000
(0x80000000,+0x00100000) maps to 0xf4100000
(0xd8000000,+0x00100000) maps to 0xf4200000
So map PBC to 0xf4300000.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
On i.MX31 I sometimes get spurious interrupts. There is no need
to crash the whole system when this happens. Instead, silently
ignore it.
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
We want to have a mx31_defconfig file that builds a kernel that is able
to boot on all support mx31 systems and thus also can be better tested
by automatic build scripts. For these reasons, this config file is not
needed anymore.
Signed-off-by: Valentin Longchamp <valentin.longchamp@epfl.ch>
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
All i.MX platforms support <linux/clk.h> calls and should select HAVE_CLK.
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
On MX2 platforms imx_dma_request() calls request_irq() which may sleep
with interrupts disabled.
Signed-off-by: Martin Fuzzey <mfuzzey@gmail.com>
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
The sequence
imx_dma_request()
imx_dma_enable()
imx_dma_free()
left the dma channel in_use mode and did not release the timer.
Signed-off-by: Martin Fuzzey <mfuzzey@gmail.com>
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>