Mike Galbraith reported finding a lockup ("perma-spin bug") where the
cpumask passed to smp_call_function_many was cleared by other cpu(s)
while a cpu was preparing its call_data block, resulting in no cpu to
clear the last ref and unlock the block.
Having cpus clear their bit asynchronously could be useful on a mask of
cpus that might have a translation context, or cpus that need a push to
complete an rcu window.
Instead of adding a BUG_ON and requiring yet another cpumask copy, just
detect the race and handle it.
Note: arch_send_call_function_ipi_mask must still handle an empty
cpumask because the data block is globally visible before the that arch
callback is made. And (obviously) there are no guarantees to which cpus
are notified if the mask is changed during the call; only cpus that were
online and had their mask bit set during the whole call are guaranteed
to be called.
Reported-by: Mike Galbraith <efault@gmx.de>
Reported-by: Jan Beulich <JBeulich@novell.com>
Acked-by: Jan Beulich <jbeulich@novell.com>
Cc: stable@kernel.org
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Paul McKenney's review pointed out two problems with the barriers in the
2.6.38 update to the smp call function many code.
First, a barrier that would force the func and info members of data to
be visible before their consumption in the interrupt handler was
missing. This can be solved by adding a smp_wmb between setting the
func and info members and setting setting the cpumask; this will pair
with the existing and required smp_rmb ordering the cpumask read before
the read of refs. This placement avoids the need a second smp_rmb in
the interrupt handler which would be executed on each of the N cpus
executing the call request. (I was thinking this barrier was present
but was not).
Second, the previous write to refs (establishing the zero that we the
interrupt handler was testing from all cpus) was performed by a third
party cpu. This would invoke transitivity which, as a recient or
concurrent addition to memory-barriers.txt now explicitly states, would
require a full smp_mb().
However, we know the cpumask will only be set by one cpu (the data
owner) and any preivous iteration of the mask would have cleared by the
reading cpu. By redundantly writing refs to 0 on the owning cpu before
the smp_wmb, the write to refs will follow the same path as the writes
that set the cpumask, which in turn allows us to keep the barrier in the
interrupt handler a smp_rmb instead of promoting it to a smp_mb (which
will be be executed by N cpus for each of the possible M elements on the
list).
I moved and expanded the comment about our (ab)use of the rcu list
primitives for the concurrent walk earlier into this function. I
considered moving the first two paragraphs to the queue list head and
lock, but felt it would have been too disconected from the code.
Cc: Paul McKinney <paulmck@linux.vnet.ibm.com>
Cc: stable@kernel.org (2.6.32 and later)
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Peter pointed out there was nothing preventing the list_del_rcu in
smp_call_function_interrupt from running before the list_add_rcu in
smp_call_function_many.
Fix this by not setting refs until we have gotten the lock for the list.
Take advantage of the wmb in list_add_rcu to save an explicit additional
one.
I tried to force this race with a udelay before the lock & list_add and
by mixing all 64 online cpus with just 3 random cpus in the mask, but
was unsuccessful. Still, inspection shows a valid race, and the fix is
a extension of the existing protection window in the current code.
Cc: stable@kernel.org (v2.6.32 and later)
Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change the _mapcount value indicating PageBuddy from -2 to -128 for
more robusteness against page_mapcount() undeflows.
Use reset_page_mapcount instead of __ClearPageBuddy in bad_page to
ignore the previous retval of PageBuddy().
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reported-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use __ffs() to find the pending interrupt source instead of looping 32
times.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Feng Tang <feng.tang@intel.com>
Cc: Alek Du <alek.du@intel.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Convert to the new irq function names.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Feng Tang <feng.tang@intel.com>
Cc: Alek Du <alek.du@intel.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
commit 0766d20fd (langwell_gpio: modify EOI handling following change
of kernel irq subsystem) changes
- desc->chip->eoi(irq);
+
+ if (desc->chip->irq_eoi)
+ desc->chip->irq_eoi(irq_get_irq_data(irq));
+ else
+ dev_warn(pg->chip.dev, "missing EOI handler for irq %d\n", irq);
With the following explanation:
"Latest kernel has many changes in IRQ subsystem and its interfaces,
like adding irq_eoi" for struct irq_chip, this patch will make it
support both the new and old interface."
This is completely bogus.
#1) The changelog does not match the patch at all
#2) This driver relies on the assumption that it sits behind an eoi
capable interrupt line. If the implementation of the underlying
chip changes from eoi to irq_eoi then this driver has to follow
that change and not add a total bogosity.
#3) Just mechanically changing eoi to irq_eoi without checking the
background of that change is sloppy at best.
Remove the sillyness and retrieve the interrupt data from irq_desc
directly. No need to go through a sparse irq lookup.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Feng Tang <feng.tang@intel.com>
Cc: Alek Du <alek.du@intel.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Nothing outside of x86 can use that code.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
This patch adds support for power regulators.
Signed-off-by: Jaehoon Chung <jh80.chung@samsung.com>
Signed-off-by: kyungmin Park <kyungmin.park@samsung.com>
Acked-by: Will Newton <will.newton@imgtec.com>
Signed-off-by: Chris Ball <cjb@laptop.org>
This patch is related to re-init processing on suspend/resume.
When card is resuming, some register is reset. If card is removable,
maybe controller should be rescan for card. But if assume card is
non-removable, need to restore the old value at registers.
We store the value of FIFOTH at probe time and then restore it in
dw_mci_resume().
Signed-off-by: Jaehoon Chung <jh80.chung@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Acked-by: Will Newton <will.newton@imgtec.com>
Signed-off-by: Chris Ball <cjb@laptop.org>
This patch adds quirks and capabilities to platdata.
Some cards don't use the CDn pin; in that case, we assume the card's
inserted. Some boards need other capabilities. So, we add capabilities
in the board's platdata.
Signed-off-by: Jaehoon Chung <jh80.chung@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Acked-by: Will Newton <will.newton@imgtec.com>
Signed-off-by: Chris Ball <cjb@laptop.org>
Currently kunmap_atomic() doesn't take into account the offset, used
with kmap_atomic(). On platforms, where kunmap_atomic() is not a NOP,
this will lead to problems, when offset != 0.
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Signed-off-by: Chris Ball <cjb@laptop.org>
At power off, reset OCR mask to be the highest possible voltage
supported for the current mmc host.
This solves the re-initialization during the power up sequence.
The voltage may have been decreased due to the card accepts a lower
voltage than the voltage used during the initialization sequence.
We need to reset the voltage to by the host highest possible value
since according to specification the initialization must always be
done at high voltage.
Reviewed-by: Jonas Aberg <jonas.aberg@stericsson.com>
Signed-off-by: Ulf Hansson <ulf.hansson@stericsson.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Chris Ball <cjb@laptop.org>
When using mmc_try_claim_host the corresponding release
function is mmc_do_release_host, which then also must
be exported.
Reviewed-by: Jonas Aberg <jonas.aberg@stericsson.com>
Reviewed-by: Sebastian Rasmussen <sebastian.rasmussen@stericsson.com>
Signed-off-by: Ulf Hansson <ulf.hansson@stericsson.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Chris Ball <cjb@laptop.org>
During redetection of a SDIO card, a request for a new card RCA
was submitted to the card, but was then overwritten by the old RCA.
This caused the card to be deselected instead of selected when using
the incorrect RCA. This bug's been present since the "oldcard"
handling was introduced in 2.6.32.
Signed-off-by: Stefan Nilsson XK <stefan.xk.nilsson@stericsson.com>
Reviewed-by: Ulf Hansson <ulf.hansson@stericsson.com>
Reviewed-by: Pawel Wieczorkiewicz <pawel.wieczorkiewicz@stericsson.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Cc: <stable@kernel.org>
Signed-off-by: Chris Ball <cjb@laptop.org>
count is only ever used by assigning to old_len if count == 0, and
then old_len isn't ever used at all. So, both are redundant. Fixes:
drivers/mmc/host/dw_mmc.c: In function ‘dw_mci_read_data_pio’:
drivers/mmc/host/dw_mmc.c:1034:32: warning: variable ‘old_len’ set but
not used [-Wunused-but-set-variable]
Signed-off-by: Chris Ball <cjb@laptop.org>
Acked-by: Will Newton <will.newton@imgtec.com>
Signed-off-by: Wolfram Sang <w.sang@pengutronix.de>
Tested-by: Marc Reilly <marc@cpdesign.com.au>
Tested-by: Eric Benard <eric@eukrea.com>
Signed-off-by: Chris Ball <cjb@laptop.org>
It can be worked around using a GPIO which will be done for i.MX later.
Signed-off-by: Wolfram Sang <w.sang@pengutronix.de>
Acked-by: Anton Vorontsov <cbouatmailru@gmail.com>
Tested-by: Marc Reilly <marc@cpdesign.com.au>
Tested-by: Eric Benard <eric@eukrea.com>
Signed-off-by: Chris Ball <cjb@laptop.org>
Signed-off-by: Wolfram Sang <w.sang@pengutronix.de>
Tested-by: Marc Reilly <marc@cpdesign.com.au>
Tested-by: Eric Benard <eric@eukrea.com>
Signed-off-by: Chris Ball <cjb@laptop.org>
dma_unmap_sg() already flushes the cache, I don't get what this
code is doing here.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Chris Ball <cjb@laptop.org>
Fix section mismatch by annotating using variable name suffix.
Signed-off-by: Nikanth Karthikesan <knikanth@suse.de>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
This change supports building the kernel with newer binutils where
a shift of greater than the word size is no longer interpreted
silently as modulo the word size, but instead generates a warning.
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/epip/linux-2.6-unicore32: (40 commits)
unicore32: rewrite arch-specific tlb.h to use asm-generic version
unicore32: modify io_p2v and io_v2p macros, and adjust PKUNITY_mmio_BASEs
unicore32: replace unicore32-specific iomap functions with generic lib implementation
unicore32 machine related: add frame buffer driver for pkunity-v3 soc
unicore32 machine related files: add i2c bus drivers for pkunity-v3 soc
unicore32 io: redefine __REG(x) and re-use readl/writel funcs
unicore32 i8042 upgrade and bugfix: adjust resource request region type
unicore32 upgrade to v2.6.38-rc5: add one more paramter for pte_alloc_map call
unicore32 i8042: adjust io funcs of i8042-unicore32io.h
unicore32: rename PKUNITY_IOSPACE_BASE to PKUNITY_MMIO_BASE
unicore32: modify function names and parameters for irq_chips
unicore32: remove unused lines in arch/unicore32/include/asm/irq.h
unicore32 time.c: change calculate method for clock_event_device
unicore32: ADD MAINTAINER for unicore32 architecture
unicore32 machine related files: ps2 driver
unicore32 machine related files: pci bus handling
unicore32 machine related files: hardware registers
unicore32 machine related files: core files
unicore32 additional architecture files: boot process
unicore32 additional architecture files: low-level lib: misc
...
Acked-by: Arnd Bergmann <arnd@arndb.de>
* 'sh-latest' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6: (34 commits)
sh: Convert to generic show_interrupts.
sh: Wire up new fhandle and clock_adjtime syscalls.
sh: modify platform_device for sh_eth driver
sh: add GETHER's platform_device in board-sh7757lcr
sh: update sh7757lcr_defconfig
sh: add platform_device of tmio_mmc and sh_mmcif to sh7757lcr
sh: dmaengine support for SH7757
sh: add mmc clock in clock-sh7757
sh: add spi_board_info in sh7757lcr
sh: add platform_device for SPI
sh: add USB_ARCH_HAS_EHCI and OHCI for SH7757
sh: Rename cpuidle states to fit general conventions
serial: sh-sci: fix deadlock when resuming from S3 sleep
sh: Enable CONFIG_GCOV_PROFILE_ALL for sh
sh: Fix up async PCIe probing on SMP.
serial: sh-sci: Kill off the special earlyprintk device.
serial: sh-sci: Use dev_name() for region reservations.
serial: sh-sci: Fix up earlyprintk port mapping.
serial: sh-sci: Limit early console to one device.
serial: sh-sci: Fix up break timer scheduling race.
...
* git://git.kernel.org/pub/scm/linux/kernel/git/lethal/fbdev-2.6:
fbdev: sh_mobile_lcdc: Add YUV framebuffer support
viafb: split pll configs up
viafb: remove duplicated clock storage
viafb: always return the best possible clock
viafb: remove duplicated clock information
fbdev: sh_mobile_lcdcfb: add backlight support
viafb: factor lcd scaling parameters out
viafb: strip some structures
viafb: remove unused data_mode and device_type
viafb: kill lcd_panel_id
video via: make local variables static
video via: fix iomem access
video/via: drop deprecated (and unused) i2c_adapter.id
RPC task RPC_TASK_QUEUED bit is set must be checked before trying to wake up
task rpc_killall_tasks() because task->tk_waitqueue can not be set (equal to
NULL).
Also, as Trond Myklebust mentioned, such approach (instead of checking
tk_waitqueue to NULL) allows us to "optimise away the call to
rpc_wake_up_queued_task() altogether for those
tasks that aren't queued".
Here is an example of dereferencing of tk_waitqueue equal to NULL:
CPU 0 CPU 1 CPU 2
-------------------- --------------------- --------------------------
nfs4_run_open_task
rpc_run_task
rpc_execute
rpc_set_active
rpc_make_runnable
(waiting)
rpc_async_schedule
nfs4_open_prepare
nfs_wait_on_sequence
nfs_umount_begin
rpc_killall_tasks
rpc_wake_up_task
rpc_wake_up_queued_task
spin_lock(tk_waitqueue == NULL)
BUG()
rpc_sleep_on
spin_lock(&q->lock)
__rpc_sleep_on
task->tk_waitqueue = q
Signed-off-by: Stanislav Kinsbursky <skinsbursky@openvz.org>
Cc: stable@kernel.org
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
This fixes a race in which the task->tk_callback() puts the rpc_task
to sleep, setting a new callback. Under certain circumstances, the current
code may end up executing the task->tk_action before it gets round to the
callback.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
The recently increased type checking in platform_get_drvdata() reveals a few
offenders:
drivers/rtc/rtc-ds1390.c:161: warning: passing argument 1 of ‘platform_get_drvdata’ from incompatible pointer type
drivers/rtc/rtc-ds3234.c:161: warning: passing argument 1 of ‘platform_get_drvdata’ from incompatible pointer type
drivers/rtc/rtc-m41t94.c:139: warning: passing argument 1 of ‘platform_get_drvdata’ from incompatible pointer type
Use spi_get_drvdata() instead of platform_get_drvdata().
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
* 'drm-core-next' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6: (177 commits)
drm/radeon: fixup refcounts in radeon dumb create ioctl.
drm: radeon: *_cs_packet_parse_vline() cleanup
radeon: merge list_del()/list_add_tail() to list_move_tail()
drm: Retry i2c transfer of EDID block after failure
drm/radeon/kms: fix typo in atom overscan setup
drm: Hold the mode mutex whilst probing for sysfs status
drm/nouveau: fix __nouveau_fence_wait performance
drm/nv40: attempt to reserve just enough vram for all 32 channels
drm/nv50: check for vm traps on every gr irq
drm/nv50: decode vm faults some more
drm/nouveau: add nouveau_enum_find() util function
drm/nouveau: properly handle pushbuffer check failures
drm/nvc0: remove vm hack forcing large/small pages to not share a PDE
drm/i915: disable opregion lid detection for now.
drm/i915: Only wait on a pending flip if we intend to write to the buffer
drm/i915/dp: Sanity check eDP existence
drm: add cap bit to denote if dumb ioctl is available or not.
drm/core: add ioctl to query device/driver capabilities
drm/radeon/kms: allow max clock of 340 Mhz on hdmi 1.3+
drm/radeon/kms: add cayman pci ids
...
Commit 6440e5967bc broke old userspaces that do not set tss address
before entering vcpu. Unbreak it by setting tss address to a safe
value on the first vcpu entry. New userspaces should set tss address,
so print warning in case it doesn't.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
This patch does:
- call vcpu->arch.mmu.update_pte directly
- use gfn_to_pfn_atomic in update_pte path
The suggestion is from Avi.
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Cleanup the code of pte_prefetch_gfn_to_memslot and mapping_level_dirty_bitmap
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
commit 387b9f97750444728962b236987fbe8ee8cc4f8c moved kvm_request_guest_time_update(vcpu),
breaking 32bit SMP guests using kvm-clock. Fix this by moving (new) clock update function
to proper place.
Signed-off-by: Nikola Ciprich <nikola.ciprich@linuxbox.cz>
Acked-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Currently if io port + len crosses 8bit boundary in io permission bitmap the
check may allow IO that otherwise should not be allowed. The patch fixes that.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Current implementation truncates upper 32bit of TR base address during IO
permission bitmap check. The patch fixes this.
Reported-and-tested-by: Francis Moreau <francis.moro@gmail.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
With CONFIG_CC_STACKPROTECTOR, we need a valid %gs at all times, so disable
lazy reload and do an eager reload immediately after the vmexit.
Reported-by: IVAN ANGELOV <ivangotoy@gmail.com>
Acked-By: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Access to this page is mostly done through the regs member which holds
the address to this page. The exceptions are in vmx_vcpu_reset() and
kvm_free_lapic() and these both can easily be converted to using regs.
Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Avi Kivity <avi@redhat.com>
The RCU use in kvm_irqfd_deassign is tricky: we have rcu_assign_pointer
but no synchronize_rcu: synchronize_rcu is done by kvm_irq_routing_update
which we share a spinlock with.
Fix up a comment in an attempt to make this clearer.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Using __get_free_page instead of alloc_page and page_address,
using free_page instead of __free_page and virt_to_page
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
No need to record the gfn to verifier the pte has the same mode as
current vcpu, it's because we only speculatively update the pte only
if the pte and vcpu have the same mode
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
kvm_mmu_calculate_mmu_pages need to walk all memslots and it's protected by
kvm->slots_lock, so move it out of mmu spinlock
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Set spte accessed bit only if guest_initiated == 1 that means the really
accessed
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>