Commit Graph

232268 Commits

Author SHA1 Message Date
Linus Torvalds
8d99641f6c Merge branch 'akpm'
* akpm:
  kernel/smp.c: consolidate writes in smp_call_function_interrupt()
  kernel/smp.c: fix smp_call_function_many() SMP race
  memcg: correctly order reading PCG_USED and pc->mem_cgroup
  backlight: fix 88pm860x_bl macro collision
  drivers/leds/ledtrig-gpio.c: make output match input, tighten input checking
  MAINTAINERS: update Atmel AT91 entry
  mm: fix truncate_setsize() comment
  memcg: fix rmdir, force_empty with THP
  memcg: fix LRU accounting with THP
  memcg: fix USED bit handling at uncharge in THP
  memcg: modify accounting function for supporting THP better
  fs/direct-io.c: don't try to allocate more than BIO_MAX_PAGES in a bio
  mm: compaction: prevent division-by-zero during user-requested compaction
  mm/vmscan.c: remove duplicate include of compaction.h
  memblock: fix memblock_is_region_memory()
  thp: keep highpte mapped until it is no longer needed
  kconfig: rename CONFIG_EMBEDDED to CONFIG_EXPERT
2011-01-20 17:02:14 -08:00
Milton Miller
225c8e010f kernel/smp.c: consolidate writes in smp_call_function_interrupt()
We have to test the cpu mask in the interrupt handler before checking the
refs, otherwise we can start to follow an entry before its deleted and
find it partially initailzed for the next trip.  Presently we also clear
the cpumask bit before executing the called function, which implies
getting write access to the line.  After the function is called we then
decrement refs, and if they go to zero we then unlock the structure.

However, this implies getting write access to the call function data
before and after another the function is called.  If we can assert that no
smp_call_function execution function is allowed to enable interrupts, then
we can move both writes to after the function is called, hopfully allowing
both writes with one cache line bounce.

On a 256 thread system with a kernel compiled for 1024 threads, the time
to execute testcase in the "smp_call_function_many race" changelog was
reduced by about 30-40ms out of about 545 ms.

I decided to keep this as WARN because its now a buggy function, even
though the stack trace is of no value -- a simple printk would give us the
information needed.

Raw data:

Without patch:
  ipi_test startup took 1219366ns complete 539819014ns total 541038380ns
  ipi_test startup took 1695754ns complete 543439872ns total 545135626ns
  ipi_test startup took 7513568ns complete 539606362ns total 547119930ns
  ipi_test startup took 13304064ns complete 533898562ns total 547202626ns
  ipi_test startup took 8668192ns complete 544264074ns total 552932266ns
  ipi_test startup took 4977626ns complete 548862684ns total 553840310ns
  ipi_test startup took 2144486ns complete 541292318ns total 543436804ns
  ipi_test startup took 21245824ns complete 530280180ns total 551526004ns

With patch:
  ipi_test startup took 5961748ns complete 500859628ns total 506821376ns
  ipi_test startup took 8975996ns complete 495098924ns total 504074920ns
  ipi_test startup took 19797750ns complete 492204740ns total 512002490ns
  ipi_test startup took 14824796ns complete 487495878ns total 502320674ns
  ipi_test startup took 11514882ns complete 494439372ns total 505954254ns
  ipi_test startup took 8288084ns complete 502570774ns total 510858858ns
  ipi_test startup took 6789954ns complete 493388112ns total 500178066ns

	#include <linux/module.h>
	#include <linux/init.h>
	#include <linux/sched.h> /* sched clock */

	#define ITERATIONS 100

	static void do_nothing_ipi(void *dummy)
	{
	}

	static void do_ipis(struct work_struct *dummy)
	{
		int i;

		for (i = 0; i < ITERATIONS; i++)
			smp_call_function(do_nothing_ipi, NULL, 1);

		printk(KERN_DEBUG "cpu %d finished\n", smp_processor_id());
	}

	static struct work_struct work[NR_CPUS];

	static int __init testcase_init(void)
	{
		int cpu;
		u64 start, started, done;

		start = local_clock();
		for_each_online_cpu(cpu) {
			INIT_WORK(&work[cpu], do_ipis);
			schedule_work_on(cpu, &work[cpu]);
		}
		started = local_clock();
		for_each_online_cpu(cpu)
			flush_work(&work[cpu]);
		done = local_clock();
		pr_info("ipi_test startup took %lldns complete %lldns total %lldns\n",
			started-start, done-started, done-start);

		return 0;
	}

	static void __exit testcase_exit(void)
	{
	}

	module_init(testcase_init)
	module_exit(testcase_exit)
	MODULE_LICENSE("GPL");
	MODULE_AUTHOR("Anton Blanchard");

Signed-off-by: Milton Miller <miltonm@bga.com>
Cc: Anton Blanchard <anton@samba.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-20 17:02:06 -08:00
Anton Blanchard
6dc1989995 kernel/smp.c: fix smp_call_function_many() SMP race
I noticed a failure where we hit the following WARN_ON in
generic_smp_call_function_interrupt:

                if (!cpumask_test_and_clear_cpu(cpu, data->cpumask))
                        continue;

                data->csd.func(data->csd.info);

                refs = atomic_dec_return(&data->refs);
                WARN_ON(refs < 0);      <-------------------------

We atomically tested and cleared our bit in the cpumask, and yet the
number of cpus left (ie refs) was 0.  How can this be?

It turns out commit 54fdade1c3
("generic-ipi: make struct call_function_data lockless") is at fault.  It
removes locking from smp_call_function_many and in doing so creates a
rather complicated race.

The problem comes about because:

 - The smp_call_function_many interrupt handler walks call_function.queue
   without any locking.
 - We reuse a percpu data structure in smp_call_function_many.
 - We do not wait for any RCU grace period before starting the next
   smp_call_function_many.

Imagine a scenario where CPU A does two smp_call_functions back to back,
and CPU B does an smp_call_function in between.  We concentrate on how CPU
C handles the calls:

CPU A            CPU B                  CPU C              CPU D

smp_call_function
                                        smp_call_function_interrupt
                                            walks
					call_function.queue sees
					data from CPU A on list

                 smp_call_function

                                        smp_call_function_interrupt
                                            walks

                                        call_function.queue sees
                                          (stale) CPU A on list
							   smp_call_function int
							   clears last ref on A
							   list_del_rcu, unlock
smp_call_function reuses
percpu *data A
                                         data->cpumask sees and
                                         clears bit in cpumask
                                         might be using old or new fn!
                                         decrements refs below 0

set data->refs (too late!)

The important thing to note is since the interrupt handler walks a
potentially stale call_function.queue without any locking, then another
cpu can view the percpu *data structure at any time, even when the owner
is in the process of initialising it.

The following test case hits the WARN_ON 100% of the time on my PowerPC
box (having 128 threads does help :)

#include <linux/module.h>
#include <linux/init.h>

#define ITERATIONS 100

static void do_nothing_ipi(void *dummy)
{
}

static void do_ipis(struct work_struct *dummy)
{
	int i;

	for (i = 0; i < ITERATIONS; i++)
		smp_call_function(do_nothing_ipi, NULL, 1);

	printk(KERN_DEBUG "cpu %d finished\n", smp_processor_id());
}

static struct work_struct work[NR_CPUS];

static int __init testcase_init(void)
{
	int cpu;

	for_each_online_cpu(cpu) {
		INIT_WORK(&work[cpu], do_ipis);
		schedule_work_on(cpu, &work[cpu]);
	}

	return 0;
}

static void __exit testcase_exit(void)
{
}

module_init(testcase_init)
module_exit(testcase_exit)
MODULE_LICENSE("GPL");
MODULE_AUTHOR("Anton Blanchard");

I tried to fix it by ordering the read and the write of ->cpumask and
->refs.  In doing so I missed a critical case but Paul McKenney was able
to spot my bug thankfully :) To ensure we arent viewing previous
iterations the interrupt handler needs to read ->refs then ->cpumask then
->refs _again_.

Thanks to Milton Miller and Paul McKenney for helping to debug this issue.

[miltonm@bga.com: add WARN_ON and BUG_ON, remove extra read of refs before initial read of mask that doesn't help (also noted by Peter Zijlstra), adjust comments, hopefully clarify scenario ]
[miltonm@bga.com: remove excess tests]
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Milton Miller <miltonm@bga.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: <stable@kernel.org> [2.6.32+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-20 17:02:06 -08:00
Johannes Weiner
713735b423 memcg: correctly order reading PCG_USED and pc->mem_cgroup
The placement of the read-side barrier is confused: the writer first
sets pc->mem_cgroup, then PCG_USED.  The read-side barrier has to be
between testing PCG_USED and reading pc->mem_cgroup.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-20 17:02:06 -08:00
Randy Dunlap
2550326ac7 backlight: fix 88pm860x_bl macro collision
Fix collision with kernel-supplied #define:

  drivers/video/backlight/88pm860x_bl.c:24:1: warning: "CURRENT_MASK" redefined
  arch/x86/include/asm/page_64_types.h:6:1: warning: this is the location of the previous definition

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Haojian Zhuang <haojian.zhuang@marvell.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-20 17:02:06 -08:00
Janusz Krzysztofik
cc587ece12 drivers/leds/ledtrig-gpio.c: make output match input, tighten input checking
Replicate changes made to drivers/leds/ledtrig-backlight.c.

Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Richard Purdie <richard.purdie@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-20 17:02:06 -08:00
Nicolas Ferre
c1fc8675c9 MAINTAINERS: update Atmel AT91 entry
Add two co-maintainers and update the entry with new information.

Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Acked-by: Andrew Victor <linux@maxim.org.za>
Acked-by: Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-20 17:02:06 -08:00
Jan Kara
382e27daa5 mm: fix truncate_setsize() comment
Contrary to what the comment says, truncate_setsize() should be called
*before* filesystem truncated blocks.

Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-20 17:02:06 -08:00
KAMEZAWA Hiroyuki
987eba66e0 memcg: fix rmdir, force_empty with THP
Now, when THP is enabled, memcg's rmdir() function is broken because
move_account() for THP page is not supported.

This will cause account leak or -EBUSY issue at rmdir().
This patch fixes the issue by supporting move_account() THP pages.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-20 17:02:06 -08:00
KAMEZAWA Hiroyuki
ece35ca810 memcg: fix LRU accounting with THP
memory cgroup's LRU stat should take care of size of pages because
Transparent Hugepage inserts hugepage into LRU.  If this value is the
number wrong, memory reclaim will not work well.

Note: only head page of THP's huge page is linked into LRU.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-20 17:02:06 -08:00
KAMEZAWA Hiroyuki
ca3e021417 memcg: fix USED bit handling at uncharge in THP
Now, under THP:

at charge:
  - PageCgroupUsed bit is set to all page_cgroup on a hugepage.
    ....set to 512 pages.
at uncharge
  - PageCgroupUsed bit is unset on the head page.

So, some pages will remain with "Used" bit.

This patch fixes that Used bit is set only to the head page.
Used bits for tail pages will be set at splitting if necessary.

This patch adds this lock order:
   compound_lock() -> page_cgroup_move_lock().

[akpm@linux-foundation.org: fix warning]
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-20 17:02:06 -08:00
KAMEZAWA Hiroyuki
e401f1761c memcg: modify accounting function for supporting THP better
mem_cgroup_charge_statisics() was designed for charging a page but now, we
have transparent hugepage.  To fix problems (in following patch) it's
required to change the function to get the number of pages as its
arguments.

The new function gets following as argument.
  - type of page rather than 'pc'
  - size of page which is accounted.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-20 17:02:05 -08:00
David Dillow
20d9600cb4 fs/direct-io.c: don't try to allocate more than BIO_MAX_PAGES in a bio
When using devices that support max_segments > BIO_MAX_PAGES (256), direct
IO tries to allocate a bio with more pages than allowed, which leads to an
oops in dio_bio_alloc().  Clamp the request to the supported maximum, and
change dio_bio_alloc() to reflect that bio_alloc() will always return a
bio when called with __GFP_WAIT and a valid number of vectors.

[akpm@linux-foundation.org: remove redundant BUG_ON()]
Signed-off-by: David Dillow <dillowda@ornl.gov>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-20 17:02:05 -08:00
Johannes Weiner
82478fb7bc mm: compaction: prevent division-by-zero during user-requested compaction
Up until 3e7d344 ("mm: vmscan: reclaim order-0 and use compaction instead
of lumpy reclaim"), compaction skipped calculating the fragmentation index
of a zone when compaction was explicitely requested through the procfs
knob.

However, when compaction_suitable was introduced, it did not come with an
extra check for order == -1, set on explicit compaction requests, and
passed this order on to the fragmentation index calculation, where it
overshifts the number of requested pages, leading to a division by zero.

This patch makes sure that order == -1 is recognized as the flag it is
rather than passing it along as valid order parameter.

[akpm@linux-foundation.org: add comment, per Mel]
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-20 17:02:05 -08:00
Jesper Juhl
3305de51bf mm/vmscan.c: remove duplicate include of compaction.h
Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-20 17:02:05 -08:00
Tomi Valkeinen
abb65272a1 memblock: fix memblock_is_region_memory()
memblock_is_region_memory() uses reserved memblocks to search for the
given region, while it should use the memory memblocks.

I encountered the problem with OMAP's framebuffer ram allocation.
Normally the ram is allocated dynamically, and this function is not
called.  However, if we want to pass the framebuffer from the bootloader
to the kernel (to retain the boot image), this function is used to check
the validity of the kernel parameters for the framebuffer ram area.

Signed-off-by: Tomi Valkeinen <tomi.valkeinen@nokia.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-20 17:02:05 -08:00
Johannes Weiner
453c719261 thp: keep highpte mapped until it is no longer needed
Two users reported THP-related crashes on 32-bit x86 machines.  Their oops
reports indicated an invalid pte, and subsequent code inspection showed
that the highpte is actually used after unmap.

The fix is to unmap the pte only after all operations against it are
finished.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: Ilya Dryomov <idryomov@gmail.com>
Reported-by: werner <w.landgraf@ru.ru>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Tested-by: Ilya Dryomov <idryomov@gmail.com>
Tested-by: Steven Rostedt <rostedt@goodmis.org
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-20 17:02:05 -08:00
David Rientjes
6a108a14fa kconfig: rename CONFIG_EMBEDDED to CONFIG_EXPERT
The meaning of CONFIG_EMBEDDED has long since been obsoleted; the option
is used to configure any non-standard kernel with a much larger scope than
only small devices.

This patch renames the option to CONFIG_EXPERT in init/Kconfig and fixes
references to the option throughout the kernel.  A new CONFIG_EMBEDDED
option is added that automatically selects CONFIG_EXPERT when enabled and
can be used in the future to isolate options that should only be
considered for embedded systems (RISC architectures, SLOB, etc).

Calling the option "EXPERT" more accurately represents its intention: only
expert users who understand the impact of the configuration changes they
are making should enable it.

Reviewed-by: Ingo Molnar <mingo@elte.hu>
Acked-by: David Woodhouse <david.woodhouse@intel.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Greg KH <gregkh@suse.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Robin Holt <holt@sgi.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-20 17:02:05 -08:00
Linus Torvalds
fc887b15d9 Merge branch 'tty-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6
* 'tty-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6:
  tty: update MAINTAINERS file due to driver movement
  tty: move drivers/serial/ to drivers/tty/serial/
  tty: move hvc drivers to drivers/tty/hvc/
2011-01-20 16:39:23 -08:00
Linus Torvalds
466c19063b Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  sched, cgroup: Use exit hook to avoid use-after-free crash
  sched: Fix signed unsigned comparison in check_preempt_tick()
  sched: Replace rq->bkl_count with rq->rq_sched_info.bkl_count
  sched, autogroup: Fix CONFIG_RT_GROUP_SCHED sched_setscheduler() failure
  sched: Display autogroup names in /proc/sched_debug
  sched: Reinstate group names in /proc/sched_debug
  sched: Update effective_load() to use global share weights
2011-01-20 16:37:55 -08:00
Linus Torvalds
67290f41b2 Merge branch 'xen/xenbus' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen
* 'xen/xenbus' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen:
  xenbus: Fix memory leak on release
  xenbus: avoid zero returns from read()
  xenbus: add missing wakeup in concurrent read/write
  xenbus: allow any xenbus command over /proc/xen/xenbus
  xenfs/xenbus: report partial reads/writes correctly
2011-01-20 16:37:28 -08:00
Linus Torvalds
5cdec1fca2 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6
* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
  cifs: mangle existing header for SMB_COM_NT_CANCEL
  cifs: remove code for setting timeouts on requests
  [CIFS] cifs: reconnect unresponsive servers
  cifs: set up recurring workqueue job to do SMB echo requests
  cifs: add ability to send an echo request
  cifs: add cifs_call_async
  cifs: allow for different handling of received response
  cifs: clean up sync_mid_result
  cifs: don't reconnect server when we don't get a response
  cifs: wait indefinitely for responses
  cifs: Use mask of ACEs for SID Everyone to calculate all three permissions user, group, and other
  cifs: Fix regression during share-level security mounts (Repost)
  [CIFS] Update cifs version number
  cifs: move mid result processing into common function
  cifs: move locked sections out of DeleteMidQEntry and AllocMidQEntry
  cifs: clean up accesses to midCount
  cifs: make wait_for_free_request take a TCP_Server_Info pointer
  cifs: no need to mark smb_ses_list as cifs_demultiplex_thread is exiting
  cifs: don't fail writepages on -EAGAIN errors
  CIFS: Fix oplock break handling (try #2)
2011-01-20 16:36:38 -08:00
Linus Torvalds
e55fdbd741 Merge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
  virtio: remove virtio-pci root device
  LGUEST_GUEST: fix unmet direct dependencies (VIRTUALIZATION && VIRTIO)
  lguest: compile fixes
  lguest: Use this_cpu_ops
  lguest: document --rng in example Launcher
  lguest: example launcher to use guard pages, drop PROT_EXEC, fix limit logic
  lguest: --username and --chroot options
2011-01-20 16:31:20 -08:00
Linus Torvalds
c522682d74 Merge branch 'for-38-rc2' of git://codeaurora.org/quic/kernel/davidb/linux-msm
* 'for-38-rc2' of git://codeaurora.org/quic/kernel/davidb/linux-msm:
  msm: qsd8x50: Platform data isn't init data
2011-01-20 16:30:22 -08:00
Linus Torvalds
069b614337 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6:
  trusted-keys: avoid scattring va_end()
  trusted-keys: check for NULL before using it
  trusted-keys: another free memory bugfix
  trusted-keys: free memory bugfix
2011-01-20 16:29:43 -08:00
Linus Torvalds
b06ae1ead2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes
* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes:
  GFS2: Fix error path in gfs2_lookup_by_inum()
  GFS2: remove iopen glocks from cache on failed deletes
2011-01-20 16:29:04 -08:00
Linus Torvalds
e589501cb9 Merge branch 'acpica' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6
* 'acpica' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6:
  ACPICA: Update version to 20110112
  ACPICA: Update all ACPICA copyrights and signons to 2011
  ACPICA: Fix issues/fault with automatic "serialized" method support
  ACPICA: Debugger: Lock namespace for duration of a namespace dump
  ACPICA: Fix namespace race condition
  ACPICA: Fix memory leak in acpi_ev_asynch_execute_gpe_method().
2011-01-20 16:28:34 -08:00
Linus Torvalds
28e58ee8ce Fix broken "pipe: use event aware wakeups" optimization
Commit e462c448fd ("pipe: use event aware wakeups") optimized the pipe
event wakeup calls to avoid wakeups if the events do not match the
requested set.

However, the optimization was buggy, in that it didn't actually use the
correct sets for the events: when we make room for more data to be
written, the pipe poll() routine will return both the POLLOUT _and_
POLLWRNORM bits.  Similarly for read.

And most critically, when a pipe is released, that will potentially
result in POLLHUP|POLLERR (depending on whether it was the last reader
or writer), not just the regular POLLIN|POLLOUT.

This bug showed itself as a hung gnome-screensaver-dialog process, stuck
forever (or at least until it was poked by a signal or by being traced)
in a poll() system call.

Cc: Davide Libenzi <davidel@xmailserver.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-20 16:21:59 -08:00
Linus Torvalds
d7b9935a34 i915: Fix i915 suspend delay
During system suspend, the "wait for ring buffer to empty" loop would
always time out after three seconds, because the faster cached ring
buffer head read would always return zero.  Force the slow-and-careful
PIO read on all but the first iterations of the loop to fix it.

This also removes the unused (and useless) 'actual_head' variable that
tried to approximate doing this, but did it incorrectly.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Dave Airlie <airlied@linux.ie>
Cc: DRI mailing list <dri-devel@lists.freedesktop.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-20 16:18:25 -08:00
Benjamin Herrenschmidt
50f4df4e6a Merge remote branch 'kumar/next' into merge 2011-01-21 11:00:44 +11:00
Stefan Richter
324719978d firewire: net: is not experimental anymore
thanks to Clemens' and Maxim's fixes to firewire-ohci and -net in the
last two kernel releases.

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
2011-01-21 00:36:00 +01:00
Maxim Levitsky
74a1450499 firewire: net: invalidate ARP entries of removed nodes
This makes it possible to resume communication with a node that dropped
off the bus for a brief period.  Otherwise communication will only be
possible after ARP cache entry timeouts.

Signed-off-by: Maxim Levitsky <maximlevitsky@gmail.com>
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> (rebased)
2011-01-21 00:36:00 +01:00
Stefan Richter
6044565af4 firewire: core: fix unstable I/O with Canon camcorder
Regression since commit 1038953674, "firewire: core: check for 1394a
compliant IRM, fix inaccessibility of Sony camcorder":

The camcorder Canon MV5i generates lots of bus resets when asynchronous
requests are sent to it (e.g. Config ROM read requests or FCP Command
write requests) if the camcorder is not root node.  This causes drop-
outs in videos or makes the camcorder entirely inaccessible.
https://bugzilla.redhat.com/show_bug.cgi?id=633260

Fix this by allowing any Canon device, even if it is a pre-1394a IRM
like MV5i are, to remain root node (if it is at least Cycle Master
capable).  With the FireWire controller cards that I tested, MV5i always
becomes root node when plugged in and left to its own devices.

Reported-by: Ralf Lange
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Cc: <stable@kernel.org> # 2.6.32.y and newer
2011-01-21 00:27:46 +01:00
Jeff Layton
84cdf74e80 cifs: fix unaligned accesses in cifsConvertToUCS
Move cifsConvertToUCS to cifs_unicode.c where all of the other unicode
related functions live. Have it store mapped characters in 'temp' and
then use put_unaligned_le16 to copy it to the target buffer. Also fix
the comments to match kernel coding style.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Pavel Shilovsky <piastryyy@gmail.com>
Reviewed-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-01-20 21:48:00 +00:00
Jeff Layton
ba2dbf30df cifs: clean up unaligned accesses in cifs_unicode.c
Make sure we use get/put_unaligned routines when accessing wide
character strings.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Pavel Shilovsky <piastryyy@gmail.com>
Reviewed-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-01-20 21:47:57 +00:00
Jeff Layton
26ec254869 cifs: fix unaligned access in check2ndT2 and coalesce_t2
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Pavel Shilovsky <piastryyy@gmail.com>
Reviewed-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-01-20 21:47:54 +00:00
Jeff Layton
12df83c9b9 cifs: clean up unaligned accesses in validate_t2
...and clean up function to reduce indentation.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Pavel Shilovsky <piastryyy@gmail.com>
Reviewed-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-01-20 21:46:33 +00:00
Jeff Layton
690c522fa5 cifs: use get/put_unaligned functions to access ByteCount
It's possible that when we access the ByteCount that the alignment
will be off. Most CPUs deal with that transparently, but there's
usually some performance impact. Some CPUs raise an exception on
unaligned accesses.

Fix this by accessing the byte count using the get_unaligned and
put_unaligned inlined functions. While we're at it, fix the types
of some of the variables that end up getting returns from these
functions.

Acked-by: Pavel Shilovsky <piastryyy@gmail.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-01-20 21:46:29 +00:00
Jeff Layton
aae62fdb6b cifs: move time field in cifsInodeInfo
...and remove length qualifiers from bools.

Before:

	/* size: 1176, cachelines: 19, members: 13 */
	/* sum members: 1165, holes: 2, sum holes: 11 */
	/* bit holes: 1, sum bit holes: 4 bits */
	/* last cacheline: 24 bytes */

After:

	/* size: 1168, cachelines: 19, members: 13 */
	/* last cacheline: 16 bytes */

...savings of 8 bytes per inode.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Pavel Shilovsky <piastryyy@gmail.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-01-20 21:46:26 +00:00
Jeff Layton
c3dccf4817 cifs: TCP_Server_Info diet
Remove fields that are completely unused, and rearrange struct
according to recommendations by "pahole".

Before:

	/* size: 1112, cachelines: 18, members: 49 */
	/* sum members: 1086, holes: 8, sum holes: 26 */
	/* bit holes: 1, sum bit holes: 7 bits */
	/* last cacheline: 24 bytes */

After:

	/* size: 1072, cachelines: 17, members: 42 */
	/* sum members: 1065, holes: 3, sum holes: 7 */
	/* last cacheline: 48 bytes */

...savings of 40 bytes per struct on x86_64. 21 bytes by field removal,
and 19 by reorganizing to eliminate holes.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-01-20 21:46:22 +00:00
Pavel Shilovsky
a70307eeeb CIFS: Implement cifs_strict_readv (try #4)
Read from the cache if we have at least Level II oplock - otherwise
read from the server. Add cifs_user_readv to let the client read into
iovec buffers.

Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Pavel Shilovsky <piastryyy@gmail.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-01-20 21:42:29 +00:00
Pavel Shilovsky
7a6a19b17a CIFS: Implement cifs_file_strict_mmap (try #2)
Invalidate inode mapping if we don't have at least Level II oplock.

Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Pavel Shilovsky <piastryyy@gmail.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-01-20 21:42:25 +00:00
Pavel Shilovsky
8be7e6ba14 CIFS: Implement cifs_strict_fsync
Invalidate inode mapping if we don't have at least Level II oplock in
cifs_strict_fsync. Also remove filemap_write_and_wait call from cifs_fsync
because it is previously called from vfs_fsync_range. Add file operations'
structures for strict cache mode.

Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Pavel Shilovsky <piastryyy@gmail.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-01-20 21:42:21 +00:00
Pavel Shilovsky
4f8ba8a0c0 CIFS: Make cifsFileInfo_put work with strict cache mode
On strict cache mode when we close the last file handle of the inode we
should set invalid_mapping flag on this inode to prevent data coherency
problem when we open it again but it has been modified on the server.

Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Pavel Shilovsky <piastryyy@gmail.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-01-20 21:42:17 +00:00
Linus Torvalds
b23fffd778 ACPI / Battery: remove battery refresh on resume
This partially reverts commit da8aeb92d4
("ACPI / Battery: Update information on info notification and resume"),
which causes a hang on resume on at least some machines.

This bug was bisected on an ASUS EeePC 901, which hangs at resume time
if we do that "acpi_battery_refresh(battery)" in the battery resume
function.

Rafael suspects we'll still need to refresh the sysfs files upon resume,
but that that can be done from a PM notifier (that will run after
thawing user space).

Bisected-and-tested-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Garrett <mjg@redhat.com>
Cc: Len Brown <len.brown@intel.com>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-20 13:14:10 -08:00
Randy Dunlap
7d81c3b9e2 xen: fix non-ANSI function warning in irq.c
Fix sparse warning for non-ANSI function declaration:

arch/x86/xen/irq.c:129:30: warning: non-ANSI function declaration of function 'xen_init_irq_ops'

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc:	Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-01-20 14:52:13 -05:00
Jeff Layton
76dcc26f1d cifs: mangle existing header for SMB_COM_NT_CANCEL
The NT_CANCEL command looks just like the original command, except for a
few small differences. The send_nt_cancel function however currently takes
a tcon, which we don't have in SendReceive and SendReceive2.

Instead of "respinning" the entire header for an NT_CANCEL, just mangle
the existing header by replacing just the fields we need. This means we
don't need a tcon and allows us to call it from other places.

Reviewed-by: Pavel Shilovsky <piastryyy@gmail.com>
Reviewed-by: Suresh Jayaraman <sjayaraman@suse.de>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-01-20 18:08:36 +00:00
Jeff Layton
7749981ec3 cifs: remove code for setting timeouts on requests
Since we don't time out individual requests anymore, remove the code
that we used to use for setting timeouts on different requests.

Reviewed-by: Pavel Shilovsky <piastryyy@gmail.com>
Reviewed-by: Suresh Jayaraman <sjayaraman@suse.de>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-01-20 18:07:55 +00:00
Steve French
fda3594362 [CIFS] cifs: reconnect unresponsive servers
If the server isn't responding to echoes, we don't want to leave tasks
hung waiting for it to reply. At that point, we'll want to reconnect
so that soft mounts can return an error to userspace quickly.

If the client hasn't received a reply after a specified number of echo
intervals, assume that the transport is down and attempt to reconnect
the socket.

The number of echo_intervals to wait before attempting to reconnect is
tunable via a module parameter. Setting it to 0, means that the client
will never attempt to reconnect. The default is 5.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
2011-01-20 18:06:34 +00:00
Jeff Layton
c74093b694 cifs: set up recurring workqueue job to do SMB echo requests
Reviewed-by: Suresh Jayaraman <sjayaraman@suse.de>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-01-20 17:48:10 +00:00