kernel_optimize_test

Author	SHA1	Message	Date
Linus Torvalds	0999d978dc	Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: fix compat-vdso x86/mm: unify init task OOM handling x86/mm: do not trigger a kernel warning if user-space disables interrupts and generates a page fault	2008-10-16 15:08:45 -07:00
Ingo Molnar	3a1dfe6eef	x86/mm: unify init task OOM handling Linus noticed that the "again:" versus "survive:" OOM logic for the init task was arbitrarily different. The 64-bit codepath is the better one, because it correctly re-lookups the vma after having dropped the ->mmap_sem. Signed-off-by: Ingo Molnar <mingo@elte.hu> Acked-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-13 18:11:13 +02:00
Linus Torvalds	891cffbd6b	x86/mm: do not trigger a kernel warning if user-space disables interrupts and generates a page fault Arjan reported a spike in the following bug pattern in v2.6.27: http://www.kerneloops.org/searchweek.php?search=lock_page which happens because hwclock started triggering warnings due to a (correct) might_sleep() check in the MM code. The warning occurs because hwclock uses this dubious sequence of code to run "atomic" code: static unsigned long atomic(const char name, unsigned long (op)(unsigned long), unsigned long arg) { unsigned long v; __asm__ volatile ("cli"); v = (*op)(arg); __asm__ volatile ("sti"); return v; } Then it pagefaults in that "atomic" section, triggering the warning. There is no way the kernel could provide "atomicity" in this path, a page fault is a cannot-continue machine event so the kernel has to wait for the page to be filled in. Even if it was just a minor fault we'd have to take locks and might have to spend quite a bit of time with interrupts disabled - not nice to irq latencies in general. So instead just enable interrupts in the pagefault path unconditionally if we come from user-space, and handle the fault. Also, while touching this code, unify some trivial parts of the x86 VM paths at the same time. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Reported-by: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-10-13 17:46:39 +02:00
Alexander van Heukelum	69c89b5bf7	traps: x86: remove trace_hardirqs_fixup from pagefault handler The last use of trace_hardirqs_fixup is unnecessary, because the trap is taken with interrupt off on i386 as well as x86_64, and the irq-tracer is notified of this from the assembly code. trace_hardirqs_fixup and trace_hardirqs_fixup_flags are removed from include/asm-x86/irqflags.h as they are no longer used. Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-10-13 10:22:04 +02:00
Ingo Molnar	365d46dc9b	Merge branch 'linus' into x86/xen Conflicts: arch/x86/kernel/cpu/common.c arch/x86/kernel/process_64.c arch/x86/xen/enlighten.c	2008-10-12 12:37:32 +02:00
Jan Beulich	cc643d4687	x86: adjust vmalloc_sync_all() for Xen (2nd try) Since the fourth PDPT entry cannot be shared under Xen, vmalloc_sync_all() must iterate over pmd-s rather than pgd-s here. Luckily, the code isn't used for native PAE (SHARED_KERNEL_PMD is 1) and the change is benign to non-PAE. Also do a little more cleanup in that function. Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Jeremy Fitzhardinge <jeremy@goop.org>	2008-09-06 19:48:01 +02:00
Jaswinder Singh	70ef56414e	x86: mm/fault.c declare do_page_fault before they get used declared do_page_fault() in asm-x86/trap.h for both X86_32 and X86_64 removed do_invalid_op declaration from mm/fault.c as it is already declared in asm-x86/trap.h Signed-off-by: Jaswinder Singh <jaswinder@infradead.org>	2008-07-23 17:36:37 +05:30
Ingo Molnar	5806b81ac1	Merge branch 'auto-ftrace-next' into tracing/for-linus Conflicts: arch/x86/kernel/entry_32.S arch/x86/kernel/process_32.c arch/x86/kernel/process_64.c arch/x86/lib/Makefile include/asm-x86/irqflags.h kernel/Makefile kernel/sched.c Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-14 16:11:52 +02:00
Jeremy Fitzhardinge	67350a5c45	x86: simplify vmalloc_sync_all vmalloc_sync_all() is only called from register_die_notifier and alloc_vm_area. Neither is on any performance-critical paths, so vmalloc_sync_all() itself is not on any hot paths. Given that the optimisations in vmalloc_sync_all add a fair amount of code and complexity, and are fairly hard to evaluate for correctness, it's better to just remove them to simplify the code rather than worry about its absolute performance. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: xen-devel <xen-devel@lists.xensource.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-08 13:10:59 +02:00
Ingo Molnar	58cf35228f	Merge branches 'x86/mmio', 'x86/delay', 'x86/idle', 'x86/oprofile', 'x86/debug', 'x86/ptrace' and 'x86/amd-iommu' into x86/devel	2008-07-08 09:46:15 +02:00
Ingo Molnar	d763d5edf9	Merge branch 'linus' into tracing/mmiotrace	2008-07-07 08:07:35 +02:00
Gustavo Fernando Padovan	95c60b08c6	x86: remove unnecessary #ifdef CONFIG_X86_32...#else Remove the #ifdef conditional because this comparison is already done in user_mode_vm(). Signed-off-by: Gustavo F. Padovan <gustavo@las.ic.unicamp.br> Cc: akpm@osdl.org Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-03 15:09:10 +02:00
Vegard Nossum	f294a8ce21	x86: small unifications of address printing 'man 3 printf' tells me that %p should be printed as if by %#x, but this is not true for the kernel, which does not use the '0x' prefix for the %p conversion specifier. A small cast to (void *) is also prettier than #ifdef/#else/#endif. Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-01 16:21:41 +02:00
Henry Nestler	b29c701dea	x86: fix endless page faults in mount_block_root for Linux 2.6 Page faults in kernel address space between PAGE_OFFSET up to VMALLOC_START should not try to map as vmalloc. Fix rarely endless page faults inside mount_block_root for root filesystem at boot time. All 32bit kernels up to 2.6.25 can fail into this hole. I can not present this under native linux kernel. I see, that the 64bit has fixed the problem. I copied the same lines into 32bit part. Recorded debugs are from coLinux kernel 2.6.22.18 (virtualisation): http://www.henrynestler.com/colinux/testing/pfn-check-0.7.3/20080410-antinx/bug16-recursive-page-fault-endless.txt The physicaly memory was trimmed down to 192MB to better catch the bug. More memory gets the bug more rarely. Details, how every x86 32bit system can fail: Start from "mount_block_root", http://lxr.linux.no/linux/init/do_mounts.c#L297 There the variable "fs_names" got one memory page with 4096 bytes. Variable "p" walks through the existing file system types. The first string is no problem. But, with the second loop in mount_block_root the offset of "p" is not at beginning of page, the offset is for example +9, if "reiserfs" is the first in list. Than calls do_mount_root, and lands in sys_mount. Remember: Variable "type_page" contains now "fs_type+9" and not contains a full page. The sys_mount copies 4096 bytes with function "exact_copy_from_user()": http://lxr.linux.no/linux/fs/namespace.c#L1540 Mostly exist pages after the buffer "fs_names+4096+9" and the page fault handler was not called. No problem. In the case, if the page after "fs_names+4096" is not mapped, the page fault handler was called from http://lxr.linux.no/linux/fs/namespace.c#L1320 The do_page_fault gots an address 0xc03b4000. It's kernel address, address >= TASK_SIZE, but not from vmalloc! It's from "__getname()" alias "kmem_cache_alloc". The "error_code" is 0. "vmalloc_fault" will be call: http://lxr.linux.no/linux/arch/i386/mm/fault.c#L332 "vmalloc_fault" tryed to find the physical page for a non existing virtual memory area. The macro "pte_present" in vmalloc_fault() got a next page fault for 0xc0000ed0 at: http://lxr.linux.no/linux/arch/i386/mm/fault.c#L282 No PTE exist for such virtual address. The page fault handler was trying to sync the physical page for the PTE lockup. This called vmalloc_fault() again for address 0xc000000, and that also was not existing. The endless began... In normal case the cpu would still loop with disabled interrrupts. Under coLinux this was catched by a stack overflow inside printk debugs. Signed-off-by: Henry Nestler <henry.nestler@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-06-12 21:26:07 +02:00
Pekka Paalanen	0fd0e3da45	x86: mmiotrace full patch, preview 1 kmmio.c handles the list of mmio probes with callbacks, list of traced pages, and attaching into the page fault handler and die notifier. It arms, traps and disarms the given pages, this is the core of mmiotrace. mmio-mod.c is a user interface, hooking into ioremap functions and registering the mmio probes. It also decodes the required information from trapped mmio accesses via the pre and post callbacks in each probe. Currently, hooking into ioremap functions works by redefining the symbols of the target (binary) kernel module, so that it calls the traced versions of the functions. The most notable changes done since the last discussion are: - kmmio.c is a built-in, not part of the module - direct call from fault.c to kmmio.c, removing all dynamic hooks - prepare for unregistering probes at any time - make kmmio re-initializable and accessible to more than one user - rewrite kmmio locking to remove all spinlocks from page fault path Can I abuse call_rcu() like I do in kmmio.c:unregister_kmmio_probe() or is there a better way? The function called via call_rcu() itself calls call_rcu() again, will this work or break? There I need a second grace period for RCU after the first grace period for page faults. Mmiotrace itself (mmio-mod.c) is still a module, I am going to attack that next. At some point I will start looking into how to make mmiotrace a tracer component of ftrace (thanks for the hint, Ingo). Ftrace should make the user space part of mmiotracing as simple as 'cat /debug/trace/mmio > dump.txt'. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-05-24 11:22:12 +02:00
Pekka Paalanen	10c43d2eb5	x86: explicit call to mmiotrace in do_page_fault() The custom page fault handler list is replaced with a single function pointer. All related functions and variables are renamed for mmiotrace. Signed-off-by: Pekka Paalanen <pq@iki.fi> Cc: Christoph Hellwig <hch@infradead.org> Cc: Arjan van de Ven <arjan@infradead.org> Cc: pq@iki.fi Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-05-24 11:21:55 +02:00
Pekka Paalanen	86069782d6	x86: add a list for custom page fault handlers. Provides kernel modules a way to register custom page fault handlers. On every page fault this will call a list of registered functions. The functions may handle the fault and force do_page_fault() to return immediately. This functionality is similar to the now removed page fault notifiers. Custom page fault handlers are used by debugging and reverse engineering tools. Mmiotrace is one such tool and a patch to add it into the tree will follow. The custom page fault handlers are called earlier in do_page_fault() than the page fault notifiers were. Signed-off-by: Pekka Paalanen <pq@iki.fi> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-05-23 21:16:38 +02:00
gorcunov@gmail.com	6b6891f9c5	x86: cleanup - rename VM_MASK to X86_VM_MASK This patch renames VM_MASK to X86_VM_MASK (which in turn defined as alias to X86_EFLAGS_VM) to better distinguish from virtual memory flags. We can't just use X86_EFLAGS_VM instead because it is also used for conditional compilation Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-04-17 17:41:33 +02:00
Ingo Molnar	b4e0409a36	x86: check vmlinux limits, 64-bit these build-time and link-time checks would have prevented the vmlinux size regression. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-04-17 17:40:45 +02:00
Ingo Molnar	3085354de6	x86: prefetch fix #2 Linus noticed a second bug and an uncleanliness: - we'd return on any instruction fetch fault - we'd use both the value of 16 and the PF_INSTR symbol which are the same and make no sense the cleanup nicely unifies this piece of logic. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-03-27 22:00:16 +01:00
Ingo Molnar	bc713dcf35	x86: fix prefetch workaround some early Athlon XP's and Opterons generate bogus faults on prefetch instructions. The workaround for this regressed over .24 - reinstate it. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-03-27 16:08:44 +01:00
Adrian Bunk	cae30f8270	x86: make dump_pagetable() static dump_pagetable() can now become static. Signed-off-by: Adrian Bunk <bunk@kernel.org> Acked-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-02-14 23:30:19 +01:00
Ingo Molnar	58d5d0d8dd	x86: fix deadlock, make pgd_lock irq-safe lockdep just caught this one: ================================= [ INFO: inconsistent lock state ] 2.6.24 #38 --------------------------------- inconsistent {in-softirq-W} -> {softirq-on-W} usage. swapper/1 [HC0[0]:SC0[0]:HE1:SE1] takes: (pgd_lock){-+..}, at: [<ffffffff8022a9ea>] mm_init+0x1da/0x250 {in-softirq-W} state was registered at: [<ffffffffffffffff>] 0xffffffffffffffff irq event stamp: 394559 hardirqs last enabled at (394559): [<ffffffff80267f0a>] get_page_from_freelist+0x30a/0x4c0 hardirqs last disabled at (394558): [<ffffffff80267d25>] get_page_from_freelist+0x125/0x4c0 softirqs last enabled at (393952): [<ffffffff80232f8e>] __do_softirq+0xce/0xe0 softirqs last disabled at (393945): [<ffffffff8020c57c>] call_softirq+0x1c/0x30 other info that might help us debug this: no locks held by swapper/1. stack backtrace: Pid: 1, comm: swapper Not tainted 2.6.24 #38 Call Trace: [<ffffffff8024e1fb>] print_usage_bug+0x18b/0x190 [<ffffffff8024f55d>] mark_lock+0x53d/0x560 [<ffffffff8024fffa>] __lock_acquire+0x3ca/0xed0 [<ffffffff80250ba8>] lock_acquire+0xa8/0xe0 [<ffffffff8022a9ea>] ? mm_init+0x1da/0x250 [<ffffffff809bcd10>] _spin_lock+0x30/0x70 [<ffffffff8022a9ea>] mm_init+0x1da/0x250 [<ffffffff8022aa99>] mm_alloc+0x39/0x50 [<ffffffff8028b95a>] bprm_mm_init+0x2a/0x1a0 [<ffffffff8028d12b>] do_execve+0x7b/0x220 [<ffffffff80209776>] sys_execve+0x46/0x70 [<ffffffff8020c214>] kernel_execve+0x64/0xd0 [<ffffffff8020901e>] ? _stext+0x1e/0x20 [<ffffffff802090ba>] init_post+0x9a/0xf0 [<ffffffff809bc5f6>] ? trace_hardirqs_on_thunk+0x35/0x3a [<ffffffff8024f75a>] ? trace_hardirqs_on+0xba/0xd0 [<ffffffff8020c1a8>] ? child_rip+0xa/0x12 [<ffffffff8020bcbc>] ? restore_args+0x0/0x44 [<ffffffff8020c19e>] ? child_rip+0x0/0x12 turns out that pgd_lock has been used on 64-bit x86 in an irq-unsafe way for almost two years, since commit `8c914cb704`. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-02-06 22:39:45 +01:00
Thomas Gleixner	d8b57bb700	x86: make spurious fault handler aware of large mappings In very rare cases, on certain CPUs, we could end up in the spurious fault handler and ignore a large pud/pmd mapping. The resulting pte pointer points into the mapped physical space and dereferencing it will fault recursively. Make the code aware of large mappings and do the permission check on the pmd/pud entry, when a large pud/pmd mapping is detected. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-02-06 22:39:43 +01:00
Andi Kleen	b536022227	x86: support gbpages in pagetable dump Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-02-04 16:48:09 +01:00
Harvey Harrison	cf89ec924d	x86: reduce ifdef sections in fault.c Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-02-04 16:47:56 +01:00
Harvey Harrison	93809be8b1	x86: fixes for lookup_address args Signedness mismatches in level argument. Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-02-01 17:49:43 +01:00
Jeremy Fitzhardinge	e3ed910db2	x86: use the same pgd_list for PAE and 64-bit Use a standard list threaded through page->lru for maintaining the pgd list on PAE. This is the same as 64-bit, and seems saner than using a non-standard list via page->index. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-01-30 13:34:11 +01:00
Harvey Harrison	fd40d6e318	x86: shrink some ifdefs in fault.c The change from current to tsk in do_page_fault is safe as this is set at the very beginning of the function. Removes a likely() annotation from the 64-bit version, this could have instead been added to 32-bit. Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-01-30 13:34:11 +01:00
Jeremy Fitzhardinge	5b727a3b01	x86: ignore spurious faults When changing a kernel page from RO->RW, it's OK to leave stale TLB entries around, since doing a global flush is expensive and they pose no security problem. They can, however, generate a spurious fault, which we should catch and simply return from (which will have the side-effect of reloading the TLB to the current PTE). This can occur when running under Xen, because it frequently changes kernel pages from RW->RO->RW to implement Xen's pagetable semantics. It could also occur when using CONFIG_DEBUG_PAGEALLOC, since it avoids doing a global TLB flush after changing page permissions. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-01-30 13:34:11 +01:00
Harvey Harrison	b406ac61e9	x86: remove nx_enabled from fault.c On !PAE 32-bit, _PAGE_NX will be 0, making is_prefetch always return early. The test is sufficient on PAE as __supported_pte_mask is updated in the same places as nx_enabled in init_32.c which also takes disable_nx into account. Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-01-30 13:34:11 +01:00
Harvey Harrison	c61e211d99	x86: unify fault_32\|64.c Unify includes in moved fault.c. Modify Makefiles to pick up unified file. Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-01-30 13:34:11 +01:00

32 Commits