kernel_optimize_test/arch
Henry Nestler b29c701dea x86: fix endless page faults in mount_block_root for Linux 2.6
Page faults in kernel address space between PAGE_OFFSET up to
VMALLOC_START should not try to map as vmalloc.

Fix rarely endless page faults inside mount_block_root for root
filesystem at boot time.

All 32bit kernels up to 2.6.25 can fail into this hole.
I can not present this under native linux kernel. I see, that the 64bit
has fixed the problem. I copied the same lines into 32bit part.

Recorded debugs are from coLinux kernel 2.6.22.18 (virtualisation):
http://www.henrynestler.com/colinux/testing/pfn-check-0.7.3/20080410-antinx/bug16-recursive-page-fault-endless.txt
The physicaly memory was trimmed down to 192MB to better catch the bug.
More memory gets the bug more rarely.

Details, how every x86 32bit system can fail:

Start from "mount_block_root",
http://lxr.linux.no/linux/init/do_mounts.c#L297
There the variable "fs_names" got one memory page with 4096 bytes.
Variable "p" walks through the existing file system types. The first
string is no problem.
But, with the second loop in mount_block_root the offset of "p" is not
at beginning of page, the offset is for example +9, if "reiserfs" is the
first in list.
Than calls do_mount_root, and lands in sys_mount.
Remember: Variable "type_page" contains now "fs_type+9" and not contains
a full page.
The sys_mount copies 4096 bytes with function "exact_copy_from_user()":
http://lxr.linux.no/linux/fs/namespace.c#L1540

Mostly exist pages after the buffer "fs_names+4096+9" and the page fault
handler was not called. No problem.

In the case, if the page after "fs_names+4096" is not mapped, the page
fault handler was called from http://lxr.linux.no/linux/fs/namespace.c#L1320

The do_page_fault gots an address 0xc03b4000.
It's kernel address, address >= TASK_SIZE, but not from vmalloc! It's
from "__getname()" alias "kmem_cache_alloc".
The "error_code" is 0. "vmalloc_fault" will be call:
http://lxr.linux.no/linux/arch/i386/mm/fault.c#L332

"vmalloc_fault" tryed to find the physical page for a non existing
virtual memory area. The macro "pte_present" in vmalloc_fault()
got a next page fault for 0xc0000ed0 at:
http://lxr.linux.no/linux/arch/i386/mm/fault.c#L282

No PTE exist for such virtual address. The page fault handler was trying
to sync the physical page for the PTE lockup.

This called vmalloc_fault() again for address 0xc000000, and that also
was not existing. The endless began...

In normal case the cpu would still loop with disabled interrrupts. Under
coLinux this was catched by a stack overflow inside printk debugs.

Signed-off-by: Henry Nestler <henry.nestler@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-06-12 21:26:07 +02:00
..
alpha
arm [ARM] pxa: fix tosa.c build error 2008-06-02 20:38:15 +01:00
avr32 avr32: Fix cpufreq oops when ondemand governor is default 2008-05-27 09:37:42 +02:00
blackfin Blackfin arch: protect only the SPI bus controller with CONFIG_SPI_BFIN 2008-06-07 15:03:01 +08:00
cris
frv Fix various old email addresses for dwmw2 2008-06-06 11:29:10 -07:00
h8300
ia64 [IA64] Workaround for RSE issue 2008-05-27 13:24:39 -07:00
m32r
m68k m68k: enable CONFIG_COMPAT_BRK by default 2008-06-06 11:29:09 -07:00
m68knommu
mips Fix divide by zero error in build_clear_page() and build_copy_page() 2008-06-05 18:13:16 +01:00
mn10300
parisc
powerpc [POWERPC] Make walk_memory_resource available with MEMORY_HOTPLUG=n 2008-06-09 11:32:41 +10:00
ppc [POWERPC] Export empty_zero_page and copy_page in arch/ppc 2008-05-31 17:08:28 +10:00
s390 [S390] Update default configuration. 2008-05-30 10:03:36 +02:00
sh sh: Add -mno-fdpic to default flags. 2008-06-09 16:49:43 +09:00
sparc sparc: switch /proc/led to seq_file 2008-06-03 15:21:21 -07:00
sparc64
um uml: PATH_MAX needs limits.h 2008-06-06 11:29:10 -07:00
v850
x86 x86: fix endless page faults in mount_block_root for Linux 2.6 2008-06-12 21:26:07 +02:00
xtensa
.gitignore
Kconfig