kernel_optimize_test/mm
Mike Kravetz 4643d67e8c hugetlbfs: fix hugetlb page migration/fault race causing SIGBUS
Li Wang discovered that LTP/move_page12 V2 sometimes triggers SIGBUS in
the kernel-v5.2.3 testing.  This is caused by a race between hugetlb
page migration and page fault.

If a hugetlb page can not be allocated to satisfy a page fault, the task
is sent SIGBUS.  This is normal hugetlbfs behavior.  A hugetlb fault
mutex exists to prevent two tasks from trying to instantiate the same
page.  This protects against the situation where there is only one
hugetlb page, and both tasks would try to allocate.  Without the mutex,
one would fail and SIGBUS even though the other fault would be
successful.

There is a similar race between hugetlb page migration and fault.
Migration code will allocate a page for the target of the migration.  It
will then unmap the original page from all page tables.  It does this
unmap by first clearing the pte and then writing a migration entry.  The
page table lock is held for the duration of this clear and write
operation.  However, the beginnings of the hugetlb page fault code
optimistically checks the pte without taking the page table lock.  If
clear (as it can be during the migration unmap operation), a hugetlb
page allocation is attempted to satisfy the fault.  Note that the page
which will eventually satisfy this fault was already allocated by the
migration code.  However, the allocation within the fault path could
fail which would result in the task incorrectly being sent SIGBUS.

Ideally, we could take the hugetlb fault mutex in the migration code
when modifying the page tables.  However, locks must be taken in the
order of hugetlb fault mutex, page lock, page table lock.  This would
require significant rework of the migration code.  Instead, the issue is
addressed in the hugetlb fault code.  After failing to allocate a huge
page, take the page table lock and check for huge_pte_none before
returning an error.  This is the same check that must be made further in
the code even if page allocation is successful.

Link: http://lkml.kernel.org/r/20190808000533.7701-1-mike.kravetz@oracle.com
Fixes: 290408d4a2 ("hugetlb: hugepage migration core")
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reported-by: Li Wang <liwang@redhat.com>
Tested-by: Li Wang <liwang@redhat.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Cyril Hrubis <chrubis@suse.cz>
Cc: Xishi Qiu <xishi.qiuxishi@alibaba-inc.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-08-13 16:06:53 -07:00
..
kasan
backing-dev.c
balloon_compaction.c balloon: fix up comments 2019-07-22 11:19:26 -04:00
cleancache.c
cma_debug.c
cma.c
cma.h
compaction.c mm: compaction: avoid 100% CPU usage during compaction when a task is killed 2019-08-03 07:02:00 -07:00
debug_page_ref.c
debug.c
dmapool.c
early_ioremap.c
fadvise.c
failslab.c
filemap.c
frame_vector.c
frontswap.c
gup_benchmark.c
gup.c
highmem.c
hmm.c mm/hmm: always return EBUSY for invalid ranges in hmm_range_{fault,snapshot} 2019-07-25 16:14:39 -03:00
huge_memory.c Revert "mm, thp: restore node-local hugepage allocations" 2019-08-13 16:06:52 -07:00
hugetlb_cgroup.c
hugetlb.c hugetlbfs: fix hugetlb page migration/fault race causing SIGBUS 2019-08-13 16:06:53 -07:00
hwpoison-inject.c
init-mm.c
internal.h
interval_tree.c
Kconfig
Kconfig.debug
khugepaged.c
kmemleak-test.c
kmemleak.c mm: kmemleak: disable early logging in case of error 2019-08-13 16:06:52 -07:00
ksm.c
list_lru.c
maccess.c
madvise.c
Makefile memremap: move from kernel/ to mm/ 2019-08-03 07:02:01 -07:00
memblock.c
memcontrol.c mm: workingset: fix vmstat counters for shadow nodes 2019-08-13 16:06:52 -07:00
memfd.c
memory_hotplug.c mm/memory_hotplug.c: remove unneeded return for void function 2019-08-03 07:02:01 -07:00
memory-failure.c
memory.c
mempolicy.c Revert "mm, thp: restore node-local hugepage allocations" 2019-08-13 16:06:52 -07:00
mempool.c
memremap.c mm/hmm: fix ZONE_DEVICE anon page mapping reuse 2019-08-13 16:06:52 -07:00
memtest.c
migrate.c mm/migrate.c: initialize pud_entry in migrate_vma() 2019-08-03 07:02:01 -07:00
mincore.c
mlock.c
mm_init.c
mmap.c
mmu_context.c
mmu_gather.c
mmu_notifier.c
mmzone.c
mprotect.c
mremap.c
msync.c
nommu.c
oom_kill.c
page_alloc.c
page_counter.c
page_ext.c
page_idle.c
page_io.c
page_isolation.c
page_owner.c
page_poison.c
page_vma_mapped.c
page-writeback.c
pagewalk.c
percpu-internal.h
percpu-km.c
percpu-stats.c
percpu-vm.c
percpu.c
pgtable-generic.c
process_vm_access.c
quicklist.c
readahead.c
rmap.c mm/hmm: fix bad subpage pointer in try_to_unmap_one 2019-08-13 16:06:52 -07:00
rodata_test.c
shmem.c Revert "Revert "mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask"" 2019-08-13 16:06:52 -07:00
shuffle.c
shuffle.h
slab_common.c
slab.c
slab.h
slob.c
slub.c mm: slub: Fix slab walking for init_on_free 2019-07-31 13:16:06 -07:00
sparse-vmemmap.c
sparse.c
swap_cgroup.c
swap_slots.c
swap_state.c
swap.c
swapfile.c
truncate.c
usercopy.c mm/usercopy: use memory range to be accessed for wraparound check 2019-08-13 16:06:52 -07:00
userfaultfd.c
util.c
vmacache.c
vmalloc.c mm/vmalloc.c: fix percpu free VM area search criteria 2019-08-13 16:06:52 -07:00
vmpressure.c
vmscan.c mm, vmscan: do not special-case slab reclaim when watermarks are boosted 2019-08-13 16:06:53 -07:00
vmstat.c
workingset.c mm: workingset: fix vmstat counters for shadow nodes 2019-08-13 16:06:52 -07:00
z3fold.c mm/z3fold.c: fix z3fold_destroy_pool() race condition 2019-08-13 16:06:52 -07:00
zbud.c
zpool.c
zsmalloc.c
zswap.c