kernel_optimize_test/mm
Vlastimil Babka 9050d7eba4 mm: include VM_MIXEDMAP flag in the VM_SPECIAL list to avoid m(un)locking
Daniel Borkmann reported a VM_BUG_ON assertion failing:

  ------------[ cut here ]------------
  kernel BUG at mm/mlock.c:528!
  invalid opcode: 0000 [#1] SMP
  Modules linked in: ccm arc4 iwldvm [...]
   video
  CPU: 3 PID: 2266 Comm: netsniff-ng Not tainted 3.14.0-rc2+ #8
  Hardware name: LENOVO 2429BP3/2429BP3, BIOS G4ET37WW (1.12 ) 05/29/2012
  task: ffff8801f87f9820 ti: ffff88002cb44000 task.ti: ffff88002cb44000
  RIP: 0010:[<ffffffff81171ad0>]  [<ffffffff81171ad0>] munlock_vma_pages_range+0x2e0/0x2f0
  Call Trace:
    do_munmap+0x18f/0x3b0
    vm_munmap+0x41/0x60
    SyS_munmap+0x22/0x30
    system_call_fastpath+0x1a/0x1f
  RIP   munlock_vma_pages_range+0x2e0/0x2f0
  ---[ end trace a0088dcf07ae10f2 ]---

because munlock_vma_pages_range() thinks it's unexpectedly in the middle
of a THP page.  This can be reproduced with default config since 3.11
kernels.  A reproducer can be found in the kernel's selftest directory
for networking by running ./psock_tpacket.

The problem is that an order=2 compound page (allocated by
alloc_one_pg_vec_page() is part of the munlocked VM_MIXEDMAP vma (mapped
by packet_mmap()) and mistaken for a THP page and assumed to be order=9.

The checks for THP in munlock came with commit ff6a6da60b ("mm:
accelerate munlock() treatment of THP pages"), i.e.  since 3.9, but did
not trigger a bug.  It just makes munlock_vma_pages_range() skip such
compound pages until the next 512-pages-aligned page, when it encounters
a head page.  This is however not a problem for vma's where mlocking has
no effect anyway, but it can distort the accounting.

Since commit 7225522bb4 ("mm: munlock: batch non-THP page isolation
and munlock+putback using pagevec") this can trigger a VM_BUG_ON in
PageTransHuge() check.

This patch fixes the issue by adding VM_MIXEDMAP flag to VM_SPECIAL, a
list of flags that make vma's non-mlockable and non-mergeable.  The
reasoning is that VM_MIXEDMAP vma's are similar to VM_PFNMAP, which is
already on the VM_SPECIAL list, and both are intended for non-LRU pages
where mlocking makes no sense anyway.  Related Lkml discussion can be
found in [2].

 [1] tools/testing/selftests/net/psock_tpacket
 [2] https://lkml.org/lkml/2014/1/10/427

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Reported-by: Daniel Borkmann <dborkman@redhat.com>
Tested-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Cc: John David Anglin <dave.anglin@bell.net>
Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Carsten Otte <cotte@de.ibm.com>
Cc: Jared Hulbert <jaredeh@gmail.com>
Tested-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: <stable@vger.kernel.org> [3.11.x+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-03-04 07:55:48 -08:00
..
backing-dev.c
balloon_compaction.c
bootmem.c
bounce.c
cleancache.c
compaction.c
debug-pagealloc.c
dmapool.c
fadvise.c
failslab.c
filemap_xip.c
filemap.c fix O_SYNC|O_APPEND syncing the wrong range on write() 2014-02-09 15:18:09 -05:00
fremap.c
frontswap.c
highmem.c
huge_memory.c mm: include VM_MIXEDMAP flag in the VM_SPECIAL list to avoid m(un)locking 2014-03-04 07:55:48 -08:00
hugetlb_cgroup.c
hugetlb.c
hwpoison-inject.c
init-mm.c
internal.h
interval_tree.c
Kconfig
Kconfig.debug
kmemcheck.c
kmemleak-test.c
kmemleak.c
ksm.c mm: close PageTail race 2014-03-04 07:55:47 -08:00
list_lru.c
maccess.c
madvise.c
Makefile
memblock.c
memcontrol.c memcg: reparent charges of children before processing parent 2014-03-04 07:55:48 -08:00
memory_hotplug.c
memory-failure.c mm: close PageTail race 2014-03-04 07:55:47 -08:00
memory.c mm, thp: fix infinite loop on memcg OOM 2014-02-25 15:25:44 -08:00
mempolicy.c mm/mempolicy.c: fix mempolicy printing in numa_maps 2014-01-30 16:56:56 -08:00
mempool.c
migrate.c
mincore.c
mlock.c
mm_init.c
mmap.c
mmu_context.c
mmu_notifier.c
mmzone.c
mprotect.c mm: Use ptep/pmdp_set_numa() for updating _PAGE_NUMA bit 2014-02-17 11:19:36 +11:00
mremap.c
msync.c
nobootmem.c
nommu.c
oom_kill.c mm, oom: base root bonus on current usage 2014-01-30 16:56:56 -08:00
page_alloc.c mm: close PageTail race 2014-03-04 07:55:47 -08:00
page_cgroup.c
page_io.c
page_isolation.c
page-writeback.c mm: __set_page_dirty_nobuffers() uses spin_lock_irqsave() instead of spin_lock_irq() 2014-02-06 13:48:51 -08:00
pagewalk.c
percpu-km.c
percpu-vm.c
percpu.c
pgtable-generic.c
process_vm_access.c
quicklist.c
readahead.c
rmap.c
shmem.c
slab_common.c
slab.c mm: Fix warning on make htmldocs caused by slab.c 2014-01-31 13:52:25 +02:00
slab.h
slob.c
slub.c slub: do not assert not having lock in removing freed partial 2014-02-10 16:01:42 -08:00
sparse-vmemmap.c
sparse.c
swap_state.c swap: add a simple detector for inappropriate swapin readahead 2014-02-06 13:48:51 -08:00
swap.c mm: close PageTail race 2014-03-04 07:55:47 -08:00
swapfile.c mm/swap: fix race on swap_info reuse between swapoff and swapon 2014-02-06 13:48:51 -08:00
truncate.c
util.c
vmalloc.c
vmpressure.c arm, pm, vmpressure: add missing slab.h includes 2014-02-03 13:24:01 -05:00
vmscan.c
vmstat.c
zbud.c
zsmalloc.c
zswap.c