kernel_optimize_test

History

Sergey Senozhatsky 44f43e99fe zsmalloc: fix zs_can_compact() integer overflow zs_can_compact() has two race conditions in its core calculation: unsigned long obj_wasted = zs_stat_get(class, OBJ_ALLOCATED) - zs_stat_get(class, OBJ_USED); 1) classes are not locked, so the numbers of allocated and used objects can change by the concurrent ops happening on other CPUs 2) shrinker invokes it from preemptible context Depending on the circumstances, thus, OBJ_ALLOCATED can become less than OBJ_USED, which can result in either very high or negative `total_scan' value calculated later in do_shrink_slab(). do_shrink_slab() has some logic to prevent those cases: vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62 vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62 vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-64 vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62 vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62 vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62 However, due to the way `total_scan' is calculated, not every shrinker->count_objects() overflow can be spotted and handled. To demonstrate the latter, I added some debugging code to do_shrink_slab() (x86_64) and the results were: vmscan: OVERFLOW: shrinker->count_objects() == -1 [18446744073709551615] vmscan: but total_scan > 0: 92679974445502 vmscan: resulting total_scan: 92679974445502 [..] vmscan: OVERFLOW: shrinker->count_objects() == -1 [18446744073709551615] vmscan: but total_scan > 0: 22634041808232578 vmscan: resulting total_scan: 22634041808232578 Even though shrinker->count_objects() has returned an overflowed value, the resulting `total_scan' is positive, and, what is more worrisome, it is insanely huge. This value is getting used later on in shrinker->scan_objects() loop: while (total_scan >= batch_size \|\| total_scan >= freeable) { unsigned long ret; unsigned long nr_to_scan = min(batch_size, total_scan); shrinkctl->nr_to_scan = nr_to_scan; ret = shrinker->scan_objects(shrinker, shrinkctl); if (ret == SHRINK_STOP) break; freed += ret; count_vm_events(SLABS_SCANNED, nr_to_scan); total_scan -= nr_to_scan; cond_resched(); } `total_scan >= batch_size' is true for a very-very long time and 'total_scan >= freeable' is also true for quite some time, because `freeable < 0' and `total_scan' is large enough, for example, 22634041808232578. The only break condition, in the given scheme of things, is shrinker->scan_objects() == SHRINK_STOP test, which is a bit too weak to rely on, especially in heavy zsmalloc-usage scenarios. To fix the issue, take a pool stat snapshot and use it instead of racy zs_stat_get() calls. Link: http://lkml.kernel.org/r/20160509140052.3389-1-sergey.senozhatsky@gmail.com Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Minchan Kim <minchan@kernel.org> Cc: <stable@vger.kernel.org> [4.3+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>		2016-05-09 17:40:59 -07:00
..
kasan
backing-dev.c
balloon_compaction.c
bootmem.c
cleancache.c
cma_debug.c
cma.c
cma.h
compaction.c	mm: fix kcompactd hang during memory offlining	2016-05-05 17:38:53 -07:00
debug_page_ref.c
debug.c
dmapool.c
early_ioremap.c
fadvise.c
failslab.c
filemap.c
frame_vector.c
frontswap.c
gup.c
highmem.c
huge_memory.c	mm: thp: correct split_huge_pages file permission	2016-05-05 17:38:53 -07:00
hugetlb_cgroup.c
hugetlb.c
hwpoison-inject.c
init-mm.c
internal.h
interval_tree.c
Kconfig
Kconfig.debug
kmemcheck.c
kmemleak-test.c
kmemleak.c
ksm.c
list_lru.c
maccess.c
madvise.c
Makefile
memblock.c
memcontrol.c
memory_hotplug.c
memory-failure.c	mm/memory-failure: fix race with compound page split/merge	2016-04-28 19:34:04 -07:00
memory.c	huge pagecache: mmap_sem is unlocked when truncation splits pmd	2016-05-05 17:38:53 -07:00
mempolicy.c
mempool.c
memtest.c
migrate.c	mm/hwpoison: fix wrong num_poisoned_pages accounting	2016-04-28 19:34:04 -07:00
mincore.c
mlock.c
mm_init.c
mmap.c
mmu_context.c
mmu_notifier.c
mmzone.c
mprotect.c
mremap.c
msync.c
nobootmem.c
nommu.c
oom_kill.c
page_alloc.c	mm: update min_free_kbytes from khugepaged after core initialization	2016-05-05 17:38:53 -07:00
page_counter.c
page_ext.c
page_idle.c
page_io.c	mm: call swap_slot_free_notify() with page lock held	2016-04-28 19:34:04 -07:00
page_isolation.c
page_owner.c
page_poison.c
page-writeback.c	Merge branch 'for-linus' of git://git.kernel.dk/linux-block	2016-05-06 13:08:35 -07:00
pagewalk.c
percpu-km.c
percpu-vm.c
percpu.c
pgtable-generic.c
process_vm_access.c
quicklist.c
readahead.c
rmap.c
shmem.c
slab_common.c
slab.c
slab.h
slob.c
slub.c
sparse-vmemmap.c
sparse.c
swap_cgroup.c
swap_state.c
swap.c
swapfile.c
truncate.c
userfaultfd.c
util.c
vmacache.c
vmalloc.c
vmpressure.c
vmscan.c	mm: wake kcompactd before kswapd's short sleep	2016-04-28 19:34:04 -07:00
vmstat.c
workingset.c
zbud.c
zpool.c
zsmalloc.c	zsmalloc: fix zs_can_compact() integer overflow	2016-05-09 17:40:59 -07:00
zswap.c	mm/zswap: provide unique zpool name	2016-05-05 17:38:53 -07:00