kernel_optimize_test

History

Johannes Weiner dd0a41bc17 Revert "mm: memcontrol: avoid workload stalls when lowering memory.high" commit e82553c10b0899994153f9bf0af333c0a1550fd7 upstream. This reverts commit `536d3bf261`, as it can cause writers to memory.high to get stuck in the kernel forever, performing page reclaim and consuming excessive amounts of CPU cycles. Before the patch, a write to memory.high would first put the new limit in place for the workload, and then reclaim the requested delta. After the patch, the kernel tries to reclaim the delta before putting the new limit into place, in order to not overwhelm the workload with a sudden, large excess over the limit. However, if reclaim is actively racing with new allocations from the uncurbed workload, it can keep the write() working inside the kernel indefinitely. This is causing problems in Facebook production. A privileged system-level daemon that adjusts memory.high for various workloads running on a host can get unexpectedly stuck in the kernel and essentially turn into a sort of involuntary kswapd for one of the workloads. We've observed that daemon busy-spin in a write() for minutes at a time, neglecting its other duties on the system, and expending privileged system resources on behalf of a workload. To remedy this, we have first considered changing the reclaim logic to break out after a couple of loops - whether the workload has converged to the new limit or not - and bound the write() call this way. However, the root cause that inspired the sequence change in the first place has been fixed through other means, and so a revert back to the proven limit-setting sequence, also used by memory.max, is preferable. The sequence was changed to avoid extreme latencies in the workload when the limit was lowered: the sudden, large excess created by the limit lowering would erroneously trigger the penalty sleeping code that is meant to throttle excessive growth from below. Allocating threads could end up sleeping long after the write() had already reclaimed the delta for which they were being punished. However, erroneous throttling also caused problems in other scenarios at around the same time. This resulted in commit `b3ff92916a` ("mm, memcg: reclaim more aggressively before high allocator throttling"), included in the same release as the offending commit. When allocating threads now encounter large excess caused by a racing write() to memory.high, instead of entering punitive sleeps, they will simply be tasked with helping reclaim down the excess, and will be held no longer than it takes to accomplish that. This is in line with regular limit enforcement - i.e. if the workload allocates up against or over an otherwise unchanged limit from below. With the patch breaking userspace, and the root cause addressed by other means already, revert it again. Link: https://lkml.kernel.org/r/20210122184341.292461-1-hannes@cmpxchg.org Fixes: `536d3bf261` ("mm: memcontrol: avoid workload stalls when lowering memory.high") Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reported-by: Tejun Heo <tj@kernel.org> Acked-by: Chris Down <chris@chrisdown.name> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Roman Gushchin <guro@fb.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Michal Koutný <mkoutny@suse.com> Cc: <stable@vger.kernel.org> [5.8+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>		2021-02-13 13:55:17 +01:00
..
kasan	kasan: fix incorrect arguments passing in kasan_add_zero_shadow	2021-01-27 11:55:23 +01:00
backing-dev.c
balloon_compaction.c
cleancache.c
cma_debug.c
cma.c
cma.h
compaction.c	mm, compaction: move high_pfn to the for loop scope	2021-02-10 09:29:21 +01:00
debug_page_ref.c
debug_vm_pgtable.c	mm/debug_vm_pgtable: avoid doing memory allocation with pgtable_t mapped.	2020-10-16 11:11:14 -07:00
debug.c	mm, dump_page: rename head_mapcount() --> head_compound_mapcount()	2020-10-13 18:38:29 -07:00
dmapool.c	mm/dmapool.c: replace hard coded function name with __func__	2020-10-13 18:38:32 -07:00
early_ioremap.c
fadvise.c	mm, fadvise: improve the expensive remote LRU cache draining after FADV_DONTNEED	2020-10-13 18:38:29 -07:00
failslab.c
filemap.c	mm/filemap: add missing mem_cgroup_uncharge() to __add_to_page_cache_locked()	2021-02-10 09:29:21 +01:00
frame_vector.c
frontswap.c
gup_benchmark.c	mm/gup_benchmark: take the mmap lock around GUP	2020-10-18 09:27:09 -07:00
gup.c	mm/gup: combine put_compound_head() and unpin_user_page()	2020-12-30 11:53:54 +01:00
highmem.c	mm/highmem.c: clean up endif comments	2020-10-16 11:11:18 -07:00
hmm.c
huge_memory.c	mm: thp: fix MADV_REMOVE deadlock on shmem THP	2021-02-10 09:29:21 +01:00
hugetlb_cgroup.c	hugetlb_cgroup: fix offline of hugetlb cgroup with reservations	2020-12-06 10:19:07 -08:00
hugetlb.c	mm: hugetlb: remove VM_BUG_ON_PAGE from page_huge_active	2021-02-10 09:29:21 +01:00
hwpoison-inject.c	mm,hwpoison-inject: don't pin for hwpoison_filter	2020-10-16 11:11:16 -07:00
init-mm.c	mm/gup: prevent gup_fast from racing with COW during fork	2020-12-30 11:53:54 +01:00
internal.h	mm: rename page_order() to buddy_order()	2020-10-16 11:11:19 -07:00
interval_tree.c
ioremap.c
Kconfig	mm/zsmalloc.c: drop ZSMALLOC_PGTABLE_MAPPING	2020-12-06 10:19:07 -08:00
Kconfig.debug
khugepaged.c	mm: remove the now-unnecessary mmget_still_valid() hack	2020-10-16 11:11:22 -07:00
kmemleak.c	mm/kmemleak: rely on rcu for task stack scanning	2020-10-13 18:38:27 -07:00
ksm.c	docs: get rid of :c:type explicit declarations for structs	2020-10-15 07:49:40 +02:00
list_lru.c	mm: list_lru: set shrinker map bit when child nr_items is not zero	2020-12-06 10:19:07 -08:00
maccess.c
madvise.c	mm,memory_failure: always pin the page in madvise_inject_error	2020-12-30 11:53:55 +01:00
Makefile	mm,kmemleak-test.c: move kmemleak-test.c to samples dir	2020-10-13 18:38:27 -07:00
mapping_dirty_helpers.c
memblock.c	memblock: do not start bottom-up allocations with kernel_end	2021-02-10 09:29:15 +01:00
memcontrol.c	Revert "mm: memcontrol: avoid workload stalls when lowering memory.high"	2021-02-13 13:55:17 +01:00
memfd.c
memory_hotplug.c	mm: memmap defer init doesn't work as expected	2021-01-06 14:56:50 +01:00
memory-failure.c	mm,memory_failure: always pin the page in madvise_inject_error	2020-12-30 11:53:55 +01:00
memory.c	mm/gup: prevent gup_fast from racing with COW during fork	2020-12-30 11:53:54 +01:00
mempolicy.c	mm: mempolicy: fix potential pte_unmap_unlock pte error	2020-11-02 12:14:19 -08:00
mempool.c	mm/mempool: add 'else' to split mutually exclusive case	2020-10-13 18:38:34 -07:00
memremap.c	mm/mremap_pages: fix static key devmap_managed_key updates	2020-11-02 12:14:18 -08:00
memtest.c
migrate.c	mm: fix numa stats for thp migration	2021-01-27 11:55:14 +01:00
mincore.c	mm: factor find_get_incore_page out of mincore_page	2020-10-13 18:38:29 -07:00
mlock.c
mm_init.c
mmap.c	mm/mmap.c: fix mmap return value when vma is merged after call_mmap()	2020-12-06 10:19:07 -08:00
mmu_gather.c
mmu_notifier.c	mm/mmu_notifier: fix mmget() assert in __mmu_interval_notifier_insert	2020-10-16 11:11:17 -07:00
mmzone.c
mprotect.c
mremap.c
msync.c
nommu.c	mm: remove alloc_vm_area	2020-10-18 09:27:10 -07:00
oom_kill.c	mm, oom_adj: don't loop through tasks in __set_oom_adj when not necessary	2020-10-13 18:38:35 -07:00
page_alloc.c	mm/page_alloc: add a missing mm_page_alloc_zone_locked() tracepoint	2021-01-30 13:55:19 +01:00
page_counter.c	mm/page_counter: correct the obsolete func name in the comment of page_counter_try_charge()	2020-10-13 18:38:30 -07:00
page_ext.c
page_idle.c
page_io.c	mm/page_io.c: remove useless out label in __swap_writepage()	2020-10-13 18:38:30 -07:00
page_isolation.c	mm: rename page_order() to buddy_order()	2020-10-16 11:11:19 -07:00
page_owner.c	mm: rename page_order() to buddy_order()	2020-10-16 11:11:19 -07:00
page_poison.c	mm/page_poison.c: replace bool variable with static key	2020-10-16 11:11:17 -07:00
page_reporting.c	mm: rename page_order() to buddy_order()	2020-10-16 11:11:19 -07:00
page_reporting.h
page_vma_mapped.c
page-writeback.c	mm: make wait_on_page_writeback() wait for multiple pending writebacks	2021-01-12 20:18:22 +01:00
pagewalk.c
percpu-internal.h
percpu-km.c
percpu-stats.c
percpu-vm.c
percpu.c	percpu: convert flexible array initializers to use struct_size()	2020-10-30 23:02:28 +00:00
pgalloc-track.h
pgtable-generic.c
process_vm_access.c	mm/process_vm_access.c: include compat.h	2021-01-19 18:27:21 +01:00
ptdump.c
readahead.c	mm: use limited read-ahead to satisfy read	2020-10-17 13:49:08 -06:00
rmap.c	mm/rmap: always do TTU_IGNORE_ACCESS	2020-12-30 11:53:55 +01:00
rodata_test.c
shmem.c	fs: add a filesystem flag for THPs	2020-10-16 11:11:15 -07:00
shuffle.c	mm: rename page_order() to buddy_order()	2020-10-16 11:11:19 -07:00
shuffle.h
slab_common.c
slab.c	mm: fix some comments formatting	2020-10-16 11:11:19 -07:00
slab.h	mm: memcg/slab: fix obj_cgroup_charge() return value handling	2020-12-06 10:19:07 -08:00
slob.c
slub.c	Revert "mm/slub: fix a memory leak in sysfs_slab_add()"	2021-01-30 13:55:16 +01:00
sparse-vmemmap.c
sparse.c	mm/memory_hotplug: guard more declarations by CONFIG_MEMORY_HOTPLUG	2020-10-16 11:11:18 -07:00
swap_cgroup.c
swap_slots.c	mm/swap_slots.c: remove always zero and unused return value of enable_swap_slots_cache()	2020-10-13 18:38:30 -07:00
swap_state.c	mm: fix some broken comments	2020-10-16 11:11:19 -07:00
swap.c	mm: move call to compound_head() in release_pages()	2020-10-13 18:38:33 -07:00
swapfile.c	mm: fix a race on nr_swap_pages	2021-01-30 13:55:19 +01:00
truncate.c	mm/truncate.c: make __invalidate_mapping_pages() static	2020-11-02 12:14:19 -08:00
usercopy.c
userfaultfd.c
util.c	mm/util.c: update the kerneldoc for kstrdup_const()	2020-10-16 11:11:17 -07:00
vmacache.c
vmalloc.c	mm/vmalloc.c: fix potential memory leak	2021-01-19 18:27:21 +01:00
vmpressure.c
vmscan.c	mm: don't put pinned pages into the swap cache	2021-01-19 18:27:29 +01:00
vmstat.c	mm/vmstat.c: use helper macro abs()	2020-10-16 11:11:17 -07:00
workingset.c	XArray updates for 5.9	2020-10-20 14:39:37 -07:00
z3fold.c	z3fold: stricter locking and more careful reclaim	2020-12-30 11:54:10 +01:00
zbud.c	mm/zbud: remove redundant initialization	2020-10-13 18:38:34 -07:00
zpool.c
zsmalloc.c	mm/zsmalloc.c: drop ZSMALLOC_PGTABLE_MAPPING	2020-12-06 10:19:07 -08:00
zswap.c