kernel_optimize_test/mm
Greg Thelen 0610c25daa memcg: fix dirty page migration
The problem starts with a file backed dirty page which is charged to a
memcg.  Then page migration is used to move oldpage to newpage.

Migration:
 - copies the oldpage's data to newpage
 - clears oldpage.PG_dirty
 - sets newpage.PG_dirty
 - uncharges oldpage from memcg
 - charges newpage to memcg

Clearing oldpage.PG_dirty decrements the charged memcg's dirty page
count.

However, because newpage is not yet charged, setting newpage.PG_dirty
does not increment the memcg's dirty page count.  After migration
completes newpage.PG_dirty is eventually cleared, often in
account_page_cleaned().  At this time newpage is charged to a memcg so
the memcg's dirty page count is decremented which causes underflow
because the count was not previously incremented by migration.  This
underflow causes balance_dirty_pages() to see a very large unsigned
number of dirty memcg pages which leads to aggressive throttling of
buffered writes by processes in non root memcg.

This issue:
 - can harm performance of non root memcg buffered writes.
 - can report too small (even negative) values in
   memory.stat[(total_)dirty] counters of all memcg, including the root.

To avoid polluting migrate.c with #ifdef CONFIG_MEMCG checks, introduce
page_memcg() and set_page_memcg() helpers.

Test:
    0) setup and enter limited memcg
    mkdir /sys/fs/cgroup/test
    echo 1G > /sys/fs/cgroup/test/memory.limit_in_bytes
    echo $$ > /sys/fs/cgroup/test/cgroup.procs

    1) buffered writes baseline
    dd if=/dev/zero of=/data/tmp/foo bs=1M count=1k
    sync
    grep ^dirty /sys/fs/cgroup/test/memory.stat

    2) buffered writes with compaction antagonist to induce migration
    yes 1 > /proc/sys/vm/compact_memory &
    rm -rf /data/tmp/foo
    dd if=/dev/zero of=/data/tmp/foo bs=1M count=1k
    kill %
    sync
    grep ^dirty /sys/fs/cgroup/test/memory.stat

    3) buffered writes without antagonist, should match baseline
    rm -rf /data/tmp/foo
    dd if=/dev/zero of=/data/tmp/foo bs=1M count=1k
    sync
    grep ^dirty /sys/fs/cgroup/test/memory.stat

                       (speed, dirty residue)
             unpatched                       patched
    1) 841 MB/s 0 dirty pages          886 MB/s 0 dirty pages
    2) 611 MB/s -33427456 dirty pages  793 MB/s 0 dirty pages
    3) 114 MB/s -33427456 dirty pages  891 MB/s 0 dirty pages

    Notice that unpatched baseline performance (1) fell after
    migration (3): 841 -> 114 MB/s.  In the patched kernel, post
    migration performance matches baseline.

Fixes: c4843a7593 ("memcg: add per cgroup dirty page accounting")
Signed-off-by: Greg Thelen <gthelen@google.com>
Reported-by: Dave Hansen <dave.hansen@intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: <stable@vger.kernel.org>	[4.2+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-10-01 21:42:35 -04:00
..
kasan kasan: fix last shadow judgement in memory_is_poisoned_16() 2015-09-17 21:16:07 -07:00
backing-dev.c Merge branch 'for-4.3/blkcg' of git://git.kernel.dk/linux-block 2015-09-10 18:56:14 -07:00
balloon_compaction.c
bootmem.c bootmem: avoid freeing to bootmem after bootmem is done 2015-09-08 15:35:28 -07:00
cleancache.c
cma_debug.c mm/cma_debug: correct size input to bitmap function 2015-07-17 16:39:54 -07:00
cma.c
cma.h mm: cma: mark cma_bitmap_maxno() inline in header 2015-08-14 15:56:32 -07:00
compaction.c mm/compaction: correct to flush migrated pages if pageblock skip happens 2015-09-08 15:35:28 -07:00
debug-pagealloc.c
debug.c mm: introduce idle page tracking 2015-09-10 13:29:01 -07:00
dmapool.c mm: add support for __GFP_ZERO flag to dma_pool_alloc() 2015-09-08 15:35:28 -07:00
early_ioremap.c mm/early_ioremap: add explicit #include of asm/early_ioremap.h 2015-09-11 15:21:34 -07:00
fadvise.c
failslab.c
filemap.c mm: rename alloc_pages_exact_node() to __alloc_pages_node() 2015-09-08 15:35:28 -07:00
frame_vector.c [media] mm: Provide new get_vaddr_frames() helper 2015-08-16 13:02:47 -03:00
frontswap.c
gup.c mm: make GUP handle pfn mapping unless FOLL_GET is requested 2015-09-04 16:54:41 -07:00
highmem.c
huge_memory.c mm: introduce idle page tracking 2015-09-10 13:29:01 -07:00
hugetlb_cgroup.c
hugetlb.c mm: hugetlbfs: skip shared VMAs when unmapping private pages to satisfy a fault 2015-10-01 21:42:35 -04:00
hwpoison-inject.c hwpoison: use page_cgroup_ino for filtering by memcg 2015-09-10 13:29:01 -07:00
init-mm.c
internal.h mm/compaction: correct to flush migrated pages if pageblock skip happens 2015-09-08 15:35:28 -07:00
interval_tree.c
Kconfig media updates for v4.3-rc1 2015-09-11 16:42:39 -07:00
Kconfig.debug
kmemcheck.c
kmemleak-test.c
kmemleak.c kmemleak: use seq_hex_dump() to dump buffers 2015-09-10 13:29:01 -07:00
ksm.c
list_lru.c list_lru: don't call list_lru_from_kmem if the list_head is empty 2015-09-08 15:35:28 -07:00
maccess.c lib: move strncpy_from_unsafe() into mm/maccess.c 2015-08-31 12:36:10 -07:00
madvise.c mm: madvise allow remove operation for hugetlbfs 2015-09-08 15:35:28 -07:00
Makefile media updates for v4.3-rc1 2015-09-11 16:42:39 -07:00
memblock.c mm/memblock.c: fix comment in __next_mem_range() 2015-09-08 15:35:28 -07:00
memcontrol.c memcg: zap try_get_mem_cgroup_from_page 2015-09-10 13:29:01 -07:00
memory_hotplug.c libnvdimm for 4.3: 2015-09-08 14:35:59 -07:00
memory-failure.c hwpoison: use page_cgroup_ino for filtering by memcg 2015-09-10 13:29:01 -07:00
memory.c mm: use vma_is_anonymous() in create_huge_pmd() and wp_huge_pmd() 2015-09-10 13:29:01 -07:00
mempolicy.c mm: rename alloc_pages_exact_node() to __alloc_pages_node() 2015-09-08 15:35:28 -07:00
mempool.c mm/mempool: allow NULL `pool' pointer in mempool_destroy() 2015-09-08 15:35:28 -07:00
memtest.c memtest: remove unused header files 2015-09-08 15:35:28 -07:00
migrate.c memcg: fix dirty page migration 2015-10-01 21:42:35 -04:00
mincore.c
mlock.c userfaultfd: teach vma_merge to merge across vma->vm_userfaultfd_ctx 2015-09-04 16:54:41 -07:00
mm_init.c mm: meminit: remove mminit_verify_page_links 2015-06-30 19:44:56 -07:00
mmap.c mm, dax: VMA with vm_ops->pfn_mkwrite wants to be write-notified 2015-09-22 15:09:53 -07:00
mmu_context.c
mmu_notifier.c mmu-notifier: add clear_young callback 2015-09-10 13:29:01 -07:00
mmzone.c
mprotect.c userfaultfd: teach vma_merge to merge across vma->vm_userfaultfd_ctx 2015-09-04 16:54:41 -07:00
mremap.c mremap: simplify the "overlap" check in mremap_to() 2015-09-04 16:54:41 -07:00
msync.c
nobootmem.c mm: page_alloc: pass PFN to __free_pages_bootmem 2015-06-30 19:44:55 -07:00
nommu.c mm, mpx: add "vm_flags_t vm_flags" arg to do_mmap_pgoff() 2015-09-10 13:29:01 -07:00
oom_kill.c mm, oom: remove unnecessary variable 2015-09-08 15:35:28 -07:00
page_alloc.c Merge branch 'akpm' (patches from Andrew) 2015-09-08 17:52:23 -07:00
page_counter.c
page_ext.c mm: introduce idle page tracking 2015-09-10 13:29:01 -07:00
page_idle.c mm: introduce idle page tracking 2015-09-10 13:29:01 -07:00
page_io.c fs: use helper bio_add_page() instead of open coding on bi_io_vec 2015-08-13 12:32:00 -06:00
page_isolation.c mm, page_isolation: make set/unset_migratetype_isolate() file-local 2015-09-08 15:35:28 -07:00
page_owner.c mm/page_owner: set correct gfp_mask on page_owner 2015-07-17 16:39:54 -07:00
page-writeback.c Merge branch 'for-4.3/blkcg' of git://git.kernel.dk/linux-block 2015-09-10 18:56:14 -07:00
pagewalk.c
percpu-km.c
percpu-vm.c
percpu.c percpu: clean up of schunk->map[] assignment in pcpu_setup_first_chunk 2015-07-21 11:31:00 -04:00
pgtable-generic.c
process_vm_access.c
quicklist.c
readahead.c
rmap.c mm: introduce idle page tracking 2015-09-10 13:29:01 -07:00
shmem.c shmem: recalculate file inode when fstat 2015-09-08 15:35:28 -07:00
slab_common.c memcg: export struct mem_cgroup 2015-09-08 15:35:28 -07:00
slab.c mm/slab: fix unexpected index mapping result of kmalloc_size(INDEX_NODE+1) 2015-10-01 21:42:35 -04:00
slab.h mm/slab.h: fix argument order in cache_from_obj's error message 2015-09-04 16:54:41 -07:00
slob.c mm: rename alloc_pages_exact_node() to __alloc_pages_node() 2015-09-08 15:35:28 -07:00
slub.c mm: rename alloc_pages_exact_node() to __alloc_pages_node() 2015-09-08 15:35:28 -07:00
sparse-vmemmap.c
sparse.c
swap_cgroup.c
swap_state.c mm: swap: zswap: maybe_preload & refactoring 2015-09-08 15:35:28 -07:00
swap.c mm: introduce idle page tracking 2015-09-10 13:29:01 -07:00
swapfile.c mm: /proc/pid/smaps:: show proportional swap share of the mapping 2015-09-08 15:35:28 -07:00
truncate.c
userfaultfd.c userfaultfd: avoid mmap_sem read recursion in mcopy_atomic 2015-09-04 16:54:41 -07:00
util.c
vmacache.c
vmalloc.c
vmpressure.c
vmscan.c vmscan: fix sane_reclaim helper for legacy memcg 2015-09-22 15:09:53 -07:00
vmstat.c
workingset.c
zbud.c mm: zbud: constify the zbud_ops 2015-09-08 15:35:28 -07:00
zpool.c zpool: add zpool_has_pool() 2015-09-10 13:29:01 -07:00
zsmalloc.c mm: zpool: constify the zpool_ops 2015-09-08 15:35:28 -07:00
zswap.c zswap: change zpool/compressor at runtime 2015-09-10 13:29:01 -07:00