kernel_optimize_test/mm
Linus Torvalds 254090925e mm: don't try to NUMA-migrate COW pages that have other uses
commit 80d47f5de5e311cbc0d01ebb6ee684e8f4c196c6 upstream.

Oded Gabbay reports that enabling NUMA balancing causes corruption with
his Gaudi accelerator test load:

 "All the details are in the bug, but the bottom line is that somehow,
  this patch causes corruption when the numa balancing feature is
  enabled AND we don't use process affinity AND we use GUP to pin pages
  so our accelerator can DMA to/from system memory.

  Either disabling numa balancing, using process affinity to bind to
  specific numa-node or reverting this patch causes the bug to
  disappear"

and Oded bisected the issue to commit 09854ba94c ("mm: do_wp_page()
simplification").

Now, the NUMA balancing shouldn't actually be changing the writability
of a page, and as such shouldn't matter for COW.  But it appears it
does.  Suspicious.

However, regardless of that, the condition for enabling NUMA faults in
change_pte_range() is nonsensical.  It uses "page_mapcount(page)" to
decide if a COW page should be NUMA-protected or not, and that makes
absolutely no sense.

The number of mappings a page has is irrelevant: not only does GUP get a
reference to a page as in Oded's case, but the other mappings migth be
paged out and the only reference to them would be in the page count.

Since we should never try to NUMA-balance a page that we can't move
anyway due to other references, just fix the code to use 'page_count()'.
Oded confirms that that fixes his issue.

Now, this does imply that something in NUMA balancing ends up changing
page protections (other than the obvious one of making the page
inaccessible to get the NUMA faulting information).  Otherwise the COW
simplification wouldn't matter - since doing the GUP on the page would
make sure it's writable.

The cause of that permission change would be good to figure out too,
since it clearly results in spurious COW events - but fixing the
nonsensical test that just happened to work before is obviously the
CorrectThing(tm) to do regardless.

Fixes: 09854ba94c ("mm: do_wp_page() simplification")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=215616
Link: https://lore.kernel.org/all/CAFCwf10eNmwq2wD71xjUhqkvv5+_pJMR1nPug2RqNDcFT4H86Q@mail.gmail.com/
Reported-and-tested-by: Oded Gabbay <oded.gabbay@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-23 12:00:57 +01:00
..
kasan kasan: fix incorrect arguments passing in kasan_add_zero_shadow 2021-01-27 11:55:23 +01:00
backing-dev.c mm: bdi: initialize bdi_min_ratio when bdi is unregistered 2021-12-14 11:32:37 +01:00
balloon_compaction.c
cleancache.c
cma_debug.c
cma.c
cma.h
compaction.c mm, compaction: make fast_isolate_freepages() stay within zone 2021-03-04 11:38:38 +01:00
debug_page_ref.c
debug_vm_pgtable.c mm/debug_vm_pgtable: remove pte entry from the page table 2022-02-08 18:30:35 +01:00
debug.c
dmapool.c
early_ioremap.c
fadvise.c
failslab.c
filemap.c mm/filemap: add missing mem_cgroup_uncharge() to __add_to_page_cache_locked() 2021-02-10 09:29:21 +01:00
frame_vector.c
frontswap.c
gup_benchmark.c mm/gup_benchmark: take the mmap lock around GUP 2020-10-18 09:27:09 -07:00
gup.c mm/gup: fix try_grab_compound_head() race with split_huge_page() 2021-07-14 16:55:42 +02:00
highmem.c mm/highmem.c: clean up endif comments 2020-10-16 11:11:18 -07:00
hmm.c mm/hmm.c: allow VM_MIXEDMAP to work with hmm_range_fault 2022-01-27 10:54:36 +01:00
huge_memory.c mm/userfaultfd: fix uffd-wp special cases for fork() 2021-07-25 14:36:18 +02:00
hugetlb_cgroup.c hugetlb_cgroup: fix imbalanced css_get and css_put pair for shared mappings 2021-03-30 14:31:54 +02:00
hugetlb.c hugetlbfs: flush TLBs correctly after huge_pmd_unshare 2021-11-26 10:39:21 +01:00
hwpoison-inject.c mm,hwpoison-inject: don't pin for hwpoison_filter 2020-10-16 11:11:16 -07:00
init-mm.c mm/gup: prevent gup_fast from racing with COW during fork 2020-12-30 11:53:54 +01:00
internal.h mm/thp: fix vma_address() if virtual address below file offset 2021-06-30 08:47:27 -04:00
interval_tree.c
ioremap.c
Kconfig mm/zsmalloc.c: drop ZSMALLOC_PGTABLE_MAPPING 2020-12-06 10:19:07 -08:00
Kconfig.debug
khugepaged.c mm: khugepaged: skip huge page collapse for special files 2021-11-06 14:10:09 +01:00
kmemleak.c mm/kmemleak: avoid scanning potential huge holes 2022-02-08 18:30:35 +01:00
ksm.c ksm: fix potential missing rmap_item for stable_node 2021-05-19 10:13:07 +02:00
list_lru.c mm: list_lru: set shrinker map bit when child nr_items is not zero 2020-12-06 10:19:07 -08:00
maccess.c
madvise.c mm/madvise: replace ptrace attach requirement for process_madvise 2021-03-17 17:06:37 +01:00
Makefile
mapping_dirty_helpers.c
memblock.c memblock: ensure there is no overflow in memblock_overlaps_region() 2021-12-17 10:14:42 +01:00
memcontrol.c mm: memcg: synchronize objcg lists with a dedicated spinlock 2022-02-23 12:00:56 +01:00
memfd.c
memory_hotplug.c mm/memory_hotplug: use "unsigned long" for PFN in zone_for_pfn_range() 2021-09-22 12:27:59 +02:00
memory-failure.c mm/hwpoison: clear MF_COUNT_INCREASED before retrying get_any_page() 2021-12-29 12:26:05 +01:00
memory.c mm/userfaultfd: fix uffd-wp special cases for fork() 2021-07-25 14:36:18 +02:00
mempolicy.c mm: mempolicy: fix THP allocations escaping mempolicy restrictions 2021-12-29 12:26:06 +01:00
mempool.c
memremap.c mm: fix memory_failure() handling of dax-namespace metadata 2021-03-04 11:38:21 +01:00
memtest.c
migrate.c mm, thp: use head page in __migration_entry_wait() 2021-06-30 08:47:26 -04:00
mincore.c
mlock.c
mm_init.c
mmap.c mm/mmap.c: fix mmap return value when vma is merged after call_mmap() 2020-12-06 10:19:07 -08:00
mmu_gather.c
mmu_notifier.c mm/mmu_notifiers: ensure range_end() is paired with range_start() 2021-03-30 14:32:06 +02:00
mmzone.c
mprotect.c mm: don't try to NUMA-migrate COW pages that have other uses 2022-02-23 12:00:57 +01:00
mremap.c
msync.c
nommu.c mm: remove alloc_vm_area 2020-10-18 09:27:10 -07:00
oom_kill.c mm, oom: do not trigger out_of_memory from the #PF 2021-11-18 14:04:30 +01:00
page_alloc.c mm/page_alloc.c: do not warn allocation failure on zone DMA if no managed pages 2022-01-27 10:53:44 +01:00
page_counter.c
page_ext.c
page_idle.c
page_io.c swap: fix swapfile read/write offset 2021-03-07 12:34:15 +01:00
page_isolation.c mm: rename page_order() to buddy_order() 2020-10-16 11:11:19 -07:00
page_owner.c mm: rename page_order() to buddy_order() 2020-10-16 11:11:19 -07:00
page_poison.c mm/page_poison.c: replace bool variable with static key 2020-10-16 11:11:17 -07:00
page_reporting.c mm: rename page_order() to buddy_order() 2020-10-16 11:11:19 -07:00
page_reporting.h
page_vma_mapped.c mm/thp: another PVMW_SYNC fix in page_vma_mapped_walk() 2021-06-30 08:47:29 -04:00
page-writeback.c mm: make wait_on_page_writeback() wait for multiple pending writebacks 2021-01-12 20:18:22 +01:00
pagewalk.c
percpu-internal.h percpu: make pcpu_nr_empty_pop_pages per chunk type 2021-04-14 08:42:03 +02:00
percpu-km.c
percpu-stats.c percpu: make pcpu_nr_empty_pop_pages per chunk type 2021-04-14 08:42:03 +02:00
percpu-vm.c
percpu.c percpu: make pcpu_nr_empty_pop_pages per chunk type 2021-04-14 08:42:03 +02:00
pgalloc-track.h
pgtable-generic.c mm/thp: fix __split_huge_pmd_locked() on shmem migration entry 2021-06-30 08:47:26 -04:00
process_vm_access.c mm/process_vm_access.c: include compat.h 2021-01-19 18:27:21 +01:00
ptdump.c mm: ptdump: fix build failure 2021-04-21 13:00:57 +02:00
readahead.c mm: use limited read-ahead to satisfy read 2020-10-17 13:49:08 -06:00
rmap.c mm/thp: fix page_address_in_vma() on file THP tails 2021-06-30 08:47:27 -04:00
rodata_test.c
shmem.c shmem: fix a race between shmem_unused_huge_shrink and shmem_evict_inode 2022-01-27 10:53:44 +01:00
shuffle.c mm: rename page_order() to buddy_order() 2020-10-16 11:11:19 -07:00
shuffle.h
slab_common.c mm/slub: fix redzoning for small allocations 2021-06-23 14:42:54 +02:00
slab.c mm/sl?b.c: remove ctor argument from kmem_cache_flags 2021-05-14 09:50:45 +02:00
slab.h mm: kmemleak: slob: respect SLAB_NOLEAKTRACE flag 2021-11-26 10:39:19 +01:00
slob.c
slub.c mm, slub: fix incorrect memcg slab count for bulk free 2021-10-27 09:56:53 +02:00
sparse-vmemmap.c
sparse.c mm/sparse: add the missing sparse_buffer_fini() in error branch 2021-05-14 09:50:45 +02:00
swap_cgroup.c
swap_slots.c
swap_state.c mm: fix some broken comments 2020-10-16 11:11:19 -07:00
swap.c
swapfile.c mm/swap: fix pte_same_as_swp() not removing uffd-wp bit when compare 2021-06-23 14:42:53 +02:00
truncate.c mm/thp: unmap_mapping_page() to fix THP truncate_cleanup_page() 2021-06-30 08:47:27 -04:00
usercopy.c
userfaultfd.c
util.c mm: don't allow oversized kvmalloc() calls 2021-10-06 15:56:03 +02:00
vmacache.c
vmalloc.c mm/vmalloc.c: fix potential memory leak 2021-01-19 18:27:21 +01:00
vmpressure.c
vmscan.c mm,vmscan: fix divide by zero in get_scan_count 2021-09-18 13:40:36 +02:00
vmstat.c mm/vmstat.c: use helper macro abs() 2020-10-16 11:11:17 -07:00
workingset.c XArray updates for 5.9 2020-10-20 14:39:37 -07:00
z3fold.c mm/z3fold: use release_z3fold_page_locked() to release locked z3fold page 2021-07-14 16:56:51 +02:00
zbud.c
zpool.c
zsmalloc.c mm/zsmalloc.c: close race window between zs_pool_dec_isolated() and zs_unregister_migration() 2021-11-18 14:04:26 +01:00
zswap.c