kernel_optimize_test/mm
Baoquan He d2e5724117 mm_zone: add function to check if managed dma zone exists
commit 62b3107073646e0946bd97ff926832bafb846d17 upstream.

Patch series "Handle warning of allocation failure on DMA zone w/o
managed pages", v4.

**Problem observed:
On x86_64, when crash is triggered and entering into kdump kernel, page
allocation failure can always be seen.

 ---------------------------------
 DMA: preallocated 128 KiB GFP_KERNEL pool for atomic allocations
 swapper/0: page allocation failure: order:5, mode:0xcc1(GFP_KERNEL|GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0
 CPU: 0 PID: 1 Comm: swapper/0
 Call Trace:
  dump_stack+0x7f/0xa1
  warn_alloc.cold+0x72/0xd6
  ......
  __alloc_pages+0x24d/0x2c0
  ......
  dma_atomic_pool_init+0xdb/0x176
  do_one_initcall+0x67/0x320
  ? rcu_read_lock_sched_held+0x3f/0x80
  kernel_init_freeable+0x290/0x2dc
  ? rest_init+0x24f/0x24f
  kernel_init+0xa/0x111
  ret_from_fork+0x22/0x30
 Mem-Info:
 ------------------------------------

***Root cause:
In the current kernel, it assumes that DMA zone must have managed pages
and try to request pages if CONFIG_ZONE_DMA is enabled. While this is not
always true. E.g in kdump kernel of x86_64, only low 1M is presented and
locked down at very early stage of boot, so that this low 1M won't be
added into buddy allocator to become managed pages of DMA zone. This
exception will always cause page allocation failure if page is requested
from DMA zone.

***Investigation:
This failure happens since below commit merged into linus's tree.
  1a6a9044b967 x86/setup: Remove CONFIG_X86_RESERVE_LOW and reservelow= options
  23721c8e92f7 x86/crash: Remove crash_reserve_low_1M()
  f1d4d47c5851 x86/setup: Always reserve the first 1M of RAM
  7c321eb2b8 x86/kdump: Remove the backup region handling
  6f599d8423 x86/kdump: Always reserve the low 1M when the crashkernel option is specified

Before them, on x86_64, the low 640K area will be reused by kdump kernel.
So in kdump kernel, the content of low 640K area is copied into a backup
region for dumping before jumping into kdump. Then except of those firmware
reserved region in [0, 640K], the left area will be added into buddy
allocator to become available managed pages of DMA zone.

However, after above commits applied, in kdump kernel of x86_64, the low
1M is reserved by memblock, but not released to buddy allocator. So any
later page allocation requested from DMA zone will fail.

At the beginning, if crashkernel is reserved, the low 1M need be locked
down because AMD SME encrypts memory making the old backup region
mechanims impossible when switching into kdump kernel.

Later, it was also observed that there are BIOSes corrupting memory
under 1M. To solve this, in commit f1d4d47c5851, the entire region of
low 1M is always reserved after the real mode trampoline is allocated.

Besides, recently, Intel engineer mentioned their TDX (Trusted domain
extensions) which is under development in kernel also needs to lock down
the low 1M. So we can't simply revert above commits to fix the page allocation
failure from DMA zone as someone suggested.

***Solution:
Currently, only DMA atomic pool and dma-kmalloc will initialize and
request page allocation with GFP_DMA during bootup.

So only initializ DMA atomic pool when DMA zone has available managed
pages, otherwise just skip the initialization.

For dma-kmalloc(), for the time being, let's mute the warning of
allocation failure if requesting pages from DMA zone while no manged
pages.  Meanwhile, change code to use dma_alloc_xx/dma_map_xx API to
replace kmalloc(GFP_DMA), or do not use GFP_DMA when calling kmalloc() if
not necessary.  Christoph is posting patches to fix those under
drivers/scsi/.  Finally, we can remove the need of dma-kmalloc() as people
suggested.

This patch (of 3):

In some places of the current kernel, it assumes that dma zone must have
managed pages if CONFIG_ZONE_DMA is enabled.  While this is not always
true.  E.g in kdump kernel of x86_64, only low 1M is presented and locked
down at very early stage of boot, so that there's no managed pages at all
in DMA zone.  This exception will always cause page allocation failure if
page is requested from DMA zone.

Here add function has_managed_dma() and the relevant helper functions to
check if there's DMA zone with managed pages.  It will be used in later
patches.

Link: https://lkml.kernel.org/r/20211223094435.248523-1-bhe@redhat.com
Link: https://lkml.kernel.org/r/20211223094435.248523-2-bhe@redhat.com
Fixes: 6f599d8423 ("x86/kdump: Always reserve the low 1M when the crashkernel option is specified")
Signed-off-by: Baoquan He <bhe@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: John Donnelly  <john.p.donnelly@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Christoph Lameter <cl@linux.com>
Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: David Laight <David.Laight@ACULAB.COM>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-01-27 10:53:44 +01:00
..
kasan kasan: fix incorrect arguments passing in kasan_add_zero_shadow 2021-01-27 11:55:23 +01:00
backing-dev.c mm: bdi: initialize bdi_min_ratio when bdi is unregistered 2021-12-14 11:32:37 +01:00
balloon_compaction.c mm/balloon_compaction: suppress allocation warnings 2019-09-04 07:42:01 -04:00
cleancache.c Driver Core and debugfs changes for 5.3-rc1 2019-07-12 12:24:03 -07:00
cma_debug.c debugfs: make sure we can remove u32_array files cleanly 2020-07-10 13:54:00 -07:00
cma.c cma: don't quit at first error when activating reserved areas 2020-08-12 10:57:57 -07:00
cma.h mm: cma: use CMA_MAX_NAME to define the length of cma name array 2020-09-01 09:19:43 +02:00
compaction.c mm, compaction: make fast_isolate_freepages() stay within zone 2021-03-04 11:38:38 +01:00
debug_page_ref.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
debug_vm_pgtable.c mm/debug_vm_pgtable: ensure THP availability via has_transparent_hugepage() 2021-07-14 16:56:13 +02:00
debug.c mm, dump_page: rename head_mapcount() --> head_compound_mapcount() 2020-10-13 18:38:29 -07:00
dmapool.c mm/dmapool.c: replace hard coded function name with __func__ 2020-10-13 18:38:32 -07:00
early_ioremap.c mm/early_ioremap.c: use %pa to print resource_size_t variables 2020-01-31 10:30:38 -08:00
fadvise.c mm, fadvise: improve the expensive remote LRU cache draining after FADV_DONTNEED 2020-10-13 18:38:29 -07:00
failslab.c mm/failslab.c: by default, do not fail allocations with direct reclaim only 2019-07-12 11:05:43 -07:00
filemap.c mm/filemap: add missing mem_cgroup_uncharge() to __add_to_page_cache_locked() 2021-02-10 09:29:21 +01:00
frame_vector.c mmap locking API: convert mmap_sem comments 2020-06-09 09:39:14 -07:00
frontswap.c mm/frontswap: mark various intentional data races 2020-08-14 19:56:56 -07:00
gup_benchmark.c mm/gup_benchmark: take the mmap lock around GUP 2020-10-18 09:27:09 -07:00
gup.c mm/gup: fix try_grab_compound_head() race with split_huge_page() 2021-07-14 16:55:42 +02:00
highmem.c mm/highmem.c: clean up endif comments 2020-10-16 11:11:18 -07:00
hmm.c mm/hmm: bypass devmap pte when all pfn requested flags are fulfilled 2021-09-18 13:40:36 +02:00
huge_memory.c mm/userfaultfd: fix uffd-wp special cases for fork() 2021-07-25 14:36:18 +02:00
hugetlb_cgroup.c hugetlb_cgroup: fix imbalanced css_get and css_put pair for shared mappings 2021-03-30 14:31:54 +02:00
hugetlb.c hugetlbfs: flush TLBs correctly after huge_pmd_unshare 2021-11-26 10:39:21 +01:00
hwpoison-inject.c mm,hwpoison-inject: don't pin for hwpoison_filter 2020-10-16 11:11:16 -07:00
init-mm.c mm/gup: prevent gup_fast from racing with COW during fork 2020-12-30 11:53:54 +01:00
internal.h mm/thp: fix vma_address() if virtual address below file offset 2021-06-30 08:47:27 -04:00
interval_tree.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 248 2019-06-19 17:09:08 +02:00
ioremap.c mm: move p?d_alloc_track to separate header file 2020-08-07 11:33:26 -07:00
Kconfig mm/zsmalloc.c: drop ZSMALLOC_PGTABLE_MAPPING 2020-12-06 10:19:07 -08:00
Kconfig.debug treewide: replace '---help---' in Kconfig files with 'help' 2020-06-14 01:57:21 +09:00
khugepaged.c mm: khugepaged: skip huge page collapse for special files 2021-11-06 14:10:09 +01:00
kmemleak.c mm/kmemleak: rely on rcu for task stack scanning 2020-10-13 18:38:27 -07:00
ksm.c ksm: fix potential missing rmap_item for stable_node 2021-05-19 10:13:07 +02:00
list_lru.c mm: list_lru: set shrinker map bit when child nr_items is not zero 2020-12-06 10:19:07 -08:00
maccess.c uaccess: add force_uaccess_{begin,end} helpers 2020-08-12 10:57:59 -07:00
madvise.c mm/madvise: replace ptrace attach requirement for process_madvise 2021-03-17 17:06:37 +01:00
Makefile mm,kmemleak-test.c: move kmemleak-test.c to samples dir 2020-10-13 18:38:27 -07:00
mapping_dirty_helpers.c mm/mapping_dirty_helpers: update huge page-table entry callbacks 2020-04-02 09:35:29 -07:00
memblock.c memblock: ensure there is no overflow in memblock_overlaps_region() 2021-12-17 10:14:42 +01:00
memcontrol.c memcg: prohibit unconditional exceeding the limit of dying tasks 2021-11-18 14:04:29 +01:00
memfd.c mm: page cache: store only head pages in i_pages 2019-09-24 15:54:08 -07:00
memory_hotplug.c mm/memory_hotplug: use "unsigned long" for PFN in zone_for_pfn_range() 2021-09-22 12:27:59 +02:00
memory-failure.c mm/hwpoison: clear MF_COUNT_INCREASED before retrying get_any_page() 2021-12-29 12:26:05 +01:00
memory.c mm/userfaultfd: fix uffd-wp special cases for fork() 2021-07-25 14:36:18 +02:00
mempolicy.c mm: mempolicy: fix THP allocations escaping mempolicy restrictions 2021-12-29 12:26:06 +01:00
mempool.c mm/mempool: add 'else' to split mutually exclusive case 2020-10-13 18:38:34 -07:00
memremap.c mm: fix memory_failure() handling of dax-namespace metadata 2021-03-04 11:38:21 +01:00
memtest.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
migrate.c mm, thp: use head page in __migration_entry_wait() 2021-06-30 08:47:26 -04:00
mincore.c mm: factor find_get_incore_page out of mincore_page 2020-10-13 18:38:29 -07:00
mlock.c mlock: fix unevictable_pgs event counts on THP 2020-09-19 13:13:38 -07:00
mm_init.c mm: adjust vm_committed_as_batch according to vm overcommit policy 2020-08-07 11:33:26 -07:00
mmap.c mm/mmap.c: fix mmap return value when vma is merged after call_mmap() 2020-12-06 10:19:07 -08:00
mmu_gather.c mmap locking API: convert mmap_sem comments 2020-06-09 09:39:14 -07:00
mmu_notifier.c mm/mmu_notifiers: ensure range_end() is paired with range_start() 2021-03-30 14:32:06 +02:00
mmzone.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
mprotect.c mm: Introduce arch_validate_flags() 2020-09-04 12:46:07 +01:00
mremap.c mm/mremap: start addresses are properly aligned 2020-08-07 11:33:27 -07:00
msync.c mmap locking API: use coccinelle to convert mmap_sem rwsem call sites 2020-06-09 09:39:14 -07:00
nommu.c mm: remove alloc_vm_area 2020-10-18 09:27:10 -07:00
oom_kill.c mm, oom: do not trigger out_of_memory from the #PF 2021-11-18 14:04:30 +01:00
page_alloc.c mm_zone: add function to check if managed dma zone exists 2022-01-27 10:53:44 +01:00
page_counter.c mm/page_counter: correct the obsolete func name in the comment of page_counter_try_charge() 2020-10-13 18:38:30 -07:00
page_ext.c mm/page_ext.c: drop pfn_present() check when onlining 2020-04-07 10:43:40 -07:00
page_idle.c mm/page_idle.c: skip offline pages 2020-06-08 11:05:55 -07:00
page_io.c swap: fix swapfile read/write offset 2021-03-07 12:34:15 +01:00
page_isolation.c mm: rename page_order() to buddy_order() 2020-10-16 11:11:19 -07:00
page_owner.c mm: rename page_order() to buddy_order() 2020-10-16 11:11:19 -07:00
page_poison.c mm/page_poison.c: replace bool variable with static key 2020-10-16 11:11:17 -07:00
page_reporting.c mm: rename page_order() to buddy_order() 2020-10-16 11:11:19 -07:00
page_reporting.h mm: introduce include/linux/pgtable.h 2020-06-09 09:39:13 -07:00
page_vma_mapped.c mm/thp: another PVMW_SYNC fix in page_vma_mapped_walk() 2021-06-30 08:47:29 -04:00
page-writeback.c mm: make wait_on_page_writeback() wait for multiple pending writebacks 2021-01-12 20:18:22 +01:00
pagewalk.c mmap locking API: convert mmap_sem comments 2020-06-09 09:39:14 -07:00
percpu-internal.h percpu: make pcpu_nr_empty_pop_pages per chunk type 2021-04-14 08:42:03 +02:00
percpu-km.c mm: memcg/percpu: account percpu memory to memory cgroups 2020-08-12 10:57:55 -07:00
percpu-stats.c percpu: make pcpu_nr_empty_pop_pages per chunk type 2021-04-14 08:42:03 +02:00
percpu-vm.c mm: memcg/percpu: account percpu memory to memory cgroups 2020-08-12 10:57:55 -07:00
percpu.c percpu: make pcpu_nr_empty_pop_pages per chunk type 2021-04-14 08:42:03 +02:00
pgalloc-track.h mm: move p?d_alloc_track to separate header file 2020-08-07 11:33:26 -07:00
pgtable-generic.c mm/thp: fix __split_huge_pmd_locked() on shmem migration entry 2021-06-30 08:47:26 -04:00
process_vm_access.c mm/process_vm_access.c: include compat.h 2021-01-19 18:27:21 +01:00
ptdump.c mm: ptdump: fix build failure 2021-04-21 13:00:57 +02:00
readahead.c mm: use limited read-ahead to satisfy read 2020-10-17 13:49:08 -06:00
rmap.c mm/thp: fix page_address_in_vma() on file THP tails 2021-06-30 08:47:27 -04:00
rodata_test.c mm/rodata_test.c: fix missing function declaration 2020-08-21 09:52:53 -07:00
shmem.c Revert "mm/shmem: fix shmem_swapin() race with swapoff" 2021-07-25 14:36:20 +02:00
shuffle.c mm: rename page_order() to buddy_order() 2020-10-16 11:11:19 -07:00
shuffle.h mm/shuffle: remove dynamic reconfiguration 2020-08-07 11:33:29 -07:00
slab_common.c mm/slub: fix redzoning for small allocations 2021-06-23 14:42:54 +02:00
slab.c mm/sl?b.c: remove ctor argument from kmem_cache_flags 2021-05-14 09:50:45 +02:00
slab.h mm: kmemleak: slob: respect SLAB_NOLEAKTRACE flag 2021-11-26 10:39:19 +01:00
slob.c mm: memcg: convert vmstat slab counters to bytes 2020-08-07 11:33:24 -07:00
slub.c mm, slub: fix incorrect memcg slab count for bulk free 2021-10-27 09:56:53 +02:00
sparse-vmemmap.c mm/sparse: only sub-section aligned range would be populated 2020-08-07 11:33:27 -07:00
sparse.c mm/sparse: add the missing sparse_buffer_fini() in error branch 2021-05-14 09:50:45 +02:00
swap_cgroup.c mm: memcontrol: make swap tracking an integral part of memory control 2020-06-03 20:09:48 -07:00
swap_slots.c mm/swap_slots.c: remove always zero and unused return value of enable_swap_slots_cache() 2020-10-13 18:38:30 -07:00
swap_state.c mm: fix some broken comments 2020-10-16 11:11:19 -07:00
swap.c mm: move call to compound_head() in release_pages() 2020-10-13 18:38:33 -07:00
swapfile.c mm/swap: fix pte_same_as_swp() not removing uffd-wp bit when compare 2021-06-23 14:42:53 +02:00
truncate.c mm/thp: unmap_mapping_page() to fix THP truncate_cleanup_page() 2021-06-30 08:47:27 -04:00
usercopy.c mm/usercopy.c: delete duplicated word 2020-08-12 10:57:58 -07:00
userfaultfd.c mm/vmscan: protect the workingset on anonymous LRU 2020-08-12 10:57:55 -07:00
util.c mm: don't allow oversized kvmalloc() calls 2021-10-06 15:56:03 +02:00
vmacache.c kernel: better document the use_mm/unuse_mm API contract 2020-06-10 19:14:18 -07:00
vmalloc.c mm/vmalloc.c: fix potential memory leak 2021-01-19 18:27:21 +01:00
vmpressure.c mm: vmpressure: use mem_cgroup_is_root API 2020-04-02 09:35:31 -07:00
vmscan.c mm,vmscan: fix divide by zero in get_scan_count 2021-09-18 13:40:36 +02:00
vmstat.c mm/vmstat.c: use helper macro abs() 2020-10-16 11:11:17 -07:00
workingset.c XArray updates for 5.9 2020-10-20 14:39:37 -07:00
z3fold.c mm/z3fold: use release_z3fold_page_locked() to release locked z3fold page 2021-07-14 16:56:51 +02:00
zbud.c mm/zbud: remove redundant initialization 2020-10-13 18:38:34 -07:00
zpool.c mm/zpool.c: delete duplicated word and fix grammar 2020-08-12 10:57:58 -07:00
zsmalloc.c mm/zsmalloc.c: close race window between zs_pool_dec_isolated() and zs_unregister_migration() 2021-11-18 14:04:26 +01:00
zswap.c mm/zswap: allow setting default status, compressor and allocator in Kconfig 2020-04-07 10:43:41 -07:00