tmp_suning_uos_patched/arch/s390/mm
Alexander Gordeev ecb71f7bd5 s390/mm: fix 2KB pgtable release race
commit c2c224932fd0ee6854d6ebfc8d059c2bcad86606 upstream.

There is a race on concurrent 2KB-pgtables release paths when
both upper and lower halves of the containing parent page are
freed, one via page_table_free_rcu() + __tlb_remove_table(),
and the other via page_table_free(). The race might lead to a
corruption as result of remove of list item in page_table_free()
concurrently with __free_page() in __tlb_remove_table().

Let's assume first the lower and next the upper 2KB-pgtables are
freed from a page. Since both halves of the page are allocated
the tracking byte (bits 24-31 of the page _refcount) has value
of 0x03 initially:

CPU0				CPU1
----				----

page_table_free_rcu() // lower half
{
	// _refcount[31..24] == 0x03
	...
	atomic_xor_bits(&page->_refcount,
			0x11U << (0 + 24));
	// _refcount[31..24] <= 0x12
	...
	table = table | (1U << 0);
	tlb_remove_table(tlb, table);
}
...
__tlb_remove_table()
{
	// _refcount[31..24] == 0x12
	mask = _table & 3;
	// mask <= 0x01
	...

				page_table_free() // upper half
				{
					// _refcount[31..24] == 0x12
					...
					atomic_xor_bits(
						&page->_refcount,
						1U << (1 + 24));
					// _refcount[31..24] <= 0x10
					// mask <= 0x10
					...
	atomic_xor_bits(&page->_refcount,
			mask << (4 + 24));
	// _refcount[31..24] <= 0x00
	// mask <= 0x00
	...
	if (mask != 0) // == false
		break;
	fallthrough;
	...
					if (mask & 3) // == false
						...
					else
	__free_page(page);			list_del(&page->lru);
	^^^^^^^^^^^^^^^^^^	RACE!		^^^^^^^^^^^^^^^^^^^^^
}					...
				}

The problem is page_table_free() releases the page as result of
lower nibble unset and __tlb_remove_table() observing zero too
early. With this update page_table_free() will use the similar
logic as page_table_free_rcu() + __tlb_remove_table(), and mark
the fragment as pending for removal in the upper nibble until
after the list_del().

In other words, the parent page is considered as unreferenced and
safe to release only when the lower nibble is cleared already and
unsetting a bit in upper nibble results in that nibble turned zero.

Cc: stable@vger.kernel.org
Suggested-by: Vlastimil Babka <vbabka@suse.com>
Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-01-27 10:54:25 +01:00
..
cmm.c mm: remove unneeded includes of <asm/pgalloc.h> 2020-08-07 11:33:26 -07:00
dump_pagetables.c s390/mm,ptdump: sort markers 2020-09-16 14:08:47 +02:00
extmem.c s390/extmem: remove stale -ENOSPC comment and handling 2020-07-03 10:49:16 +02:00
fault.c s390: mm: Fix secure storage access exception handling 2021-07-14 16:55:43 +02:00
gmap.c s390/gmap: don't unconditionally call pte_unmap_unlock() in __gmap_zap() 2021-11-18 14:04:10 +01:00
hugetlbpage.c s390/mm: fix huge pte soft dirty copying 2020-07-09 15:18:23 +02:00
init.c s390/pv: fix the forcing of the swiotlb 2021-09-18 13:40:36 +02:00
kasan_init.c s390/kasan: fix large PMD pages address alignment check 2021-09-15 09:50:27 +02:00
maccess.c s390/maccess: add no DAT mode to kernel_write 2020-06-29 16:26:36 +02:00
Makefile s390: add ARCH_HAS_DEBUG_WX support 2020-09-14 11:38:35 +02:00
mmap.c mm: remove unneeded includes of <asm/pgalloc.h> 2020-08-07 11:33:26 -07:00
page-states.c arch, mm: replace for_each_memblock() with for_each_mem_pfn_range() 2020-10-13 18:38:35 -07:00
pageattr.c s390/mm,ptdump: hold cpa mutex while walking for kernel page table dump 2020-09-14 11:38:34 +02:00
pgalloc.c s390/mm: fix 2KB pgtable release race 2022-01-27 10:54:25 +01:00
pgtable.c s390/mm: validate VMA in PGSTE manipulation functions 2021-12-01 09:19:10 +01:00
vmem.c arch, drivers: replace for_each_membock() with for_each_mem_range() 2020-10-13 18:38:35 -07:00