Go to file
Jann Horn 6c32496964 mm/slub: add missing TID updates on slab deactivation
commit eeaa345e128515135ccb864c04482180c08e3259 upstream.

The fastpath in slab_alloc_node() assumes that c->slab is stable as long as
the TID stays the same. However, two places in __slab_alloc() currently
don't update the TID when deactivating the CPU slab.

If multiple operations race the right way, this could lead to an object
getting lost; or, in an even more unlikely situation, it could even lead to
an object being freed onto the wrong slab's freelist, messing up the
`inuse` counter and eventually causing a page to be freed to the page
allocator while it still contains slab objects.

(I haven't actually tested these cases though, this is just based on
looking at the code. Writing testcases for this stuff seems like it'd be
a pain...)

The race leading to state inconsistency is (all operations on the same CPU
and kmem_cache):

 - task A: begin do_slab_free():
    - read TID
    - read pcpu freelist (==NULL)
    - check `slab == c->slab` (true)
 - [PREEMPT A->B]
 - task B: begin slab_alloc_node():
    - fastpath fails (`c->freelist` is NULL)
    - enter __slab_alloc()
    - slub_get_cpu_ptr() (disables preemption)
    - enter ___slab_alloc()
    - take local_lock_irqsave()
    - read c->freelist as NULL
    - get_freelist() returns NULL
    - write `c->slab = NULL`
    - drop local_unlock_irqrestore()
    - goto new_slab
    - slub_percpu_partial() is NULL
    - get_partial() returns NULL
    - slub_put_cpu_ptr() (enables preemption)
 - [PREEMPT B->A]
 - task A: finish do_slab_free():
    - this_cpu_cmpxchg_double() succeeds()
    - [CORRUPT STATE: c->slab==NULL, c->freelist!=NULL]

From there, the object on c->freelist will get lost if task B is allowed to
continue from here: It will proceed to the retry_load_slab label,
set c->slab, then jump to load_freelist, which clobbers c->freelist.

But if we instead continue as follows, we get worse corruption:

 - task A: run __slab_free() on object from other struct slab:
    - CPU_PARTIAL_FREE case (slab was on no list, is now on pcpu partial)
 - task A: run slab_alloc_node() with NUMA node constraint:
    - fastpath fails (c->slab is NULL)
    - call __slab_alloc()
    - slub_get_cpu_ptr() (disables preemption)
    - enter ___slab_alloc()
    - c->slab is NULL: goto new_slab
    - slub_percpu_partial() is non-NULL
    - set c->slab to slub_percpu_partial(c)
    - [CORRUPT STATE: c->slab points to slab-1, c->freelist has objects
      from slab-2]
    - goto redo
    - node_match() fails
    - goto deactivate_slab
    - existing c->freelist is passed into deactivate_slab()
    - inuse count of slab-1 is decremented to account for object from
      slab-2

At this point, the inuse count of slab-1 is 1 lower than it should be.
This means that if we free all allocated objects in slab-1 except for one,
SLUB will think that slab-1 is completely unused, and may free its page,
leading to use-after-free.

Fixes: c17dda40a6 ("slub: Separate out kmem_cache_cpu processing from deactivate_slab")
Fixes: 03e404af26 ("slub: fast release on full slab")
Cc: stable@vger.kernel.org
Signed-off-by: Jann Horn <jannh@google.com>
Acked-by: Christoph Lameter <cl@linux.com>
Acked-by: David Rientjes <rientjes@google.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Link: https://lore.kernel.org/r/20220608182205.2945720-1-jannh@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-07-12 16:32:16 +02:00
arch xen/arm: Fix race in RB-tree based P2M accounting 2022-07-07 17:52:23 +02:00
block block: Fix handling of offline queues in blk_mq_alloc_request_hctx() 2022-06-22 14:13:17 +02:00
certs certs/blacklist_hashes.c: fix const confusion in certs blacklist 2022-06-22 14:13:17 +02:00
crypto crypto: memneq - move into lib/ 2022-06-22 14:13:18 +02:00
Documentation iio: adc: vf610: fix conversion mode sysfs node name 2022-06-29 08:59:50 +02:00
drivers clocksource/drivers/ixp4xx: remove EXPORT_SYMBOL_GPL from ixp4xx_timer_setup() 2022-07-07 17:52:23 +02:00
fs xfs: fix xfs_reflink_unshare usage of filemap_write_and_wait_range 2022-07-07 17:52:20 +02:00
include linux/dim: Fix divide by 0 in RDMA DIM 2022-07-07 17:52:17 +02:00
init Kconfig: Add option for asm goto w/ tied outputs to workaround clang-13 bug 2022-06-09 10:21:25 +02:00
ipc ipc/mqueue: use get_tree_nodev() in mqueue_get_tree() 2022-06-09 10:21:17 +02:00
kernel tick/nohz: unexport __init-annotated tick_nohz_full_setup() 2022-07-02 16:39:23 +02:00
lib crypto: memneq - move into lib/ 2022-06-22 14:13:18 +02:00
LICENSES LICENSES/deprecated: add Zlib license text 2020-09-16 14:33:49 +02:00
mm mm/slub: add missing TID updates on slab deactivation 2022-07-12 16:32:16 +02:00
net ipv6/sit: fix ipip6_tunnel_get_prl return value 2022-07-07 17:52:20 +02:00
samples samples/bpf, xdpsock: Fix race when running for fix duration of time 2022-04-08 14:40:21 +02:00
scripts modpost: fix section mismatch check for exported init/exit sections 2022-06-29 08:59:54 +02:00
security ima: remove the IMA_TEMPLATE Kconfig option 2022-06-09 10:21:25 +02:00
sound ALSA: hda/realtek: Add quirk for Clevo NS50PU 2022-06-29 08:59:45 +02:00
tools selftests/rseq: Change type of rseq_offset to ptrdiff_t 2022-07-07 17:52:22 +02:00
usr usr/include/Makefile: add linux/nfc.h to the compile-test coverage 2022-02-01 17:25:48 +01:00
virt KVM: Prevent module exit until all VMs are freed 2022-04-08 14:40:38 +02:00
.clang-format RDMA 5.10 pull request 2020-10-17 11:18:18 -07:00
.cocciconfig scripts: add Linux .cocciconfig for coccinelle 2016-07-22 12:13:39 +02:00
.get_maintainer.ignore Opt out of scripts/get_maintainer.pl 2019-05-16 10:53:40 -07:00
.gitattributes .gitattributes: use 'dts' diff driver for dts files 2019-12-04 19:44:11 -08:00
.gitignore kbuild: generate Module.symvers only when vmlinux exists 2021-05-19 10:12:59 +02:00
.mailmap mailmap: add two more addresses of Uwe Kleine-König 2020-12-06 10:19:07 -08:00
COPYING COPYING: state that all contributions really are covered by this file 2020-02-10 13:32:20 -08:00
CREDITS MAINTAINERS: Move Jason Cooper to CREDITS 2020-11-30 10:20:34 +01:00
Kbuild kbuild: rename hostprogs-y/always to hostprogs/always-y 2020-02-04 01:53:07 +09:00
Kconfig kbuild: ensure full rebuild when the compiler is updated 2020-05-12 13:28:33 +09:00
MAINTAINERS MAINTAINERS: add Amir as xfs maintainer for 5.10.y 2022-07-02 16:39:22 +02:00
Makefile Linux 5.10.129 2022-07-07 17:52:23 +02:00
README Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.