Go to file
Waiman Long 2a9d3b5118 ipc/mqueue: use get_tree_nodev() in mqueue_get_tree()
[ Upstream commit d60c4d01a98bc1942dba6e3adc02031f5519f94b ]

When running the stress-ng clone benchmark with multiple testing threads,
it was found that there were significant spinlock contention in sget_fc().
The contended spinlock was the sb_lock.  It is under heavy contention
because the following code in the critcal section of sget_fc():

  hlist_for_each_entry(old, &fc->fs_type->fs_supers, s_instances) {
      if (test(old, fc))
          goto share_extant_sb;
  }

After testing with added instrumentation code, it was found that the
benchmark could generate thousands of ipc namespaces with the
corresponding number of entries in the mqueue's fs_supers list where the
namespaces are the key for the search.  This leads to excessive time in
scanning the list for a match.

Looking back at the mqueue calling sequence leading to sget_fc():

  mq_init_ns()
  => mq_create_mount()
  => fc_mount()
  => vfs_get_tree()
  => mqueue_get_tree()
  => get_tree_keyed()
  => vfs_get_super()
  => sget_fc()

Currently, mq_init_ns() is the only mqueue function that will indirectly
call mqueue_get_tree() with a newly allocated ipc namespace as the key for
searching.  As a result, there will never be a match with the exising ipc
namespaces stored in the mqueue's fs_supers list.

So using get_tree_keyed() to do an existing ipc namespace search is just a
waste of time.  Instead, we could use get_tree_nodev() to eliminate the
useless search.  By doing so, we can greatly reduce the sb_lock hold time
and avoid the spinlock contention problem in case a large number of ipc
namespaces are present.

Of course, if the code is modified in the future to allow
mqueue_get_tree() to be called with an existing ipc namespace instead of a
new one, we will have to use get_tree_keyed() in this case.

The following stress-ng clone benchmark command was run on a 2-socket
48-core Intel system:

./stress-ng --clone 32 --verbose --oomable --metrics-brief -t 20

The "bogo ops/s" increased from 5948.45 before patch to 9137.06 after
patch. This is an increase of 54% in performance.

Link: https://lkml.kernel.org/r/20220121172315.19652-1-longman@redhat.com
Fixes: 935c6912b1 ("ipc: Convert mqueue fs to fs_context")
Signed-off-by: Waiman Long <longman@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: David Howells <dhowells@redhat.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-09 10:21:17 +02:00
arch powerpc/4xx/cpm: Fix return value of __setup() handler 2022-06-09 10:21:16 +02:00
block block-map: add __GFP_ZERO flag for alloc_page in function bio_copy_kern 2022-05-12 12:25:45 +02:00
certs certs: Trigger creation of RSA module signing key if it's not an RSA key 2021-09-15 09:50:29 +02:00
crypto crypto: ecrdsa - Fix incorrect use of vli_cmp 2022-06-06 08:42:43 +02:00
Documentation spi: qcom-qspi: Add minItems to interconnect-names 2022-06-09 10:20:59 +02:00
drivers pinctrl: renesas: core: Fix possible null-ptr-deref in sh_pfc_map_resources() 2022-06-09 10:21:16 +02:00
fs proc: fix dentry/inode overinstantiating under /proc/${pid}/net 2022-06-09 10:21:17 +02:00
include scsi: fcoe: Fix Wstringop-overflow warnings in fcoe_wwn_from_mac() 2022-06-09 10:21:15 +02:00
init random: handle latent entropy and command line from random_init() 2022-05-30 09:33:44 +02:00
ipc ipc/mqueue: use get_tree_nodev() in mqueue_get_tree() 2022-06-09 10:21:17 +02:00
kernel sched/fair: Fix cfs_rq_clock_pelt() for throttled cfs_rq 2022-06-09 10:21:02 +02:00
lib lib/crypto: add prompts back to crypto libraries 2022-06-06 08:42:42 +02:00
LICENSES LICENSES/deprecated: add Zlib license text 2020-09-16 14:33:49 +02:00
mm zsmalloc: fix races between asynchronous zspage free and page migration 2022-06-06 08:42:43 +02:00
net net/smc: postpone sk_refcnt increment in connect() 2022-06-09 10:21:12 +02:00
samples samples/bpf, xdpsock: Fix race when running for fix duration of time 2022-04-08 14:40:21 +02:00
scripts scripts/faddr2line: Fix overlapping text section failures 2022-06-09 10:21:08 +02:00
security lsm,selinux: pass flowi_common instead of flowi to the LSM hooks 2022-06-09 10:21:09 +02:00
sound ASoC: atmel-classd: Remove endianness flag on class d component 2022-06-09 10:21:16 +02:00
tools kselftest/cgroup: fix test_stress.sh to use OUTPUT dir 2022-06-09 10:21:07 +02:00
usr usr/include/Makefile: add linux/nfc.h to the compile-test coverage 2022-02-01 17:25:48 +01:00
virt KVM: Prevent module exit until all VMs are freed 2022-04-08 14:40:38 +02:00
.clang-format RDMA 5.10 pull request 2020-10-17 11:18:18 -07:00
.cocciconfig scripts: add Linux .cocciconfig for coccinelle 2016-07-22 12:13:39 +02:00
.get_maintainer.ignore Opt out of scripts/get_maintainer.pl 2019-05-16 10:53:40 -07:00
.gitattributes .gitattributes: use 'dts' diff driver for dts files 2019-12-04 19:44:11 -08:00
.gitignore kbuild: generate Module.symvers only when vmlinux exists 2021-05-19 10:12:59 +02:00
.mailmap mailmap: add two more addresses of Uwe Kleine-König 2020-12-06 10:19:07 -08:00
COPYING COPYING: state that all contributions really are covered by this file 2020-02-10 13:32:20 -08:00
CREDITS MAINTAINERS: Move Jason Cooper to CREDITS 2020-11-30 10:20:34 +01:00
Kbuild kbuild: rename hostprogs-y/always to hostprogs/always-y 2020-02-04 01:53:07 +09:00
Kconfig kbuild: ensure full rebuild when the compiler is updated 2020-05-12 13:28:33 +09:00
MAINTAINERS MAINTAINERS: add git tree for random.c 2022-05-30 09:33:24 +02:00
Makefile Linux 5.10.120 2022-06-06 08:42:45 +02:00
README Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.