tmp_suning_uos_patched/kernel/sched
Peter Zijlstra a3e6bd0c71 sched/rt: Fix double enqueue caused by rt_effective_prio
commit f558c2b834ec27e75d37b1c860c139e7b7c3a8e4 upstream.

Double enqueues in rt runqueues (list) have been reported while running
a simple test that spawns a number of threads doing a short sleep/run
pattern while being concurrently setscheduled between rt and fair class.

  WARNING: CPU: 3 PID: 2825 at kernel/sched/rt.c:1294 enqueue_task_rt+0x355/0x360
  CPU: 3 PID: 2825 Comm: setsched__13
  RIP: 0010:enqueue_task_rt+0x355/0x360
  Call Trace:
   __sched_setscheduler+0x581/0x9d0
   _sched_setscheduler+0x63/0xa0
   do_sched_setscheduler+0xa0/0x150
   __x64_sys_sched_setscheduler+0x1a/0x30
   do_syscall_64+0x33/0x40
   entry_SYSCALL_64_after_hwframe+0x44/0xae

  list_add double add: new=ffff9867cb629b40, prev=ffff9867cb629b40,
		       next=ffff98679fc67ca0.
  kernel BUG at lib/list_debug.c:31!
  invalid opcode: 0000 [#1] PREEMPT_RT SMP PTI
  CPU: 3 PID: 2825 Comm: setsched__13
  RIP: 0010:__list_add_valid+0x41/0x50
  Call Trace:
   enqueue_task_rt+0x291/0x360
   __sched_setscheduler+0x581/0x9d0
   _sched_setscheduler+0x63/0xa0
   do_sched_setscheduler+0xa0/0x150
   __x64_sys_sched_setscheduler+0x1a/0x30
   do_syscall_64+0x33/0x40
   entry_SYSCALL_64_after_hwframe+0x44/0xae

__sched_setscheduler() uses rt_effective_prio() to handle proper queuing
of priority boosted tasks that are setscheduled while being boosted.
rt_effective_prio() is however called twice per each
__sched_setscheduler() call: first directly by __sched_setscheduler()
before dequeuing the task and then by __setscheduler() to actually do
the priority change. If the priority of the pi_top_task is concurrently
being changed however, it might happen that the two calls return
different results. If, for example, the first call returned the same rt
priority the task was running at and the second one a fair priority, the
task won't be removed by the rt list (on_list still set) and then
enqueued in the fair runqueue. When eventually setscheduled back to rt
it will be seen as enqueued already and the WARNING/BUG be issued.

Fix this by calling rt_effective_prio() only once and then reusing the
return value. While at it refactor code as well for clarity. Concurrent
priority inheritance handling is still safe and will eventually converge
to a new state by following the inheritance chain(s).

Fixes: 0782e63bc6 ("sched: Handle priority boosted tasks proper in setscheduler()")
[squashed Peterz changes; added changelog]
Reported-by: Mark Simmons <msimmons@redhat.com>
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210803104501.38333-1-juri.lelli@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-08-12 13:22:19 +02:00
..
autogroup.c
autogroup.h
clock.c
completion.c
core.c sched/rt: Fix double enqueue caused by rt_effective_prio 2021-08-12 13:22:19 +02:00
cpuacct.c
cpudeadline.c
cpudeadline.h
cpufreq_schedutil.c cpufreq: Introduce governor flags 2020-11-10 18:31:17 +01:00
cpufreq.c
cpupri.c
cpupri.h
cputime.c
deadline.c sched/rt: Fix Deadline utilization tracking during policy change 2021-07-14 16:56:09 +02:00
debug.c sched/fair: Fix util_est UTIL_AVG_UNCHANGED handling 2021-06-16 12:01:46 +02:00
fair.c sched/fair: Fix CFS bandwidth hrtimer expiry type 2021-07-25 14:36:17 +02:00
features.h sched,fair: Alternative sched_slice() 2021-05-11 14:47:31 +02:00
idle.c rcu/nocb: Perform deferred wake up before last idle's need_resched() check 2021-03-04 11:38:35 +01:00
isolation.c
loadavg.c sched: nohz: stop passing around unused "ticks" parameter. 2020-07-22 10:22:04 +02:00
Makefile
membarrier.c sched/membarrier: fix missing local execution of ipi_sync_rq_state() 2021-03-17 17:06:35 +01:00
pelt.c
pelt.h sched/fair: Fix util_est UTIL_AVG_UNCHANGED handling 2021-06-16 12:01:46 +02:00
psi.c psi: Fix race between psi_trigger_create/destroy 2021-07-14 16:56:10 +02:00
rt.c sched/rt: Fix RT utilization tracking during policy change 2021-07-14 16:56:09 +02:00
sched-pelt.h
sched.h sched/uclamp: Ignore max aggregation if rq is idle 2021-07-20 16:05:58 +02:00
smp.h
stats.c
stats.h
stop_task.c treewide: Convert macro and uses of __section(foo) to __section("foo") 2020-10-25 14:51:49 -07:00
swait.c
topology.c Scheduler changes for v5.10: 2020-10-12 12:56:01 -07:00
wait_bit.c
wait.c rq-qos: fix missed wake-ups in rq_qos_throttle try two 2021-07-19 09:45:00 +02:00